Commit Graph

4 Commits

Author SHA1 Message Date
38c03478a3 CUDA: fix FA out-of-bounds writes (#7465) 2024-05-22 17:58:25 +02:00
133d99c599 CUDA: deduplicate FlashAttention code (#7352) 2024-05-18 12:36:25 +02:00
0fc1e820a9 CUDA: faster large batch FA without tensor cores (#7314) 2024-05-17 18:54:52 +02:00
dc685be466 CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00