Johannes Gäßler
9b596417af
CUDA: quantized KV support for FA vec (#7527)
* CUDA: quantized KV support for FA vec
* try CI fix
* fix commented-out kernel variants
* add q8_0 q4_0 tests
* fix nwarps > batch size
* split fattn compile via extern templates
* fix flake8
* fix metal tests
* fix cmake
* make generate_cu_files.py executable
* add autogenerated .cu files
* fix AMD
* error if type_v != FP16 and not flash_attn
* remove obsolete code
2024-06-01 08:44:14 +02:00
..
2024-06-01 08:44:14 +02:00
2024-04-03 16:07:05 +03:00
2024-04-18 15:18:48 +02:00
2024-05-08 22:55:49 +02:00
2024-05-28 01:40:47 +02:00
2024-05-29 15:38:26 +03:00
2024-05-23 10:00:21 +03:00
2024-04-09 11:16:13 +03:00
2024-05-08 22:55:49 +02:00
2024-05-08 22:55:49 +02:00
2024-04-09 11:16:13 +03:00
2024-05-23 10:00:21 +03:00
2024-03-29 17:45:46 +02:00
2024-06-01 08:44:14 +02:00
2024-06-01 08:44:14 +02:00
2024-05-17 18:54:52 +02:00
2024-06-01 08:44:14 +02:00
2024-05-17 18:54:52 +02:00
2024-06-01 08:44:14 +02:00
2024-06-01 08:44:14 +02:00
2024-06-01 08:44:14 +02:00
2024-06-01 08:44:14 +02:00
2024-04-30 12:16:08 +03:00
2024-06-01 08:44:14 +02:00
2024-05-08 22:55:49 +02:00
2024-05-29 20:17:31 +03:00
2024-04-09 11:16:13 +03:00
2024-04-09 11:16:13 +03:00
2024-05-29 20:17:31 +03:00
2024-05-08 22:55:49 +02:00
2024-05-18 12:36:25 +02:00
2024-05-11 15:38:34 +03:00
2024-05-11 15:38:34 +03:00
2024-05-15 13:23:33 +03:00
2024-06-01 08:44:14 +02:00