llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-30 22:23:31 -04:00

Files

Jeff Bolz 7ecd780b1a vulkan: Use fp16 for the flash attention P*V multiplication (#12783 )

This is consistent with the ggml-cuda behavior and the mul_mat fallback.

2025-04-09 07:12:57 +02:00

2025-03-27 10:09:29 +02:00

2025-03-28 20:21:59 +02:00

2025-04-09 07:12:57 +02:00

.gitignore

2024-07-13 18:12:39 +02:00

CMakeLists.txt

2025-03-30 08:33:31 +03:00