Files
llama.cpp/ggml
Jeff Bolz 7ecd780b1a vulkan: Use fp16 for the flash attention P*V multiplication (#12783)
This is consistent with the ggml-cuda behavior and the mul_mat fallback.
2025-04-09 07:12:57 +02:00
..
2024-07-13 18:12:39 +02:00