llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-27 03:55:20 +00:00

Files

Jeff Bolz 24e86cae72 vulkan: KHR_coopmat flash attention (#13506 )

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.

2025-05-14 11:55:26 +02:00

cmake

scripts : update sync + fix cmake merge

2025-03-27 10:09:29 +02:00

include

llama/ggml: add LLM training support (#10544 )

2025-05-12 14:44:49 +02:00

src

vulkan: KHR_coopmat flash attention (#13506 )

2025-05-14 11:55:26 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

whisper: remove MSVC warnings pragmas (whisper/3090)

2025-05-07 17:28:36 +03:00