llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-26 03:03:25 -04:00

Files

Jeff Bolz 015022bb53 vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931 )

The grouped query attention optmization doesn't require a power of two ratio,
the only thing relying on it was the modulo operation written as bitwise &.

split_k need not depend on gqa_ratio - enable it any time there's only one
workgroup in the X dimension. The shader gets the split index from the x coord,
and multiple workgroups in the X dimension (pre-split) indicates a larger
FA operation that wouldn't need splitting.

2025-04-16 20:37:25 +02:00

cmake

scripts : update sync + fix cmake merge

2025-03-27 10:09:29 +02:00

include

ggml : add bilinear upscale support (ggml/1185)

2025-04-11 00:17:47 +03:00

src

vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931 )

2025-04-16 20:37:25 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

CUDA/HIP: Share the same unified memory allocation logic. (#12934 )

2025-04-15 11:20:38 +02:00