llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-26 03:03:25 -04:00

Files

Johannes Gäßler 808aba3916 CUDA: optimize and refactor MMQ (#8416 )

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

2024-07-11 16:47:47 +02:00

2024-06-26 18:33:02 +03:00

2024-07-10 15:14:51 +03:00

2024-07-11 16:47:47 +02:00

CMakeLists.txt

2024-07-10 15:23:29 +03:00

ggml_vk_generate_shaders.py

2024-07-07 15:04:39 -04:00