CUDA: optimize and refactor MMQ (#8416)

* CUDA: optimize and refactor MMQ * explicit q8_1 memory layouts, add documentation
2025-08-11 11:05:39 -04:00 · 2024-07-11 16:47:47 +02:00
parent a977c11544
commit 808aba3916
5 changed files with 867 additions and 687 deletions
--- a/ggml/src/ggml-cuda/mmq.cuh
+++ b/ggml/src/ggml-cuda/mmq.cuh