mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-08-18 05:56:00 -04:00
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds
This commit is contained in:
1590
ggml-cuda.cu
1590
ggml-cuda.cu
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user