mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-08-19 06:25:15 -04:00
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)
* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics
This commit is contained in:
Reference in New Issue
Block a user