Srihari-mcw
|
2f8bd2b901
|
llamafile : extend sgemm.cpp support for Q5_0 models (#10010)
|
2024-10-25 10:27:41 +03:00 |
|
slaren
|
23e0d70bac
|
ggml : move common CPU backend impl to new header (#9509)
|
2024-09-16 16:22:07 +02:00 |
|
Eve
|
5c3d0f1824
|
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)
* squashed
readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049
have ggml_vec_dot_q4_0 do two blocks per loop for avx
try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue
* shuffle
* remove f16c iq4_nl as i cant make it faster than before
|
2024-09-16 09:48:24 +03:00 |
|
Eve
|
e536426ded
|
llamafile : disable sgemm for batch-size 1 (#9330)
|
2024-09-07 22:02:26 +03:00 |
|
Srihari-mcw
|
ea5d7478b1
|
sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908)
|
2024-08-31 11:20:35 +03:00 |
|
Georgi Gerganov
|
6b2a849d1f
|
ggml : move sgemm sources to llamafile subfolder (#8394)
ggml-ci
|
2024-07-10 15:23:29 +03:00 |
|