llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-09-06 07:11:25 -04:00

Files

Georgi Gerganov d9c3ba2b77 ggml : disable iq4_nl interleave size 8 (#10709 )

ggml-ci

2024-12-07 18:38:15 +02:00

ggml-blas

…

ggml-cann

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-cpu

ggml : disable iq4_nl interleave size 8 (#10709 )

2024-12-07 18:38:15 +02:00

ggml-cuda

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-hip

…

ggml-kompute

…

ggml-metal

metal : Extend how Llama.cpp locates metal resources (#10676 )

2024-12-07 09:55:01 +02:00

ggml-musa

…

ggml-rpc

…

ggml-sycl

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-vulkan

Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597 )

2024-12-07 10:24:15 +01:00

CMakeLists.txt

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-alloc.c

…

ggml-backend-impl.h

…

ggml-backend-reg.cpp

ggml : add predefined list of CPU backend variants to build (#10626 )

2024-12-04 14:45:40 +01:00

ggml-backend.cpp

…

ggml-common.h

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-impl.h

Avoid using __fp16 on ARM with old nvcc (#10616 )

2024-12-04 01:41:37 +01:00

ggml-opt.cpp

…

ggml-quants.c

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-quants.h

…

ggml-threading.cpp

…

ggml-threading.h

…

ggml.c

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00