llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-09-01 12:52:17 -04:00

Files

Diego Devesa ec428b02c3 llama : add --n-cpu-moe option (#15077 )

* llama : add --n-cpu-moe option

Keeps the MoE weights of the first N layers in the CPU

2025-08-05 01:05:36 +02:00

arg.cpp

llama : add --n-cpu-moe option (#15077 )

2025-08-05 01:05:36 +02:00

arg.h

…

base64.hpp

…

build-info.cpp.in

…

chat-parser.cpp

…

chat-parser.h

…

chat.cpp

chat : fix multiple tool_calls on hermes-2-pro (#14962 )

2025-08-02 18:04:48 +08:00

chat.h

…

CMakeLists.txt

cmake : do not search for curl libraries by ourselves (#14613 )

2025-07-10 15:29:05 +03:00

common.cpp

llama : allow other bufts when overriding to CPU, add --no-repack option (#14990 )

2025-07-31 18:11:34 +02:00

common.h

imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076 )

2025-08-04 23:26:52 +02:00

console.cpp

…

console.h

…

json-partial.cpp

…

json-partial.h

…

json-schema-to-grammar.cpp

…

json-schema-to-grammar.h

…

llguidance.cpp

…

log.cpp

…

log.h

…

ngram-cache.cpp

…

ngram-cache.h

…

regex-partial.cpp

…

regex-partial.h

…

sampling.cpp

…

sampling.h

…

speculative.cpp

server : implement universal assisted decoding (#12635 )

2025-07-31 14:25:23 +02:00

speculative.h

server : implement universal assisted decoding (#12635 )

2025-07-31 14:25:23 +02:00