llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-14 04:17:53 -04:00

Files

Diego Devesa ec428b02c3 llama : add --n-cpu-moe option (#15077 )

* llama : add --n-cpu-moe option

Keeps the MoE weights of the first N layers in the CPU

2025-08-05 01:05:36 +02:00

arg.cpp

llama : add --n-cpu-moe option (#15077 )

2025-08-05 01:05:36 +02:00

arg.h

common : add common_remote_get_content (#13123 )

2025-04-26 22:58:12 +02:00

base64.hpp

llava : expose as a shared library for downstream projects (#3613 )

2023-11-07 00:36:23 +03:00

build-info.cpp.in

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )

2025-06-13 10:38:52 +02:00

chat-parser.cpp

llama-chat : Do not throw when tool parsing fails (#14012 )

2025-06-14 17:25:15 +01:00

chat-parser.h

llama-chat : Do not throw when tool parsing fails (#14012 )

2025-06-14 17:25:15 +01:00

chat.cpp

chat : fix multiple tool_calls on hermes-2-pro (#14962 )

2025-08-02 18:04:48 +08:00

chat.h

server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196 )

2025-06-29 20:02:53 +02:00

CMakeLists.txt

cmake : do not search for curl libraries by ourselves (#14613 )

2025-07-10 15:29:05 +03:00

common.cpp

llama : allow other bufts when overriding to CPU, add --no-repack option (#14990 )

2025-07-31 18:11:34 +02:00

common.h

imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076 )

2025-08-04 23:26:52 +02:00

console.cpp

console : utf-8 fix for windows stdin (#9690 )

2024-09-30 11:23:42 +03:00

console.h

…

json-partial.cpp

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

json-partial.h

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

json-schema-to-grammar.cpp

common : use std::string_view now that we target c++17 (#14319 )

2025-06-22 08:37:43 +03:00

json-schema-to-grammar.h

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

llguidance.cpp

llguidance : set tokenizer slices to default (#13424 )

2025-05-10 17:19:52 +02:00

log.cpp

Fix: Compile failure due to Microsoft STL breaking change (#11836 )

2025-02-12 21:36:11 +01:00

log.h

cleanup: fix compile warnings associated with gnu_printf (#11811 )

2025-02-12 10:06:53 -04:00

ngram-cache.cpp

ggml : portability fixes for VS 2017 (#12150 )

2025-03-04 18:53:26 +02:00

ngram-cache.h

llama : use LLAMA_TOKEN_NULL (#11062 )

2025-01-06 10:52:15 +02:00

regex-partial.cpp

common: add partial regex support (#12808 )

2025-05-14 19:50:57 +01:00

regex-partial.h

common: add partial regex support (#12808 )

2025-05-14 19:50:57 +01:00

sampling.cpp

server: streaming of tool calls and thoughts when --jinja is on (#12379 )

2025-05-25 01:48:08 +01:00

sampling.h

sampling : support for llguidance grammars (#10224 )

2025-02-02 09:55:32 +02:00

speculative.cpp

server : implement universal assisted decoding (#12635 )

2025-07-31 14:25:23 +02:00

speculative.h

server : implement universal assisted decoding (#12635 )

2025-07-31 14:25:23 +02:00