4a4f426944
model : add Kimi-K2 support ( #14654 )
...
* Kimi-K2 conversion
* add Kimi_K2 pre type
* Kimi-K2
* Kimi-K2 unicode
* Kimi-K2
* LLAMA_MAX_EXPERTS 384
* fix vocab iteration
* regex space fix
* add kimi-k2 to pre_computed_hashes
* Updated with kimi-k2 get_vocab_base_pre hash
* fix whitespaces
* fix flake errors
* remove more unicode.cpp whitespaces
* change set_vocab() flow
* add moonshotai-Kimi-K2.jinja to /models/templates/
* update moonshotai-Kimi-K2.jinja
* add kimi-k2 chat template
* add kimi-k2
* update NotImplementedError
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* except Exception
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-15 21:54:22 +02:00
0d9226763c
llama : add jinja template for rwkv-world ( #14665 )
...
* llama : add jinja template for rwkv-world
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-14 07:43:43 +08:00
901e20bbe5
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja ( #14349 )
...
This will allow the use of tools on the llama-server
2025-06-24 09:17:58 +03:00
e121edc432
server
: add --reasoning-budget 0
to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 )
...
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-05-26 00:30:51 +01:00
f5cd27b71d
server
: streaming of tool calls and thoughts when --jinja
is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com >
2025-05-25 01:48:08 +01:00
669912d9a5
tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )
...
* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-03-05 13:05:13 +00:00
c7f460ab88
server
: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none
(#11607 )
...
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-13 10:05:16 +00:00
bfcce4d693
tool-call
: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 )
...
* `tool-call`: support Command R7B (w/ tool_plan return)
* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override
* `tool-call`: test cleanup / handle lazy grammar triggers
2025-02-02 09:25:38 +00:00
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-01-30 19:13:58 +00:00