`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-26 19:55:04 +00:00

* add common_json w/ support for truncated json healing

* add common_chat_msg_diff

* partial common_chat_parse

* refactor parser w/ optionals

* server: wire chat diffs in stream mode

* fix trigger of thinking models (must happen after thoughts are closed)

* fix functionary v3.2 raw python!

* rename: common_chat_syntax (now contains format)

* rm common_regex.at_start

* don't return empty <think></think>

* accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`)

* fix QwQ 32B tool call parsing after thoughts (hermes2)

* better logs for grammar triggers

* consume spaces after parse_json_tool_calls

* fix required tool calls w/ thinking models that have pre-opened thinking tags

* fix thinking model's initial trigger + test qwq's template

* run most test_tool_call tests in stream + non-stream modes

* make functionary v3.2 parsing more strict (differentiate first match from others)

* send final diff from server, to close off raw python arguments

* support partial content streaming in Generic mode

* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)

* Update function-calling.md

* Update tool_bench.py

* chat-parser: remove input from exception (llm output may contain PII)

---------

Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>

This commit is contained in:

Olivier Chafik

2025-05-25 01:48:08 +01:00

committed by

GitHub

parent a2d02d5793

commit f5cd27b71d

23 changed files with 3245 additions and 1091 deletions

									
										4

common/CMakeLists.txt
									
												View File
												
				@ -60,12 +60,16 @@ add_library(${TARGET} STATIC

				    base64.hpp

				    chat.cpp

				    chat.h

				    chat-parser.cpp

				    chat-parser.h

				    common.cpp

				    common.h

				    console.cpp

				    console.h

				    json-schema-to-grammar.cpp

				    json.hpp

				    json-partial.h

				    json-partial.cpp

				    llguidance.cpp

				    log.cpp

				    log.h

server: streaming of tool calls and thoughts when --jinja is on (#12379)

4 common/CMakeLists.txt Unescape Escape View File

`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)

4

common/CMakeLists.txt

View File