f3a4b1659c
sync : ggml
...
ggml-ci
2025-06-01 13:43:57 +03:00
53f925074d
sync : vendor ( #13901 )
...
* sync : vendor
ggml-ci
* cont : fix httplib version
ggml-ci
* cont : fix lint
* cont : fix lint
* vendor : move to common folder /vendor
ggml-ci
* cont : fix lint
* cont : move httplib to /vendor + use json_fwd.hpp
ggml-ci
* cont : fix server build
ggml-ci
* cont : add missing headers
ggml-ci
* cont : header clean-up
ggml-ci
2025-05-30 16:25:45 +03:00
1c49c70d07
sync : ggml
2025-05-27 18:05:33 +03:00
a26c4cc11e
scripts : add option to compare commits in Debug ( #13806 )
...
* scripts : add option to compare commits in Debug
* cont : reuse existing CMAKE_OPTS
2025-05-26 22:24:01 +03:00
f5cd27b71d
server
: streaming of tool calls and thoughts when --jinja
is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com >
2025-05-25 01:48:08 +01:00
d30cb5a7fa
sync : ggml
...
ggml-ci
2025-05-19 13:29:56 +03:00
be1d4a13db
scripts : fix compare-llama-bench.py show parameter ( #13514 )
2025-05-14 08:41:01 +02:00
bf79371120
scripts : support arbitrary input file formats in compare-llama-bench.py ( #13455 )
2025-05-13 15:31:12 +02:00
1e2809bc4b
sync : ggml
2025-05-13 14:02:28 +03:00
09232370fc
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare ( #13451 )
2025-05-11 16:20:39 +02:00
d879433824
sync : ggml
...
ggml-ci
2025-05-07 17:28:36 +03:00
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00
b34443923c
sync : ggml ( #13268 )
...
* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)
* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)
* review: remove src_x/y < 0 checks; add performance tests
* sync : ggml
ggml-ci
* vulkan : fix lint (#0 )
---------
Co-authored-by: Acly <aclysia@gmail.com >
2025-05-02 20:54:30 +03:00
b1dd4d08e8
sync : ggml
...
ggml-ci
2025-05-01 20:15:34 +03:00
8d33d740c3
sync : ggml
2025-05-01 10:00:39 +03:00
19e899ce21
scripts: n_depth for compare-llama-bench [no ci] ( #13201 )
2025-04-29 23:32:04 +02:00
63b4911494
sync : ggml
...
ggml-ci
2025-04-24 17:32:47 +03:00
526739b879
sync : ggml
...
ggml-ci
2025-04-14 09:26:15 +03:00
47ba87d0a4
sync : ggml
2025-04-11 00:17:47 +03:00
eb420e1148
sync : ggml
...
ggml-ci
2025-04-11 00:17:47 +03:00
e4bf72d631
scripts : fix sync-ggml-am.sh
2025-04-11 00:17:47 +03:00
a4e46e28f9
sync : ggml
...
ggml-ci
2025-04-07 18:44:17 +03:00
0114a32da0
sync : ggml
...
ggml-ci
2025-03-31 15:07:32 +03:00
d3f1f0acfb
sync : ggml
...
ggml-ci
2025-03-30 08:33:31 +03:00
029c693fdc
sync : ggml
...
ggml-ci
2025-03-27 10:09:29 +02:00
771d84371c
scripts : update sync + fix cmake merge
...
ggml-ci
2025-03-27 10:09:29 +02:00
df0665a483
sync : ggml
...
ggml-ci
2025-03-27 09:04:38 +02:00
102ac1891d
sync : ggml
...
ggml-ci
2025-03-07 14:49:44 +02:00
669912d9a5
tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )
...
* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-03-05 13:05:13 +00:00
a057897ad4
llama : add xcframework build script ( #11996 )
...
* llama : add xcframework build script
This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.
The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```
Refs: https://github.com/ggml-org/llama.cpp/issues/10747
* examples : remove llama.cpp (source dir ref) from project.pbxproj
This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.
* ci : updated build.yml to use build-xcframework.sh
* ci : add xcframework build to github releases
This commit adds the ability to create a GitHub release with the
xcframework build artifact.
* scripts : add apple app validation scripts
This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.
The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.
* llama : remove Package.swift
This commit removes the Package.swift file, as we are now building an
XCFramework for the project.
* llama : remove Sources and spm-headers directories
* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
dfd6b2c0be
sync : ggml
...
ggml-ci
2025-03-03 18:18:11 +02:00
3d1cf3cf33
sync : ggml
...
ggml-ci
2025-03-03 18:18:11 +02:00
8371d44595
sync : ggml
...
ggml-ci
2025-03-03 18:18:11 +02:00
aede2074f6
scripts : sync-ggml-am.sh fix
2025-03-03 18:18:11 +02:00
5137da7b8c
scripts: corrected encoding when getting chat template ( #11866 ) ( #11907 )
...
Signed-off-by: MoonRide303 <moonride303@gmail.com >
2025-02-18 10:30:16 +01:00
6dde178248
scripts: fix compare-llama-bench commit hash logic ( #11891 )
2025-02-15 20:23:22 +01:00
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
c7f460ab88
server
: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none
(#11607 )
...
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-13 10:05:16 +00:00
0fb77f821f
sync : ggml
2025-02-12 21:46:02 +02:00
8a59053f63
sync : ggml
2025-02-06 21:23:03 +02:00
7c9e0ca520
sync : ggml
2025-02-04 12:59:21 +02:00
8ec05832fa
sync : ggml
2025-02-03 14:57:08 +02:00
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-01-30 19:13:58 +00:00
815857791d
sync : ggml
2025-01-29 11:25:29 +02:00
6171c9d258
Add Jinja template support ( #11016 )
...
* Copy minja from 58f0ca6dd7
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626
* Update minja to https://github.com/google/minja/pull/25
* Update minja from https://github.com/google/minja/pull/27
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-01-21 13:18:51 +00:00
f26c874179
scripts : restore hf.sh ( #11288 )
...
ggml-ci
2025-01-18 13:18:32 +02:00
f11cfdfd7f
ci : use -no-cnv in gguf-split tests ( #11254 )
...
* ci : use -no-cnv in gguf-split tests
ggml-ci
* ci : use -no-cnv in requantize tests
ggml-ci
* scripts : fix [no ci]
2025-01-15 18:28:35 +02:00
44d1e796d0
sync : ggml
2025-01-14 10:39:42 +02:00
a4f3f5d8e6
scripts : sync gguf (cont)
2025-01-14 09:40:52 +02:00
48e1ae0e61
scripts : sync gguf
2025-01-14 09:36:58 +02:00