Commit Graph

243 Commits

Author SHA1 Message Date
f61c05d4b1 sync : ggml
ggml-ci
2025-07-01 11:06:39 +03:00
e9b6350e61 scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
06cbedfca1 sync : ggml
ggml-ci
2025-06-20 21:02:47 +03:00
d03172cc79 sync : ggml
ggml-ci
2025-06-18 09:59:21 +03:00
2e42be42bd compare-llama-bench: add option to plot (#14169)
* compare llama-bench: add option to plot

* Address review comments: convert case + add type hints

* Add matplotlib to requirements

* fix tests

* Improve comment and fix assert condition for test

* Add back default test_name, add --plot_log_scale

* use log_scale regardless of x_values
2025-06-14 10:34:20 +02:00
ae92c1855b sync : ggml
ggml-ci
2025-06-10 18:39:33 +03:00
b8e2194efc sync : ggml
ggml-ci
2025-06-10 09:21:56 +03:00
f3a4b1659c sync : ggml
ggml-ci
2025-06-01 13:43:57 +03:00
53f925074d sync : vendor (#13901)
* sync : vendor

ggml-ci

* cont : fix httplib version

ggml-ci

* cont : fix lint

* cont : fix lint

* vendor : move to common folder /vendor

ggml-ci

* cont : fix lint

* cont : move httplib to /vendor + use json_fwd.hpp

ggml-ci

* cont : fix server build

ggml-ci

* cont : add missing headers

ggml-ci

* cont : header clean-up

ggml-ci
2025-05-30 16:25:45 +03:00
1c49c70d07 sync : ggml 2025-05-27 18:05:33 +03:00
a26c4cc11e scripts : add option to compare commits in Debug (#13806)
* scripts : add option to compare commits in Debug

* cont : reuse existing CMAKE_OPTS
2025-05-26 22:24:01 +03:00
f5cd27b71d server: streaming of tool calls and thoughts when --jinja is on (#12379)
* add common_json w/ support for truncated json healing

* add common_chat_msg_diff

* partial common_chat_parse

* refactor parser w/ optionals

* server: wire chat diffs in stream mode

* fix trigger of thinking models (must happen after thoughts are closed)

* fix functionary v3.2 raw python!

* rename: common_chat_syntax (now contains format)

* rm common_regex.at_start

* don't return empty <think></think>

* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)

* fix QwQ 32B tool call parsing after thoughts (hermes2)

* better logs for grammar triggers

* consume spaces after parse_json_tool_calls

* fix required tool calls w/ thinking models that have pre-opened thinking tags

* fix thinking model's initial trigger + test qwq's template

* run most test_tool_call tests in stream + non-stream modes

* make functionary v3.2 parsing more strict (differentiate first match from others)

* send final diff from server, to close off raw python arguments

* support partial content streaming in Generic mode

* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)

* Update function-calling.md

* Update tool_bench.py

* chat-parser: remove input from exception (llm output may contain PII)

---------

Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>
2025-05-25 01:48:08 +01:00
d30cb5a7fa sync : ggml
ggml-ci
2025-05-19 13:29:56 +03:00
be1d4a13db scripts : fix compare-llama-bench.py show parameter (#13514) 2025-05-14 08:41:01 +02:00
bf79371120 scripts : support arbitrary input file formats in compare-llama-bench.py (#13455) 2025-05-13 15:31:12 +02:00
1e2809bc4b sync : ggml 2025-05-13 14:02:28 +03:00
09232370fc scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451) 2025-05-11 16:20:39 +02:00
d879433824 sync : ggml
ggml-ci
2025-05-07 17:28:36 +03:00
1d36b3670b llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
b34443923c sync : ggml (#13268)
* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)

* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)

* review: remove src_x/y < 0 checks; add performance tests

* sync : ggml

ggml-ci

* vulkan : fix lint (#0)

---------

Co-authored-by: Acly <aclysia@gmail.com>
2025-05-02 20:54:30 +03:00
b1dd4d08e8 sync : ggml
ggml-ci
2025-05-01 20:15:34 +03:00
8d33d740c3 sync : ggml 2025-05-01 10:00:39 +03:00
19e899ce21 scripts: n_depth for compare-llama-bench [no ci] (#13201) 2025-04-29 23:32:04 +02:00
63b4911494 sync : ggml
ggml-ci
2025-04-24 17:32:47 +03:00
526739b879 sync : ggml
ggml-ci
2025-04-14 09:26:15 +03:00
47ba87d0a4 sync : ggml 2025-04-11 00:17:47 +03:00
eb420e1148 sync : ggml
ggml-ci
2025-04-11 00:17:47 +03:00
e4bf72d631 scripts : fix sync-ggml-am.sh 2025-04-11 00:17:47 +03:00
a4e46e28f9 sync : ggml
ggml-ci
2025-04-07 18:44:17 +03:00
0114a32da0 sync : ggml
ggml-ci
2025-03-31 15:07:32 +03:00
d3f1f0acfb sync : ggml
ggml-ci
2025-03-30 08:33:31 +03:00
029c693fdc sync : ggml
ggml-ci
2025-03-27 10:09:29 +02:00
771d84371c scripts : update sync + fix cmake merge
ggml-ci
2025-03-27 10:09:29 +02:00
df0665a483 sync : ggml
ggml-ci
2025-03-27 09:04:38 +02:00
102ac1891d sync : ggml
ggml-ci
2025-03-07 14:49:44 +02:00
669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
a057897ad4 llama : add xcframework build script (#11996)
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
dfd6b2c0be sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
3d1cf3cf33 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
8371d44595 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
aede2074f6 scripts : sync-ggml-am.sh fix 2025-03-03 18:18:11 +02:00
5137da7b8c scripts: corrected encoding when getting chat template (#11866) (#11907)
Signed-off-by: MoonRide303 <moonride303@gmail.com>
2025-02-18 10:30:16 +01:00
6dde178248 scripts: fix compare-llama-bench commit hash logic (#11891) 2025-02-15 20:23:22 +01:00
68ff663a04 repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
c7f460ab88 server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607)
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B

* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template

* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out

* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability

* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
0fb77f821f sync : ggml 2025-02-12 21:46:02 +02:00
8a59053f63 sync : ggml 2025-02-06 21:23:03 +02:00
7c9e0ca520 sync : ggml 2025-02-04 12:59:21 +02:00
8ec05832fa sync : ggml 2025-02-03 14:57:08 +02:00
8b576b6c55 Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00