Commit Graph

231 Commits

Author SHA1 Message Date
d30cb5a7fa sync : ggml
ggml-ci
2025-05-19 13:29:56 +03:00
be1d4a13db scripts : fix compare-llama-bench.py show parameter (#13514) 2025-05-14 08:41:01 +02:00
bf79371120 scripts : support arbitrary input file formats in compare-llama-bench.py (#13455) 2025-05-13 15:31:12 +02:00
1e2809bc4b sync : ggml 2025-05-13 14:02:28 +03:00
09232370fc scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451) 2025-05-11 16:20:39 +02:00
d879433824 sync : ggml
ggml-ci
2025-05-07 17:28:36 +03:00
1d36b3670b llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
b34443923c sync : ggml (#13268)
* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)

* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)

* review: remove src_x/y < 0 checks; add performance tests

* sync : ggml

ggml-ci

* vulkan : fix lint (#0)

---------

Co-authored-by: Acly <aclysia@gmail.com>
2025-05-02 20:54:30 +03:00
b1dd4d08e8 sync : ggml
ggml-ci
2025-05-01 20:15:34 +03:00
8d33d740c3 sync : ggml 2025-05-01 10:00:39 +03:00
19e899ce21 scripts: n_depth for compare-llama-bench [no ci] (#13201) 2025-04-29 23:32:04 +02:00
63b4911494 sync : ggml
ggml-ci
2025-04-24 17:32:47 +03:00
526739b879 sync : ggml
ggml-ci
2025-04-14 09:26:15 +03:00
47ba87d0a4 sync : ggml 2025-04-11 00:17:47 +03:00
eb420e1148 sync : ggml
ggml-ci
2025-04-11 00:17:47 +03:00
e4bf72d631 scripts : fix sync-ggml-am.sh 2025-04-11 00:17:47 +03:00
a4e46e28f9 sync : ggml
ggml-ci
2025-04-07 18:44:17 +03:00
0114a32da0 sync : ggml
ggml-ci
2025-03-31 15:07:32 +03:00
d3f1f0acfb sync : ggml
ggml-ci
2025-03-30 08:33:31 +03:00
029c693fdc sync : ggml
ggml-ci
2025-03-27 10:09:29 +02:00
771d84371c scripts : update sync + fix cmake merge
ggml-ci
2025-03-27 10:09:29 +02:00
df0665a483 sync : ggml
ggml-ci
2025-03-27 09:04:38 +02:00
102ac1891d sync : ggml
ggml-ci
2025-03-07 14:49:44 +02:00
669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
a057897ad4 llama : add xcframework build script (#11996)
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
dfd6b2c0be sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
3d1cf3cf33 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
8371d44595 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
aede2074f6 scripts : sync-ggml-am.sh fix 2025-03-03 18:18:11 +02:00
5137da7b8c scripts: corrected encoding when getting chat template (#11866) (#11907)
Signed-off-by: MoonRide303 <moonride303@gmail.com>
2025-02-18 10:30:16 +01:00
6dde178248 scripts: fix compare-llama-bench commit hash logic (#11891) 2025-02-15 20:23:22 +01:00
68ff663a04 repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
c7f460ab88 server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607)
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B

* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template

* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out

* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability

* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
0fb77f821f sync : ggml 2025-02-12 21:46:02 +02:00
8a59053f63 sync : ggml 2025-02-06 21:23:03 +02:00
7c9e0ca520 sync : ggml 2025-02-04 12:59:21 +02:00
8ec05832fa sync : ggml 2025-02-03 14:57:08 +02:00
8b576b6c55 Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
815857791d sync : ggml 2025-01-29 11:25:29 +02:00
6171c9d258 Add Jinja template support (#11016)
* Copy minja from 58f0ca6dd7

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to b8437df626

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
f26c874179 scripts : restore hf.sh (#11288)
ggml-ci
2025-01-18 13:18:32 +02:00
f11cfdfd7f ci : use -no-cnv in gguf-split tests (#11254)
* ci : use -no-cnv in gguf-split tests

ggml-ci

* ci : use -no-cnv in requantize tests

ggml-ci

* scripts : fix [no ci]
2025-01-15 18:28:35 +02:00
44d1e796d0 sync : ggml 2025-01-14 10:39:42 +02:00
a4f3f5d8e6 scripts : sync gguf (cont) 2025-01-14 09:40:52 +02:00
48e1ae0e61 scripts : sync gguf 2025-01-14 09:36:58 +02:00
d00a80e89d scripts : sync opencl 2025-01-14 09:19:58 +02:00
99a3755a3c sync : ggml 2025-01-08 13:40:30 +02:00
78c6785175 sync : ggml 2025-01-04 16:09:53 +02:00
2cd43f4900 ggml : more perfo with llamafile tinyblas on x86_64 (#10714)
* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test
2024-12-24 18:54:49 +01:00
5437d4aaf5 sync : ggml 2024-12-17 18:36:02 +02:00