Commit Graph

1411 Commits

Author SHA1 Message Date
Xuan Son Nguyen
a363251fac qwen2vl: use llama_batch_ext_set_pos 2025-03-14 11:25:36 +01:00
Xuan Son Nguyen
ba79369615 fix llama_batch_ext_init_from_embd 2025-03-14 11:17:22 +01:00
Xuan Son Nguyen
07d84fa3c2 fix missing n_past in various places
this is actually a revert of cda0e4b648
2025-03-14 10:47:08 +01:00
Xuan Son Nguyen
32940369d3 fix gemma3-cli 2025-03-14 10:33:28 +01:00
Xuan Son Nguyen
5e6a6d4e1c fix llama-run n_past 2025-03-14 10:32:43 +01:00
Xuan Son Nguyen
04f8641815 rm redundant llama_batch_ext_set_output_last 2025-03-13 23:14:16 +01:00
Xuan Son Nguyen
c3dd79007b fix llama_batch_ext_init_from_text 2025-03-13 23:09:27 +01:00
Xuan Son Nguyen
65f0184517 compile ok 2025-03-13 22:56:35 +01:00
Xuan Son Nguyen
47086fa82d apply to the rest 2025-03-13 22:36:27 +01:00
Xuan Son Nguyen
4aabf4e8f4 return output ID from llama_batch_ext_add/set 2025-03-13 17:47:07 +01:00
Xuan Son Nguyen
17f954c8e2 Merge branch 'master' into xsn/private_batch_api 2025-03-13 15:55:18 +01:00
Georgi Gerganov
e0dbec0bc6 llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338)
* Fix DOS index bug

* Remove new APIs

* remove extra line

* Remove from API

* Add extra newline

* Update examples/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-13 11:10:05 +01:00
Daniel Bevenius
80a02aa858 llama.swiftui : fix xcframework dir in README [no ci] (#12353)
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Xuan-Son Nguyen
7841fc723e llama : Add Gemma 3 support (+ experimental vision capability) (#12343)
* llama : Add Gemma 3 text-only support

* fix python coding style

* fix compile on ubuntu

* python: fix style

* fix ubuntu compile

* fix build on ubuntu (again)

* fix ubuntu build, finally

* clip : Experimental support for Gemma 3 vision (#12344)

* clip : Experimental support for Gemma 3 vision

* fix build

* PRId64
2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen
96e1280839 clip : bring back GPU support (#12322)
* clip : bring back GPU support

* use n_gpu_layers param

* fix double free

* ggml_backend_init_by_type

* clean up
2025-03-11 09:20:16 +01:00
marcoStocchi
6ef79a67ca common : refactor '-o' option (#12278)
As discussed in PR 'llama-tts : add -o option' (#12042):

* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.

* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
2025-03-10 13:34:13 +02:00
Olivier Chafik
be421fc429 tool-call: ensure there's always a non-empty tool call id (#12292) 2025-03-10 09:45:29 +00:00
Olivier Chafik
2b3a25c212 sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291)
* Fix typo in lazy grammar handling (fixes trigger tokens)

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-10 09:44:42 +00:00
tc-mb
8352cdc87b llava : fix bug in minicpm-v code (#11513)
* fix bug in minicpm-v code

* update readme of minicpm-v
2025-03-10 10:33:24 +02:00
Georgi Gerganov
7ab364390f server : infill gen ends on new line (#12254) 2025-03-07 20:54:30 +02:00
Sigbjørn Skjæret
8fad3c7a7c server : Log original chat template parsing error (#12233) 2025-03-07 11:15:33 +01:00
Aaron Teo
e9b2f84f14 llava: add big-endian conversion for image encoder (#12218)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-03-06 09:33:21 +01:00
Han Yin
57b6abf85a android : fix KV cache log message condition (#12212) 2025-03-06 08:22:49 +02:00
Olivier Chafik
669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
Clauszy
06a92a193a server : fix cache reuse logic (#12161)
The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
2025-03-05 09:25:45 +02:00
Daniel Bevenius
a057897ad4 llama : add xcframework build script (#11996)
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
mgroeber9110
5bbe6a9fe9 ggml : portability fixes for VS 2017 (#12150)
* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
2025-03-04 18:53:26 +02:00
Sigbjørn Skjæret
56d7a9f812 main: allow preloading conversation with -p and add -st / --single-turn (#12145)
* Add chat template formatting to -no-cnv

* only enable prompt formatting if explicitly enabled

* add -st / --single-turn

* add --single-turn and -p in conversation mode

* fix -sys + -p

* reword warning

* small readability change and fix (long) outdated example usage

* only activate single turn in conversation mode
2025-03-04 12:19:39 -04:00
Olivier Chafik
1a24c4621f server: fix deadly typo in response_format.json_schema.schema handling (#12168) 2025-03-04 08:24:07 +02:00
dm4
c43af9276b tts: add speaker file support (#12048)
* tts: add speaker file support

Signed-off-by: dm4 <sunrisedm4@gmail.com>

* tts: handle outetts-0.3

* tts : add new line in error message

---------

Signed-off-by: dm4 <sunrisedm4@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-03 15:09:29 +02:00
Eric Curtin
c950a1f692 Adding UTF-8 support to llama.cpp (#12111)
For emojis, non-alpha characters, etc.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-03-03 12:44:56 +00:00
Xuan-Son Nguyen
7b69003af7 webui : add ?m=... and ?q=... params (#12148)
* webui : add ?m=... and ?q=... params

* also clear prefilledMessage variable

* better approach

* fix comment

* test: bump timeout on GITHUB_ACTION
2025-03-03 11:42:45 +01:00
Sigbjørn Skjæret
14dec0c2f2 main: use jinja chat template system prompt by default (#12118)
* Use jinja chat template system prompt by default

* faster conditional order

* remove nested ternary

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-03-02 14:53:48 +01:00
Xuan Son Nguyen
46596caf6d apply various in places 2025-03-01 20:42:18 +01:00
Xuan Son Nguyen
1d6ba97789 remove token_info API 2025-03-01 16:21:16 +01:00
Sigbjørn Skjæret
1782cdfed6 main: update outdated system prompt message (followup to #12131) (#12132)
* Update outdated message

* wording

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-01 15:22:27 +01:00
Xuan Son Nguyen
1170135dfb llama_batch_ext_add_text 2025-03-01 14:00:14 +01:00
Sigbjørn Skjæret
45a8e76745 common : add --system-prompt parameter, replace behavior of -p in conversation mode (#12131)
* Add --system-prompt parameter

* use user defined system prompt

* clarify

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* add warning

* clarify

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-01 13:56:45 +01:00
Xuan Son Nguyen
9e75c49d35 Merge branch 'master' into xsn/private_batch_api 2025-03-01 12:13:03 +01:00
Vivian
2cc4a5e44a webui : minor typo fixes (#12116)
* fix typos and improve menu text clarity

* rename variable trimedValue to trimmedValue

* add updated index.html.gz

* rebuild

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-03-01 11:15:09 +01:00
Alex Brooks
84d5f4bc19 Update granite vision docs for 3.2 model (#12105)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-28 11:31:47 +00:00
Ting Lou
a800ae46da llava : add struct for FFI bindgen (#12079)
* add struct for FFI bindgen

* Apply suggestions from code review

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-26 15:26:52 +01:00
Olivier Chafik
d7cfe1ffe0 docs: add docs/function-calling.md to lighten server/README.md's plight (#12069) 2025-02-25 18:52:56 +00:00
rhjdvsgsgks
401af80b54 server: handle echo=false on /v1/completions (#12060) 2025-02-25 12:52:52 +01:00
Olivier Chafik
0b52745649 server: support add_generation_prompt query param (#12062) 2025-02-25 10:40:22 +00:00
Alex Brooks
4d1051a40f Add Doc for Converting Granite Vision -> GGUF (#12006)
* Add example docs for granite vision

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-25 10:46:05 +01:00
Alex Brooks
7a2c913e66 llava : Add Granite Vision Support (#11794)
* Add super wip scripts for multimodal granite gguf

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add example for converting mmgranite to gguf

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* remove hardcoded path

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add vision feature layer to gguf params

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Clean up llava surgery and remove name substitution hacks

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add transformers llava next tensor name mapping

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Make siglip / openclip mutuall exclusive

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix projector linear substitution

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix linear 2 substitution index

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Increase max flattened gridpoints to 64

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix hardcoded concat for multiple feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Pull vision feature layers out of gguf keys

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* fix num gridpoints and use all layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Avoid dropping last image encoder layer in llava models

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use 10 for max number of patches

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Standardize vision feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Cleanup logs

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Update comment for vision feature layer init

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Update notes for alternative to legacy llm conversion script

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix notes rendering

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add v prefix to vision feature layer log

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use current defaults for feature layer

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use constant for max gridpoints / feat layers, style fixes

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* clarify non-negative feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Remove CLIP_API from func signature

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Clarify feature layers are non negative ints and not uint

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix condition for reading feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* pop last llava layer when feature layers are unset

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix unset vision layer 0

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Update examples/llava/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Reenable assertion for out of bounds get_rows

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use std vector for gridpoints and feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Caculate max feature layer at load time

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Include base patch for granite vision allocation

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix trailing whitespace

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add max num patches = 10 back for minicpmv

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use unordered set to store feature layers

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use max feature layer for postnorm

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Apply suggestions from code review

---------

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-24 17:09:51 +01:00
Xuan Son Nguyen
a1b1dea33b Merge branch 'master' into xsn/private_batch_api 2025-02-24 17:01:30 +01:00
Xuan Son Nguyen
4bf7ca3943 llama_decode_ext 2025-02-24 17:01:20 +01:00