Xuan Son Nguyen
a363251fac
qwen2vl: use llama_batch_ext_set_pos
2025-03-14 11:25:36 +01:00
Xuan Son Nguyen
ba79369615
fix llama_batch_ext_init_from_embd
2025-03-14 11:17:22 +01:00
Xuan Son Nguyen
07d84fa3c2
fix missing n_past in various places
...
this is actually a revert of cda0e4b648
2025-03-14 10:47:08 +01:00
Xuan Son Nguyen
32940369d3
fix gemma3-cli
2025-03-14 10:33:28 +01:00
Xuan Son Nguyen
5e6a6d4e1c
fix llama-run n_past
2025-03-14 10:32:43 +01:00
Xuan Son Nguyen
04f8641815
rm redundant llama_batch_ext_set_output_last
2025-03-13 23:14:16 +01:00
Xuan Son Nguyen
c3dd79007b
fix llama_batch_ext_init_from_text
2025-03-13 23:09:27 +01:00
Xuan Son Nguyen
65f0184517
compile ok
2025-03-13 22:56:35 +01:00
Xuan Son Nguyen
47086fa82d
apply to the rest
2025-03-13 22:36:27 +01:00
Xuan Son Nguyen
4aabf4e8f4
return output ID from llama_batch_ext_add/set
2025-03-13 17:47:07 +01:00
Xuan Son Nguyen
17f954c8e2
Merge branch 'master' into xsn/private_batch_api
2025-03-13 15:55:18 +01:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d
server : fix crash when using verbose output with input tokens that are not in printable range ( #12178 ) ( #12338 )
...
* Fix DOS index bug
* Remove new APIs
* remove extra line
* Remove from API
* Add extra newline
* Update examples/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-03-13 11:10:05 +01:00
Daniel Bevenius
80a02aa858
llama.swiftui : fix xcframework dir in README [no ci] ( #12353 )
...
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
2025-03-11 09:20:16 +01:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option ( #12278 )
...
As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
2025-03-10 13:34:13 +02:00
Olivier Chafik
be421fc429
tool-call
: ensure there's always a non-empty tool call id (#12292 )
2025-03-10 09:45:29 +00:00
Olivier Chafik
2b3a25c212
sampler
: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 )
...
* Fix typo in lazy grammar handling (fixes trigger tokens)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-03-10 09:44:42 +00:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
2025-03-10 10:33:24 +02:00
Georgi Gerganov
7ab364390f
server : infill gen ends on new line ( #12254 )
2025-03-07 20:54:30 +02:00
Sigbjørn Skjæret
8fad3c7a7c
server : Log original chat template parsing error ( #12233 )
2025-03-07 11:15:33 +01:00
Aaron Teo
e9b2f84f14
llava: add big-endian conversion for image encoder ( #12218 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-03-06 09:33:21 +01:00
Han Yin
57b6abf85a
android : fix KV cache log message condition ( #12212 )
2025-03-06 08:22:49 +02:00
Olivier Chafik
669912d9a5
tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )
...
* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-03-05 13:05:13 +00:00
Clauszy
06a92a193a
server : fix cache reuse logic ( #12161 )
...
The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
2025-03-05 09:25:45 +02:00
Daniel Bevenius
a057897ad4
llama : add xcframework build script ( #11996 )
...
* llama : add xcframework build script
This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.
The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```
Refs: https://github.com/ggml-org/llama.cpp/issues/10747
* examples : remove llama.cpp (source dir ref) from project.pbxproj
This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.
* ci : updated build.yml to use build-xcframework.sh
* ci : add xcframework build to github releases
This commit adds the ability to create a GitHub release with the
xcframework build artifact.
* scripts : add apple app validation scripts
This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.
The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.
* llama : remove Package.swift
This commit removes the Package.swift file, as we are now building an
XCFramework for the project.
* llama : remove Sources and spm-headers directories
* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ( #12150 )
...
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com >
2025-03-04 18:53:26 +02:00
Sigbjørn Skjæret
56d7a9f812
main: allow preloading conversation with -p and add -st / --single-turn ( #12145 )
...
* Add chat template formatting to -no-cnv
* only enable prompt formatting if explicitly enabled
* add -st / --single-turn
* add --single-turn and -p in conversation mode
* fix -sys + -p
* reword warning
* small readability change and fix (long) outdated example usage
* only activate single turn in conversation mode
2025-03-04 12:19:39 -04:00
Olivier Chafik
1a24c4621f
server
: fix deadly typo in response_format.json_schema.schema handling (#12168 )
2025-03-04 08:24:07 +02:00
dm4
c43af9276b
tts: add speaker file support ( #12048 )
...
* tts: add speaker file support
Signed-off-by: dm4 <sunrisedm4@gmail.com >
* tts: handle outetts-0.3
* tts : add new line in error message
---------
Signed-off-by: dm4 <sunrisedm4@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-03-03 15:09:29 +02:00
Eric Curtin
c950a1f692
Adding UTF-8 support to llama.cpp ( #12111 )
...
For emojis, non-alpha characters, etc.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
2025-03-03 12:44:56 +00:00
Xuan-Son Nguyen
7b69003af7
webui : add ?m=... and ?q=... params ( #12148 )
...
* webui : add ?m=... and ?q=... params
* also clear prefilledMessage variable
* better approach
* fix comment
* test: bump timeout on GITHUB_ACTION
2025-03-03 11:42:45 +01:00
Sigbjørn Skjæret
14dec0c2f2
main: use jinja chat template system prompt by default ( #12118 )
...
* Use jinja chat template system prompt by default
* faster conditional order
* remove nested ternary
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-03-02 14:53:48 +01:00
Xuan Son Nguyen
46596caf6d
apply various in places
2025-03-01 20:42:18 +01:00
Xuan Son Nguyen
1d6ba97789
remove token_info API
2025-03-01 16:21:16 +01:00
Sigbjørn Skjæret
1782cdfed6
main: update outdated system prompt message (followup to #12131 ) ( #12132 )
...
* Update outdated message
* wording
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-03-01 15:22:27 +01:00
Xuan Son Nguyen
1170135dfb
llama_batch_ext_add_text
2025-03-01 14:00:14 +01:00
Sigbjørn Skjæret
45a8e76745
common : add --system-prompt parameter, replace behavior of -p in conversation mode ( #12131 )
...
* Add --system-prompt parameter
* use user defined system prompt
* clarify
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* add warning
* clarify
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-03-01 13:56:45 +01:00
Xuan Son Nguyen
9e75c49d35
Merge branch 'master' into xsn/private_batch_api
2025-03-01 12:13:03 +01:00
Vivian
2cc4a5e44a
webui : minor typo fixes ( #12116 )
...
* fix typos and improve menu text clarity
* rename variable trimedValue to trimmedValue
* add updated index.html.gz
* rebuild
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-03-01 11:15:09 +01:00
Alex Brooks
84d5f4bc19
Update granite vision docs for 3.2 model ( #12105 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-02-28 11:31:47 +00:00
Ting Lou
a800ae46da
llava : add struct for FFI bindgen ( #12079 )
...
* add struct for FFI bindgen
* Apply suggestions from code review
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-02-26 15:26:52 +01:00
Olivier Chafik
d7cfe1ffe0
docs: add docs/function-calling.md to lighten server/README.md's plight ( #12069 )
2025-02-25 18:52:56 +00:00
rhjdvsgsgks
401af80b54
server: handle echo=false on /v1/completions ( #12060 )
2025-02-25 12:52:52 +01:00
Olivier Chafik
0b52745649
server: support add_generation_prompt query param ( #12062 )
2025-02-25 10:40:22 +00:00
Alex Brooks
4d1051a40f
Add Doc for Converting Granite Vision -> GGUF ( #12006 )
...
* Add example docs for granite vision
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-02-25 10:46:05 +01:00
Alex Brooks
7a2c913e66
llava : Add Granite Vision Support ( #11794 )
...
* Add super wip scripts for multimodal granite gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add example for converting mmgranite to gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* remove hardcoded path
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add vision feature layer to gguf params
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Clean up llava surgery and remove name substitution hacks
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add transformers llava next tensor name mapping
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Make siglip / openclip mutuall exclusive
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix projector linear substitution
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix linear 2 substitution index
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Increase max flattened gridpoints to 64
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix hardcoded concat for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Pull vision feature layers out of gguf keys
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* fix num gridpoints and use all layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Avoid dropping last image encoder layer in llava models
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use 10 for max number of patches
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Standardize vision feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Cleanup logs
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Update comment for vision feature layer init
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Update notes for alternative to legacy llm conversion script
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix notes rendering
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add v prefix to vision feature layer log
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use current defaults for feature layer
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use constant for max gridpoints / feat layers, style fixes
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* clarify non-negative feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Remove CLIP_API from func signature
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Clarify feature layers are non negative ints and not uint
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix condition for reading feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* pop last llava layer when feature layers are unset
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix unset vision layer 0
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Update examples/llava/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Reenable assertion for out of bounds get_rows
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use std vector for gridpoints and feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Caculate max feature layer at load time
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Include base patch for granite vision allocation
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix trailing whitespace
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add max num patches = 10 back for minicpmv
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use unordered set to store feature layers
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use max feature layer for postnorm
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Apply suggestions from code review
---------
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-02-24 17:09:51 +01:00
Xuan Son Nguyen
a1b1dea33b
Merge branch 'master' into xsn/private_batch_api
2025-02-24 17:01:30 +01:00
Xuan Son Nguyen
4bf7ca3943
llama_decode_ext
2025-02-24 17:01:20 +01:00