Georgi Gerganov
ed3290ab34
metal : add mean kernel ( #14267 )
...
* metal : add mean kernel
ggml-ci
* cont : dedup implementation
ggml-ci
b5701
2025-06-19 08:05:21 +03:00
Aaron Teo
8d94713654
docs: add s390x build documentation ( #14264 )
...
* docs: add s390x-specific build docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add s390x model conversion steps
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x build indent
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update hyperlinks for s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update llama.h docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x add accelerator and perf optimizations
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x indent blocks
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: revert block indentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add support information for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: remove indentation for accelerator section s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: remove redundant words s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: reword for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x reword simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: fix trailing whitespace for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:10:26 +01:00
Aaron Teo
50d2227953
ggml-cpu: reduce asm calls for hsum ( #14037 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
b5699
2025-06-18 18:10:08 +01:00
Aaron Teo
6231c5cd6d
ggml-cpu: fix uncaught underscore terminators ( #14023 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
b5698
2025-06-18 18:06:49 +01:00
Charles Xu
ef035803eb
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS ( #14258 )
b5697
2025-06-18 12:40:07 +01:00
Xuan-Son Nguyen
413977de32
mtmd : refactor llava-uhd preprocessing logic ( #14247 )
...
* mtmd : refactor llava-uhd preprocessing logic
* fix editorconfig
b5696
2025-06-18 10:43:57 +02:00
Xuan-Son Nguyen
95402553a5
llama-chat : fix multiple system message for gemma, orion ( #14246 )
b5695
2025-06-18 09:58:43 +02:00
Sigbjørn Skjæret
3865cff4f5
convert : fix null head_dim AutoConfig regression ( #14248 )
2025-06-18 09:52:07 +02:00
Georgi Gerganov
d03172cc79
sync : ggml
...
ggml-ci
b5693
2025-06-18 09:59:21 +03:00
Daniel Bevenius
dd8e59f443
ggml : disable warnings for tests when using MSVC (ggml/1273)
...
* ggml : disable warnings for tests when using MSVC
This commit disables warnings for tests on windows when using MSVC.
The motivation for this is that this brings the build output more
inline with what Linux/MacOS systems produce.
There is still one warning generated for the tests which is:
```console
Building Custom Rule C:/ggml/tests/CMakeLists.txt
cl : command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG'
[C:\ggml\build\tests\test-arange.vcxproj]
test-arange.cpp
test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe
```
* ggml : fix typo in tests disable list
2025-06-18 09:59:21 +03:00
Daniel Bevenius
bbe98d2784
ggml : remove unused ggml_context_container (ggml/1272)
...
This commit removes the unused `ggml_context_container` structure from
the ggml library. It looks like the usage of this struct was removed in
Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc
ggml_contexts on the heap (whisper/2525)").
The motivation for this changes is to improve code clarity/readability.
2025-06-18 09:59:21 +03:00
Daniel Bevenius
c2056ed6d4
examples : include examples in msvc disable warn (ggml/1270)
...
This commit adds the examples in the "list" of targets to ignore MSVC
warnings.
The motivation for this is that currently the examples generate a number
of warnings that are ignore/disabled for the core ggml project. This
makes for a cleaner output when building.
2025-06-18 09:59:21 +03:00
bandoti
c46503014d
cmake: remove shader-gen step-targets from ggml-vulkan ( #14226 )
...
* Remove step-targets from vulkan-shaders-gen
* Unset DESTDIR when building vulkan-shaders-gen
b5689
2025-06-17 22:33:25 +02:00
xctan
860a9e4eef
ggml-cpu : remove the weak alias trick ( #14221 )
b5688
2025-06-17 12:58:32 +03:00
R0CKSTAR
fe9d60e74a
musa: fix build warning (unused variable) ( #14231 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b5687
2025-06-17 17:48:08 +08:00
Sigbjørn Skjæret
e434e69183
common : suggest --jinja when autodetection fails ( #14222 )
b5686
2025-06-16 21:58:42 +02:00
Georgi Gerganov
89fea80d29
server : fix incorrect usage of llama_get_embeddings() ( #14225 )
...
* server : fix incorrect usage of llama_get_embeddings()
ggml-ci
* cont : fix the fix
ggml-ci
b5685
2025-06-16 22:33:27 +03:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b5684
2025-06-16 08:11:43 -07:00
bandoti
0dbcabde8c
cmake: clean up external project logic for vulkan-shaders-gen ( #14179 )
...
* Remove install step for vulkan-shaders-gen
* Add install step to normalize msvc with make
* Regenerate modified shaders at build-time
b5683
2025-06-16 10:32:13 -03:00
Đinh Trọng Huy
ad590be98c
model : add NeoBERT ( #14164 )
...
* convert neobert model to gguf
* add inference graph
* fix flake8 lint
* followed reviewer suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* follow reviewers suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* override NeoBERT feed-forward length
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b5682
2025-06-16 14:53:41 +02:00
uvos
7d6d91babf
HIP: disable rocwmma on gfx12 by default until rocm 7.0 ( #14202 )
b5681
2025-06-16 13:47:38 +02:00
Georgi Gerganov
d3e64b9f49
llama : rework embeddings logic ( #14208 )
...
* llama : rework embeddings logic
ggml-ci
* cont : fix rerank
ggml-ci
* cont : engrish [no ci]
* cont : fix rerank
ggml-ci
* server : support both embeddings and completions with single model
ggml-ci
* cont : avoid embeddings_org
ggml-ci
2025-06-16 14:14:00 +03:00
Charles Xu
3ba0d843c6
ggml: Add Android support for GGML_CPU_ALL_VARIANTS ( #14206 )
b5679
2025-06-16 11:47:57 +02:00
Bartowski
0bf49eb668
convert : remove arcee change in convert_hf_to_gguf_update.py ( #14207 )
2025-06-16 10:16:06 +02:00
Đinh Trọng Huy
4ad243677b
gguf-py : allow key override when adding value to GGUFWriter ( #14194 )
...
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
2025-06-16 09:20:59 +02:00
Jeff Bolz
c89c2d1ab9
vulkan: mutex around vkQueueSubmit ( #14127 )
...
This fixes the remaining crash in test-thread-safety on my system.
b5676
2025-06-16 08:21:08 +02:00
xctan
3555b3004b
ggml-cpu : rework weak alias on apple targets ( #14146 )
...
* ggml-cpu : rework weak alias on apple targets
* fix powerpc detection
* fix ppc detection
* fix powerpc detection on darwin
b5675
2025-06-16 13:54:15 +08:00
Bartowski
d7da8dc83a
model : Add support for Arcee AI's upcoming AFM model ( #14185 )
...
* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
b5674
2025-06-16 01:04:06 +02:00
Eric Curtin
cd355eda7d
server : When listening on a unix domain socket don't print http:// and port ( #14180 )
...
Instead show something like this:
main: server is listening on file.sock - starting the main loop
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b5673
2025-06-15 23:36:22 +02:00
Ed Addario
30e5b01de2
quantize : change int to unsigned int for KV overrides ( #14197 )
b5672
2025-06-15 18:53:45 +02:00
uvos
e54b394082
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 ( #14196 )
b5671
2025-06-15 17:30:13 +02:00
uvos
2c2caa4443
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ ( #14183 )
b5670
2025-06-15 15:45:27 +02:00
Georgi Gerganov
5fce5f948d
kv-cache : fix use-after-move of defrag info ( #14189 )
...
ggml-ci
b5669
2025-06-15 10:52:11 +03:00
Mikko Juola
9ae4143bc6
model : add dots.llm1 architecture support ( #14044 ) ( #14118 )
...
Adds:
* Dots1Model to convert_hf_to_gguf.py
* Computation graph code to llama-model.cpp
* Chat template to llama-chat.cpp to detect this model's template.
---
The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.
The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:
* https://huggingface.co/rednote-hilab/dots.llm1.inst
* https://huggingface.co/rednote-hilab/dots.llm1.base
The model architecture is a combination of Qwen and Deepseek parts, as
seen here:
ffe12627b4/src/transformers/models/dots1/modular_dots1.py
b5668
2025-06-15 09:52:06 +02:00
Georgi Gerganov
c311ac664d
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ ( #14188 )
...
ggml-ci
b5667
2025-06-15 10:08:58 +03:00
Georgi Gerganov
b9912ac570
batch : auto-gen positions + verify multi-sequence input ( #14177 )
...
* batch : verify multi-sequence input batches
ggml-ci
* cont : auto-gen positions + verify multi-seq input
ggml-ci
* cont : first print debug info, then perform validation
ggml-ci
* cont : fix position auto-gen + add comments
ggml-ci
b5666
2025-06-15 09:18:37 +03:00
Pepijn de Vos
00ba772610
docs : remove WIP since PR has been merged ( #13912 )
2025-06-15 08:06:37 +02:00
Piotr
3cb203c89f
llama-chat : Do not throw when tool parsing fails ( #14012 )
...
Currently when a model generates output which looks like a tool call,
but is invalid an exception is thrown and not handled, causing the cli
or llama-server to bail. Instead, handle the chat parser exception and
simply return the generated text in such cases.
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
b5664
2025-06-14 17:25:15 +01:00
Aman Gupta
2e42be42bd
compare-llama-bench: add option to plot ( #14169 )
...
* compare llama-bench: add option to plot
* Address review comments: convert case + add type hints
* Add matplotlib to requirements
* fix tests
* Improve comment and fix assert condition for test
* Add back default test_name, add --plot_log_scale
* use log_scale regardless of x_values
2025-06-14 10:34:20 +02:00
Georgi Gerganov
fb85a288d7
vocab : fix build ( #14175 )
...
ggml-ci
b5662
2025-06-13 20:03:05 +03:00
Svetlozar Georgiev
40643edb86
sycl: fix docker image ( #14144 )
2025-06-13 18:32:56 +02:00
Guy Goldenberg
3cfbbdb44e
Merge commit from fork
...
* vocab : prevent integer overflow during load
* Add static cast and GGML_ABORT
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-13 19:20:25 +03:00
Georgi Gerganov
80709b70a2
batch : add LLAMA_BATCH_DEBUG environment variable ( #14172 )
...
* batch : add LLAMA_BATCH_DEBUG environment variable
ggml-ci
* cont : improve seq_id display
b5659
2025-06-13 18:35:00 +03:00
ddpasa
26ff3685bf
docs : Update multimodal.md ( #14122 )
...
* Update multimodal.md
* Update multimodal.md
2025-06-13 15:17:53 +02:00
Georgi Gerganov
60c666347b
batch : rework llama_batch_allocr ( #14153 )
...
* batch : rework llama_batch_allocr
ggml-ci
* cont : move validation inside class
ggml-ci
* cont : move output counting to class
ggml-ci
* cont : minor
ggml-ci
* batch : add TODOs
ggml-ci
b5657
2025-06-13 13:47:55 +03:00
Georgi Gerganov
b7cc7745e3
readme : remove survey link ( #14168 )
2025-06-13 11:55:44 +03:00
Christian Kastner
cc8d081879
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT ( #14167 )
...
* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT
* cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*
b5655
2025-06-13 10:38:52 +02:00
Đinh Trọng Huy
d714dadb57
pooling : make cls_b and cls_out_b optional ( #14165 )
...
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
b5654
2025-06-13 11:34:08 +03:00
Georgi Gerganov
ffad043973
server : fix SWA condition for full context reprocess ( #14163 )
...
ggml-ci
b5653
2025-06-13 11:18:25 +03:00
Anton Mitkov
0889eba570
sycl: Adding additional cpy dbg print output ( #14034 )
b5652
2025-06-13 08:51:39 +01:00