Georgi Gerganov
a5771c9eea
ops : update BLAS ( #14914 )
2025-07-28 10:01:03 +02:00
Georgi Gerganov
c35f9eaf09
ops : update Metal ( #14912 )
2025-07-28 08:22:56 +03:00
Ruben Ortlam
bf78f5439e
vulkan: add ops docs ( #14900 )
2025-07-27 15:33:08 +02:00
Akarshan Biswas
bbfc849274
SYCL: add ops doc ( #14901 )
2025-07-27 17:52:58 +05:30
Aman Gupta
446595b9b3
Docs: add instructions for adding backends ( #14889 )
2025-07-27 09:36:43 +08:00
Aaron Teo
c7f3169cd5
ggml-cpu : disable GGML_NNPA by default due to instability ( #14880 )
...
* docs: update s390x document for sentencepiece
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit e086c5e3a7
)
* docs: update huggingface links + reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8410b085ea
)
* ggml-cpu: disable ggml-nnpa compile flag by default
fixes #14877
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 412f4c7c88
)
* docs: update s390x build docs to reflect nnpa disable
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit c1eeae1d0c
)
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-25 19:09:03 +02:00
wooksong
e7fecba934
docs : update HOWTO‑add‑model.md for ModelBase and new model classes ( #14874 )
...
This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.
It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.
Signed-off-by: Wook Song <wook16.song@samsung.com >
2025-07-25 16:25:05 +02:00
R0CKSTAR
3f4fc97f1d
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-24 20:05:37 +01:00
Pouya
39cffdf188
docs: add libcurl-dev install hint for Linux distros ( #14801 )
...
* docs: add libcurl-dev install hint for Linux distros
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
* Update docs/build.md
---------
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-07-24 11:26:44 +02:00
rspOverflow
b526ad2668
Documentation: Further revisions to the Vulkan section in build.md ( #14785 )
...
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
rspOverflow
f0d4d176df
Documentation: Update build.md's Vulkan section ( #14736 )
...
* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation.
* Documentation: Reorganize build.md's Vulkan section.
2025-07-19 12:18:36 +02:00
Reese Levine
21c021745d
ggml: Add initial WebGPU backend ( #14521 )
...
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
2025-07-16 18:18:51 +03:00
Aman Gupta
11ee0fea2a
Docs: script to auto-generate ggml operations docs ( #14598 )
...
* Docs: script to auto-generate ggml operations docs
* Review: formatting changes + change github action
* Use built-in types instead of typing
* docs : add BLAS and Metal ops
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-07-10 23:29:01 +08:00
Xuan-Son Nguyen
08382869a2
model : add SmolLM3 ( #14581 )
...
* Init - first pass.
* Model -> ModelBase.
* fix errors in conversion.
* Update the graph.
* up.
* up.
* wip
* cgraph ok
* rm redundant code
---------
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com >
2025-07-08 18:07:01 +02:00
Grzegorz Grasza
1b2aaf28ac
Add Vulkan images to docker.md ( #14472 )
...
Right now it's not easy to find those.
2025-07-01 15:44:11 +02:00
Aaron Teo
bf5bcd0b85
docs: update s390x documentation + add faq ( #14389 )
...
* docs: update s390x documentation + add faq
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add s390x z17 build q&a
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-26 12:41:41 +02:00
Aaron Teo
60ef23d6c1
ggml-cpu: enable IBM NNPA Vector Intrinsics ( #14317 )
...
* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 4a9f60c201
)
* ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8d4a7987f9
)
* ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 0ff0d65162
)
* ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 2f58bbcbb8
)
* docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 01b929491b
)
* ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 157f856c34
)
* ggml-cpu: move things around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451f
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa5
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984ad
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 9e40d984ad
)
* ggml: move ggml_table_f32_f16 to ggml-cpu.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f7
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178
.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: slaren <slarengh@gmail.com >
2025-06-25 23:49:04 +02:00
Anton Mitkov
2bf9d539dd
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices ( #13973 )
2025-06-25 18:09:55 +02:00
David Chiu
d860dd99a4
docs : fix the link to llama.h ( #14293 )
2025-06-20 19:43:35 +02:00
Aaron Teo
8d94713654
docs: add s390x build documentation ( #14264 )
...
* docs: add s390x-specific build docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add s390x model conversion steps
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x build indent
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update hyperlinks for s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update llama.h docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x add accelerator and perf optimizations
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x indent blocks
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: revert block indentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add support information for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: remove indentation for accelerator section s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: remove redundant words s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: reword for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x reword simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: fix trailing whitespace for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:10:26 +01:00
Pepijn de Vos
00ba772610
docs : remove WIP since PR has been merged ( #13912 )
2025-06-15 08:06:37 +02:00
ddpasa
26ff3685bf
docs : Update multimodal.md ( #14122 )
...
* Update multimodal.md
* Update multimodal.md
2025-06-13 15:17:53 +02:00
Xinpeng Dou
e21d2d4ae2
CANN: Simplify the environment variable setting( #13104 )
...
* Simplify the environment variable setting to specify the memory pool type.
* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.
* update
* fix CI
* update
* delete whitespace
* fix according to review
* update CANN.md
* update CANN.md
2025-06-09 19:47:39 +08:00
Xuan-Son Nguyen
ea1431b0fa
docs : add "Quick start" section for new users ( #13862 )
...
* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md
2025-06-03 13:09:36 +02:00
Jiří Podivín
b3a89c3d9e
docs : Note about necessity of having libcurl installed for standard build. ( #13945 )
...
Signed-off-by: Jiri Podivin <jpodivin@gmail.com >
2025-05-31 18:58:35 +02:00
Xuan-Son Nguyen
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )
...
* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens
2025-05-27 14:06:10 +02:00
bandoti
72b090da2c
docs: remove link for llama-cli function calling ( #13810 )
2025-05-27 08:52:40 -03:00
Bizhao Shi
2d38b6e400
CANN: Add the basic supports of Flash Attention kernel ( #13627 )
...
* cann: add the basic FA support
* cann: update the readme
* cann: update the FlashAttention with PSEShift
* cann: update the input parameters in FA
* cann: update the alibi with max_bias
* cann: add the constrints of softcap
* cann: update the docs CANN.md
* cann: update the docs CANN.md
* cann: fix typo of CANN.md
* cann: add some comments and update the CANN.md
* cann: update the CANN.md
* cann: update the inner precise for fusedInferAttention
* cann: update the constraints of flash_attn_ext on ggml-cann.cpp
* cann: clean the whitespace
* cann: clean the whitespace
* cann: add a new endline
2025-05-26 10:20:18 +08:00
Xuan-Son Nguyen
40aaa8a403
mtmd : add support for Qwen2-Audio and SeaLLM-Audio ( #13760 )
...
* mtmd : add Qwen2-Audio support
* small clean up
* update discussion link
* clarify mtmd_get_output_embd
* clarification in multimodal.md
* fix ultravox bug
* ggml_cont
2025-05-25 14:06:32 +02:00
ddpasa
a08c1d2845
docs : add Moondream2 pre-quantized link ( #13745 )
...
* Multimodal: Added Moondream2 model and fixed ggml.org link
* Apply suggestions from code review
---------
Co-authored-by: name <none@none.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-05-25 14:04:49 +02:00
Olivier Chafik
f5cd27b71d
server
: streaming of tool calls and thoughts when --jinja
is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com >
2025-05-25 01:48:08 +01:00
Xuan-Son Nguyen
797990c4bc
mtmd : add ultravox audio input ( #13623 )
...
* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker()
2025-05-22 20:42:48 +02:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-05-21 09:58:49 +08:00
Xinpeng Dou
f0adb80bf7
CANN: Update CANN model support ( #13162 )
...
* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md
2025-05-20 11:43:43 +08:00
Alberto Cabrera Pérez
725f23f1f3
sycl : backend documentation review ( #13544 )
...
* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table
2025-05-19 14:38:20 +01:00
Xuan-Son Nguyen
92ecdcc06a
mtmd : add vision support for llama 4 ( #13282 )
...
* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality
2025-05-19 13:04:14 +02:00
Łukasz Ślusarczyk
9c404ed54c
sycl: use oneDNN for matrices multiplication ( #12972 )
2025-05-15 16:53:41 +02:00
ddpasa
21ca987fba
docs: Update link to ggml-org in multimodal.md ( #13513 )
...
* Update multimodal.md
Minor change to include the huggingface link
* Update docs/multimodal.md
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-05-14 09:59:12 +02:00
Thomas Germer
62d4250e52
docs : Fix typo in InternVL3 model name ( #13440 )
2025-05-10 22:26:46 +02:00
Xuan-Son Nguyen
3b24d26c22
server : update docs ( #13432 )
2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 ( #13422 )
...
* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen
33eff40240
server : vision support via libmtmd ( #12898 )
...
* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-05-09 19:29:37 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd ( #13311 )
...
* mv llava to mtmd
* change ref everywhere
2025-05-05 16:02:55 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
David Huang
84778e9770
CUDA/HIP: Share the same unified memory allocation logic. ( #12934 )
...
Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.
2025-04-15 11:20:38 +02:00
Romain Biessy
8ed71242f4
sycl: update documentation to use -no-cnv ( #12845 )
2025-04-09 11:22:04 +02:00
Nicolò Scipione
94148ba330
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution ( #12625 )
2025-04-04 16:00:46 +02:00
lhez
7d7b1bafa7
opencl: update doc for OpenCL ( #12702 )
...
* opencl: add OpenCL to build.md
* opencl: remove fixed issue/TODO
* opencl: add link to OPENCL.md
* opencl: update doc - refine tools requirement for Windows 11 arm64
2025-04-03 22:18:17 -07:00