lhez
ce111d39d6
opencl: add fused rms_norm_mul
( #14841 )
...
* opencl: add fused `rms_norm` + `mul`
* opencl: improve workgroup size for `rms_norm_mul`
b5992
2025-07-25 17:12:13 +02:00
wooksong
e7fecba934
docs : update HOWTO‑add‑model.md for ModelBase and new model classes ( #14874 )
...
This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.
It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.
Signed-off-by: Wook Song <wook16.song@samsung.com >
2025-07-25 16:25:05 +02:00
Oliver Simons
e2b7621e7c
ggml : remove invalid portPos specifiers from dot files ( #14838 )
...
Neither "g" nor "x" are valid portPos specifiers per the official
[graphviz documents](https://graphviz.org/docs/attr-types/portPos/ ):
> If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_".
I tested locally for it to fall back to default portPos specifier if an
invalid portPos is specified. As a consequence, we can remove associated
code.
b5990
2025-07-25 14:29:57 +03:00
Georgi Gerganov
c1dbea752a
context : restore preemptive sched reset when LLAMA_SET_ROWS=0 ( #14870 )
...
ggml-ci
b5989
2025-07-25 14:28:06 +03:00
kiwi
749e0d27f0
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip ( #14503 )
...
* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip
* Update export-lora.cpp
* Update clip.cpp
* Update export-lora.cpp
* format: use space to replace tab
b5988
2025-07-25 13:08:04 +02:00
Chris Rohlf
64bf1c3744
rpc : check for null buffers in get/set/copy tensor endpoints ( #14868 )
b5987
2025-07-25 12:17:02 +02:00
Diego Devesa
c12bbde372
sched : fix multiple evaluations of the same graph with pipeline parallelism ( #14855 )
...
ggml-ci
b5986
2025-07-25 11:07:26 +03:00
R0CKSTAR
3f4fc97f1d
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b5985
2025-07-24 20:05:37 +01:00
Georgi Gerganov
2df255da3c
sync : ggml
...
ggml-ci
b5984
2025-07-24 20:27:23 +03:00
Kai Pastor
60f816a79d
cmake : fix usage issues (ggml/1257)
...
* CMake config: Create target only once
Fix error on repeated find_package(ggml).
For simplicity, check only for the top-level ggml::ggml.
* CMake config: Add CUDA link libs
* CMake config: Add OpenCL link libs
* CMake config: Use canonical find_dependency
Use set and append to control link lib variables.
Apply more $<LINK_ONLY...>.
* CMake config: Wire OpenMP dependency
2025-07-24 20:27:23 +03:00
Daniel Bevenius
5592f278b6
ggml-cpu : remove stdlib include from repack.cpp (ggml/1276)
...
This commit removes the inclusion of `<cstdlib>`.
The motivation for this change is that this source file does not seem to
use any functions from this header and the comment about `qsort` is a
little misleading/confusing.
2025-07-24 20:27:23 +03:00
Georgi Gerganov
e4868d16d2
context : perform output reorder lazily upon access after sync ( #14853 )
...
* context : perform output reorder after lazily upon access after sync
ggml-ci
* cont : add TODO
b5981
2025-07-24 16:31:48 +03:00
Xuan-Son Nguyen
820de57d4f
chat : fix kimi-k2 chat template ( #14852 )
b5980
2025-07-24 13:59:56 +02:00
Alberto Cabrera Pérez
cb4a63aad6
sycl: fixed semantics of block offset calculation ( #14814 )
b5979
2025-07-24 11:09:57 +01:00
yummy
86f5623d90
llama : fix MiniCPM inference after Granite Four changes ( #14850 )
...
MiniCPM models use the llm_build_granite constructor which was changed
in the Granite Four PR to use hparams.rope_finetuned instead of a
use_rope parameter. MiniCPM models need rope enabled by default.
Fixes inference from gibberish to correct responses.
b5978
2025-07-24 11:50:51 +02:00
Pouya
39cffdf188
docs: add libcurl-dev install hint for Linux distros ( #14801 )
...
* docs: add libcurl-dev install hint for Linux distros
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
* Update docs/build.md
---------
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-07-24 11:26:44 +02:00
Georgi Gerganov
065908cb09
metal : fix fusion across different encoders ( #14849 )
...
* metal : fix fusion across different encoders
ggml-ci
* cont : add assertion
ggml-ci
b5976
2025-07-24 10:24:05 +03:00
Donghyeon Jeong
4ec6291a24
sycl: fix undefined variable in work group size check ( #14843 )
b5975
2025-07-24 12:50:41 +08:00
jacekpoplawski
a12363bbf0
convert : text-only support for GLM-4.1V-9B-Thinking ( #14823 )
...
* use language_model part only, ignore visual layers
* fix rope_dim calculation
2025-07-23 23:23:57 +02:00
Johannes Gäßler
a86f52b285
CUDA: fix overflow in FA, tune performance ( #14840 )
b5973
2025-07-23 21:43:25 +02:00
Johannes Gäßler
b284197df4
CUDA: fix compilation with GGML_CUDA_F16 ( #14837 )
b5972
2025-07-23 18:22:30 +02:00
Sigbjørn Skjæret
221c0e0c58
ci : correct label refactor->refactoring ( #14832 )
2025-07-23 14:27:54 +02:00
Johannes Gäßler
07a19e27a2
CUDA: fix quantized KV cache + multiple sequences ( #14822 )
...
* CUDA: fix quantized KV cache + multiple sequences
* Update ggml/src/ggml-cuda/fattn-common.cuh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b5970
2025-07-23 14:08:09 +03:00
Georgi Gerganov
18f3b5ff9e
tests : add non-cont K,V FA tests
...
ggml-ci
2025-07-23 14:08:09 +03:00
l3utterfly
7233358d29
memory : handle saving/loading null layers in recurrent memory ( #14675 )
...
* Update llama-memory-recurrent.cpp
handle saving/loading null layers in recurrent memory
* fixed styling issues and updated comments
* fix styling issue
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b5968
2025-07-23 11:16:41 +03:00
lixing-star
6c88b3bb25
ggml: fix loongarch quantize_row_q8_1 error ( #14827 )
b5967
2025-07-23 09:39:51 +03:00
chen fan
14c28dfc50
CANN: weight format to NZ for Ascend310P3 ( #14407 )
...
* weight format to nz for 310p
* remove quant weight format to nz
* clean code
* fix
* make the conditions for converting weights to NZ format consistent
* clean code
b5966
2025-07-23 11:58:00 +08:00
Aman Gupta
8c988fa41d
CUDA: add fused rms norm ( #14800 )
b5965
2025-07-23 09:25:42 +08:00
Csaba Kecskemeti
acd6cb1c41
ggml : model card yaml tab->2xspace ( #14819 )
2025-07-22 19:29:43 +03:00
Jeff Bolz
84712b6043
vulkan: fix rms_norm_mul to handle broadcasting dim0 ( #14817 )
b5963
2025-07-22 17:35:21 +02:00
Molly Sophia
d4d1522b20
llama : add model type detection for rwkv7 7B&14B ( #14816 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
b5962
2025-07-22 23:01:29 +08:00
Ed Addario
d1aa0cc5d1
imatrix: add option to display importance score statistics for a given imatrix file ( #12718 )
...
* Add --show-statistics option
* Add --show-statistics logic
* Add tensor name parsing
* Tidy output format
* Fix typo in title
* Improve tensor influence ranking
* Add better statistics
* Change statistics' sort order
* Add Cosine Similarity
* Add header search path
* Change header search path to private
* Add weighted statistics per layer
* Update report title
* Refactor compute_statistics out of main
* Refactor compute_cossim out of load_imatrix
* Refactor compute_statistics out of load_imatrix
* Move imatrix statistics calculation into its own functions
* Add checks and validations
* Remove unnecessary include directory
* Rename labels
* Add m_stats getter and refactor compute_statistics out of load_imatrix
* Refactor variable names
* Minor cosmetic change
* Retrigger checks (empty commit)
* Rerun checks (empty commit)
* Fix unnecessary type promotion
Co-authored-by: compilade <git@compilade.net >
* Reverting change to improve code readability
* Rerun checks (empty commit)
* Rerun checks (empty commit)
* Rerun checks - third time's the Charm 🤞 (empty commit)
* Minor cosmetic change
* Update README
* Fix typo
* Update README
* Rerun checks (empty commit)
* Re-implement changes on top of #9400
* Update README.md
* Update README
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Update README.md
* Remove duplicate option in print_usage()
* Update README.md
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Remove input check
* Remove commented out code
---------
Co-authored-by: compilade <git@compilade.net >
b5961
2025-07-22 14:33:37 +02:00
stduhpf
c8ade30036
Mtmd: add a way to select device for vision encoder ( #14236 )
...
* Mtmd: add a way to select device for vision encoder
* simplify
* format
* Warn user if manual device selection failed
* initialize backend to nullptr
b5960
2025-07-22 12:51:03 +02:00
Sigbjørn Skjæret
e28c0b80c2
cuda : implement bf16 cpy ops and enable bf16 cont ( #14763 )
...
* implement bf16 cpy ops and enable bf16 cont
* deduplicate copy functions
* deduplicate checks
b5959
2025-07-22 12:33:10 +02:00
lhez
8e6f8bc875
opencl: remove unreachable return
( #14806 )
b5958
2025-07-22 08:53:30 +02:00
Molly Sophia
adef81781a
server : allow setting --reverse-prompt
arg ( #14799 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
b5957
2025-07-22 09:24:22 +08:00
R0CKSTAR
48b86c4fdb
cuda: remove linking to cublasLt ( #14790 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b5956
2025-07-22 07:45:26 +08:00
Sigbjørn Skjæret
38d3af1b73
opencl: fix im2col
when KW!=KH
( #14803 )
2025-07-21 13:55:10 -07:00
rmatif
6c9ee3b17e
opencl: add conv2d kernel ( #14403 )
...
* add conv2d kernel
* fix trailing whitespace
* whitespace fixe
* handle f16 input and f16 kernel, more opt
* resolve conflicts
* use enqueue_ndrange_kernel
b5954
2025-07-21 10:03:19 -07:00
Romain Biessy
cd465d823c
sycl: Fix im2col ( #14797 )
b5953
2025-07-21 18:39:29 +02:00
Charles Xu
922042601b
kleidiai: add support for get_rows ( #14676 )
...
* kleidiai: add support for get_rows
* apply fixes based on code review
* apply more fixes based on code review
b5952
2025-07-21 16:49:52 +03:00
Radoslav Gerganov
2ba1333b35
docs : fix backends table in README.md ( #14796 )
2025-07-21 14:03:49 +02:00
Jeff Bolz
c2e058f1b4
vulkan/cuda: Fix im2col when KW!=KH ( #14789 )
...
The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.
b5950
2025-07-21 13:35:40 +02:00
Molly Sophia
c82d48ec23
llama : fix --reverse-prompt
crashing issue ( #14794 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
b5949
2025-07-21 17:38:36 +08:00
IsaacDynamo
b4efd77f8a
server : add parse_special option to /tokenize endpoint ( #14783 )
2025-07-21 10:24:51 +03:00
Aman Gupta
2be60cbc27
docs : fix link for tools/perplexity in README.md ( #14780 )
2025-07-20 20:13:47 +02:00
rspOverflow
b526ad2668
Documentation: Further revisions to the Vulkan section in build.md ( #14785 )
...
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
Aman Gupta
938b785764
Clang-format: local files first + fix BinPacking ( #14779 )
2025-07-20 19:42:34 +08:00
0cc4m
36c153248f
Contrib: add 0cc4m as codeowner for Vulkan backend ( #14775 )
2025-07-19 23:47:21 +03:00
Ervin Áron Tasnádi
a979ca22db
ggml: adds CONV_2D op and direct GEMM Vulkan implementation ( #14316 )
...
* ggml/ggml-vulkan/test-backend-ops: adds CONV_2D for Vulkan
* ggml-vulkan: adds f32 scalar shader to compute 2D convolution directly
with gemm (no need for im2col),
* test-backend-ops: adds test_case_ref to check the validity/performance of ops
against reference implementations having different graphs, adds tests
* * Performance fixes: minimized branch divergence, uses collectives to
eliminate redundant calculation, macros removed.
* Kernel shared memory size check
* Updates test-backend-ops to support graphs for performance
measurement.
* * Apple/Win32 compile errors fixed
* Subgroup size used to determine tile size -> fixes llvmpipe errors.
* Collectives disabled by default.
* Intel support is disabled as the performance is poor.
* Conv2d enabled for Intel with disabled collectives, disabled for Apple
* test-backend-ops modifications are reverted
* Trailing spaces and missing override fixed.
* Triggering pipeline relaunch.
* Code formatted with .clang-format.
b5943
2025-07-19 21:59:08 +02:00