412f4c7c88
ggml-cpu: disable ggml-nnpa compile flag by default
...
fixes #14877
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-25 21:26:58 +08:00
2177ccdc41
ggml : remove invalid portPos specifiers from dot files ( #14838 )
...
Neither "g" nor "x" are valid portPos specifiers per the official
[graphviz documents](https://graphviz.org/docs/attr-types/portPos/ ):
> If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_".
I tested locally for it to fall back to default portPos specifier if an
invalid portPos is specified. As a consequence, we can remove associated
code.
2025-07-25 21:24:51 +08:00
a6357ac39e
context : restore preemptive sched reset when LLAMA_SET_ROWS=0 ( #14870 )
...
ggml-ci
2025-07-25 21:24:51 +08:00
092c1bd385
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip ( #14503 )
...
* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip
* Update export-lora.cpp
* Update clip.cpp
* Update export-lora.cpp
* format: use space to replace tab
2025-07-25 21:24:51 +08:00
328ed53601
rpc : check for null buffers in get/set/copy tensor endpoints ( #14868 )
2025-07-25 21:24:51 +08:00
a12209588e
sched : fix multiple evaluations of the same graph with pipeline parallelism ( #14855 )
...
ggml-ci
2025-07-25 21:24:51 +08:00
caaebfe425
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-25 21:24:51 +08:00
45c2cc370c
sync : ggml
...
ggml-ci
2025-07-25 21:24:51 +08:00
7902541d2e
cmake : fix usage issues (ggml/1257)
...
* CMake config: Create target only once
Fix error on repeated find_package(ggml).
For simplicity, check only for the top-level ggml::ggml.
* CMake config: Add CUDA link libs
* CMake config: Add OpenCL link libs
* CMake config: Use canonical find_dependency
Use set and append to control link lib variables.
Apply more $<LINK_ONLY...>.
* CMake config: Wire OpenMP dependency
2025-07-25 21:24:51 +08:00
4601f396e6
ggml-cpu : remove stdlib include from repack.cpp (ggml/1276)
...
This commit removes the inclusion of `<cstdlib>`.
The motivation for this change is that this source file does not seem to
use any functions from this header and the comment about `qsort` is a
little misleading/confusing.
2025-07-25 21:24:51 +08:00
7c5ca60b12
context : perform output reorder lazily upon access after sync ( #14853 )
...
* context : perform output reorder after lazily upon access after sync
ggml-ci
* cont : add TODO
2025-07-25 21:24:51 +08:00
c1d4ffc553
chat : fix kimi-k2 chat template ( #14852 )
2025-07-25 21:24:51 +08:00
07a49304ad
sycl: fixed semantics of block offset calculation ( #14814 )
2025-07-25 21:24:50 +08:00
6286ad25d1
llama : fix MiniCPM inference after Granite Four changes ( #14850 )
...
MiniCPM models use the llm_build_granite constructor which was changed
in the Granite Four PR to use hparams.rope_finetuned instead of a
use_rope parameter. MiniCPM models need rope enabled by default.
Fixes inference from gibberish to correct responses.
2025-07-25 21:24:50 +08:00
63b420bf9a
docs: add libcurl-dev install hint for Linux distros ( #14801 )
...
* docs: add libcurl-dev install hint for Linux distros
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
* Update docs/build.md
---------
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-07-25 21:24:50 +08:00
e84b9110f7
metal : fix fusion across different encoders ( #14849 )
...
* metal : fix fusion across different encoders
ggml-ci
* cont : add assertion
ggml-ci
2025-07-25 21:24:50 +08:00
7234b891ad
sycl: fix undefined variable in work group size check ( #14843 )
2025-07-25 21:24:50 +08:00
bd060d6036
convert : text-only support for GLM-4.1V-9B-Thinking ( #14823 )
...
* use language_model part only, ignore visual layers
* fix rope_dim calculation
2025-07-25 21:24:50 +08:00
5ad021f924
CUDA: fix overflow in FA, tune performance ( #14840 )
2025-07-25 21:24:50 +08:00
9db975e327
CUDA: fix compilation with GGML_CUDA_F16 ( #14837 )
2025-07-25 21:24:50 +08:00
a3ddddbe02
ci : correct label refactor->refactoring ( #14832 )
2025-07-25 21:24:50 +08:00
7473a0d07c
CUDA: fix quantized KV cache + multiple sequences ( #14822 )
...
* CUDA: fix quantized KV cache + multiple sequences
* Update ggml/src/ggml-cuda/fattn-common.cuh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-07-25 21:24:50 +08:00
90916df84b
tests : add non-cont K,V FA tests
...
ggml-ci
2025-07-25 21:24:50 +08:00
e0f261585b
memory : handle saving/loading null layers in recurrent memory ( #14675 )
...
* Update llama-memory-recurrent.cpp
handle saving/loading null layers in recurrent memory
* fixed styling issues and updated comments
* fix styling issue
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-25 21:24:50 +08:00
bd3c22a666
ggml: fix loongarch quantize_row_q8_1 error ( #14827 )
2025-07-25 21:24:50 +08:00
ef6198b5a5
CANN: weight format to NZ for Ascend310P3 ( #14407 )
...
* weight format to nz for 310p
* remove quant weight format to nz
* clean code
* fix
* make the conditions for converting weights to NZ format consistent
* clean code
2025-07-25 21:24:50 +08:00
1e55890e40
CUDA: add fused rms norm ( #14800 )
2025-07-25 21:24:50 +08:00
9b5125679c
ggml : model card yaml tab->2xspace ( #14819 )
2025-07-25 21:24:50 +08:00
44d4801a25
vulkan: fix rms_norm_mul to handle broadcasting dim0 ( #14817 )
2025-07-25 21:24:50 +08:00
10a676558d
llama : add model type detection for rwkv7 7B&14B ( #14816 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
2025-07-25 21:24:50 +08:00
45fc00e2c0
imatrix: add option to display importance score statistics for a given imatrix file ( #12718 )
...
* Add --show-statistics option
* Add --show-statistics logic
* Add tensor name parsing
* Tidy output format
* Fix typo in title
* Improve tensor influence ranking
* Add better statistics
* Change statistics' sort order
* Add Cosine Similarity
* Add header search path
* Change header search path to private
* Add weighted statistics per layer
* Update report title
* Refactor compute_statistics out of main
* Refactor compute_cossim out of load_imatrix
* Refactor compute_statistics out of load_imatrix
* Move imatrix statistics calculation into its own functions
* Add checks and validations
* Remove unnecessary include directory
* Rename labels
* Add m_stats getter and refactor compute_statistics out of load_imatrix
* Refactor variable names
* Minor cosmetic change
* Retrigger checks (empty commit)
* Rerun checks (empty commit)
* Fix unnecessary type promotion
Co-authored-by: compilade <git@compilade.net >
* Reverting change to improve code readability
* Rerun checks (empty commit)
* Rerun checks (empty commit)
* Rerun checks - third time's the Charm 🤞 (empty commit)
* Minor cosmetic change
* Update README
* Fix typo
* Update README
* Rerun checks (empty commit)
* Re-implement changes on top of #9400
* Update README.md
* Update README
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Update README.md
* Remove duplicate option in print_usage()
* Update README.md
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Update README.md
Co-authored-by: compilade <git@compilade.net >
* Remove input check
* Remove commented out code
---------
Co-authored-by: compilade <git@compilade.net >
2025-07-25 21:24:50 +08:00
888b75ba61
Mtmd: add a way to select device for vision encoder ( #14236 )
...
* Mtmd: add a way to select device for vision encoder
* simplify
* format
* Warn user if manual device selection failed
* initialize backend to nullptr
2025-07-25 21:24:50 +08:00
4c94f27ab7
cuda : implement bf16 cpy ops and enable bf16 cont ( #14763 )
...
* implement bf16 cpy ops and enable bf16 cont
* deduplicate copy functions
* deduplicate checks
2025-07-25 21:24:50 +08:00
1e54562db3
opencl: remove unreachable return
( #14806 )
2025-07-25 21:24:50 +08:00
0dd3cd5540
server : allow setting --reverse-prompt
arg ( #14799 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
2025-07-25 21:24:50 +08:00
9e500e2355
cuda: remove linking to cublasLt ( #14790 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-25 21:24:50 +08:00
e77f241b84
opencl: fix im2col
when KW!=KH
( #14803 )
2025-07-25 21:24:50 +08:00
120add9ef4
opencl: add conv2d kernel ( #14403 )
...
* add conv2d kernel
* fix trailing whitespace
* whitespace fixe
* handle f16 input and f16 kernel, more opt
* resolve conflicts
* use enqueue_ndrange_kernel
2025-07-25 21:24:50 +08:00
f04095bde9
sycl: Fix im2col ( #14797 )
2025-07-25 21:24:50 +08:00
549f9eb1b5
kleidiai: add support for get_rows ( #14676 )
...
* kleidiai: add support for get_rows
* apply fixes based on code review
* apply more fixes based on code review
2025-07-25 21:24:50 +08:00
ae77ded2c2
docs : fix backends table in README.md ( #14796 )
2025-07-25 21:24:50 +08:00
a2cdf559c2
vulkan/cuda: Fix im2col when KW!=KH ( #14789 )
...
The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.
2025-07-25 21:24:49 +08:00
8410b085ea
docs: update huggingface links + reword
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 18:31:18 +08:00
e086c5e3a7
docs: update s390x document for sentencepiece
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 18:21:39 +08:00
c82d48ec23
llama : fix --reverse-prompt
crashing issue ( #14794 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
b5949
2025-07-21 17:38:36 +08:00
b4efd77f8a
server : add parse_special option to /tokenize endpoint ( #14783 )
2025-07-21 10:24:51 +03:00
2be60cbc27
docs : fix link for tools/perplexity in README.md ( #14780 )
2025-07-20 20:13:47 +02:00
b526ad2668
Documentation: Further revisions to the Vulkan section in build.md ( #14785 )
...
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
938b785764
Clang-format: local files first + fix BinPacking ( #14779 )
2025-07-20 19:42:34 +08:00
36c153248f
Contrib: add 0cc4m as codeowner for Vulkan backend ( #14775 )
2025-07-19 23:47:21 +03:00