Commit Graph

5993 Commits

Author SHA1 Message Date
412f4c7c88 ggml-cpu: disable ggml-nnpa compile flag by default
fixes #14877

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-25 21:26:58 +08:00
2177ccdc41 ggml : remove invalid portPos specifiers from dot files (#14838)
Neither "g" nor "x" are valid portPos specifiers per the official
[graphviz documents](https://graphviz.org/docs/attr-types/portPos/):

> If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_".

I tested locally for it to fall back to default portPos specifier if an
invalid portPos is specified. As a consequence, we can remove associated
code.
2025-07-25 21:24:51 +08:00
a6357ac39e context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870)
ggml-ci
2025-07-25 21:24:51 +08:00
092c1bd385 mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503)
* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip

* Update export-lora.cpp

* Update clip.cpp

* Update export-lora.cpp

* format: use space to replace tab
2025-07-25 21:24:51 +08:00
328ed53601 rpc : check for null buffers in get/set/copy tensor endpoints (#14868) 2025-07-25 21:24:51 +08:00
a12209588e sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855)
ggml-ci
2025-07-25 21:24:51 +08:00
caaebfe425 musa: upgrade musa sdk to rc4.2.0 (#14498)
* musa: apply mublas API changes

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: update musa version to 4.2.0

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: restore MUSA graph settings in CMakeLists.txt

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: disable mudnnMemcpyAsync by default

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: switch back to non-mudnn images

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* minor changes

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: restore rc in docker image tag

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-07-25 21:24:51 +08:00
45c2cc370c sync : ggml
ggml-ci
2025-07-25 21:24:51 +08:00
7902541d2e cmake : fix usage issues (ggml/1257)
* CMake config: Create target only once

Fix error on repeated find_package(ggml).
For simplicity, check only for the top-level ggml::ggml.

* CMake config: Add CUDA link libs

* CMake config: Add OpenCL link libs

* CMake config: Use canonical find_dependency

Use set and append to control link lib variables.
Apply more $<LINK_ONLY...>.

* CMake config: Wire OpenMP dependency
2025-07-25 21:24:51 +08:00
4601f396e6 ggml-cpu : remove stdlib include from repack.cpp (ggml/1276)
This commit removes the inclusion of `<cstdlib>`.

The motivation for this change is that this source file does not seem to
use any functions from this header and the comment about `qsort` is a
little misleading/confusing.
2025-07-25 21:24:51 +08:00
7c5ca60b12 context : perform output reorder lazily upon access after sync (#14853)
* context : perform output reorder after lazily upon access after sync

ggml-ci

* cont : add TODO
2025-07-25 21:24:51 +08:00
c1d4ffc553 chat : fix kimi-k2 chat template (#14852) 2025-07-25 21:24:51 +08:00
07a49304ad sycl: fixed semantics of block offset calculation (#14814) 2025-07-25 21:24:50 +08:00
6286ad25d1 llama : fix MiniCPM inference after Granite Four changes (#14850)
MiniCPM models use the llm_build_granite constructor which was changed
in the Granite Four PR to use hparams.rope_finetuned instead of a
use_rope parameter. MiniCPM models need rope enabled by default.

Fixes inference from gibberish to correct responses.
2025-07-25 21:24:50 +08:00
63b420bf9a docs: add libcurl-dev install hint for Linux distros (#14801)
* docs: add libcurl-dev install hint for Linux distros

Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com>

* Update docs/build.md

---------

Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-07-25 21:24:50 +08:00
e84b9110f7 metal : fix fusion across different encoders (#14849)
* metal : fix fusion across different encoders

ggml-ci

* cont : add assertion

ggml-ci
2025-07-25 21:24:50 +08:00
7234b891ad sycl: fix undefined variable in work group size check (#14843) 2025-07-25 21:24:50 +08:00
bd060d6036 convert : text-only support for GLM-4.1V-9B-Thinking (#14823)
* use language_model part only, ignore visual layers

* fix rope_dim calculation
2025-07-25 21:24:50 +08:00
5ad021f924 CUDA: fix overflow in FA, tune performance (#14840) 2025-07-25 21:24:50 +08:00
9db975e327 CUDA: fix compilation with GGML_CUDA_F16 (#14837) 2025-07-25 21:24:50 +08:00
a3ddddbe02 ci : correct label refactor->refactoring (#14832) 2025-07-25 21:24:50 +08:00
7473a0d07c CUDA: fix quantized KV cache + multiple sequences (#14822)
* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-07-25 21:24:50 +08:00
90916df84b tests : add non-cont K,V FA tests
ggml-ci
2025-07-25 21:24:50 +08:00
e0f261585b memory : handle saving/loading null layers in recurrent memory (#14675)
* Update llama-memory-recurrent.cpp

handle saving/loading null layers in recurrent memory

* fixed styling issues and updated comments

* fix styling issue

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-25 21:24:50 +08:00
bd3c22a666 ggml: fix loongarch quantize_row_q8_1 error (#14827) 2025-07-25 21:24:50 +08:00
ef6198b5a5 CANN: weight format to NZ for Ascend310P3 (#14407)
* weight format to nz for 310p

* remove quant weight format to nz

* clean code

* fix

* make the conditions for converting weights to NZ format consistent

* clean code
2025-07-25 21:24:50 +08:00
1e55890e40 CUDA: add fused rms norm (#14800) 2025-07-25 21:24:50 +08:00
9b5125679c ggml : model card yaml tab->2xspace (#14819) 2025-07-25 21:24:50 +08:00
44d4801a25 vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817) 2025-07-25 21:24:50 +08:00
10a676558d llama : add model type detection for rwkv7 7B&14B (#14816)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-07-25 21:24:50 +08:00
45fc00e2c0 imatrix: add option to display importance score statistics for a given imatrix file (#12718)
* Add --show-statistics option

* Add --show-statistics logic

* Add tensor name parsing

* Tidy output format

* Fix typo in title

* Improve tensor influence ranking

* Add better statistics

* Change statistics' sort order

* Add Cosine Similarity

* Add header search path

* Change header search path to private

* Add weighted statistics per layer

* Update report title

* Refactor compute_statistics out of main

* Refactor compute_cossim out of load_imatrix

* Refactor compute_statistics out of load_imatrix

* Move imatrix statistics calculation into its own functions

* Add checks and validations

* Remove unnecessary include directory

* Rename labels

* Add m_stats getter and refactor compute_statistics out of load_imatrix

* Refactor variable names

* Minor cosmetic change

* Retrigger checks (empty commit)

* Rerun checks (empty commit)

* Fix unnecessary type promotion

Co-authored-by: compilade <git@compilade.net>

* Reverting change to improve code readability

* Rerun checks (empty commit)

* Rerun checks (empty commit)

* Rerun checks - third time's the Charm 🤞 (empty commit)

* Minor cosmetic change

* Update README

* Fix typo

* Update README

* Rerun checks (empty commit)

* Re-implement changes on top of #9400

* Update README.md

* Update README

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Update README.md

* Remove duplicate option in print_usage()

* Update README.md

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Remove input check

* Remove commented out code

---------

Co-authored-by: compilade <git@compilade.net>
2025-07-25 21:24:50 +08:00
888b75ba61 Mtmd: add a way to select device for vision encoder (#14236)
* Mtmd: add a way to select device for vision encoder

* simplify

* format

* Warn user if manual device selection failed

* initialize backend to nullptr
2025-07-25 21:24:50 +08:00
4c94f27ab7 cuda : implement bf16 cpy ops and enable bf16 cont (#14763)
* implement bf16 cpy ops and enable bf16 cont

* deduplicate copy functions

* deduplicate checks
2025-07-25 21:24:50 +08:00
1e54562db3 opencl: remove unreachable return (#14806) 2025-07-25 21:24:50 +08:00
0dd3cd5540 server : allow setting --reverse-prompt arg (#14799)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-07-25 21:24:50 +08:00
9e500e2355 cuda: remove linking to cublasLt (#14790)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-07-25 21:24:50 +08:00
e77f241b84 opencl: fix im2col when KW!=KH (#14803) 2025-07-25 21:24:50 +08:00
120add9ef4 opencl: add conv2d kernel (#14403)
* add conv2d kernel

* fix trailing whitespace

* whitespace fixe

* handle f16 input and f16 kernel, more opt

* resolve conflicts

* use enqueue_ndrange_kernel
2025-07-25 21:24:50 +08:00
f04095bde9 sycl: Fix im2col (#14797) 2025-07-25 21:24:50 +08:00
549f9eb1b5 kleidiai: add support for get_rows (#14676)
* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review
2025-07-25 21:24:50 +08:00
ae77ded2c2 docs : fix backends table in README.md (#14796) 2025-07-25 21:24:50 +08:00
a2cdf559c2 vulkan/cuda: Fix im2col when KW!=KH (#14789)
The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.
2025-07-25 21:24:49 +08:00
8410b085ea docs: update huggingface links + reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 18:31:18 +08:00
e086c5e3a7 docs: update s390x document for sentencepiece
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 18:21:39 +08:00
c82d48ec23 llama : fix --reverse-prompt crashing issue (#14794)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
b5949
2025-07-21 17:38:36 +08:00
b4efd77f8a server : add parse_special option to /tokenize endpoint (#14783) 2025-07-21 10:24:51 +03:00
2be60cbc27 docs : fix link for tools/perplexity in README.md (#14780) 2025-07-20 20:13:47 +02:00
b526ad2668 Documentation: Further revisions to the Vulkan section in build.md (#14785)
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.

* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
938b785764 Clang-format: local files first + fix BinPacking (#14779) 2025-07-20 19:42:34 +08:00
36c153248f Contrib: add 0cc4m as codeowner for Vulkan backend (#14775) 2025-07-19 23:47:21 +03:00