Commit Graph

5988 Commits

Author SHA1 Message Date
Diego Devesa
a12209588e sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855)
ggml-ci
2025-07-25 21:24:51 +08:00
R0CKSTAR
caaebfe425 musa: upgrade musa sdk to rc4.2.0 (#14498)
* musa: apply mublas API changes

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: update musa version to 4.2.0

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: restore MUSA graph settings in CMakeLists.txt

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: disable mudnnMemcpyAsync by default

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: switch back to non-mudnn images

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* minor changes

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: restore rc in docker image tag

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-07-25 21:24:51 +08:00
Georgi Gerganov
45c2cc370c sync : ggml
ggml-ci
2025-07-25 21:24:51 +08:00
Kai Pastor
7902541d2e cmake : fix usage issues (ggml/1257)
* CMake config: Create target only once

Fix error on repeated find_package(ggml).
For simplicity, check only for the top-level ggml::ggml.

* CMake config: Add CUDA link libs

* CMake config: Add OpenCL link libs

* CMake config: Use canonical find_dependency

Use set and append to control link lib variables.
Apply more $<LINK_ONLY...>.

* CMake config: Wire OpenMP dependency
2025-07-25 21:24:51 +08:00
Daniel Bevenius
4601f396e6 ggml-cpu : remove stdlib include from repack.cpp (ggml/1276)
This commit removes the inclusion of `<cstdlib>`.

The motivation for this change is that this source file does not seem to
use any functions from this header and the comment about `qsort` is a
little misleading/confusing.
2025-07-25 21:24:51 +08:00
Georgi Gerganov
7c5ca60b12 context : perform output reorder lazily upon access after sync (#14853)
* context : perform output reorder after lazily upon access after sync

ggml-ci

* cont : add TODO
2025-07-25 21:24:51 +08:00
Xuan-Son Nguyen
c1d4ffc553 chat : fix kimi-k2 chat template (#14852) 2025-07-25 21:24:51 +08:00
Alberto Cabrera Pérez
07a49304ad sycl: fixed semantics of block offset calculation (#14814) 2025-07-25 21:24:50 +08:00
yummy
6286ad25d1 llama : fix MiniCPM inference after Granite Four changes (#14850)
MiniCPM models use the llm_build_granite constructor which was changed
in the Granite Four PR to use hparams.rope_finetuned instead of a
use_rope parameter. MiniCPM models need rope enabled by default.

Fixes inference from gibberish to correct responses.
2025-07-25 21:24:50 +08:00
Pouya
63b420bf9a docs: add libcurl-dev install hint for Linux distros (#14801)
* docs: add libcurl-dev install hint for Linux distros

Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com>

* Update docs/build.md

---------

Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-07-25 21:24:50 +08:00
Georgi Gerganov
e84b9110f7 metal : fix fusion across different encoders (#14849)
* metal : fix fusion across different encoders

ggml-ci

* cont : add assertion

ggml-ci
2025-07-25 21:24:50 +08:00
Donghyeon Jeong
7234b891ad sycl: fix undefined variable in work group size check (#14843) 2025-07-25 21:24:50 +08:00
jacekpoplawski
bd060d6036 convert : text-only support for GLM-4.1V-9B-Thinking (#14823)
* use language_model part only, ignore visual layers

* fix rope_dim calculation
2025-07-25 21:24:50 +08:00
Johannes Gäßler
5ad021f924 CUDA: fix overflow in FA, tune performance (#14840) 2025-07-25 21:24:50 +08:00
Johannes Gäßler
9db975e327 CUDA: fix compilation with GGML_CUDA_F16 (#14837) 2025-07-25 21:24:50 +08:00
Sigbjørn Skjæret
a3ddddbe02 ci : correct label refactor->refactoring (#14832) 2025-07-25 21:24:50 +08:00
Johannes Gäßler
7473a0d07c CUDA: fix quantized KV cache + multiple sequences (#14822)
* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-07-25 21:24:50 +08:00
Georgi Gerganov
90916df84b tests : add non-cont K,V FA tests
ggml-ci
2025-07-25 21:24:50 +08:00
l3utterfly
e0f261585b memory : handle saving/loading null layers in recurrent memory (#14675)
* Update llama-memory-recurrent.cpp

handle saving/loading null layers in recurrent memory

* fixed styling issues and updated comments

* fix styling issue

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-25 21:24:50 +08:00
lixing-star
bd3c22a666 ggml: fix loongarch quantize_row_q8_1 error (#14827) 2025-07-25 21:24:50 +08:00
chen fan
ef6198b5a5 CANN: weight format to NZ for Ascend310P3 (#14407)
* weight format to nz for 310p

* remove quant weight format to nz

* clean code

* fix

* make the conditions for converting weights to NZ format consistent

* clean code
2025-07-25 21:24:50 +08:00
Aman Gupta
1e55890e40 CUDA: add fused rms norm (#14800) 2025-07-25 21:24:50 +08:00
Csaba Kecskemeti
9b5125679c ggml : model card yaml tab->2xspace (#14819) 2025-07-25 21:24:50 +08:00
Jeff Bolz
44d4801a25 vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817) 2025-07-25 21:24:50 +08:00
Molly Sophia
10a676558d llama : add model type detection for rwkv7 7B&14B (#14816)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-07-25 21:24:50 +08:00
Ed Addario
45fc00e2c0 imatrix: add option to display importance score statistics for a given imatrix file (#12718)
* Add --show-statistics option

* Add --show-statistics logic

* Add tensor name parsing

* Tidy output format

* Fix typo in title

* Improve tensor influence ranking

* Add better statistics

* Change statistics' sort order

* Add Cosine Similarity

* Add header search path

* Change header search path to private

* Add weighted statistics per layer

* Update report title

* Refactor compute_statistics out of main

* Refactor compute_cossim out of load_imatrix

* Refactor compute_statistics out of load_imatrix

* Move imatrix statistics calculation into its own functions

* Add checks and validations

* Remove unnecessary include directory

* Rename labels

* Add m_stats getter and refactor compute_statistics out of load_imatrix

* Refactor variable names

* Minor cosmetic change

* Retrigger checks (empty commit)

* Rerun checks (empty commit)

* Fix unnecessary type promotion

Co-authored-by: compilade <git@compilade.net>

* Reverting change to improve code readability

* Rerun checks (empty commit)

* Rerun checks (empty commit)

* Rerun checks - third time's the Charm 🤞 (empty commit)

* Minor cosmetic change

* Update README

* Fix typo

* Update README

* Rerun checks (empty commit)

* Re-implement changes on top of #9400

* Update README.md

* Update README

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Update README.md

* Remove duplicate option in print_usage()

* Update README.md

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Update README.md

Co-authored-by: compilade <git@compilade.net>

* Remove input check

* Remove commented out code

---------

Co-authored-by: compilade <git@compilade.net>
2025-07-25 21:24:50 +08:00
stduhpf
888b75ba61 Mtmd: add a way to select device for vision encoder (#14236)
* Mtmd: add a way to select device for vision encoder

* simplify

* format

* Warn user if manual device selection failed

* initialize backend to nullptr
2025-07-25 21:24:50 +08:00
Sigbjørn Skjæret
4c94f27ab7 cuda : implement bf16 cpy ops and enable bf16 cont (#14763)
* implement bf16 cpy ops and enable bf16 cont

* deduplicate copy functions

* deduplicate checks
2025-07-25 21:24:50 +08:00
lhez
1e54562db3 opencl: remove unreachable return (#14806) 2025-07-25 21:24:50 +08:00
Molly Sophia
0dd3cd5540 server : allow setting --reverse-prompt arg (#14799)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-07-25 21:24:50 +08:00
R0CKSTAR
9e500e2355 cuda: remove linking to cublasLt (#14790)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-07-25 21:24:50 +08:00
Sigbjørn Skjæret
e77f241b84 opencl: fix im2col when KW!=KH (#14803) 2025-07-25 21:24:50 +08:00
rmatif
120add9ef4 opencl: add conv2d kernel (#14403)
* add conv2d kernel

* fix trailing whitespace

* whitespace fixe

* handle f16 input and f16 kernel, more opt

* resolve conflicts

* use enqueue_ndrange_kernel
2025-07-25 21:24:50 +08:00
Romain Biessy
f04095bde9 sycl: Fix im2col (#14797) 2025-07-25 21:24:50 +08:00
Charles Xu
549f9eb1b5 kleidiai: add support for get_rows (#14676)
* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review
2025-07-25 21:24:50 +08:00
Radoslav Gerganov
ae77ded2c2 docs : fix backends table in README.md (#14796) 2025-07-25 21:24:50 +08:00
Jeff Bolz
a2cdf559c2 vulkan/cuda: Fix im2col when KW!=KH (#14789)
The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.
2025-07-25 21:24:49 +08:00
Aaron Teo
8410b085ea docs: update huggingface links + reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 18:31:18 +08:00
Aaron Teo
e086c5e3a7 docs: update s390x document for sentencepiece
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 18:21:39 +08:00
Molly Sophia
c82d48ec23 llama : fix --reverse-prompt crashing issue (#14794)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
b5949
2025-07-21 17:38:36 +08:00
IsaacDynamo
b4efd77f8a server : add parse_special option to /tokenize endpoint (#14783) 2025-07-21 10:24:51 +03:00
Aman Gupta
2be60cbc27 docs : fix link for tools/perplexity in README.md (#14780) 2025-07-20 20:13:47 +02:00
rspOverflow
b526ad2668 Documentation: Further revisions to the Vulkan section in build.md (#14785)
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.

* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
Aman Gupta
938b785764 Clang-format: local files first + fix BinPacking (#14779) 2025-07-20 19:42:34 +08:00
0cc4m
36c153248f Contrib: add 0cc4m as codeowner for Vulkan backend (#14775) 2025-07-19 23:47:21 +03:00
Ervin Áron Tasnádi
a979ca22db ggml: adds CONV_2D op and direct GEMM Vulkan implementation (#14316)
* ggml/ggml-vulkan/test-backend-ops: adds CONV_2D for Vulkan

* ggml-vulkan: adds f32 scalar shader to compute 2D convolution directly
with gemm (no need for im2col),

* test-backend-ops: adds test_case_ref to check the validity/performance of ops
against reference implementations having different graphs, adds tests

* * Performance fixes: minimized branch divergence, uses collectives to
  eliminate redundant calculation, macros removed.

* Kernel shared memory size check

* Updates test-backend-ops to support graphs for performance
  measurement.

* * Apple/Win32 compile errors fixed

* Subgroup size used to determine tile size -> fixes llvmpipe errors.

* Collectives disabled by default.

* Intel support is disabled as the performance is poor.

* Conv2d enabled for Intel with disabled collectives, disabled for Apple

* test-backend-ops modifications are reverted

* Trailing spaces and missing override fixed.

* Triggering pipeline relaunch.

* Code formatted with .clang-format.
b5943
2025-07-19 21:59:08 +02:00
compilade
90083283ec imatrix : use GGUF to store importance matrices (#9400)
* imatrix : allow processing multiple chunks per batch

* perplexity : simplify filling the batch

* imatrix : fix segfault when using a single chunk per batch

* imatrix : use GGUF to store imatrix data

* imatrix : fix conversion problems

* imatrix : use FMA and sort tensor names

* py : add requirements for legacy imatrix convert script

* perplexity : revert changes

* py : include imatrix converter requirements in toplevel requirements

* imatrix : avoid using designated initializers in C++

* imatrix : remove unused n_entries

* imatrix : allow loading mis-ordered tensors

Sums and counts tensors no longer need to be consecutive.

* imatrix : more sanity checks when loading multiple imatrix files

* imatrix : use ggml_format_name instead of std::string concatenation

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* quantize : use unused imatrix chunk_size with LLAMA_TRACE

* common : use GGUF for imatrix output by default

* imatrix : two-way conversion between old format and GGUF

* convert : remove imatrix to gguf python script

* imatrix : use the function name in more error messages

* imatrix : don't use FMA explicitly

This should make comparisons between the formats easier
because this matches the behavior of the previous version.

* imatrix : avoid returning from void function save_imatrix

* imatrix : support 3d tensors with MUL_MAT

* quantize : fix dataset name loading from gguf imatrix

* common : move string_remove_suffix from quantize and imatrix

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* imatrix : add warning when legacy format is written

* imatrix : warn when writing partial data, to help guess dataset coverage

Also make the legacy format store partial data
by using neutral values for missing data.
This matches what is done at read-time for the new format,
and so should get the same quality in case the old format is still used.

* imatrix : avoid loading model to convert or combine imatrix

* imatrix : avoid using imatrix.dat in README

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
b5942
2025-07-19 12:51:22 -04:00
Peter0x44
d4b91ea7b2 vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274) (#14707) b5941 2025-07-19 17:58:03 +02:00
0cc4m
83f5872404 Vulkan: Fix fprintf format-security warning (#14770) b5940 2025-07-19 17:47:53 +02:00
rspOverflow
f0d4d176df Documentation: Update build.md's Vulkan section (#14736)
* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation.

* Documentation: Reorganize build.md's Vulkan section.
2025-07-19 12:18:36 +02:00