llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-27 11:43:35 -04:00

Author	SHA1	Message	Date
Aaron Teo	412f4c7c88	ggml-cpu: disable ggml-nnpa compile flag by default fixes #14877 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-25 21:26:58 +08:00
Oliver Simons	2177ccdc41	ggml : remove invalid portPos specifiers from dot files (#14838 ) Neither "g" nor "x" are valid portPos specifiers per the official [graphviz documents](https://graphviz.org/docs/attr-types/portPos/): > If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_". I tested locally for it to fall back to default portPos specifier if an invalid portPos is specified. As a consequence, we can remove associated code.	2025-07-25 21:24:51 +08:00
Georgi Gerganov	a6357ac39e	context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870 ) ggml-ci	2025-07-25 21:24:51 +08:00
kiwi	092c1bd385	mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503 ) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab	2025-07-25 21:24:51 +08:00
Chris Rohlf	328ed53601	rpc : check for null buffers in get/set/copy tensor endpoints (#14868 )	2025-07-25 21:24:51 +08:00
Diego Devesa	a12209588e	sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855 ) ggml-ci	2025-07-25 21:24:51 +08:00
R0CKSTAR	caaebfe425	musa: upgrade musa sdk to rc4.2.0 (#14498 ) * musa: apply mublas API changes Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: update musa version to 4.2.0 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: restore MUSA graph settings in CMakeLists.txt Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: disable mudnnMemcpyAsync by default Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: switch back to non-mudnn images Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * minor changes Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: restore rc in docker image tag Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-25 21:24:51 +08:00
Georgi Gerganov	45c2cc370c	sync : ggml ggml-ci	2025-07-25 21:24:51 +08:00
Kai Pastor	7902541d2e	cmake : fix usage issues (ggml/1257) * CMake config: Create target only once Fix error on repeated find_package(ggml). For simplicity, check only for the top-level ggml::ggml. * CMake config: Add CUDA link libs * CMake config: Add OpenCL link libs * CMake config: Use canonical find_dependency Use set and append to control link lib variables. Apply more $<LINK_ONLY...>. * CMake config: Wire OpenMP dependency	2025-07-25 21:24:51 +08:00
Daniel Bevenius	4601f396e6	ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) This commit removes the inclusion of `<cstdlib>`. The motivation for this change is that this source file does not seem to use any functions from this header and the comment about `qsort` is a little misleading/confusing.	2025-07-25 21:24:51 +08:00
Georgi Gerganov	7c5ca60b12	context : perform output reorder lazily upon access after sync (#14853 ) * context : perform output reorder after lazily upon access after sync ggml-ci * cont : add TODO	2025-07-25 21:24:51 +08:00
Xuan-Son Nguyen	c1d4ffc553	chat : fix kimi-k2 chat template (#14852 )	2025-07-25 21:24:51 +08:00
Alberto Cabrera Pérez	07a49304ad	sycl: fixed semantics of block offset calculation (#14814 )	2025-07-25 21:24:50 +08:00
yummy	6286ad25d1	llama : fix MiniCPM inference after Granite Four changes (#14850 ) MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.	2025-07-25 21:24:50 +08:00
Pouya	63b420bf9a	docs: add libcurl-dev install hint for Linux distros (#14801 ) * docs: add libcurl-dev install hint for Linux distros Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com> * Update docs/build.md --------- Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-07-25 21:24:50 +08:00
Georgi Gerganov	e84b9110f7	metal : fix fusion across different encoders (#14849 ) * metal : fix fusion across different encoders ggml-ci * cont : add assertion ggml-ci	2025-07-25 21:24:50 +08:00
Donghyeon Jeong	7234b891ad	sycl: fix undefined variable in work group size check (#14843 )	2025-07-25 21:24:50 +08:00
jacekpoplawski	bd060d6036	convert : text-only support for GLM-4.1V-9B-Thinking (#14823 ) * use language_model part only, ignore visual layers * fix rope_dim calculation	2025-07-25 21:24:50 +08:00
Johannes Gäßler	5ad021f924	CUDA: fix overflow in FA, tune performance (#14840 )	2025-07-25 21:24:50 +08:00
Johannes Gäßler	9db975e327	CUDA: fix compilation with GGML_CUDA_F16 (#14837 )	2025-07-25 21:24:50 +08:00
Sigbjørn Skjæret	a3ddddbe02	ci : correct label refactor->refactoring (#14832 )	2025-07-25 21:24:50 +08:00
Johannes Gäßler	7473a0d07c	CUDA: fix quantized KV cache + multiple sequences (#14822 ) * CUDA: fix quantized KV cache + multiple sequences * Update ggml/src/ggml-cuda/fattn-common.cuh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-25 21:24:50 +08:00
Georgi Gerganov	90916df84b	tests : add non-cont K,V FA tests ggml-ci	2025-07-25 21:24:50 +08:00
l3utterfly	e0f261585b	memory : handle saving/loading null layers in recurrent memory (#14675 ) * Update llama-memory-recurrent.cpp handle saving/loading null layers in recurrent memory * fixed styling issues and updated comments * fix styling issue Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-25 21:24:50 +08:00
lixing-star	bd3c22a666	ggml: fix loongarch quantize_row_q8_1 error (#14827 )	2025-07-25 21:24:50 +08:00
chen fan	ef6198b5a5	CANN: weight format to NZ for Ascend310P3 (#14407 ) * weight format to nz for 310p * remove quant weight format to nz * clean code * fix * make the conditions for converting weights to NZ format consistent * clean code	2025-07-25 21:24:50 +08:00
Aman Gupta	1e55890e40	CUDA: add fused rms norm (#14800 )	2025-07-25 21:24:50 +08:00
Csaba Kecskemeti	9b5125679c	ggml : model card yaml tab->2xspace (#14819 )	2025-07-25 21:24:50 +08:00
Jeff Bolz	44d4801a25	vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817 )	2025-07-25 21:24:50 +08:00
Molly Sophia	10a676558d	llama : add model type detection for rwkv7 7B&14B (#14816 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-07-25 21:24:50 +08:00
Ed Addario	45fc00e2c0	imatrix: add option to display importance score statistics for a given imatrix file (#12718 ) * Add --show-statistics option * Add --show-statistics logic * Add tensor name parsing * Tidy output format * Fix typo in title * Improve tensor influence ranking * Add better statistics * Change statistics' sort order * Add Cosine Similarity * Add header search path * Change header search path to private * Add weighted statistics per layer * Update report title * Refactor compute_statistics out of main * Refactor compute_cossim out of load_imatrix * Refactor compute_statistics out of load_imatrix * Move imatrix statistics calculation into its own functions * Add checks and validations * Remove unnecessary include directory * Rename labels * Add m_stats getter and refactor compute_statistics out of load_imatrix * Refactor variable names * Minor cosmetic change * Retrigger checks (empty commit) * Rerun checks (empty commit) * Fix unnecessary type promotion Co-authored-by: compilade <git@compilade.net> * Reverting change to improve code readability * Rerun checks (empty commit) * Rerun checks (empty commit) * Rerun checks - third time's the Charm 🤞 (empty commit) * Minor cosmetic change * Update README * Fix typo * Update README * Rerun checks (empty commit) * Re-implement changes on top of #9400 * Update README.md * Update README * Update README.md Co-authored-by: compilade <git@compilade.net> * Update README.md Co-authored-by: compilade <git@compilade.net> * Update README.md * Remove duplicate option in print_usage() * Update README.md * Update README.md Co-authored-by: compilade <git@compilade.net> * Update README.md Co-authored-by: compilade <git@compilade.net> * Remove input check * Remove commented out code --------- Co-authored-by: compilade <git@compilade.net>	2025-07-25 21:24:50 +08:00
stduhpf	888b75ba61	Mtmd: add a way to select device for vision encoder (#14236 ) * Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr	2025-07-25 21:24:50 +08:00
Sigbjørn Skjæret	4c94f27ab7	cuda : implement bf16 cpy ops and enable bf16 cont (#14763 ) * implement bf16 cpy ops and enable bf16 cont * deduplicate copy functions * deduplicate checks	2025-07-25 21:24:50 +08:00
lhez	1e54562db3	opencl: remove unreachable `return` (#14806 )	2025-07-25 21:24:50 +08:00
Molly Sophia	0dd3cd5540	server : allow setting `--reverse-prompt` arg (#14799 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-07-25 21:24:50 +08:00
R0CKSTAR	9e500e2355	cuda: remove linking to cublasLt (#14790 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-25 21:24:50 +08:00
Sigbjørn Skjæret	e77f241b84	opencl: fix `im2col` when `KW!=KH` (#14803 )	2025-07-25 21:24:50 +08:00
rmatif	120add9ef4	opencl: add conv2d kernel (#14403 ) * add conv2d kernel * fix trailing whitespace * whitespace fixe * handle f16 input and f16 kernel, more opt * resolve conflicts * use enqueue_ndrange_kernel	2025-07-25 21:24:50 +08:00
Romain Biessy	f04095bde9	sycl: Fix im2col (#14797 )	2025-07-25 21:24:50 +08:00
Charles Xu	549f9eb1b5	kleidiai: add support for get_rows (#14676 ) * kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review	2025-07-25 21:24:50 +08:00
Radoslav Gerganov	ae77ded2c2	docs : fix backends table in README.md (#14796 )	2025-07-25 21:24:50 +08:00
Jeff Bolz	a2cdf559c2	vulkan/cuda: Fix im2col when KW!=KH (#14789 ) The tid is decomposed into "ow + kyOW + kxOW*KH". Change "ksize" to match.	2025-07-25 21:24:49 +08:00
Aaron Teo	8410b085ea	docs: update huggingface links + reword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-21 18:31:18 +08:00
Aaron Teo	e086c5e3a7	docs: update s390x document for sentencepiece Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-21 18:21:39 +08:00
Molly Sophia	c82d48ec23	llama : fix `--reverse-prompt` crashing issue (#14794 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com> b5949	2025-07-21 17:38:36 +08:00
IsaacDynamo	b4efd77f8a	server : add parse_special option to /tokenize endpoint (#14783 )	2025-07-21 10:24:51 +03:00
Aman Gupta	2be60cbc27	docs : fix link for tools/perplexity in README.md (#14780 )	2025-07-20 20:13:47 +02:00
rspOverflow	b526ad2668	Documentation: Further revisions to the Vulkan section in build.md (#14785 ) * Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md. * Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md	2025-07-20 18:55:32 +02:00
Aman Gupta	938b785764	Clang-format: local files first + fix BinPacking (#14779 )	2025-07-20 19:42:34 +08:00
0cc4m	36c153248f	Contrib: add 0cc4m as codeowner for Vulkan backend (#14775 )	2025-07-19 23:47:21 +03:00

1 2 3 4 5 ...

5993 Commits