llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-31 06:34:56 -04:00

Author	SHA1	Message	Date
Aaron Teo	48df977079	Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit `157f856c34`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 19:03:09 +08:00
Aaron Teo	157f856c34	ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 19:00:20 +08:00
Aaron Teo	e7910fc975	ggml-cpu: update macro tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 18:43:43 +08:00
Aaron Teo	8129838037	ggml-cpu: import vecintrin.h to fix compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 18:42:02 +08:00
Aaron Teo	4ad6efa37b	ggml-cpu: diagnose why __NNPA__ macro is not being defined Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 18:33:08 +08:00
Aaron Teo	0e571dd3d8	ggml-cpu: add missing __func__ Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 18:10:26 +08:00
Aaron Teo	1547ea230c	ggml-cpu: add nnpa macro check in ggml-impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 18:09:37 +08:00
Aaron Teo	f1b1d98e8d	ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:51:55 +08:00
Aaron Teo	8ef51b9055	ggml-cpu: bring back fp32->fp16 store nnpa Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:49:36 +08:00
Aaron Teo	987d1690e4	ggml-cpu: clarified vector naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:39:35 +08:00
Aaron Teo	4621a23c14	ggml-cpu: add 4 element loops for fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:32:20 +08:00
Aaron Teo	373fa28e4c	ggml-cpu: change to typedef vector types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:26:20 +08:00
Aaron Teo	7413dabc8c	ggml-cpu: fix compiler types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:23:18 +08:00
Aaron Teo	e12e9fe704	ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:20:20 +08:00
Aaron Teo	54811fc128	ggml-cpu: fix typo Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:13:57 +08:00
Aaron Teo	433d587426	ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:12:22 +08:00
Aaron Teo	946c78ebde	ggml-cpu: switch to elif macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:06:18 +08:00
Aaron Teo	27131e5f34	ggml-cpu: disable fp32->fp16 nnpa conversions for now there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:58:43 +08:00
Aaron Teo	4f017d718a	ggml-cpu: test fix for conversion failure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:55:16 +08:00
Aaron Teo	5424d9e757	ggml-cpu: add breakpoint for debugging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:51:05 +08:00
Aaron Teo	bb9345ca8a	ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:50:05 +08:00
Aaron Teo	e0f8fb930b	ggml-cpu: clarify variable naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:43:41 +08:00
Aaron Teo	27b4c3f338	ggml-cpu: remove noop, general code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:41:39 +08:00
Aaron Teo	8312adc980	ggml-cpu: rework noop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:24:32 +08:00
Aaron Teo	6d507bbeb0	ggml-cpu: switch to vec_xst for 4 element loops also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:23:23 +08:00
Aaron Teo	f9f6c7e897	ggml-cpu: nnpa switch to vec_xst test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:16:35 +08:00
Aaron Teo	6a25fd8531	ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:10:44 +08:00
Aaron Teo	ebc1d19f62	ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:01:55 +08:00
Aaron Teo	9330454cb8	ggml-cpu: remove sigint from fp16 store for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 15:06:31 +08:00
Aaron Teo	575ea9f6c6	ggml-cpu: fp16 load ensured to hit Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 15:00:46 +08:00
Aaron Teo	8f3a5af6c0	ggml-cpu: ensure fp16 and fp32 load and stores are called Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:57:25 +08:00
Aaron Teo	94f10ca189	ggml-cpu: fix float placeholder Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:53:15 +08:00
Aaron Teo	d9cc63a94a	ggml-cpu: fix print vs printf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:51:38 +08:00
Aaron Teo	48b820d05f	ggml-cpu: add debugging prints to see if dlf16 is correct Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:50:33 +08:00
Aaron Teo	ffe296457e	ggml-cpu: better variable names Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `2f58bbcbb8`)	2025-06-21 14:47:46 +08:00
Aaron Teo	ebf9f34a38	ggml-cpu: add fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `0ff0d65162`)	2025-06-21 14:47:23 +08:00
Aaron Teo	45a4cf651c	ggml-cpu: add fp16->fp32 nnpa first Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `8d4a7987f9`)	2025-06-21 14:47:12 +08:00
Aaron Teo	5801806f70	ggml-cpu: add nnpa compile flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `4a9f60c201`)	2025-06-21 14:46:41 +08:00
Markus Tavenrath	bb16041cae	Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792 ) * Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled. * remove #ifdef for debug utils and add queue marker.	2025-06-21 08:17:12 +02:00
Georgi Gerganov	67ae5312e2	metal : fix thread-safety (#14300 ) ggml-ci	2025-06-21 08:04:18 +03:00
Acly	b7147673f2	Add `ggml_roll` (ggml/1274) * ggml : add ggml_roll * use set/get_op_params & std::min	2025-06-20 21:02:47 +03:00
Aman Gupta	c959f462a0	CUDA: add conv_2d_transpose (#14287 ) * CUDA: add conv_2d_transpose * remove direct include of cuda_fp16 * Review: add brackets for readability, remove ggml_set_param and add asserts	2025-06-20 22:48:24 +08:00
Nicolò Scipione	8308f98c7f	sycl: add usage of enqueue_functions extension (#14244 ) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-20 15:07:21 +02:00
Christian Kastner	6369be0735	Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286 ) * Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-20 14:17:32 +02:00
Diego Devesa	e28c1b93fd	cuda : synchronize graph capture and cublas handle destruction (#14288 ) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread	2025-06-20 13:57:36 +02:00
Georgi Gerganov	d27b3ca175	ggml : fix repack work size for mul_mat_id (#14292 ) ggml-ci	2025-06-20 11:19:15 +03:00
Charles Xu	9230dbe2c7	ggml: Update KleidiAI to v1.9.0 (#14277 )	2025-06-20 10:51:01 +03:00
Aman Gupta	9eaa51e7f0	CUDA: add conv_2d_dw (#14265 ) * CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const	2025-06-20 09:50:24 +08:00
Diego Devesa	8f71d0f3e8	ggml-cpu : remove unnecesary arm feature detection (#14281 ) Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.	2025-06-19 21:24:14 +02:00
fanyang	456af35eb7	build : suppress gcc15 compile warnings (#14261 ) * Change _contains_any() substrs to std::string_view and fix the find comparison logic.	2025-06-19 14:49:48 +02:00

1 2 3 4 5 ...

1002 Commits