llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-30 06:03:37 -04:00

Author	SHA1	Message	Date
Aaron Teo	8ef51b9055	ggml-cpu: bring back fp32->fp16 store nnpa Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:49:36 +08:00
Aaron Teo	987d1690e4	ggml-cpu: clarified vector naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:39:35 +08:00
Aaron Teo	4621a23c14	ggml-cpu: add 4 element loops for fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:32:20 +08:00
Aaron Teo	373fa28e4c	ggml-cpu: change to typedef vector types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:26:20 +08:00
Aaron Teo	7413dabc8c	ggml-cpu: fix compiler types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:23:18 +08:00
Aaron Teo	e12e9fe704	ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:20:20 +08:00
Aaron Teo	54811fc128	ggml-cpu: fix typo Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:13:57 +08:00
Aaron Teo	433d587426	ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:12:22 +08:00
Aaron Teo	946c78ebde	ggml-cpu: switch to elif macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 17:06:18 +08:00
Aaron Teo	27131e5f34	ggml-cpu: disable fp32->fp16 nnpa conversions for now there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:58:43 +08:00
Aaron Teo	4f017d718a	ggml-cpu: test fix for conversion failure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:55:16 +08:00
Aaron Teo	5424d9e757	ggml-cpu: add breakpoint for debugging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:51:05 +08:00
Aaron Teo	bb9345ca8a	ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:50:05 +08:00
Aaron Teo	e0f8fb930b	ggml-cpu: clarify variable naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:43:41 +08:00
Aaron Teo	27b4c3f338	ggml-cpu: remove noop, general code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:41:39 +08:00
Aaron Teo	8312adc980	ggml-cpu: rework noop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:24:32 +08:00
Aaron Teo	6d507bbeb0	ggml-cpu: switch to vec_xst for 4 element loops also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:23:23 +08:00
Aaron Teo	f9f6c7e897	ggml-cpu: nnpa switch to vec_xst test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:16:35 +08:00
Aaron Teo	6a25fd8531	ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:10:44 +08:00
Aaron Teo	ebc1d19f62	ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:01:55 +08:00
Aaron Teo	9330454cb8	ggml-cpu: remove sigint from fp16 store for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 15:06:31 +08:00
Aaron Teo	575ea9f6c6	ggml-cpu: fp16 load ensured to hit Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 15:00:46 +08:00
Aaron Teo	8f3a5af6c0	ggml-cpu: ensure fp16 and fp32 load and stores are called Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:57:25 +08:00
Aaron Teo	94f10ca189	ggml-cpu: fix float placeholder Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:53:15 +08:00
Aaron Teo	d9cc63a94a	ggml-cpu: fix print vs printf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:51:38 +08:00
Aaron Teo	48b820d05f	ggml-cpu: add debugging prints to see if dlf16 is correct Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:50:33 +08:00
Aaron Teo	ffe296457e	ggml-cpu: better variable names Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `2f58bbcbb8`)	2025-06-21 14:47:46 +08:00
Aaron Teo	ebf9f34a38	ggml-cpu: add fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `0ff0d65162`)	2025-06-21 14:47:23 +08:00
Aaron Teo	45a4cf651c	ggml-cpu: add fp16->fp32 nnpa first Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `8d4a7987f9`)	2025-06-21 14:47:12 +08:00
Aaron Teo	5801806f70	ggml-cpu: add nnpa compile flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `4a9f60c201`)	2025-06-21 14:46:41 +08:00
Markus Tavenrath	bb16041cae	Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792 ) * Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled. * remove #ifdef for debug utils and add queue marker.	2025-06-21 08:17:12 +02:00
Georgi Gerganov	67ae5312e2	metal : fix thread-safety (#14300 ) ggml-ci	2025-06-21 08:04:18 +03:00
Acly	b7147673f2	Add `ggml_roll` (ggml/1274) * ggml : add ggml_roll * use set/get_op_params & std::min	2025-06-20 21:02:47 +03:00
Aman Gupta	c959f462a0	CUDA: add conv_2d_transpose (#14287 ) * CUDA: add conv_2d_transpose * remove direct include of cuda_fp16 * Review: add brackets for readability, remove ggml_set_param and add asserts	2025-06-20 22:48:24 +08:00
Nicolò Scipione	8308f98c7f	sycl: add usage of enqueue_functions extension (#14244 ) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-20 15:07:21 +02:00
Christian Kastner	6369be0735	Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286 ) * Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-20 14:17:32 +02:00
Diego Devesa	e28c1b93fd	cuda : synchronize graph capture and cublas handle destruction (#14288 ) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread	2025-06-20 13:57:36 +02:00
Georgi Gerganov	d27b3ca175	ggml : fix repack work size for mul_mat_id (#14292 ) ggml-ci	2025-06-20 11:19:15 +03:00
Charles Xu	9230dbe2c7	ggml: Update KleidiAI to v1.9.0 (#14277 )	2025-06-20 10:51:01 +03:00
Aman Gupta	9eaa51e7f0	CUDA: add conv_2d_dw (#14265 ) * CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const	2025-06-20 09:50:24 +08:00
Diego Devesa	8f71d0f3e8	ggml-cpu : remove unnecesary arm feature detection (#14281 ) Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.	2025-06-19 21:24:14 +02:00
fanyang	456af35eb7	build : suppress gcc15 compile warnings (#14261 ) * Change _contains_any() substrs to std::string_view and fix the find comparison logic.	2025-06-19 14:49:48 +02:00
Anton Mitkov	600e3e9b50	sycl: Cleanup codepaths in Get Rows in sycl backend (#14215 ) Addresses unused reorder path	2025-06-19 11:40:21 +01:00
Aaron Teo	faed5a5f5d	llamafile : support s390x SIMD instruction set (#14273 )	2025-06-19 11:48:54 +02:00
0cc4m	10bb545c5b	Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249 )	2025-06-19 09:15:42 +02:00
Georgi Gerganov	ed3290ab34	metal : add mean kernel (#14267 ) * metal : add mean kernel ggml-ci * cont : dedup implementation ggml-ci	2025-06-19 08:05:21 +03:00
Aaron Teo	50d2227953	ggml-cpu: reduce asm calls for hsum (#14037 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-18 18:10:08 +01:00
Aaron Teo	6231c5cd6d	ggml-cpu: fix uncaught underscore terminators (#14023 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-18 18:06:49 +01:00
Charles Xu	ef035803eb	ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258 )	2025-06-18 12:40:07 +01:00
Daniel Bevenius	dd8e59f443	ggml : disable warnings for tests when using MSVC (ggml/1273) * ggml : disable warnings for tests when using MSVC This commit disables warnings for tests on windows when using MSVC. The motivation for this is that this brings the build output more inline with what Linux/MacOS systems produce. There is still one warning generated for the tests which is: ```console Building Custom Rule C:/ggml/tests/CMakeLists.txt cl : command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG' [C:\ggml\build\tests\test-arange.vcxproj] test-arange.cpp test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe ``` * ggml : fix typo in tests disable list	2025-06-18 09:59:21 +03:00

1 2 3 4 5 ...

994 Commits