Aaron Teo
8ef51b9055
ggml-cpu: bring back fp32->fp16 store nnpa
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:49:36 +08:00
Aaron Teo
987d1690e4
ggml-cpu: clarified vector naming
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:39:35 +08:00
Aaron Teo
4621a23c14
ggml-cpu: add 4 element loops for fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:32:20 +08:00
Aaron Teo
373fa28e4c
ggml-cpu: change to typedef vector types
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:26:20 +08:00
Aaron Teo
7413dabc8c
ggml-cpu: fix compiler types
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:23:18 +08:00
Aaron Teo
e12e9fe704
ggml-cpu: reattempt fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:20:20 +08:00
Aaron Teo
54811fc128
ggml-cpu: fix typo
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:13:57 +08:00
Aaron Teo
433d587426
ggml-cpu: reattempt fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:12:22 +08:00
Aaron Teo
946c78ebde
ggml-cpu: switch to elif macro
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:06:18 +08:00
Aaron Teo
27131e5f34
ggml-cpu: disable fp32->fp16 nnpa conversions for now
...
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:58:43 +08:00
Aaron Teo
4f017d718a
ggml-cpu: test fix for conversion failure
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:55:16 +08:00
Aaron Teo
5424d9e757
ggml-cpu: add breakpoint for debugging
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:51:05 +08:00
Aaron Teo
bb9345ca8a
ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:50:05 +08:00
Aaron Teo
e0f8fb930b
ggml-cpu: clarify variable naming
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:43:41 +08:00
Aaron Teo
27b4c3f338
ggml-cpu: remove noop, general code cleanup
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:41:39 +08:00
Aaron Teo
8312adc980
ggml-cpu: rework noop
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:24:32 +08:00
Aaron Teo
6d507bbeb0
ggml-cpu: switch to vec_xst for 4 element loops also
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:23:23 +08:00
Aaron Teo
f9f6c7e897
ggml-cpu: nnpa switch to vec_xst test
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:16:35 +08:00
Aaron Teo
6a25fd8531
ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:10:44 +08:00
Aaron Teo
ebc1d19f62
ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:01:55 +08:00
Aaron Teo
9330454cb8
ggml-cpu: remove sigint from fp16 store
...
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 15:06:31 +08:00
Aaron Teo
575ea9f6c6
ggml-cpu: fp16 load ensured to hit
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 15:00:46 +08:00
Aaron Teo
8f3a5af6c0
ggml-cpu: ensure fp16 and fp32 load and stores are called
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:57:25 +08:00
Aaron Teo
94f10ca189
ggml-cpu: fix float placeholder
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:53:15 +08:00
Aaron Teo
d9cc63a94a
ggml-cpu: fix print vs printf
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:51:38 +08:00
Aaron Teo
48b820d05f
ggml-cpu: add debugging prints to see if dlf16 is correct
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:50:33 +08:00
Aaron Teo
ffe296457e
ggml-cpu: better variable names
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 2f58bbcbb8
)
2025-06-21 14:47:46 +08:00
Aaron Teo
ebf9f34a38
ggml-cpu: add fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 0ff0d65162
)
2025-06-21 14:47:23 +08:00
Aaron Teo
45a4cf651c
ggml-cpu: add fp16->fp32 nnpa first
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8d4a7987f9
)
2025-06-21 14:47:12 +08:00
Aaron Teo
5801806f70
ggml-cpu: add nnpa compile flag
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 4a9f60c201
)
2025-06-21 14:46:41 +08:00
Markus Tavenrath
bb16041cae
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. ( #13792 )
...
* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.
* remove #ifdef for debug utils and add queue marker.
2025-06-21 08:17:12 +02:00
Georgi Gerganov
67ae5312e2
metal : fix thread-safety ( #14300 )
...
ggml-ci
2025-06-21 08:04:18 +03:00
Acly
b7147673f2
Add ggml_roll
(ggml/1274)
...
* ggml : add ggml_roll
* use set/get_op_params & std::min
2025-06-20 21:02:47 +03:00
Aman Gupta
c959f462a0
CUDA: add conv_2d_transpose ( #14287 )
...
* CUDA: add conv_2d_transpose
* remove direct include of cuda_fp16
* Review: add brackets for readability, remove ggml_set_param and add asserts
2025-06-20 22:48:24 +08:00
Nicolò Scipione
8308f98c7f
sycl: add usage of enqueue_functions extension ( #14244 )
...
* Add header and namespace to use enqueue_functions extension
* Convert submit and parallel_for to use new extension in convert.cpp
* Convert submit and parallel_for to use extension in ggml-sycl.cpp
* Convert submit and parallel_for to use extension in gla.cpp
* Convert submit and parallel_for in mmq.cpp
* Convert submit and parallel_for in mmvq.cpp
* Convert submit and parallel_for in remaining files
* Convert all simple parallel_for to nd_launch from enqueue_functions
extension
* Wrapping extension in general function
Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com >
2025-06-20 15:07:21 +02:00
Christian Kastner
6369be0735
Implement GGML_CPU_ALL_VARIANTS for PowerPC ( #14286 )
...
* Add PowerPC feature detection and scoring
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC
* ggml-cpu: Delay some initializations until function is called
When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-06-20 14:17:32 +02:00
Diego Devesa
e28c1b93fd
cuda : synchronize graph capture and cublas handle destruction ( #14288 )
...
Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
2025-06-20 13:57:36 +02:00
Georgi Gerganov
d27b3ca175
ggml : fix repack work size for mul_mat_id ( #14292 )
...
ggml-ci
2025-06-20 11:19:15 +03:00
Charles Xu
9230dbe2c7
ggml: Update KleidiAI to v1.9.0 ( #14277 )
2025-06-20 10:51:01 +03:00
Aman Gupta
9eaa51e7f0
CUDA: add conv_2d_dw ( #14265 )
...
* CUDA: add conv_2d_dw
* better naming
* simplify using template
* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const
2025-06-20 09:50:24 +08:00
Diego Devesa
8f71d0f3e8
ggml-cpu : remove unnecesary arm feature detection ( #14281 )
...
Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.
2025-06-19 21:24:14 +02:00
fanyang
456af35eb7
build : suppress gcc15 compile warnings ( #14261 )
...
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-19 14:49:48 +02:00
Anton Mitkov
600e3e9b50
sycl: Cleanup codepaths in Get Rows in sycl backend ( #14215 )
...
Addresses unused reorder path
2025-06-19 11:40:21 +01:00
Aaron Teo
faed5a5f5d
llamafile : support s390x SIMD instruction set ( #14273 )
2025-06-19 11:48:54 +02:00
0cc4m
10bb545c5b
Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer ( #14249 )
2025-06-19 09:15:42 +02:00
Georgi Gerganov
ed3290ab34
metal : add mean kernel ( #14267 )
...
* metal : add mean kernel
ggml-ci
* cont : dedup implementation
ggml-ci
2025-06-19 08:05:21 +03:00
Aaron Teo
50d2227953
ggml-cpu: reduce asm calls for hsum ( #14037 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:10:08 +01:00
Aaron Teo
6231c5cd6d
ggml-cpu: fix uncaught underscore terminators ( #14023 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:06:49 +01:00
Charles Xu
ef035803eb
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS ( #14258 )
2025-06-18 12:40:07 +01:00
Daniel Bevenius
dd8e59f443
ggml : disable warnings for tests when using MSVC (ggml/1273)
...
* ggml : disable warnings for tests when using MSVC
This commit disables warnings for tests on windows when using MSVC.
The motivation for this is that this brings the build output more
inline with what Linux/MacOS systems produce.
There is still one warning generated for the tests which is:
```console
Building Custom Rule C:/ggml/tests/CMakeLists.txt
cl : command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG'
[C:\ggml\build\tests\test-arange.vcxproj]
test-arange.cpp
test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe
```
* ggml : fix typo in tests disable list
2025-06-18 09:59:21 +03:00