f1b1d98e8d
ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:51:55 +08:00
8ef51b9055
ggml-cpu: bring back fp32->fp16 store nnpa
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:49:36 +08:00
987d1690e4
ggml-cpu: clarified vector naming
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:39:35 +08:00
4621a23c14
ggml-cpu: add 4 element loops for fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:32:20 +08:00
373fa28e4c
ggml-cpu: change to typedef vector types
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:26:20 +08:00
7413dabc8c
ggml-cpu: fix compiler types
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:23:18 +08:00
e12e9fe704
ggml-cpu: reattempt fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:20:20 +08:00
54811fc128
ggml-cpu: fix typo
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:13:57 +08:00
433d587426
ggml-cpu: reattempt fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:12:22 +08:00
946c78ebde
ggml-cpu: switch to elif macro
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 17:06:18 +08:00
27131e5f34
ggml-cpu: disable fp32->fp16 nnpa conversions for now
...
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:58:43 +08:00
4f017d718a
ggml-cpu: test fix for conversion failure
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:55:16 +08:00
5424d9e757
ggml-cpu: add breakpoint for debugging
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:51:05 +08:00
bb9345ca8a
ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:50:05 +08:00
e0f8fb930b
ggml-cpu: clarify variable naming
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:43:41 +08:00
27b4c3f338
ggml-cpu: remove noop, general code cleanup
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:41:39 +08:00
8312adc980
ggml-cpu: rework noop
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:24:32 +08:00
6d507bbeb0
ggml-cpu: switch to vec_xst for 4 element loops also
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:23:23 +08:00
f9f6c7e897
ggml-cpu: nnpa switch to vec_xst test
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:16:35 +08:00
6a25fd8531
ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:10:44 +08:00
ebc1d19f62
ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:01:55 +08:00
9330454cb8
ggml-cpu: remove sigint from fp16 store
...
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 15:06:31 +08:00
575ea9f6c6
ggml-cpu: fp16 load ensured to hit
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 15:00:46 +08:00
8f3a5af6c0
ggml-cpu: ensure fp16 and fp32 load and stores are called
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:57:25 +08:00
94f10ca189
ggml-cpu: fix float placeholder
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:53:15 +08:00
d9cc63a94a
ggml-cpu: fix print vs printf
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:51:38 +08:00
48b820d05f
ggml-cpu: add debugging prints to see if dlf16 is correct
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:50:33 +08:00
0394a006c5
docs: update s390x docs
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 01b929491b
)
2025-06-21 14:48:46 +08:00
ffe296457e
ggml-cpu: better variable names
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 2f58bbcbb8
)
2025-06-21 14:47:46 +08:00
ebf9f34a38
ggml-cpu: add fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 0ff0d65162
)
2025-06-21 14:47:23 +08:00
45a4cf651c
ggml-cpu: add fp16->fp32 nnpa first
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8d4a7987f9
)
2025-06-21 14:47:12 +08:00
5801806f70
ggml-cpu: add nnpa compile flag
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 4a9f60c201
)
2025-06-21 14:46:41 +08:00
bb16041cae
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. ( #13792 )
...
* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.
* remove #ifdef for debug utils and add queue marker.
b5731
2025-06-21 08:17:12 +02:00
58cba76a9a
gguf-py : fix TemplateProcessing pair when bos/eos is missing ( #14312 )
2025-06-21 07:33:21 +02:00
67ae5312e2
metal : fix thread-safety ( #14300 )
...
ggml-ci
b5729
2025-06-21 08:04:18 +03:00
692e3cdd0a
memory : rename interface to llama_memory_context_i ( #14296 )
...
* memory : rename interface to llama_memory_context_i
ggml-ci
* cont : fix comments
* cont : use "mctx" for referencing a memory context
ggml-ci
b5728
2025-06-21 08:03:46 +03:00
b23fa0b3f4
convert : fix Llama 4 conversion ( #14311 )
2025-06-21 06:32:01 +02:00
06cbedfca1
sync : ggml
...
ggml-ci
b5726
2025-06-20 21:02:47 +03:00
b7147673f2
Add ggml_roll
(ggml/1274)
...
* ggml : add ggml_roll
* use set/get_op_params & std::min
2025-06-20 21:02:47 +03:00
d860dd99a4
docs : fix the link to llama.h ( #14293 )
2025-06-20 19:43:35 +02:00
c959f462a0
CUDA: add conv_2d_transpose ( #14287 )
...
* CUDA: add conv_2d_transpose
* remove direct include of cuda_fp16
* Review: add brackets for readability, remove ggml_set_param and add asserts
b5723
2025-06-20 22:48:24 +08:00
22015b2092
lint : remove trailing whitepace ( #14304 )
b5722
2025-06-20 16:37:44 +02:00
dd6e6d0b6a
vocab : prevent tokenizer overflow ( #14301 )
...
* vocab : prevent stack overflow in tokenize
* vocab : return error instead of aborting on oversized token count
* vocab : INT32_MIN from llama_tokenize on overflow
b5721
2025-06-20 07:13:06 -07:00
8308f98c7f
sycl: add usage of enqueue_functions extension ( #14244 )
...
* Add header and namespace to use enqueue_functions extension
* Convert submit and parallel_for to use new extension in convert.cpp
* Convert submit and parallel_for to use extension in ggml-sycl.cpp
* Convert submit and parallel_for to use extension in gla.cpp
* Convert submit and parallel_for in mmq.cpp
* Convert submit and parallel_for in mmvq.cpp
* Convert submit and parallel_for in remaining files
* Convert all simple parallel_for to nd_launch from enqueue_functions
extension
* Wrapping extension in general function
Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com >
b5720
2025-06-20 15:07:21 +02:00
6369be0735
Implement GGML_CPU_ALL_VARIANTS for PowerPC ( #14286 )
...
* Add PowerPC feature detection and scoring
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC
* ggml-cpu: Delay some initializations until function is called
When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
b5719
2025-06-20 14:17:32 +02:00
88fc854b4b
llama : improve sep token handling ( #14272 )
b5718
2025-06-20 14:04:09 +02:00
e28c1b93fd
cuda : synchronize graph capture and cublas handle destruction ( #14288 )
...
Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
b5717
2025-06-20 13:57:36 +02:00
d27b3ca175
ggml : fix repack work size for mul_mat_id ( #14292 )
...
ggml-ci
b5716
2025-06-20 11:19:15 +03:00
9230dbe2c7
ggml: Update KleidiAI to v1.9.0 ( #14277 )
b5715
2025-06-20 10:51:01 +03:00
812939a9e9
model : more uniform output id handling ( #14275 )
...
* model : more uniform output id handling
ggml-ci
* cont : revert n_outputs < n_tokens optimization
ggml-ci
* cont : fix out_ids initialization
ggml-ci
b5714
2025-06-20 10:50:27 +03:00