Aaron Teo
ff27f80a74
ggml: initial IBM zDNN backend ( #14975 )
...
* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78
.
* ggml-zdnn: remove unused ggml_zdnn macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-08-15 21:11:22 +08:00
Sigbjørn Skjæret
d3248d9b65
ci : fix ios-xcode-build ( #15324 )
...
* fix ios-xcode-build
* use xcode-select with fixed version
* switch to macos-15 to get xcode 16.4
2025-08-15 14:02:39 +02:00
Diego Devesa
7aeee88cfe
ci : move ccache action to ggml-org fork ( #15328 )
2025-08-15 12:27:02 +02:00
uvos
29c8fbe4e0
HIP: bump requirement to rocm 6.1 ( #15296 )
2025-08-13 20:44:30 +02:00
Ali Tariq
648ebcdb73
ci : Added CI with RISC-V RVV1.0 Hardware ( #14439 )
...
* Changed the CI file to hw
* Changed the CI file to hw
* Added to sudoers for apt
* Removed the clone command and used checkout
* Added libcurl
* Added gcc-14
* Checking gcc --version
* added gcc-14 symlink
* added CC and C++ variables
* Added the gguf weight
* Changed the weights path
* Added system specification
* Removed white spaces
* ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow
Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions.
* removed trailing whitespaces
---------
Co-authored-by: Akif Ejaz <akifejaz40@gmail.com >
2025-08-13 13:14:44 +03:00
Sigbjørn Skjæret
07aa869a91
ci : add more python requirements to copilot-setup-steps ( #15289 )
...
* ci : add flake8 and pyright to copilot-setup-steps.yml
* add tools/server/tests/requirements.txt
2025-08-13 11:30:45 +02:00
Sigbjørn Skjæret
bc5182272c
ci : add copilot-setup-steps.yml ( #15214 )
2025-08-13 09:07:13 +02:00
Reese Levine
5fd160bbd9
ggml: Add basic SET_ROWS support in WebGPU ( #15137 )
...
* Begin work on set_rows
* Work on set rows
* Add error buffers for reporting unsupported SET_ROWS indices
* Remove extra comments
2025-08-06 15:14:40 -07:00
Reese Levine
9515c6131a
ggml: WebGPU disable SET_ROWS for now ( #15078 )
...
* Add paramater buffer pool, batching of submissions, refactor command building/submission
* Add header for linux builds
* Free staged parameter buffers at once
* Format with clang-format
* Fix thread-safe implementation
* Use device implicit synchronization
* Update workflow to use custom release
* Remove testing branch workflow
* Disable set_rows until it's implemented
* Fix potential issue around empty queue submission
* Try synchronous submission
* Try waiting on all futures explicitly
* Add debug
* Add more debug messages
* Work on getting ssh access for debugging
* Debug on failure
* Disable other tests
* Remove extra if
* Try more locking
* maybe passes?
* test
* Some cleanups
* Restore build file
* Remove extra testing branch ci
2025-08-05 16:26:38 -07:00
Reese Levine
587d0118f5
ggml: WebGPU backend host improvements and style fixing ( #14978 )
...
* Add parameter buffer pool, batching of submissions, refactor command building/submission
* Add header for linux builds
* Free staged parameter buffers at once
* Format with clang-format
* Fix thread-safe implementation
* Use device implicit synchronization
* Update workflow to use custom release
* Remove testing branch workflow
2025-08-04 08:52:43 -07:00
Sigbjørn Skjæret
2bf3fbf0b5
ci : check that pre-tokenizer hashes are up-to-date ( #15032 )
...
* torch is not required for convert_hf_to_gguf_update
* add --check-missing parameter
* check that pre-tokenizer hashes are up-to-date
2025-08-02 14:39:01 +02:00
R0CKSTAR
3f4fc97f1d
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-24 20:05:37 +01:00
Sigbjørn Skjæret
221c0e0c58
ci : correct label refactor->refactoring ( #14832 )
2025-07-23 14:27:54 +02:00
Sigbjørn Skjæret
1ba45d4982
ci : disable failing vulkan crossbuilds ( #14723 )
2025-07-16 20:52:08 -03:00
Reese Levine
21c021745d
ggml: Add initial WebGPU backend ( #14521 )
...
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
2025-07-16 18:18:51 +03:00
Aman Gupta
11ee0fea2a
Docs: script to auto-generate ggml operations docs ( #14598 )
...
* Docs: script to auto-generate ggml operations docs
* Review: formatting changes + change github action
* Use built-in types instead of typing
* docs : add BLAS and Metal ops
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-07-10 23:29:01 +08:00
Jeff Bolz
53903ae6fa
vulkan: increase timeout for CI ( #14574 )
2025-07-08 09:38:31 +02:00
Georgi Gerganov
d4cdd9c1c3
ggml : remove kompute backend ( #14501 )
...
ggml-ci
2025-07-03 07:48:32 +03:00
Rotem Dan
f3ed38d793
Set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory. ( #14309 )
2025-07-02 18:37:16 +02:00
Sigbjørn Skjæret
611ba4b264
ci : add OpenCL to labeler workflow ( #14496 )
2025-07-02 09:02:51 +02:00
Eric Zhang
85841e121d
github : add OpenCL backend to issue templates ( #14492 )
2025-07-02 08:41:35 +03:00
Georgi Gerganov
de56944147
ci : disable fast-math for Metal GHA CI ( #14478 )
...
* ci : disable fast-math for Metal GHA CI
ggml-ci
* cont : remove -g flag
ggml-ci
2025-07-01 18:04:08 +03:00
Sigbjørn Skjæret
6609507a91
ci : fix windows build and release ( #14431 )
2025-06-28 09:57:07 +02:00
bandoti
ce82bd0117
ci: add workflow for relocatable cmake package ( #14346 )
2025-06-23 15:30:51 -03:00
Jeff Bolz
bf2a99e3cb
vulkan: update windows SDK in release.yml ( #14344 )
2025-06-23 15:44:48 +02:00
Jeff Bolz
3a9457df96
vulkan: update windows SDK in CI ( #14334 )
2025-06-23 10:19:24 +02:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-16 08:11:43 -07:00
bandoti
0dbcabde8c
cmake: clean up external project logic for vulkan-shaders-gen ( #14179 )
...
* Remove install step for vulkan-shaders-gen
* Add install step to normalize msvc with make
* Regenerate modified shaders at build-time
2025-06-16 10:32:13 -03:00
Jeff Bolz
652b70e667
vulkan: force device 0 in CI ( #14106 )
2025-06-10 10:53:47 -05:00
Diego Devesa
7f4fbe5183
llama : allow building all tests on windows when not using shared libs ( #13980 )
...
* llama : allow building all tests on windows when not using shared libraries
* add static windows build to ci
* tests : enable debug logs for test-chat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-09 20:03:09 +02:00
Yuanhao Ji
056eb74534
CANN: Enable labeler for Ascend NPU ( #13914 )
2025-06-09 11:20:06 +08:00
吴小白
5787b5da57
ci: add LoongArch cross-compile build ( #13944 )
2025-06-07 10:39:11 -03:00
Diego Devesa
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 ( #13997 )
2025-06-04 15:37:40 +02:00
Diego Devesa
482548716f
releases : use dl backend for linux release, remove arm64 linux release ( #13996 )
2025-06-04 13:15:54 +02:00
bandoti
d98f2a35fc
ci: disable LLAMA_CURL for Linux cross-builds ( #13871 )
2025-05-28 15:46:47 -03:00
Diego Devesa
a2d02d5793
releases : bundle llvm omp library in windows release ( #13763 )
2025-05-25 00:55:16 +02:00
Diego Devesa
17fc817b58
releases : enable openmp in windows cpu backend build ( #13756 )
2025-05-24 22:27:03 +02:00
Diego Devesa
b775345d78
ci : enable winget package updates ( #13734 )
2025-05-23 23:14:00 +03:00
Diego Devesa
a70a8a69c2
ci : add winget package updater ( #13732 )
2025-05-23 22:09:38 +02:00
Diego Devesa
3079e9ac8e
release : fix windows hip release ( #13707 )
...
* release : fix windows hip release
* make single hip release with multiple targets
2025-05-23 00:21:37 +02:00
Diego Devesa
d643bb2c79
releases : build CPU backend separately (windows) ( #13642 )
2025-05-21 22:09:57 +02:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-05-21 09:58:49 +08:00
Alberto Cabrera Pérez
f71f40a284
ci : upgraded oneAPI version in SYCL workflows and dockerfile ( #13532 )
2025-05-19 11:46:09 +01:00
Diego Devesa
415e40a357
releases : use arm version of curl for arm releases ( #13592 )
2025-05-16 19:36:51 +02:00
Sigbjørn Skjæret
7c07ac244d
ci : add ppc64el to build-linux-cross ( #13575 )
2025-05-16 14:54:23 +02:00
Thammachart Chinvarapon
b064a51a4e
ci: free_disk_space flag enabled for intel variant ( #13426 )
...
before cleanup: 20G
after cleanup: 44G
after all built and pushed: 24G
https://github.com/Thammachart/llama.cpp/actions/runs/14945093573/job/41987371245
2025-05-10 16:34:48 +02:00
Jeff Bolz
dc1d2adfc0
vulkan: scalar flash attention implementation ( #13324 )
...
* vulkan: scalar flash attention implementation
* vulkan: always use fp32 for scalar flash attention
* vulkan: use vector loads in scalar flash attention shader
* vulkan: remove PV matrix, helps with register usage
* vulkan: reduce register usage in scalar FA, but perf may be slightly worse
* vulkan: load each Q value once. optimize O reduction. more tuning
* vulkan: support q4_0/q8_0 KV in scalar FA
* CI: increase timeout to accommodate newly-supported tests
* vulkan: for scalar FA, select between 1 and 8 rows
* vulkan: avoid using Float16 capability in scalar FA
2025-05-10 08:07:07 +02:00
Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes ( #13392 )
...
* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Diego Devesa
70a6991edf
ci : move release workflow to a separate file ( #13362 )
2025-05-08 13:15:28 +02:00
Diego Devesa
814f795e06
docker : disable arm64 and intel images ( #13356 )
2025-05-07 16:36:33 +02:00