704bb7a71c
SYCL: Initial set_rows kernel implementation ( #14562 )
...
* SYCL: Initial set_rows kernel implementation
* Revert max_threads to 256
* Refactor set_rows and address review comments
* Deduplicate conversion function
* Remove guard before kernel launch and refactor
* Fix and add back SFINAE
2025-07-10 09:29:38 +01:00
17512a94d6
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs ( #12858 )
...
* sycl : Implemented reorder Q4_0 mmvq
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com >
* sycl : Fixed mmvq being called when reorder is disabled
* sycl : Improved comments in the quants header
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com >
* Use static_assert
* safe_div -> ceil_div
* Clarify qi comment
* change the reorder tensor from init to execute OP
* dbg
* Undo changes to test-backend-ops
* Refactor changes on top of q4_0 reorder fix
* Missing Reverts
* Refactored opt_for_reorder logic to simplify code path
* Explicit inlining and unroll
* Renamed mul_mat_algo enum for consistency
---------
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com >
Co-authored-by: romain.biessy <romain.biessy@codeplay.com >
2025-05-09 16:34:08 +01:00
8d66005763
SYCL: Refactor and enable FP16 in binary broadcast OPs ( #12975 )
...
* SYCL: refactor move to a separate file
* Fix binbcast
* Remove duplicates
* fix include formatting
* fix typo
2025-04-18 15:57:56 +02:00
7dfad387e3
llama: Add support for RWKV v7 architecture ( #12412 )
...
* ggml: Add op l2_norm
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* ggml: Add op rwkv_wkv7
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: Add support for RWKV7 and ARWKV7 models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix inference with RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: add more (a)rwkv7 variants in size
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Apply code-format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* fix MUSA build
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix shape error with rwkv using llama-parallel
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
2025-03-18 07:27:50 +08:00
ece9745bb8
SYCL: Move CPY kernels to a separate file and add few missing kernels ( #12133 )
...
* SYCL: refactor and move cpy kernels to a separate file
* Add few missing cpy kernels
* refactor and add debug logs
2025-03-03 11:07:22 +01:00
f446c2cf6a
SYCL: Add gated linear attention kernel ( #11175 )
...
* SYCL: Add Gated Linear attention kernel
* glahpp: add a space at the end of file
* gla: Put the barrier inside the main logic loop
2025-01-15 11:20:17 +08:00
3bcd40b3c5
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ( #10133 )
...
* rwkv6: rename to wkv6
* rwkv6: support avx2 avx512 armv8 armv9
* rwkv6: update cuda file name
* rwkv6: rename params
* wkv on sycl
* sycl: add some ops
* sycl: Enhance OP support judgment
* wkv6: drop armv9 and tranfer to GGML style
ggml-ci
* sync : ggml
* update the function to use appropriate types
* fix define error
* Update ggml/src/ggml-cpu.c
* add appropriate asserts
* move element-wise functions outside
* put the declaration outside the loop
* rewrite to be more inline with the common pattern for distributing threads
* use recommended way GGML_TENSOR_LOCALS
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
Co-authored-by: Plamen Minev <pacominev@gmail.com >
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com >
Co-authored-by: Meng, Hengyu <airdldl@163.com >
2024-11-07 15:19:10 +08:00
4f8d19ff17
[SYCL] Fix SYCL im2col
and convert
Overflow with Large Dims ( #9052 )
...
* sycl: fix im2col overflow and sync with cuda
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* sycl: fix convert overflow
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* sycl: fix convert and dequantize
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* sycl: fix ib in dmmv
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* sycl:refine convert
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* sycl: move downsample global_range into common
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* test: add im2col and convert test cases
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* test: make new cases only in sycl
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
* test: comment new test_cases for only local testing
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
---------
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
2024-08-20 23:06:51 +08:00
c887d8b017
[SYCL] Add TIMESTEP_EMBEDDING
OP ( #8707 )
...
Signed-off-by: zhentaoyu <zhentao.yu@intel.com >
2024-07-30 14:56:51 +08:00
0832de7236
[SYCL] add conv support ( #8688 )
2024-07-29 10:50:27 +08:00
16bdfa42ac
[SYCL] add concat through dim 1/2 ( #8483 )
...
* add concat through dim 1/2
2024-07-15 19:32:15 +08:00
a9554e20b6
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU ( #8266 )
...
* fix group_norm ut
* split softmax
* fix softmax
* add concat support condition
* revert debug code
* move QK_WARP_SIZE to presets.hpp
2024-07-05 13:06:13 +08:00
d08c20edde
[SYCL] Fix the sub group size of Intel ( #8106 )
...
* use warp_size macro for all sycl kernels
* fix mask of permute_sub_group_by_xor
* fix rms_norm with correct warp number
* fix rms_norm_f32/group_norm_f32
* move norm to norm.cpp file
* fix quantize bug
* fix mmvq's batch size
2024-07-02 10:16:00 +08:00
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor ( #8157 )
...
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com >
2024-06-26 18:33:02 +03:00