llama.cpp/ggml-sycl at 228f34c9ceefa3ea4f4d6933edd858121e8106cb - llama.cpp - Cat's Mantra

tqcq/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-14 20:29:41 -04:00

Files

History

Akarshan Biswas 228f34c9ce SYCL: Implement few same quantized type copy kernels (#13739 )

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

2025-06-07 18:58:20 +05:30

..

SYCL: Rename oneMKL to oneMath (#12192 )

2025-04-01 16:24:29 +08:00

backend.hpp

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858 )

2025-05-09 16:34:08 +01:00

binbcast.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

binbcast.hpp

SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975 )

2025-04-18 15:57:56 +02:00

CMakeLists.txt

cmake : Fix broken CMake error messages (ggml/1252)

2025-06-01 13:43:57 +03:00

common.cpp

SYCL: Remove misleading ggml_sycl_op_flatten function (#12387 )

2025-03-31 11:25:24 +02:00

common.hpp

SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611 )

2025-05-26 21:10:36 +05:30

concat.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

concat.hpp

SYCL: Refactor ggml_sycl_compute_forward (#11121 )

2025-01-10 08:13:03 +08:00

conv.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

conv.hpp

SYCL: Refactor ggml_sycl_compute_forward (#11121 )

2025-01-10 08:13:03 +08:00

convert.cpp

sycl: reordered Q4_K MMVQ (#13109 )

2025-05-15 17:35:44 +02:00

convert.hpp

sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343 )

2025-05-08 10:08:01 +01:00

cpy.cpp

SYCL: Implement few same quantized type copy kernels (#13739 )

2025-06-07 18:58:20 +05:30

cpy.hpp

SYCL: Move CPY kernels to a separate file and add few missing kernels (#12133 )

2025-03-03 11:07:22 +01:00

dequantize.hpp

sycl: reordered Q4_K MMVQ (#13109 )

2025-05-15 17:35:44 +02:00

dmmv.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

dmmv.hpp

…

element_wise.cpp

SYCL: add gelu_erf kernel (#13749 )

2025-05-27 20:52:59 +05:30

element_wise.hpp

SYCL: add gelu_erf kernel (#13749 )

2025-05-27 20:52:59 +05:30

gemm.hpp

sycl: use oneDNN for matrices multiplication (#12972 )

2025-05-15 16:53:41 +02:00

getrows.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

getrows.hpp

SYCL: Remove misleading ggml_sycl_op_flatten function (#12387 )

2025-03-31 11:25:24 +02:00

ggml-sycl.cpp

SYCL: Implement few same quantized type copy kernels (#13739 )

2025-06-07 18:58:20 +05:30

gla.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

gla.hpp

SYCL: Add gated linear attention kernel (#11175 )

2025-01-15 11:20:17 +08:00

im2col.cpp

SYCL: Fix im2col (#12910 )

2025-04-14 14:23:53 +02:00

im2col.hpp

SYCL: Remove misleading ggml_sycl_op_flatten function (#12387 )

2025-03-31 11:25:24 +02:00

mmq.cpp

fixed compilation warnings in ggml-sycl (#12424 )

2025-03-18 08:51:25 +08:00

mmq.hpp

…

mmvq.cpp

sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826 )

2025-06-02 10:12:20 +01:00

mmvq.hpp

…

norm.cpp

SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611 )

2025-05-26 21:10:36 +05:30

norm.hpp

SYCL: Remove misleading ggml_sycl_op_flatten function (#12387 )

2025-03-31 11:25:24 +02:00

outprod.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

outprod.hpp

SYCL: Refactor ggml_sycl_compute_forward (#11121 )

2025-01-10 08:13:03 +08:00

presets.hpp

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133 )

2024-11-07 15:19:10 +08:00

quants.hpp

sycl: reordered Q4_K MMVQ (#13109 )

2025-05-15 17:35:44 +02:00

rope.cpp

SYCL: Add mrope kernel (#13755 )

2025-05-30 19:40:57 +05:30

rope.hpp

SYCL: Add non-contiguous support in ROPE (#12993 )

2025-04-21 19:13:30 +05:30

softmax.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

softmax.hpp

SYCL : SOFTMAX F16 mask support and other fixes (#11261 )

2025-01-28 09:56:58 +00:00

sycl_hw.cpp

[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )

2025-02-24 22:33:23 +08:00

sycl_hw.hpp

[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )

2025-02-24 22:33:23 +08:00

tsembd.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

tsembd.hpp

SYCL: Refactor ggml_sycl_compute_forward (#11121 )

2025-01-10 08:13:03 +08:00

vecdotq.hpp

sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826 )

2025-06-02 10:12:20 +01:00

wkv.cpp

sycl: Add more debug prints (#13640 )

2025-05-26 10:28:53 +02:00

wkv.hpp

llama: Add support for RWKV v7 architecture (#12412 )

2025-03-18 07:27:50 +08:00