llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-31 06:34:56 -04:00

Files

Johannes Gäßler d50f8897a7 CUDA: stream-k decomposition for MMQ (#8018 )

* CUDA: stream-k decomposition for MMQ

* fix undefined memory reads for small matrices

2024-06-20 14:39:21 +02:00

template-instances

…

acc.cu

…

acc.cuh

…

arange.cu

…

arange.cuh

…

argsort.cu

…

argsort.cuh

…

binbcast.cu

…

binbcast.cuh

…

clamp.cu

…

clamp.cuh

…

common.cuh

CUDA: stream-k decomposition for MMQ (#8018 )

2024-06-20 14:39:21 +02:00

concat.cu

…

concat.cuh

…

convert.cu

…

convert.cuh

…

cpy.cu

…

cpy.cuh

…

dequantize.cuh

…

diagmask.cu

…

diagmask.cuh

…

dmmv.cu

…

dmmv.cuh

…

fattn-common.cuh

…

fattn-tile-f16.cu

…

fattn-tile-f16.cuh

…

fattn-tile-f32.cu

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681 )

2024-06-01 15:47:04 +02:00

fattn-tile-f32.cuh

…

fattn-vec-f16.cuh

…

fattn-vec-f32.cuh

…

fattn-wmma-f16.cuh

CUDA: use tensor cores for MMQ (#7676 )

2024-06-10 11:45:13 +02:00

fattn.cu

…

fattn.cuh

…

getrows.cu

…

getrows.cuh

…

im2col.cu

…

im2col.cuh

…

mma.cuh

…

mmq.cu

CUDA: stream-k decomposition for MMQ (#8018 )

2024-06-20 14:39:21 +02:00

mmq.cuh

CUDA: stream-k decomposition for MMQ (#8018 )

2024-06-20 14:39:21 +02:00

mmvq.cu

…

mmvq.cuh

…

norm.cu

…

norm.cuh

…

pad.cu

…

pad.cuh

…

pool2d.cu

…

pool2d.cuh

…

quantize.cu

…

quantize.cuh

…

rope.cu

…

rope.cuh

…

scale.cu

…

scale.cuh

…

softmax.cu

…

softmax.cuh

…

sumrows.cu

…

sumrows.cuh

…

tsembd.cu

…

tsembd.cuh

…

unary.cu

…

unary.cuh

…

upscale.cu

…

upscale.cuh

…

vecdotq.cuh

…