llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-14 20:29:41 -04:00

Files

uvos 10f2e81809 CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177 )

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2025-03-11 20:16:03 +01:00

cmake

cmake: Fix ggml backend dependencies and installation (#11818 )

2025-02-27 09:42:48 +02:00

include

ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154 )

2025-03-06 02:26:10 +01:00

src

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177 )

2025-03-11 20:16:03 +01:00

.gitignore

…

CMakeLists.txt

opencl: use OpenCL C standard supported by the device (#12221 )

2025-03-10 09:57:00 -07:00