llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-29 21:54:07 -04:00

Files

Johannes Gäßler 07a19e27a2 CUDA: fix quantized KV cache + multiple sequences (#14822 )

* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2025-07-23 14:08:09 +03:00

ggml-blas

cmake : Fix broken CMake error messages (ggml/1252)

2025-06-01 13:43:57 +03:00

ggml-cann

CANN: weight format to NZ for Ascend310P3 (#14407 )

2025-07-23 11:58:00 +08:00

ggml-cpu

ggml: fix loongarch quantize_row_q8_1 error (#14827 )

2025-07-23 09:39:51 +03:00

ggml-cuda

CUDA: fix quantized KV cache + multiple sequences (#14822 )

2025-07-23 14:08:09 +03:00

ggml-hip

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202 )

2025-06-16 13:47:38 +02:00

ggml-metal

metal : fuse add, mul + add tests (#14596 )

2025-07-18 20:37:26 +03:00

ggml-musa

musa: enable fp16 mma (all) and cublas on qy2 (#13842 )

2025-06-26 12:11:59 +08:00

ggml-opencl

opencl: remove unreachable return (#14806 )

2025-07-22 08:53:30 +02:00

ggml-rpc

rpc : nicer error messages for RPC server crash (#14076 )

2025-06-10 09:41:01 +03:00

ggml-sycl

sycl: Fix im2col (#14797 )

2025-07-21 18:39:29 +02:00

ggml-vulkan

vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817 )

2025-07-22 17:35:21 +02:00

ggml-webgpu

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00

CMakeLists.txt

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00

ggml-alloc.c

metal : fuse add, mul + add tests (#14596 )

2025-07-18 20:37:26 +03:00

ggml-backend-impl.h

ggml : upgrade init_tensor API to return a ggml_status (#11854 )

2025-02-28 14:41:47 +01:00

ggml-backend-reg.cpp

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00

ggml-backend.cpp

metal : fuse add, mul + add tests (#14596 )

2025-07-18 20:37:26 +03:00

ggml-common.h

ggml-cpu : split arch-specific implementations (#13892 )

2025-06-09 16:47:13 +02:00

ggml-impl.h

metal : fuse add, mul + add tests (#14596 )

2025-07-18 20:37:26 +03:00

ggml-opt.cpp

mnist: fix segmentation fault (ggml/1227)

2025-05-19 13:29:56 +03:00

ggml-quants.c

ggml-quants : rename best_mad to best_error (ggml/1283)

2025-07-01 11:06:39 +03:00

ggml-quants.h

…

ggml-threading.cpp

…

ggml-threading.h

…

ggml.c

ggml : add ggml_scale_bias (#14417 )

2025-07-09 18:16:12 +02:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

ggml : prevent integer overflow in gguf tensor size calculation (#14595 )

2025-07-09 14:33:53 +02:00