llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-28 21:23:55 -04:00

Files

Oliver Simons 021cc28bef cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (#14741 )

* Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs

Gemma3n uses Matrix-Matrix addition as part of their input processing,
wrongly triggering CUDA_GRAPH disablement on NVGPUs even when batch-size
of 1 is used.

* Exclude `project_per_layer_input` by matching node names

This ensures that all other graphs which don't exhibit this pattern do
not have their behavior changed.

* Revert unnecessary formatting changes

2025-07-18 04:35:32 -07:00

cmake

ggml-cpu : rework weak alias on apple targets (#14146 )

2025-06-16 13:54:15 +08:00

include

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00

src

cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (#14741 )

2025-07-18 04:35:32 -07:00

.gitignore

…

CMakeLists.txt

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00