llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-09-06 07:11:25 -04:00

Files

Aman Gupta 55a1c5a5fd CUDA: add softmax broadcast (#14475 )

* CUDA: add softmax broadcast

* Pass by const ref

* Review: Use blockDims for indexing, remove designated initializers

* Add TODO for noncontigous input/output

2025-07-02 15:48:33 +03:00

cmake

ggml-cpu : rework weak alias on apple targets (#14146 )

2025-06-16 13:54:15 +08:00

include

ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435 )

2025-07-02 15:48:33 +03:00

src

CUDA: add softmax broadcast (#14475 )

2025-07-02 15:48:33 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317 )

2025-06-25 23:49:04 +02:00