llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-15 12:42:40 -04:00

Files

Sigbjørn Skjæret 36ca8b3628 CUDA: don't convert BF16 weights to FP32 (ggml/1174)

* add bf16 support

* use convert_from_bf16_cuda instead of convert_unary_cuda for f32

* revert 7ec5085

* move functionality into convert_unary with constexpr

2025-04-07 18:44:17 +03:00

cmake

scripts : update sync + fix cmake merge

2025-03-27 10:09:29 +02:00

include

metal : improve FA + improve MoE (#12612 )

2025-03-28 20:21:59 +02:00

src

CUDA: don't convert BF16 weights to FP32 (ggml/1174)

2025-04-07 18:44:17 +03:00

.gitignore

…

CMakeLists.txt

ggml : add logging for native build options/vars (whisper/2935)

2025-03-30 08:33:31 +03:00