llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-30 20:02:18 -04:00

Files

Johannes Gäßler e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542 )

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

2024-08-24 21:34:59 +02:00

.gitignore

…

CMakeLists.txt

…

get-model.cpp

…

get-model.h

…

run-json-schema-to-grammar.mjs

…

test-autorelease.cpp

…

test-backend-ops.cpp

CPU/CUDA: Gemma 2 FlashAttention support (#8542 )

2024-08-24 21:34:59 +02:00

test-c.c

…

test-chat-template.cpp

…

test-double-float.cpp

…

test-grad0.cpp

…

test-grammar-integration.cpp

…

test-grammar-parser.cpp

…

test-json-schema-to-grammar.cpp

…

test-llama-grammar.cpp

…

test-lora-conversion-inference.sh

…

test-model-load-cancel.cpp

…

test-opt.cpp

…

test-quantize-fns.cpp

…

test-quantize-perf.cpp

…

test-rope.cpp

…

test-sampling.cpp

…

test-tokenizer-0.cpp

…

test-tokenizer-0.py

…

test-tokenizer-0.sh

…

test-tokenizer-1-bpe.cpp

…

test-tokenizer-1-spm.cpp

…

test-tokenizer-random.py

…