llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-09-26 01:00:15 -04:00

Files

slaren 0d56246f4b ggml : group all experts in a single ggml_mul_mat_id (#6505 )

* ggml : group all experts in a single ggml_mul_mat_id
cuda : improve mmid row copy

* cuda : fix bin bcast with non-cont src0

* test-backend-ops : only run all mul mat tests for base types

* llama : disable moe offloading with SYCL

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2024-04-18 15:18:48 +02:00

.gitignore

…

CMakeLists.txt

…

get-model.cpp

…

get-model.h

…

run-json-schema-to-grammar.mjs

…

test-autorelease.cpp

…

test-backend-ops.cpp

…

test-c.c

…

test-chat-template.cpp

…

test-double-float.cpp

…

test-grad0.cpp

…

test-grammar-integration.cpp

…

test-grammar-parser.cpp

…

test-json-schema-to-grammar.cpp

JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )

2024-04-12 19:43:38 +01:00

test-llama-grammar.cpp

…

test-model-load-cancel.cpp

…

test-opt.cpp

…

test-quantize-fns.cpp

…

test-quantize-perf.cpp

…

test-rope.cpp

…

test-sampling.cpp

…

test-tokenizer-0-falcon.cpp

…

test-tokenizer-0-falcon.py

…

test-tokenizer-0-llama.cpp

…

test-tokenizer-0-llama.py

…

test-tokenizer-1-bpe.cpp

…

test-tokenizer-1-llama.cpp

…