llama.cpp/examples at 2bf8d0f7c4cc1235755ad06961ca761e458c5e55 - llama.cpp - Cat's Mantra

tqcq/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-14 12:19:48 -04:00

Files

History

slaren 2bf8d0f7c4 backend : offload large batches to GPU (#6083 )

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2024-03-18 11:03:04 +01:00

..

…

…

…

…

…

…

convert-llama2c-to-ggml

…

embedding : add EOS token if not present (#899 )

2024-03-14 15:14:14 +02:00

…

…

…

…

backend : offload large batches to GPU (#6083 )

2024-03-18 11:03:04 +01:00

…

…

…

…

…

…

…

…

common: llama_load_model_from_url using --model-url (#6098 )

2024-03-17 19:12:37 +01:00

…

…

…

…

…

…

save-load-state

…

common: llama_load_model_from_url using --model-url (#6098 )

2024-03-17 19:12:37 +01:00

…

…

…

…

train-text-from-scratch

…

alpaca.sh

…

base-translate.sh

…

chat-13B.bat

…

chat-13B.sh

…

chat-persistent.sh

…

chat-vicuna.sh

…

chat.sh

…

CMakeLists.txt

…

gpt4all.sh

…

json-schema-to-grammar.py

…

llama2-13b.sh

…

llama2.sh

…

llama.vim

…

llm.vim

…

make-ggml.py

…

Miku.sh

…

pydantic_models_to_grammar.py

…

pydantic-models-to-grammar-examples.py

…

reason-act.sh

…

server-embd.py

…

server-llama2-13B.sh

…