llama.cpp/examples at ad19812cda4062c9f154ef16315df41fbe6a770a - llama.cpp - Cat's Mantra

tqcq/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-15 20:53:00 -04:00

Files

History

Georgi Gerganov ad19812cda perplexity : faster HellaSwag via batching (#5017 )

* perplexity : faster HellaSwag

ggml-ci

* perplexity : clean-up

ggml-ci

* perplexity : no need for decode_helper

ggml-ci

* perplexity : add comments

* perplexity : option to specify max batched tasks via `n_parallel`

* perplexity : remove HellaSwag restruction for n_batch

2024-01-18 15:33:01 +02:00

..

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

examples : add passkey test (#3856 )

2024-01-08 11:14:04 +02:00

llama : ggml-backend integration (#4766 )

2024-01-12 20:07:38 +01:00

…

…

2-bit quantizations (#4897 )

2024-01-14 09:45:56 +02:00

convert-llama2c-to-ggml

ggml : remove n_dims from ggml_tensor (#4469 )

2023-12-14 16:52:08 +01:00

…

export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894 )

2024-01-12 19:54:53 +02:00

finetune : add training data file to log message (#4979 )

2024-01-16 19:54:24 +02:00

gguf : simplify example dependencies

2023-12-21 23:08:14 +02:00

imatrix : offload to GPU support (#4957 )

2024-01-17 18:46:30 +02:00

…

…

llama : ggml-backend integration (#4766 )

2024-01-12 20:07:38 +01:00

android : introduce starter project example (#4926 )

2024-01-16 15:47:34 +02:00

llama.swiftui : update models layout (#4826 )

2024-01-12 14:48:00 +02:00

clip : support more quantization types (#4846 )

2024-01-10 15:37:09 +02:00

english : use typos to fix comments and logs (#4354 )

2023-12-12 11:53:36 +02:00

lookup : add prompt lookup decoding example (#4484 )

2023-12-22 18:05:56 +02:00

main : add parameter --no-display-prompt (#4541 )

2024-01-13 18:09:08 +02:00

main-cmake-pkg : fix build issue (#4665 )

2023-12-29 16:18:20 +02:00

…

examples : add passkey test (#3856 )

2024-01-08 11:14:04 +02:00

perplexity : faster HellaSwag via batching (#5017 )

2024-01-18 15:33:01 +02:00

Add ability to use importance matrix for all k-quants (#4930 )

2024-01-14 16:21:12 +02:00

llama : per-layer KV cache + quantum K cache (#4309 )

2023-12-07 13:03:17 +02:00

save-load-state

llama : minimize size used for state save/load (#4820 )

2024-01-13 18:29:43 +02:00

server : fix prompt caching with system prompt (#4914 )

2024-01-13 19:31:26 +02:00

…

speculative : threading options (#4959 )

2024-01-16 13:04:32 +02:00

…

train-text-from-scratch

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

alpaca.sh

…

base-translate.sh

examples : improve base-translate.sh script (#4783 )

2024-01-06 11:40:24 +02:00

chat-13B.bat

…

chat-13B.sh

…

chat-persistent.sh

…

chat-vicuna.sh

…

chat.sh

…

CMakeLists.txt

metal : remove old API (#4919 )

2024-01-13 20:45:45 +02:00

gpt4all.sh

…

json-schema-to-grammar.py

…

llama2-13b.sh

…

llama2.sh

…

llama.vim

…

llm.vim

…

make-ggml.py

…

Miku.sh

…

pydantic_models_to_grammar.py

examples : fix and improv docs for the grammar generator (#4909 )

2024-01-16 14:10:48 +02:00

pydantic-models-to-grammar-examples.py

examples : add complete parallel function calling example (#4974 )

2024-01-16 19:41:42 +02:00

reason-act.sh

…

server-llama2-13B.sh

…