llama.cpp/examples at d3f1f0acfbf8aaafd2ce494309a10a7cc92042c8 - llama.cpp - Cat's Mantra

tqcq/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-14 20:29:41 -04:00

Files

History

Benson Wong 5d01670266 server : include speculative decoding stats when timings_per_token is enabled (#12603 )

* Include speculative decoding stats when timings_per_token is true

New fields added to the `timings` object:

  - draft_n           : number of draft tokens generated
  - draft_accepted_n  : number of draft tokens accepted
  - draft_accept_ratio: ratio of accepted/generated

* Remove redundant draft_accept_ratio var

* add draft acceptance rate to server console output

2025-03-28 10:05:44 +02:00

..

…

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

convert-llama2c-to-ggml

…

cvector-generator

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

deprecation-warning

…

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

…

common : refactor '-o' option (#12278 )

2025-03-10 13:34:13 +02:00

…

…

…

…

…

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

…

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 )

2025-03-26 15:06:04 +01:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

docs : bring llama-cli conversation/template docs up-to-date (#12426 )

2025-03-17 21:14:32 +01:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

…

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

rpc : update README for cache usage (#12620 )

2025-03-28 09:44:13 +02:00

run: de-duplicate fmt and format functions and optimize (#11596 )

2025-03-25 18:46:11 +01:00

save-load-state

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

server : include speculative decoding stats when timings_per_token is enabled (#12603 )

2025-03-28 10:05:44 +02:00

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

simple-cmake-pkg

…

speculative : fix seg fault in certain cases (#12454 )

2025-03-18 19:35:11 +02:00

speculative-simple

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )

2025-03-13 12:35:44 +02:00

…

…

llama-tts : avoid crashes related to bad model file paths (#12482 )

2025-03-21 11:12:45 +02:00

chat-13B.bat

…

chat-13B.sh

…

chat-persistent.sh

…

chat-vicuna.sh

…

chat.sh

…

CMakeLists.txt

…

convert_legacy_llama.py

…

json_schema_pydantic_example.py

…

json_schema_to_grammar.py

…

llama.vim

…

llm.vim

…

Miku.sh

…

pydantic_models_to_grammar_examples.py

…

pydantic_models_to_grammar.py

…

reason-act.sh

…

regex_to_grammar.py

…

server_embd.py

…

server-llama2-13B.sh

…

ts-type-to-grammar.sh

…