llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-27 03:55:20 +00:00

Files

Georgi Gerganov 3600cc2886 llama : use n_swa + n_ubatch cells for SWA cache (#13833 )

* llama : use n_swa + n_ubatch cells for SWA cache

ggml-ci

* llama : add warning about multi-sqeuence SWA contexts

2025-05-31 15:57:44 +03:00

batched-bench

batched-bench : fix pp batch contents (#13492 )

2025-05-13 18:01:53 +03:00

cvector-generator

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

export-lora

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

gguf-split

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

imatrix

imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389 )

2025-05-09 11:53:58 +02:00

llama-bench

kv-cache : add SWA support (#13194 )

2025-05-20 08:05:46 +03:00

main

llama : do not crash if there is no CPU backend (#13395 )

2025-05-09 13:02:07 +02:00

mtmd

mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change) (#13917 )

2025-05-31 10:14:29 +02:00

perplexity

context : remove logits_all flag (#13284 )

2025-05-08 14:26:50 +03:00

quantize

quantize : improve tensor-type pattern matching (#13033 )

2025-05-13 19:12:31 +02:00

rpc

rpc : Fix build on OpenBSD (#13541 )

2025-05-25 15:35:53 +03:00

run

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

server

llama : use n_swa + n_ubatch cells for SWA cache (#13833 )

2025-05-31 15:57:44 +03:00

tokenize

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

tts

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

CMakeLists.txt

mtmd : rename llava directory to mtmd (#13311 )

2025-05-05 16:02:55 +02:00