llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-15 04:33:06 -04:00

Files

Georgi Gerganov 3637576288 server : disable speculative decoding for SWA models (#13970 )

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

2025-06-02 21:34:40 +03:00

batched-bench

batched-bench : fix pp batch contents (#13492 )

2025-05-13 18:01:53 +03:00

cvector-generator

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

export-lora

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

gguf-split

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

imatrix

imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389 )

2025-05-09 11:53:58 +02:00

llama-bench

threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 )

2025-05-31 15:39:19 -07:00

main

llama : do not crash if there is no CPU backend (#13395 )

2025-05-09 13:02:07 +02:00

mtmd

mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961 )

2025-06-02 16:29:28 +02:00

perplexity

context : remove logits_all flag (#13284 )

2025-05-08 14:26:50 +03:00

quantize

quantize : improve tensor-type pattern matching (#13033 )

2025-05-13 19:12:31 +02:00

rpc

rpc : Fix build on OpenBSD (#13541 )

2025-05-25 15:35:53 +03:00

run

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

server

server : disable speculative decoding for SWA models (#13970 )

2025-06-02 21:34:40 +03:00

tokenize

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

tts

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

CMakeLists.txt

mtmd : rename llava directory to mtmd (#13311 )

2025-05-05 16:02:55 +02:00