ac35e50c16
Update tools/llama-bench/llama-bench.cpp
...
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-05-31 15:38:37 -07:00
199a838422
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
...
We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.
Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-05-30 17:15:38 -07:00
e298d2fbd0
kv-cache : add SWA support ( #13194 )
...
* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci
2025-05-20 08:05:46 +03:00
6c8b91500e
llama-bench : fix -ot with dl backends ( #13563 )
2025-05-15 15:46:55 +02:00
b2838049cc
bench : handle decode errors ( #13548 )
...
ggml-ci
2025-05-15 05:57:02 +03:00
cf0a43bb64
llama-bench : add defrag-thold, check for invalid ranges ( #13487 )
2025-05-13 00:31:37 +02:00
22cdab343b
llama-bench : accept ranges for integer parameters ( #13410 )
2025-05-12 13:08:22 +02:00
7f323a589f
Add --no-op-offload
to improve -ot
pp perf in MoE models like llama4 400B ( #13386 )
2025-05-11 14:18:39 +02:00
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00