4c9fdfbe15
ubatch : new splitting logic ( #14217 )
...
ggml-ci
2025-06-20 10:14:14 +03:00
d3e64b9f49
llama : rework embeddings logic ( #14208 )
...
* llama : rework embeddings logic
ggml-ci
* cont : fix rerank
ggml-ci
* cont : engrish [no ci]
* cont : fix rerank
ggml-ci
* server : support both embeddings and completions with single model
ggml-ci
* cont : avoid embeddings_org
ggml-ci
2025-06-16 14:14:00 +03:00
c3ee46fab4
batch : remove logits_all flag ( #14141 )
...
ggml-ci
2025-06-12 11:49:26 +03:00
7ae2932116
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable ( #14121 )
2025-06-11 12:52:45 +03:00
745aa5319b
llama : deprecate llama_kv_self_ API ( #14030 )
...
* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci
2025-06-06 14:11:15 +03:00
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory ( #14006 )
...
* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API
ggml-ci
* context : fix casts
ggml-ci
2025-06-05 15:29:22 +03:00
3e63a58ef7
kv-cache : refactor the update/defrag mechanism ( #13988 )
...
* kv-cache : refactor update mechanism
ggml-ci
* memory : improve status handling
* defrag : reset head + add comments
ggml-ci
* cont : minor fixes
ggml-ci
2025-06-04 18:58:20 +03:00
0fc16b42e8
kv-cache : split implementation in separate sources ( #13920 )
...
ggml-ci
2025-06-01 11:39:27 +03:00