Georgi Gerganov
225e7a1438
llama : add high-throughput mode (#14363)
* kv-cache : prepare K/V buffers for separation
ggml-ci
* batched-bench : fix oob write
ggml-ci
* llama : add "virtual sequences"
ggml-ci
* llama : use "stream" vs "virtual sequence"
ggml-ci
* graph : fix stream splitting when KV cache is not used
ggml-ci
* kv-cache : add multi-stream save/load support
ggml-ci
* llama : add "--attn-streams" flag
ggml-ci
* kv-cache : fix handling when find_slot fails
ggml-ci
* kv-cache : restore find_slot impl
ggml-ci
* kv-cache : add comments
* kv-cache : add bounds checks for sequence id
ggml-ci
* cont : add n_seq_max to batch allocr
ggml-ci
* kv-cache : perform stream copies lazily after llama_synchronize
ggml-ci
* kv-cache : avoid throwing exceptions across the C boundary
ggml-ci
* CUDA: 4D FlashAttention support (#14628)
* CUDA: 4D FlashAttention support
* CUDA: fix WMMA FA kernel
* llama : rename attn_streams -> kv_unified
ggml-ci
* common : rename kv_split -> kv_unified
ggml-ci
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-07-16 16:35:42 +03:00
..
2025-04-01 23:44:05 +02:00
2025-06-06 14:11:15 +03:00
2025-01-12 11:32:42 +02:00
2024-12-04 23:19:20 +01:00
2025-07-16 20:03:51 +08:00
2025-07-16 16:35:42 +03:00
2025-07-05 07:18:09 +03:00
2024-11-29 21:54:58 +01:00
2025-01-07 18:01:58 +01:00
2025-01-07 18:01:58 +01:00
2025-06-16 14:14:00 +03:00
2025-06-30 10:17:18 +02:00
2025-06-06 14:11:15 +03:00
2025-06-06 14:11:15 +03:00
2025-06-06 14:11:15 +03:00
2025-06-06 14:11:15 +03:00
2025-07-16 16:35:42 +03:00
2025-06-06 14:11:15 +03:00
2025-06-06 14:11:15 +03:00
2025-06-06 14:11:15 +03:00
2025-05-19 13:25:41 +03:00
2025-07-02 14:12:07 +03:00
2025-02-15 16:40:57 +02:00
2025-06-06 14:11:15 +03:00
2025-06-06 14:11:15 +03:00
2025-06-30 10:17:18 +02:00
2025-05-26 16:55:24 +02:00
2025-06-30 10:17:18 +02:00
2025-06-30 10:17:18 +02:00
2025-06-30 10:17:18 +02:00
2025-06-30 10:17:18 +02:00
2025-07-16 20:03:51 +08:00
2024-11-13 21:10:38 +11:00
2024-07-07 15:04:39 -04:00
2025-04-26 10:10:20 +02:00
2025-02-15 16:40:57 +02:00
2023-08-30 09:50:55 +03:00
2025-06-30 10:17:18 +02:00
2025-05-02 20:27:13 +02:00
2024-07-14 19:51:21 -04:00
2025-06-30 10:17:18 +02:00
2024-07-05 07:53:33 +03:00
2025-04-08 19:54:51 +03:00
2025-06-30 10:17:18 +02:00
2025-06-30 10:17:18 +02:00