Default Branch

bf5bcd0b85 · docs: update s390x documentation + add faq (#14389) · Updated 2025-06-26 10:41:41 +00:00

Branches

c257a8871c · cont : fix defrag erasing cells that didn't move · Updated 2025-06-09 17:45:56 +00:00

145
3

ca407742c5 · profiler: initial support for profiling graph ops · Updated 2025-06-05 21:38:13 +00:00

181
1

3862d954bb · rope · Updated 2025-06-01 19:46:15 +00:00

181
10

ac35e50c16 · Update tools/llama-bench/llama-bench.cpp · Updated 2025-05-31 22:38:37 +00:00

210
3

d3a2eb592d · disable on windows · Updated 2025-05-31 21:17:18 +00:00

201
12

9065ca71a2 · tests : sampling tests use min_keep == 0 · Updated 2025-05-27 08:30:41 +00:00

256
3

108d484ab2 · tts : fix n_ubatch + make WavTokenizer cache-less · Updated 2025-05-22 18:58:10 +00:00    tqcq

300
1

b06a954bbc · llama_encode : only force non-causal attention for enc-dec models · Updated 2025-05-19 17:43:59 +00:00    tqcq

333
1

8282d74692 · bench : handle decode errors · Updated 2025-05-14 19:36:29 +00:00    tqcq

371
1

237acc7cd5 · server : update readme + return json for "meta" field · Updated 2025-05-14 12:30:12 +00:00    tqcq

380
2

78d70223c3 · metal : use FA-vec kernel up to batch size 20 · Updated 2025-05-13 07:38:06 +00:00    tqcq

397
3

5c32fc3d13 · Break down main function in llama-server · Updated 2025-05-10 12:31:48 +00:00    tqcq

423
1

1cba73458b · small note about -hf --mmproj · Updated 2025-05-09 21:42:54 +00:00    tqcq

426
2

6107303ab0 · llama : remove logits_all flag + reorder llama_context_params · Updated 2025-05-08 10:01:41 +00:00    tqcq

450
2

8681d3ddb3 · Revert "fix build on windows" · Updated 2025-05-06 11:41:55 +00:00    tqcq

469
3

16843dba33 · metal : pad mm results · Updated 2025-05-04 06:13:52 +00:00    tqcq

486
1

15dea7bbdf · opt : remove print [no ci] · Updated 2025-05-02 18:25:29 +00:00    tqcq

490
4

65202d2985 · sync : ggml · Updated 2025-05-01 06:59:02 +00:00    tqcq

521
3

b710758323 · readme : update hot topics · Updated 2025-04-28 08:04:28 +00:00    tqcq

555
1

37ae6a281a · Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete · Updated 2025-04-27 10:36:57 +00:00    tqcq

562
1