33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
b5439
2025-05-21 09:58:49 +08:00
fb1cab201c
vulkan: fix warnings ( #13626 )
...
* small fixes
* remove ifdef
b5438
2025-05-20 21:35:16 +00:00
b7a17463ec
mtmd-helper : bug fix to token batching in mtmd ( #13650 )
...
* Update mtmd-helper.cpp
* Update tools/mtmd/mtmd-helper.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
b5437
2025-05-20 18:55:30 +02:00
be0239693c
model : fix llama4 graph ( #13663 )
...
ggml-ci
b5436
2025-05-20 19:21:04 +03:00
a4090d1174
llama : remove llama_kv_cache_view API + remove deprecated ( #13653 )
...
ggml-ci
b5435
2025-05-20 16:13:16 +03:00
b69f1647f9
CUDA: skip fully masked-out KV in FA vec kernel ( #13584 )
...
* CUDA: skip fully masked-out KV in FA vec kernel
b5434
2025-05-20 14:45:07 +02:00
759e37b0d8
tests : avoid github urls due to throttling ( #13654 )
2025-05-20 12:03:17 +02:00
4245e622e0
sycl: disable reorder for sycl mulmat ( #13536 )
b5432
2025-05-20 11:34:15 +02:00
c9c64dee57
Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output ( #13639 )
b5431
2025-05-20 10:11:56 +02:00
c00a2634be
metal : fix typo in FA kernel comments ( #13651 )
b5430
2025-05-20 10:41:40 +03:00
e298d2fbd0
kv-cache : add SWA support ( #13194 )
...
* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci
b5429
2025-05-20 08:05:46 +03:00
f0adb80bf7
CANN: Update CANN model support ( #13162 )
...
* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md
2025-05-20 11:43:43 +08:00
f7c9429c85
sycl : Overcoming workaround for mmap() allocation on Windows ( #13482 )
...
* Remove mmap workaround on windows
After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.
* Update llama-bench README
SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag
b5427
2025-05-20 08:54:43 +08:00
1dfbf2cf3a
common : add load_progress_callback ( #13617 )
b5426
2025-05-19 21:17:36 +02:00
8960efd0a6
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence ( #13607 )
b5425
2025-05-19 17:54:08 +02:00
725f23f1f3
sycl : backend documentation review ( #13544 )
...
* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table
2025-05-19 14:38:20 +01:00
92ecdcc06a
mtmd : add vision support for llama 4 ( #13282 )
...
* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality
b5423
2025-05-19 13:04:14 +02:00
f71f40a284
ci : upgraded oneAPI version in SYCL workflows and dockerfile ( #13532 )
b5422
2025-05-19 11:46:09 +01:00
d30cb5a7fa
sync : ggml
...
ggml-ci
b5421
2025-05-19 13:29:56 +03:00
6c35981a64
mnist: fix segmentation fault (ggml/1227)
2025-05-19 13:29:56 +03:00
8b5e19aea6
ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)
2025-05-19 13:29:56 +03:00
60aea028b5
ggml : Fix missing backtrace on Linux (ggml/1228)
...
* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1
* Fixed lldb attach
* Simplify by having the child do ggml_print_backtrace_symbols
2025-05-19 13:29:56 +03:00
9c55e5c5c2
fix: check model pointer validity before use ( #13631 )
b5417
2025-05-19 13:25:41 +03:00
33d7aed4a8
CANN: Support MOE Model MUL_MAT_ID ( #13042 )
...
Signed-off-by: noemotiovon <757486878@qq.com >
b5416
2025-05-19 14:21:17 +08:00
6a2bc8bfb7
server : added --no-prefill-assistant flag ( #13608 )
...
* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md
b5415
2025-05-17 23:59:48 +02:00
e3a7cf6c5b
cmake: use the current build config for vulkan-shaders-gen ( #13595 )
...
* fix: use the current build config for `vulkan-shaders-gen`
* fix: only pass a valid build type to `--config`
b5414
2025-05-17 15:26:43 -03:00
518329b2d4
parallel : add option for non-shared and larger prompts ( #13598 )
...
* parallel : add option for non-shared and larger prompts
* parallel : update readme [no ci]
* cont : add note about base models [no ci]
* parallel : better var name
ggml-ci
2025-05-17 12:58:55 +03:00
2f5a4e1e09
vulkan: move common FA code to flash_attn_base.comp ( #13556 )
...
* vulkan: move common FA code to flash_attn_base.comp
* vulkan: move common FA index/stride setup code to flash_attn_base.comp
* build fix
b5412
2025-05-17 09:14:55 +02:00
4f41ee11d6
vulkan: use scalar FA rather than coopmat2 when N==1 ( #13554 )
b5411
2025-05-17 08:35:47 +02:00
3e0be1cace
llguidance : official v0.7.20 release (no actual changes) [noci] ( #13594 )
b5410
2025-05-16 22:56:28 +02:00
6aa892ec2a
server : do not return error out of context (with ctx shift disabled) ( #13577 )
b5409
2025-05-16 21:50:00 +02:00
aea9f8b4e7
webui : improve accessibility for visually impaired people ( #13551 )
...
* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content
2025-05-16 21:49:01 +02:00
06c1e4abc1
readme : add list of dependencies and their license ( #13591 )
2025-05-16 20:04:18 +02:00
415e40a357
releases : use arm version of curl for arm releases ( #13592 )
b5406
2025-05-16 19:36:51 +02:00
654a67794f
metal : add FA-vec kernel for head size 64 ( #13583 )
...
ggml-ci
b5405
2025-05-16 20:32:58 +03:00
5364ae4ba5
llama : print hint when loading a model when no backends are loaded ( #13589 )
b5404
2025-05-16 16:38:07 +02:00
7c07ac244d
ci : add ppc64el to build-linux-cross ( #13575 )
2025-05-16 14:54:23 +02:00
0a338ed013
sycl : fixed compilation warnings ( #13582 )
b5402
2025-05-16 18:15:29 +08:00
bc098c3cf0
minja: sync (qwen3) ( #13573 )
...
* minja: sync f06140fa52
- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58
---------
Co-authored-by: ochafik <ochafik@google.com >
b5401
2025-05-15 23:29:10 +01:00
c6a2c9e741
gguf : use ggml log system ( #13571 )
...
* gguf : use ggml log system
* llama : remove unnecessary new lines in exception messages
b5400
2025-05-15 19:13:11 +02:00
07ad2b6db3
gguf-py : fix disconnect-before-connect in editor-gui ( #13569 )
...
The bug caused a crash upon load with venvs created with
--system-site-packages to use
python3-pyside6.qtwidgets=python3-pyside6.qtwidgets=6.6.2-4
from Kubuntu 24.10.
2025-05-15 18:47:10 +02:00
c531edfa34
convert : fix conversion for llama 4 ( #13567 )
2025-05-15 17:40:07 +02:00
02cdd2d8b0
sycl: simplify bin_bcast_kernel ( #13383 )
2025-05-15 17:39:52 +02:00
64bb51cf90
sycl: reordered Q4_K MMVQ ( #13109 )
2025-05-15 17:35:44 +02:00
9c404ed54c
sycl: use oneDNN for matrices multiplication ( #12972 )
b5395
2025-05-15 16:53:41 +02:00
6c8b91500e
llama-bench : fix -ot with dl backends ( #13563 )
b5394
2025-05-15 15:46:55 +02:00
3cc1f1f1d2
webui : handle PDF input (as text or image) + convert pasted long content to file ( #13562 )
...
* webui : handle PDF input (as text or image)
* handle the case where pdf image + server without mtmd
* fix bug missing pages
2025-05-15 14:24:50 +02:00
c753d7bed0
server : proper error handling for missing elements in messages array (OpenAI compatible backend) ( #13540 )
b5392
2025-05-15 08:40:58 +02:00
b2838049cc
bench : handle decode errors ( #13548 )
...
ggml-ci
b5391
2025-05-15 05:57:02 +03:00
aa48e373f2
server
: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 )
...
* Inject date_string in llama 3.x + fix for functionary v2
https://github.com/ggml-org/llama.cpp/issues/12729
* move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b5390
2025-05-15 02:39:51 +01:00