66c92061f5
tests : remove json.hpp from a test ( #13880 )
...
ggml-ci
b5532
2025-05-29 12:17:16 +03:00
5ca82fc1d7
convert : workaround for AutoConfig dummy labels ( #13881 )
2025-05-29 10:00:57 +02:00
6385b843a8
llama : add RobertaForSequenceClassification reranker support ( #13875 )
b5530
2025-05-29 08:15:01 +02:00
1b8fb8152d
ggml: aarch64: Implement SVE F32 kernels for vector functions ( #13843 )
...
* F32-Mamba-SVE
* F32-Mamba-SVE
* Resolve test errors-1
* Resolve test errors-2
* F32-vec-SVE
* F32-vec-SVE
* F32-vec-SVE
b5529
2025-05-29 09:01:33 +03:00
53ae30640e
gguf-py : fix SafetensorRemote return on undefined size (< 0) ( #13841 )
2025-05-28 23:50:20 +02:00
763d06edb7
llama : fix KV shift for qwen2vl ( #13870 )
...
* llama : fix KV shift for qwen2vl
* add ref to the PR
b5527
2025-05-28 22:35:31 +02:00
10961339b2
mtmd : move helpers to dedicated library ( ⚠️ breaking change) ( #13866 )
...
* mtmd : move helpers to dedicated library
* fix server build
* rm leftover cmakelist code
b5526
2025-05-28 22:35:22 +02:00
d98f2a35fc
ci: disable LLAMA_CURL for Linux cross-builds ( #13871 )
2025-05-28 15:46:47 -03:00
e0e3aa231d
llama : add support for BertForSequenceClassification reranker ( #13858 )
...
* convert: add support for BertForSequenceClassification
* add support for reranking using BertForSequenceClassification
* merge checks of eos and sep
* fix lint
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
b5524
2025-05-28 19:01:58 +02:00
aa6dff05be
convert: small addition to support LlamaModel ( #13838 )
...
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
2025-05-28 16:34:18 +02:00
c962ae3382
server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode ( #13853 )
...
[fix]: remove 'image_url'/'input_audio' effectlly for 'llama_params' in multimodal-model-mode
b5522
2025-05-28 16:33:54 +02:00
a3938fb53d
convert : fix qwen omni conversion ( #13859 )
...
* convert : fix qwen omni conversion
* fix typo
2025-05-28 16:12:35 +02:00
f7873fc698
tests : change umlaut test ( #11600 )
2025-05-28 15:49:28 +02:00
a68247439b
CUDA: fix FA tg at long context for CC >= 8.9 ( #13852 )
b5519
2025-05-28 13:33:37 +02:00
26b79b6cb3
convert : fix tensor naming conflict for llama 4 vision ( #13836 )
...
* convert : fix tensor naming conflict for llama 4 vision
* add comment
2025-05-28 10:05:54 +02:00
1e8659e65a
CANN: Add SOC TYPE printing in cmake configuration ( #13837 )
b5517
2025-05-28 11:54:20 +08:00
a3c30846e4
opencl: add new ops - argsort
, div
, sub
, addrows
, sigmoid
, group_norm
( #13787 )
...
* opencl: add `argsort`
* opencl: add `div`
* opencl: add `add_rows`
* opencl: add `sub`
* opencl: add `sigmoid`, both `f16` and `f32`
* opencl: add `group_norm`
b5516
2025-05-27 12:56:08 -07:00
1701d4c54f
opencl: mark mul_mat
f32f32
as supporting non-contiguous tensors ( #13790 )
b5515
2025-05-27 12:53:14 -07:00
bef8176387
vulkan: use timestamp queries for GGML_VULKAN_PERF ( #13817 )
...
Also change it to be controlled by an env var rather than cmake flag
b5514
2025-05-27 18:39:07 +02:00
34b7c0439e
cmake : add llama-cparams.cpp to build ( #13832 )
b5513
2025-05-27 19:08:44 +03:00
f3101a8cc6
SYCL: add gelu_erf kernel ( #13749 )
...
* SYCL: add gelu_erf kernel
* refactor code
Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com >
* Use scope_op_debug_print
---------
Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com >
b5512
2025-05-27 20:52:59 +05:30
1c49c70d07
sync : ggml
2025-05-27 18:05:33 +03:00
a8ea03d8ad
ggml : add ggml_repeat_4d ( #13824 )
b5510
2025-05-27 15:53:55 +02:00
05f6ac6283
ggml : riscv: add xtheadvector support ( #13720 )
...
* ggml : riscv: add xtheadvector support
* ggml : clean up some macro usage
b5509
2025-05-27 16:21:36 +03:00
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )
...
* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens
b5508
2025-05-27 14:06:10 +02:00
72b090da2c
docs: remove link for llama-cli function calling ( #13810 )
2025-05-27 08:52:40 -03:00
7fe03e7446
ggml-cpu: x86 feature detection is specific to x86 ( #13811 )
b5506
2025-05-27 13:18:39 +02:00
952f3953c1
ggml : allow CUDA graphs when using pipeline parallelism ( #13814 )
b5505
2025-05-27 13:05:18 +02:00
81713121ee
kv-cells : track min/max used cells and per-sequence positions ( #13808 )
...
* kv-cells : track min/max used cells and per-sequence positions
ggml-ci
* kv-cells : fix pos-modification updates for seq_pos
ggml-ci
* kv-cells : add comments
ggml-ci
b5504
2025-05-27 13:49:41 +03:00
f9cd68398b
sampling : make sure samplers return at least 1 token ( #13822 )
...
* sampling : min-p should always return at least one token
ggml-ci
* sampling : same for typical sampling
* tests : sampling tests use min_keep == 0
ggml-ci
b5503
2025-05-27 12:07:52 +03:00
4f81b33e32
llama : validate seq id batch input ( #13809 )
...
* llama : validate seq id batch input
ggml-ci
* cont : fix the fix
ggml-ci
b5502
2025-05-27 09:40:59 +03:00
cdf94a1802
server: --offline mode ( #13804 )
...
* server: --offline mode (env: LLAMA_OFFLINE)
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
b5501
2025-05-26 22:34:27 +01:00
a26c4cc11e
scripts : add option to compare commits in Debug ( #13806 )
...
* scripts : add option to compare commits in Debug
* cont : reuse existing CMAKE_OPTS
2025-05-26 22:24:01 +03:00
4265a87b59
cuda : avoid cuGetErrorString ( #13791 )
...
ggml-ci
b5499
2025-05-26 22:14:52 +03:00
6f180b915c
SYCL: Add non contiguous support in RMS_NORM and NORM kernels ( #13611 )
...
* SYCL: Add non contiguous input support to norm kernel
* refactor and add RMS_NORM non contiguous input support
ggml-ci
* restore subgroup reduction for multi-subgroup thread blocks in norm kernels
* Swap grid dims of nsamples and nrows
ggml-ci
* Revert "Swap grid dims of nsamples and nrows"
This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.
* restore not required changes
ggml-ci
* address review comments: change it to more like SYCL
* Use a common function to calculate offset
* remove wrap around logic for handling broadcasts
* remove static from calculate_offset fn and use ceil_div
b5498
2025-05-26 21:10:36 +05:30
03f582ae8f
server: fix streaming crashes ( #13786 )
...
* add preludes to content on partial regex match
* allow all parsers to parse non-tool-call content.
* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash
b5497
2025-05-26 16:03:57 +01:00
88c125f2ac
examples/training: Fix file name in README ( #13803 )
...
This patch fixes binary file names in README.md.
Signed-off-by: Masanari Iida <standby24x7@gmail.com >
2025-05-26 16:55:24 +02:00
d74e94c1b3
server
: fix format of streamed tool call deltas (diff name, fix id location) (#13800 )
...
* fix deltas of tool_call.function.name
* fix tool_call.id (was in tool_call.function.id!) + add function type
* add tool_call.type
* populate empty tool_call.function.arguments on first delta
b5495
2025-05-26 14:56:49 +01:00
f13847cfb5
server: fix regression on streamed non-chat completion w/ stops ( #13785 )
...
* more forgiving message diffs: partial stop words aren't erased, full stops are
* Add (slow) server test for completion + stream + stop
b5494
2025-05-26 14:16:37 +01:00
79c137f776
examples : allow extracting embeddings from decoder contexts ( #13797 )
...
ggml-ci
b5493
2025-05-26 14:03:54 +03:00
22229314fc
llama : clarify deprecation message ( #13794 )
b5492
2025-05-26 12:57:50 +03:00
9012eb9b45
sycl: Add more debug prints ( #13640 )
2025-05-26 10:28:53 +02:00
fef693dc6b
vulkan: mark IM2COL as supporting non-contig ( #13783 )
b5490
2025-05-26 06:02:07 +02:00
2d38b6e400
CANN: Add the basic supports of Flash Attention kernel ( #13627 )
...
* cann: add the basic FA support
* cann: update the readme
* cann: update the FlashAttention with PSEShift
* cann: update the input parameters in FA
* cann: update the alibi with max_bias
* cann: add the constrints of softcap
* cann: update the docs CANN.md
* cann: update the docs CANN.md
* cann: fix typo of CANN.md
* cann: add some comments and update the CANN.md
* cann: update the CANN.md
* cann: update the inner precise for fusedInferAttention
* cann: update the constraints of flash_attn_ext on ggml-cann.cpp
* cann: clean the whitespace
* cann: clean the whitespace
* cann: add a new endline
b5489
2025-05-26 10:20:18 +08:00
e121edc432
server
: add --reasoning-budget 0
to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 )
...
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
b5488
2025-05-26 00:30:51 +01:00
2f099b510f
webui : bump max upload file size to 500MB ( #13779 )
2025-05-25 18:02:18 +01:00
aa50ba462f
tests : improve UGM tokenizer test coverage ( #13773 )
b5486
2025-05-25 16:22:29 +02:00
de2ef53a4b
kv-cache : rework kv_cell ( #13706 )
...
* kv-cache : rework kv_cell
ggml-ci
* kv-cells : use "shift" instead of "delta" consistently
ggml-ci
* llama : add llama_max_parallel_sequences()
ggml-ci
* kv-cells : update comments [no ci]
* context : fail upon construction if sequences exceed max value
ggml-ci
* kv-cells : get_pos() -> pos_get() + comments
ggml-ci
* kv-cells : fix tracking of "used" cells
ggml-ci
2025-05-25 16:34:36 +03:00
c508256db2
rpc : Fix build on OpenBSD ( #13541 )
b5484
2025-05-25 15:35:53 +03:00
40aaa8a403
mtmd : add support for Qwen2-Audio and SeaLLM-Audio ( #13760 )
...
* mtmd : add Qwen2-Audio support
* small clean up
* update discussion link
* clarify mtmd_get_output_embd
* clarification in multimodal.md
* fix ultravox bug
* ggml_cont
b5483
2025-05-25 14:06:32 +02:00