Commit Graph

5274 Commits

Author SHA1 Message Date
6def5cd729 metal : add missing args for nb references in ssm_scan_f32_group 2025-05-01 19:10:20 -04:00
cf4f0a4123 metal : fix confusion between ; and , 2025-05-01 18:55:34 -04:00
35d06fac5a Merge branch 'master' into compilade/mamba2 2025-05-01 18:06:09 -04:00
b1dd4d08e8 sync : ggml
ggml-ci
b5246
2025-05-01 20:15:34 +03:00
99881f77d8 whisper : add check that target name exists (whisper/3103)
This commit adds a check to makes sure that the target exists before
trying to add compile options to ignore warnings when using MSVC.

The motivation for this is currently the build is broken depending on
the cmake options provided. With this fix it should be possible to build
even if the targets are not actually available.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3090#issuecomment-2842760104
2025-05-01 20:15:34 +03:00
b5769d92b4 ggml : suppress Windows compiler warnings (whisper/3075)
* whisper: suppress Windows compiler warnings

This commit disables compiler warnings on window using MSVC.

The motivation for these changes is that some compilers generate
warnings for these conversion, for example Windows MSVC, and
there are quite a few of them. This makes it a little difficult to
spot new warnings that may be introduced and also can be difficult
for users/embedders of ggml where these warnings are hard to separate
from their own warnings.

* squash! whisper: suppress Windows compiler warnings

Move ggml related warnings into ggml. This commit also fixes the
indentation and adds a missing whitespace to the if statement.
2025-05-01 20:15:34 +03:00
8936784f7a mtmd : add **vision** support for Mistral Small 3.1 (#13231)
* convert ok

* load ok, missing patch merger

* ah sheet it works

* update llava/readme

* add test

* fix test
b5243
2025-05-01 17:05:42 +02:00
13c9a3319b arg : remove CURLINFO_EFFECTIVE_METHOD (#13228) b5242 2025-05-01 10:23:25 +02:00
a70183eb00 llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223) b5241 2025-05-01 10:09:41 +03:00
8d33d740c3 sync : ggml 2025-05-01 10:00:39 +03:00
4254bb4951 ggml : fix ggml_gallocr_ptr type (ggml/1205) b5239 2025-05-01 09:58:44 +03:00
9998540149 cuda : fix unused variable compile warning (whisper/0)
ggml-ci
2025-05-01 09:58:44 +03:00
e1e8e0991f CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199) b5237 2025-04-30 23:12:59 +02:00
6f67cf1f48 arg : -hf do not fail if url mismatch (#13219)
* arg : -hf do not fail if url mismatch

* do not return if cannot parse metadata json
b5236
2025-04-30 21:29:15 +01:00
16a457facd fix typo: n_ctx_pre_seq -> n_ctx_per_seq (#13221) b5235 2025-04-30 21:28:43 +01:00
3e168bede4 convert : improve model arch handling (#13122)
* convert : improve model arch handling

* use AutoConfig

* rm trust_remote_code

* Update convert_hf_to_gguf.py

* fix self.block_count for vision

* fix NomicBertModel
2025-04-30 16:56:24 +02:00
ceda28ef8e llava : remove duplicate include (#13207) b5233 2025-04-30 15:25:20 +02:00
3b127c7385 common : add -jf / --json-schema-file flag (#12011) b5232 2025-04-30 14:52:35 +02:00
e5007a5edf vulkan: use uint array index to avoid glslang bug (#13193) b5231 2025-04-30 14:38:37 +02:00
416313773b ggml : fix ppc64le build (#13176)
Build fails with compilation error on power pc.
This patch fixes the same.

Tested with unit tests run via
 --build <build_dir> && cd <build_dir> && make test

Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
b5230
2025-04-30 13:17:08 +02:00
07c2e2f76c convert : correct typo image_mean --> image_std (#13208) 2025-04-30 13:06:15 +02:00
44cd8d91ff feat(ggml-cpu): enable z17 compile (#13182)
z17 compilation requires GCC 15.1.0 and onwards

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
b5228
2025-04-30 10:47:35 +01:00
5933e6fdc9 arg : allow using -hf offline (#13202)
* arg : allow using -hf offline

* add more comments in code [no ci]
2025-04-30 10:46:32 +02:00
da84c04d8f docker : do not build tests (#13204)
* docker : do not build tests

* include "ggml-cpu.h"
b5226
2025-04-30 10:44:07 +02:00
a0f7016d17 rpc : fix cache directory initialization (#13188)
Signed-off-by: xiaofei <hbuxiaofei@gmail.com>
b5225
2025-04-30 09:29:22 +03:00
19e899ce21 scripts: n_depth for compare-llama-bench [no ci] (#13201) 2025-04-29 23:32:04 +02:00
e2e1ddb93a server : Prefilling assistant message in openai compatible API (#13174)
* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
b5223
2025-04-29 20:33:10 +02:00
d9d398f84f sampling : when top-k <= 0 -> noop (#13173)
ggml-ci
b5222
2025-04-29 20:22:57 +03:00
5a63980117 llama-bench: fixed size of fields to correctly map to values (#13183) b5221 2025-04-29 17:24:36 +02:00
cdf76586b2 CUDA: fix non-cont. inputs for batched mat mul (#13155) b5220 2025-04-29 16:00:27 +02:00
7d3af70b08 llama : llm_type order by size (#13177) b5219 2025-04-29 13:25:53 +02:00
00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141)
* llava : add clip_n_output_tokens, deprecate clip_n_patches

* mtmd : add qwen2vl and qwen2.5vl

* decode_embd_batch::set_position_...

* working version

* deprecate llama-qwen2vl-cli

* correct order W, H of clip_embd_nbytes_by_img

* edit existing line in hot topics
b5218
2025-04-29 11:47:04 +02:00
e98b3692be llama : set qwen3 model type sizes (#13175) b5217 2025-04-29 11:00:31 +02:00
b6ce7430b7 llama-graph : fix text position for mrope (#13159)
* llama-graph : fix text position for mrope

* fix typo

* explicitly set 4th dim in the loop
b5216
2025-04-29 09:45:49 +03:00
AT
5f5e39e1ba model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
* Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture

- Adds MoE-based embedding model supporting multilingual embeddings.
- Selects architecture variant based on hyperparameter detection (MoE layers).
- Removes unnecessary subclass initialization checks for clarity.

https://www.nomic.ai/blog/posts/nomic-embed-text-v2

Co-authored-by: Jared Van Bortel <jared@nomic.ai>

* fix tokenizer

* don't rename this tensor

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
b5215
2025-04-28 22:52:15 +03:00
eaea325324 clip : fix model size display (#13153) b5214 2025-04-28 21:23:19 +02:00
43ddab6eee fix(rpc): Improve input validation and error handling (#13069)
* fix(rpc): Improve input validation and error handling

The `rpc-server` was vulnerable to Denial of Service attacks via
several RPC commands (`SET_TENSOR`, `GRAPH_COMPUTE`, etc.). Malformed
messages could trigger failed assertions (e.g., invalid `ggml_type`)
or out-of-bounds reads/writes leading to `GGML_ABORT` calls,
crashing the server process.

This PR introduces robust input validation and replaces `abort()`
calls with graceful error handling:

- **Type Validation:** `deserialize_tensor` now checks if the
  `tensor->type` is within the valid `GGML_TYPE_COUNT` range
  *before* calling `ggml_new_tensor_4d`. Returns `nullptr` on
  invalid type.
- **Bounds Checks:** Replaced `GGML_ABORT` in `set_tensor`,
  `set_tensor_hash`, and `get_tensor` handlers with error
  logging and returning `false` when data/offset parameters
  are out of buffer bounds.
- **Size Checks:** Added safe arithmetic checks (for overflow) in
  `graph_compute` when calculating required message sizes based
  on client-provided `n_nodes` and `n_tensors`. Returns early
  if the reported sizes conflict with the actual message size or
  would lead to overflow.
- **Error Propagation:**
    - `create_node` now checks for `nullptr` return values from
      `deserialize_tensor` and its recursive calls, propagating
      `nullptr` upwards on failure. Uses `find` instead of `at`
      for safer map access.
    - `copy_tensor` now checks for `nullptr` from `deserialize_tensor`
      and sets the response status to failure if deserialization
      or bounds checks fail.
    - `graph_compute` now checks for `nullptr` return from
      `create_node` and returns failure status correctly. The final
      return value now reflects the actual computation status.

These changes improve the RPC server's resilience
against malformed client requests, preventing crashes and ensuring
errors are handled more gracefully.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* refactor(rpc): address pr comments

removed comments and unnecessary returns

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* refactor(rpc): ambiguous nullptr from create_node

rpc_server::create_node could previously return nullptr if the input ID
was 0 (valid) or if an internal error (deserialization, recursion
failure) occurred (invalid). This ambiguity made error handling
difficult for the caller (`graph_compute`).

This commit clarifies the meaning of nullptr:
- `graph_compute` now checks if the input 'id' was non-zero when
  `create_node` returns nullptr, correctly identifying failures
  versus intentional null links.
- `create_node` avoids recursive calls for zero IDs and propagates
  nullptr unambiguously on failure during recursion.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* refactor(rpc): initial zero check in create_node

The caller (`graph_compute`) already checks `id != 0` when handling
a `nullptr` return from `create_node`, correctly distinguishing
intentional null links from actual errors. This makes the initial
`if (id == 0)` check redundant.

Also removes the log message when a tensor ID is not found in the
provided map which was added in this branch.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix(rpc): Handle get_alloc_size failure in server

Check the return value of `server.get_alloc_size` in the RPC server
loop. If the call fails, return early to close the connection.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* refactor(rpc): input size validation in graph_compute

Removes detailed, step-by-step size calculations and overflow
checks in favor of simpler direct comparisons, assuming 64-bit
overflow is unlikely.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* refactor(rpc): remove extra status code setting

Removes the explicit setting of `response.result = GGML_STATUS_FAILED`
when `create_node` returns `nullptr` within `graph_compute`.
Primary signal is the `false` return value in case of failure.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* refactor(rpc): remove redundant check for tensor->type

Breaks CI on ubuntu-cpu-make. Tensor type is uint32_t, thus
the check is not needed.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

---------

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
b5213
2025-04-28 21:00:20 +03:00
1831f538f7 llama-bench: add -d depth arg (#13096)
* add depth param

* update llama-bench README and add depth param

* llama-bench: default params for depth arg for faster execution

* Update examples/llama-bench/README.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* fix buffer print ub

* use user provided args

* remove extra whitespaces

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b5212
2025-04-28 16:50:39 +02:00
4e87962e34 mtmd : fix glm-edge redundant token count (#13139)
* mtmd : fix glm-edge redundant token count

* fix chat template

* temporary disable GLMEdge test chat tmpl
b5211
2025-04-28 16:12:56 +02:00
fb0471d175 context : do not clear output buffer on reserve (#13152)
Co-authored-by: pockers21 <liyang2@uniontech.com>
b5210
2025-04-28 16:45:40 +03:00
d2b2031e5f llama : (mrope) allow using normal 1D position for text token (#13138)
* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
b5209
2025-04-28 14:20:56 +02:00
5fa9e63be8 clip : refactor set input for cgraph + fix qwen2.5vl input (#13136)
* clip : refactor set input for cgraph

* more strict assert

* minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere

* split qwen2 and qwen2.5 code blocks

* minor style fix
b5208
2025-04-28 12:18:59 +02:00
a4c340f974 SYCL: Add all missing unary kernels (#13074)
* SYCL: Add all missing unary kernels

ggml-ci

* decouple kernel launch range from data size using strided loop

* use ciel_div helper for num_blocks
ggml-ci

* clean auto imported header files
b5207
2025-04-28 11:33:25 +02:00
d0a417f3c7 readme : update hot topics (#13150) 2025-04-28 12:10:18 +03:00
43f2b07193 common : fix noreturn compile warning (#13151)
ggml-ci
b5205
2025-04-28 11:57:19 +03:00
e5d6c2554e llama-chat : fix typo GML --> GLM (#13143) b5204 2025-04-28 10:11:58 +02:00
f0dd6a1926 musa: fix typo in cc control (#13144)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-28 09:33:28 +02:00
69699be48a CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137) b5202 2025-04-28 09:29:26 +02:00
85f36e5e71 arg : fix unused variable (#13142) b5201 2025-04-28 08:16:59 +03:00
c0a97b762e llama-bench : Add --override-tensors arg (#12922)
* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)
b5200
2025-04-27 23:48:26 +02:00