Commit Graph

5076 Commits

Author SHA1 Message Date
34c961b181 CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315)
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
b4876
2025-03-12 10:14:11 +01:00
7841fc723e llama : Add Gemma 3 support (+ experimental vision capability) (#12343)
* llama : Add Gemma 3 text-only support

* fix python coding style

* fix compile on ubuntu

* python: fix style

* fix ubuntu compile

* fix build on ubuntu (again)

* fix ubuntu build, finally

* clip : Experimental support for Gemma 3 vision (#12344)

* clip : Experimental support for Gemma 3 vision

* fix build

* PRId64
b4875
2025-03-12 09:30:24 +01:00
bf69cfe62f vulkan: fix bug in coopmat1 mul_mat_id (#12316)
* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id
b4874
2025-03-12 06:59:19 +01:00
10f2e81809 CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177)
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b4873
2025-03-11 20:16:03 +01:00
ba7654380a ggml-backend : fix backend search path (#12330)
* Fix backend search path

* replace .native() with '/'

* reverted .native()
b4872
2025-03-11 14:25:17 +01:00
6ab2e4765a metal : Cache the Metal library at the device context level (#12265) b4871 2025-03-11 13:45:02 +02:00
96e1280839 clip : bring back GPU support (#12322)
* clip : bring back GPU support

* use n_gpu_layers param

* fix double free

* ggml_backend_init_by_type

* clean up
b4870
2025-03-11 09:20:16 +01:00
Eve
2c9f833d17 mat vec double buffer (#12188) b4869 2025-03-10 19:28:11 +00:00
251364549f musa: support new arch mp_31 and update doc (#12296)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b4868
2025-03-10 18:18:25 +01:00
8acdacb3ea opencl: use OpenCL C standard supported by the device (#12221)
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
b4867
2025-03-10 09:57:00 -07:00
89b2b56e86 readme: added Sidekick to available UIs (#12311) 2025-03-10 16:13:09 +02:00
e128a1bf5b tests : fix test-quantize-fns to init the CPU backend (#12306)
ggml-ci
b4865
2025-03-10 14:07:15 +02:00
6ef79a67ca common : refactor '-o' option (#12278)
As discussed in PR 'llama-tts : add -o option' (#12042):

* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.

* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
b4864
2025-03-10 13:34:13 +02:00
4e39a3c332 server: extract <think> tags from qwq outputs (#12297)
* extract <think> tags from qwq outputs

* const for all static regexes in chat.cpp
b4863
2025-03-10 10:59:03 +00:00
be421fc429 tool-call: ensure there's always a non-empty tool call id (#12292) 2025-03-10 09:45:29 +00:00
87c2630546 allow missing content in message if tool_calls provided (#12293) b4861 2025-03-10 09:45:07 +00:00
2b3a25c212 sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291)
* Fix typo in lazy grammar handling (fixes trigger tokens)

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b4860
2025-03-10 09:44:42 +00:00
8352cdc87b llava : fix bug in minicpm-v code (#11513)
* fix bug in minicpm-v code

* update readme of minicpm-v
b4859
2025-03-10 10:33:24 +02:00
1e2f78a004 server : add speculative decoding presets for FIM (#12287) 2025-03-09 19:08:20 +02:00
0fd7ca7a21 authors : update (#12271) 2025-03-08 18:26:00 +02:00
6fefc05a7a ggml-backend : make path_str compatible with C++20 (#12269) b4856 2025-03-08 17:02:39 +01:00
7ab364390f server : infill gen ends on new line (#12254) b4855 2025-03-07 20:54:30 +02:00
7c7f3b7f43 ggml : skip intermediate .air file when compiling .metallib (#12247)
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
b4854
2025-03-07 14:15:27 +01:00
102ac1891d sync : ggml
ggml-ci
b4853
2025-03-07 14:49:44 +02:00
d6ae2fa061 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.
2025-03-07 14:49:44 +02:00
68d0027f3d ggml-cpu: faster AVX2 variant for IQ1_M (#12216) b4851 2025-03-07 13:54:22 +02:00
ea002810a2 ci : fix save-load test invocations (#12245) 2025-03-07 12:19:31 +02:00
8fad3c7a7c server : Log original chat template parsing error (#12233) b4849 2025-03-07 11:15:33 +01:00
7cf64f6bee sync: minja - support QwQ-32B (#12235)
8a76f7815e
b4848
2025-03-07 09:33:37 +00:00
5e2d57b2b2 metal : simplify kernel arguments using a struct (#3229) (#12194)
* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <alexju@tencent.com>
b4847
2025-03-07 08:35:57 +01:00
f1648e91cf HIP: fix rocWMMA build flags under Windows (#12230) b4846 2025-03-07 08:06:08 +01:00
d6c95b0740 metal : fix default.metallib build (#12224)
This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.
2025-03-07 06:23:16 +01:00
d76a86d967 opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (#12217)
* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`
2025-03-07 00:20:35 +00:00
776f9e59cc cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094)
Signed-off-by: Ray Lee <hburaylee@gmail.com>
Co-authored-by: Ray Lee <hburaylee@gmail.com>
2025-03-06 22:58:25 +00:00
3d652bfddf readme : update bindings (#12229) 2025-03-06 21:15:13 +02:00
5220a16d18 CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222) 2025-03-06 18:45:09 +01:00
3ffbbd5ce1 HIP: rocWMMA documentation and enabling in workflow builds (#12179)
* Enable rocWMMA for Windows CI build

* Enable for Ubuntu

* GGML_HIP_ROCWMMA_FATTN documentation work
2025-03-06 14:14:11 +01:00
42994048a3 update function-calling.md w/ template override for functionary-small-v3.2 (#12214) 2025-03-06 09:03:31 +00:00
e9b2f84f14 llava: add big-endian conversion for image encoder (#12218)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-03-06 09:33:21 +01:00
e721c05c93 HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209)
This avoids conflict with internal cuda/hip runtimes memory managment behavior.
b4837
2025-03-06 08:20:52 +01:00
57b6abf85a android : fix KV cache log message condition (#12212) b4836 2025-03-06 08:22:49 +02:00
94bb63e4f0 opencl : fix buffer alignment (#12197)
Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.
b4835
2025-03-06 02:33:40 +01:00
f79243992c opencl : fix ulong kernel args were set from int variables (#12174)
... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.
b4834
2025-03-06 02:31:14 +01:00
ed4ce0dda2 opencl : fix profile-related errors (#12095)
Co-authored-by: ubuntu <ubuntu@localhost.localdomain>
b4833
2025-03-06 02:30:05 +01:00
07d1572347 ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154)
* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC
b4832
2025-03-06 02:26:10 +01:00
5e43f104cc SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201) b4831 2025-03-05 16:58:23 +01:00
16e4b22c5e ggml : fix GGMLMetalClass ODR (#12200)
-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.
b4830
2025-03-05 17:16:01 +02:00
074c4fd39d ci : add fetch-depth to xcframework upload (#12195)
This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.

This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.
b4829
2025-03-05 14:16:40 +01:00
669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
fa31c438e0 ci : fix xcframework artifact tag (#12191)
The commit add the name parameter to the upload-artifact action to
ensure that the artifact is uploaded with the correct name.

The motivation for this is that currently the uploaded xcframework
is named as llama-b1-xcframework.zip. With this change the name of this
artifact should contain the build number like the other artifacts.
b4827
2025-03-05 10:22:29 +01:00