Xuan Son Nguyen
624a683c6f
fix compile
2025-03-14 22:30:29 +01:00
Xuan Son Nguyen
116b9a1662
rename to init_from_text
2025-03-14 22:17:07 +01:00
Xuan Son Nguyen
eaffba0f2e
llama_batch_ext_ptr::from_text/embd
2025-03-14 17:12:03 +01:00
Xuan Son Nguyen
8e7714fa77
fix compile
2025-03-14 11:28:15 +01:00
Xuan Son Nguyen
a363251fac
qwen2vl: use llama_batch_ext_set_pos
2025-03-14 11:25:36 +01:00
Xuan Son Nguyen
ba79369615
fix llama_batch_ext_init_from_embd
2025-03-14 11:17:22 +01:00
Xuan Son Nguyen
07d84fa3c2
fix missing n_past in various places
...
this is actually a revert of cda0e4b648
2025-03-14 10:47:08 +01:00
Xuan Son Nguyen
32940369d3
fix gemma3-cli
2025-03-14 10:33:28 +01:00
Xuan Son Nguyen
5e6a6d4e1c
fix llama-run n_past
2025-03-14 10:32:43 +01:00
Xuan Son Nguyen
bfdddbc150
bring back mistakenly deleted llama_batch_init/free
2025-03-14 00:22:28 +01:00
Xuan Son Nguyen
54566ad95d
correct comment
2025-03-14 00:21:06 +01:00
Xuan Son Nguyen
04f8641815
rm redundant llama_batch_ext_set_output_last
2025-03-13 23:14:16 +01:00
Xuan Son Nguyen
c3dd79007b
fix llama_batch_ext_init_from_text
2025-03-13 23:09:27 +01:00
Xuan Son Nguyen
65f0184517
compile ok
2025-03-13 22:56:35 +01:00
Xuan Son Nguyen
9fb2d81eab
fix common_batch missing seq_id
2025-03-13 22:38:04 +01:00
Xuan Son Nguyen
47086fa82d
apply to the rest
2025-03-13 22:36:27 +01:00
Xuan Son Nguyen
4aabf4e8f4
return output ID from llama_batch_ext_add/set
2025-03-13 17:47:07 +01:00
Xuan Son Nguyen
86973cb14a
fix merge errors
2025-03-13 17:32:36 +01:00
Xuan Son Nguyen
17f954c8e2
Merge branch 'master' into xsn/private_batch_api
2025-03-13 15:55:18 +01:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill ( #12364 )
b4882
2025-03-13 12:34:54 +01:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d
server : fix crash when using verbose output with input tokens that are not in printable range ( #12178 ) ( #12338 )
...
* Fix DOS index bug
* Remove new APIs
* remove extra line
* Remove from API
* Add extra newline
* Update examples/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
b4880
2025-03-13 11:10:05 +01:00
Oscar Barenys
f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support ( #12301 )
b4879
2025-03-12 20:06:58 +01:00
Daniel Bevenius
80a02aa858
llama.swiftui : fix xcframework dir in README [no ci] ( #12353 )
...
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Alberto Cabrera Pérez
363f8c5d67
sycl : variable sg_size support for mmvq kernels ( #12336 )
b4877
2025-03-12 09:57:32 +00:00
uvos
34c961b181
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 ( #12315 )
...
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
b4876
2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
b4875
2025-03-12 09:30:24 +01:00
Jeff Bolz
bf69cfe62f
vulkan: fix bug in coopmat1 mul_mat_id ( #12316 )
...
* tests: run mul_mat_id with a larger N
* vulkan: fix bug in coopmat1 mul_mat_id
b4874
2025-03-12 06:59:19 +01:00
uvos
10f2e81809
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. ( #12177 )
...
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
b4873
2025-03-11 20:16:03 +01:00
jklincn
ba7654380a
ggml-backend : fix backend search path ( #12330 )
...
* Fix backend search path
* replace .native() with '/'
* reverted .native()
b4872
2025-03-11 14:25:17 +01:00
BB-fat
6ab2e4765a
metal : Cache the Metal library at the device context level ( #12265 )
b4871
2025-03-11 13:45:02 +02:00
Xuan-Son Nguyen
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
b4870
2025-03-11 09:20:16 +01:00
Eve
2c9f833d17
mat vec double buffer ( #12188 )
b4869
2025-03-10 19:28:11 +00:00
R0CKSTAR
251364549f
musa: support new arch mp_31 and update doc ( #12296 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b4868
2025-03-10 18:18:25 +01:00
Henry Linjamäki
8acdacb3ea
opencl: use OpenCL C standard supported by the device ( #12221 )
...
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
b4867
2025-03-10 09:57:00 -07:00
John Bean
89b2b56e86
readme: added Sidekick to available UIs ( #12311 )
2025-03-10 16:13:09 +02:00
Georgi Gerganov
e128a1bf5b
tests : fix test-quantize-fns to init the CPU backend ( #12306 )
...
ggml-ci
b4865
2025-03-10 14:07:15 +02:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option ( #12278 )
...
As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
b4864
2025-03-10 13:34:13 +02:00
Olivier Chafik
4e39a3c332
server
: extract <think> tags from qwq outputs (#12297 )
...
* extract <think> tags from qwq outputs
* const for all static regexes in chat.cpp
b4863
2025-03-10 10:59:03 +00:00
Olivier Chafik
be421fc429
tool-call
: ensure there's always a non-empty tool call id (#12292 )
2025-03-10 09:45:29 +00:00
Olivier Chafik
87c2630546
allow missing content in message if tool_calls provided ( #12293 )
b4861
2025-03-10 09:45:07 +00:00
Olivier Chafik
2b3a25c212
sampler
: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 )
...
* Fix typo in lazy grammar handling (fixes trigger tokens)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b4860
2025-03-10 09:44:42 +00:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
b4859
2025-03-10 10:33:24 +02:00
Georgi Gerganov
1e2f78a004
server : add speculative decoding presets for FIM ( #12287 )
2025-03-09 19:08:20 +02:00
Georgi Gerganov
0fd7ca7a21
authors : update ( #12271 )
2025-03-08 18:26:00 +02:00
Jason C.H
6fefc05a7a
ggml-backend : make path_str compatible with C++20 ( #12269 )
b4856
2025-03-08 17:02:39 +01:00
Georgi Gerganov
7ab364390f
server : infill gen ends on new line ( #12254 )
b4855
2025-03-07 20:54:30 +02:00
Daniel Bevenius
7c7f3b7f43
ggml : skip intermediate .air file when compiling .metallib ( #12247 )
...
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.
The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
b4854
2025-03-07 14:15:27 +01:00
Georgi Gerganov
102ac1891d
sync : ggml
...
ggml-ci
b4853
2025-03-07 14:49:44 +02:00
vmobilis
d6ae2fa061
ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
...
* ggml_compute_forward_concat() for arbitrary tensor type
* Check that tensors' type match
* ggml-cpu.c: check type of source tensors
* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()
* ggml.c: check concatenated tensor type
* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c
..., as it was moved to ggml.c.
2025-03-07 14:49:44 +02:00