Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes ( #13392 )
...
* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Xuan-Son Nguyen
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements ( #13365 )
...
* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log
2025-05-08 15:37:29 +02:00
Georgi Gerganov
51fb96b1ff
context : remove logits_all flag ( #13284 )
...
* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci
2025-05-08 14:26:50 +03:00
Ycros
39e73ae0d6
common : Add a warning when we can't match samplers from a string or char. ( #13330 )
2025-05-07 11:23:28 +03:00
Georgi Gerganov
4773d7a02f
examples : remove infill ( #13283 )
...
ggml-ci
2025-05-07 10:28:02 +03:00
oobabooga
233461f812
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) ( #13264 )
...
* sampling: add Top-nσ sampler to `llama-server` and sampler ordering
* revert: sampler ordering
* revert: VS' crappy auto-formatting
* revert: VS' crappy auto-formatting pt.2
* revert: my crappy eye sight...
* sampling: add XTC to Top-nσ sampler chain
* sampling: add Dyna. Temp. to Top-nσ sampler chain
* sampling: actually remove Top-nσ from sampler(oops)
* Integrate top_n_sigma into main sampler chain
* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA
* Formatting
* Lint
* Exit early in the sampler if nsigma < 0
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com >
2025-05-05 22:12:19 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd ( #13311 )
...
* mv llava to mtmd
* change ref everywhere
2025-05-05 16:02:55 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00
Georgi Gerganov
fab647e884
server : add cache reuse card link to help ( #13230 )
...
* server : add cache reuse card link to help
* args : use short url
2025-05-02 09:48:31 +03:00
Diego Devesa
d7a14c42a1
build : fix build info on windows ( #13239 )
...
* build : fix build info on windows
* fix cuda host compiler msg
2025-05-01 21:48:08 +02:00
Xuan-Son Nguyen
13c9a3319b
arg : remove CURLINFO_EFFECTIVE_METHOD ( #13228 )
2025-05-01 10:23:25 +02:00
Xuan-Son Nguyen
6f67cf1f48
arg : -hf do not fail if url mismatch ( #13219 )
...
* arg : -hf do not fail if url mismatch
* do not return if cannot parse metadata json
2025-04-30 21:29:15 +01:00
Olivier Chafik
3b127c7385
common : add -jf / --json-schema-file flag ( #12011 )
2025-04-30 14:52:35 +02:00
Xuan-Son Nguyen
5933e6fdc9
arg : allow using -hf offline ( #13202 )
...
* arg : allow using -hf offline
* add more comments in code [no ci]
2025-04-30 10:46:32 +02:00
Georgi Gerganov
43f2b07193
common : fix noreturn compile warning ( #13151 )
...
ggml-ci
2025-04-28 11:57:19 +03:00
Xuan-Son Nguyen
85f36e5e71
arg : fix unused variable ( #13142 )
2025-04-28 08:16:59 +03:00
Xuan-Son Nguyen
2d451c8059
common : add common_remote_get_content ( #13123 )
...
* common : add common_remote_get_content
* support max size and timeout
* add tests
2025-04-26 22:58:12 +02:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema ( #13117 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2025-04-26 10:10:20 +02:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama ( #13062 )
...
* cmake : do not include ./src as public for libllama
ggml-ci
* cmake : rework tests
ggml-ci
* llguidance : remove unicode include
ggml-ci
* cmake : make c++17 private
ggml-ci
2025-04-24 16:00:10 +03:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload ( #13093 )
...
* arg : add --no-mmproj-offload
* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf ( #13082 )
...
* arg : clean up handling --mmproj with -hf
* rm change about no_mmproj
* Revert "rm change about no_mmproj"
This reverts commit 2cac8e0efb
.
* handle no_mmproj explicitly
* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen
84a9bf2fc2
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli
( #13012 )
...
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`
* support for minicpmv
* remove cpp files of llava and minicpmv
* update hot topics
* mtmd : add not supported msg for qwen2vl
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-04-21 15:32:58 +02:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX ( #12915 )
2025-04-12 17:33:39 +02:00
Olivier Chafik
b6930ebc42
tool-call
: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 )
...
* `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema)
* test all chat formats w/o tools
2025-04-11 21:47:52 +02:00
yuri@FreeBSD
68b08f36d0
common : Define cache directory on FreeBSD ( #12892 )
2025-04-11 21:45:44 +02:00
tastelikefeet
b2034c2b55
contrib: support modelscope community ( #12664 )
...
* support download from modelscope
* support login
* remove comments
* add arguments
* fix code
* fix win32
* test passed
* fix readme
* revert readme
* change to MODEL_ENDPOINT
* revert tail line
* fix readme
* refactor model endpoint
* remove blank line
* fix header
* fix as comments
* update comment
* update readme
---------
Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
2025-04-11 14:01:56 +02:00
Prajwal B Mehendarkar
1d343b4069
arg : Including limits file on AIX ( #12822 )
2025-04-08 14:30:59 +02:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default ( #12761 )
...
* cmake : enable curl by default
* no curl if no examples
* fix build
* fix build-linux-cross
* add windows-setup-curl
* fix
* shell
* fix path
* fix windows-latest-cmake*
* run: include_directories
* LLAMA_RUN_EXTRA_LIBS
* sycl: no llama_curl
* no test-arg-parser on windows
* clarification
* try riscv64 / arm64
* windows: include libcurl inside release binary
* add msg
* fix mac / ios / android build
* will this fix xcode?
* try clearing the cache
* add bunch of licenses
* revert clear cache
* fix xcode
* fix xcode (2)
* fix typo
2025-04-07 13:35:19 +02:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp ( #12766 )
...
* arg.cpp: add a missing include
* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
エシュナヴァリシア
c6ff5d2a8d
common: custom hf endpoint support ( #12769 )
...
* common: custom hf endpoint support
Add support for custom huggingface endpoints via HF_ENDPOINT environment variable
You can now specify a custom huggingface endpoint using the HF_ENDPOINT environment variable when using the --hf-repo flag, which works similarly to huggingface-cli's endpoint configuration.
Example usage:
HF_ENDPOINT=https://hf-mirror.com/ ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"
The trailing slash in the URL is optional:
HF_ENDPOINT=https://hf-mirror.com ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"
* Update common/arg.cpp
readability Improvement
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Apply suggestions from code review
---------
Co-authored-by: ベアトリーチェ <148695646+MakiSonomura@users.noreply.github.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-04-05 15:31:42 +02:00
Olivier Chafik
7a84777f42
sync: minja ( #12739 )
...
* sync: minja
https://github.com/google/minja/pull/57
* fix json include
2025-04-04 21:16:39 +01:00
R0CKSTAR
5f696e88e0
sync : minja (inclusionAI/Ling) and update tests ( #12699 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-04-03 13:51:35 +02:00
Diego Devesa
e0e912f49b
llama : add option to override model tensor buffers ( #11397 )
...
* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes
2025-04-02 14:52:01 +02:00
Xuan-Son Nguyen
42eb248f46
common : remove json.hpp from common.cpp ( #12697 )
...
* common : remove json.hpp from common.cpp
* fix comment
2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option ( #12694 )
...
* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2
2025-04-01 23:44:05 +02:00
R0CKSTAR
a6f32f0b34
Fix clang warning in gguf_check_reserved_keys ( #12686 )
...
* Fix clang warning in gguf_check_reserved_keys
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Fix typo
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-04-01 13:12:53 +02:00
Johannes Gäßler
dd373dd3bf
llama: fix error on bad grammar ( #12628 )
2025-03-28 18:08:52 +01:00
Piotr
2099a9d5db
server : Support listening on a unix socket ( #12613 )
...
* server : Bump cpp-httplib to include AF_UNIX windows support
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
* server : Allow running the server example on a unix socket
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
---------
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
2025-03-27 23:41:04 +01:00
Michał Moskal
2447ad8a98
upgrade to llguidance 0.7.10 ( #12576 )
2025-03-26 11:06:09 -07:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option ( #12398 )
...
* added -o option to specify an output file name
* llama-tts returns ENOENT in case of file write error
note : PR #12042 is closed as superseded with this one.
2025-03-15 17:23:11 +01:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file ( #12249 ) ( #12250 )
...
* add system_prompt_file
* add -sysf / --system-prompt-file
* remove system_prompt_file
2025-03-14 16:57:05 +01:00
fairydreaming
8fcb563613
Load all MoE experts during warmup ( #11571 )
...
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
2025-03-14 13:47:05 +01:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill ( #12364 )
2025-03-13 12:34:54 +01:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option ( #12278 )
...
As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
2025-03-10 13:34:13 +02:00
Olivier Chafik
4e39a3c332
server
: extract <think> tags from qwq outputs (#12297 )
...
* extract <think> tags from qwq outputs
* const for all static regexes in chat.cpp
2025-03-10 10:59:03 +00:00
Olivier Chafik
87c2630546
allow missing content in message if tool_calls provided ( #12293 )
2025-03-10 09:45:07 +00:00
Georgi Gerganov
1e2f78a004
server : add speculative decoding presets for FIM ( #12287 )
2025-03-09 19:08:20 +02:00
Olivier Chafik
7cf64f6bee
sync: minja - support QwQ-32B ( #12235 )
...
8a76f7815e
2025-03-07 09:33:37 +00:00