llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-28 13:20:27 -04:00

Author	SHA1	Message	Date
Diego Devesa	15e03282bb	ci : limit write permission to only the release step + fixes (#13392 ) * ci : limit write permission to only the release step * fix win cuda file name * fix license file copy on multi-config generators	2025-05-08 23:45:22 +02:00
Xuan-Son Nguyen	8c83449cb7	server : (webui) revamp the input area, plus many small UI improvements (#13365 ) * rework the input area * process selected file * change all icons to heroicons * fix thought process collapse * move conversation more menu to sidebar * sun icon --> moon icon * rm default system message * stricter upload file check, only allow image if server has mtmd * build it * add renaming * better autoscroll * build * add conversation group * fix scroll * extra context first, then user input in the end * fix <hr> tag * clean up a bit * build * add mb-3 for <pre> * throttle adjustTextareaHeight to make it less laggy * (nits) missing padding in sidebar * rm stray console log	2025-05-08 15:37:29 +02:00
Georgi Gerganov	51fb96b1ff	context : remove logits_all flag (#13284 ) * context : remove logits_all flag ggml-ci * llama : remove logits_all flag + reorder llama_context_params ggml-ci	2025-05-08 14:26:50 +03:00
Ycros	39e73ae0d6	common : Add a warning when we can't match samplers from a string or char. (#13330 )	2025-05-07 11:23:28 +03:00
Georgi Gerganov	4773d7a02f	examples : remove infill (#13283 ) ggml-ci	2025-05-07 10:28:02 +03:00
oobabooga	233461f812	sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264 ) * sampling: add Top-nσ sampler to `llama-server` and sampler ordering * revert: sampler ordering * revert: VS' crappy auto-formatting * revert: VS' crappy auto-formatting pt.2 * revert: my crappy eye sight... * sampling: add XTC to Top-nσ sampler chain * sampling: add Dyna. Temp. to Top-nσ sampler chain * sampling: actually remove Top-nσ from sampler(oops) * Integrate top_n_sigma into main sampler chain * Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA * Formatting * Lint * Exit early in the sampler if nsigma < 0 --------- Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>	2025-05-05 22:12:19 +02:00
Xuan-Son Nguyen	9b61acf060	mtmd : rename llava directory to mtmd (#13311 ) * mv llava to mtmd * change ref everywhere	2025-05-05 16:02:55 +02:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Georgi Gerganov	fab647e884	server : add cache reuse card link to help (#13230 ) * server : add cache reuse card link to help * args : use short url	2025-05-02 09:48:31 +03:00
Diego Devesa	d7a14c42a1	build : fix build info on windows (#13239 ) * build : fix build info on windows * fix cuda host compiler msg	2025-05-01 21:48:08 +02:00
Xuan-Son Nguyen	13c9a3319b	arg : remove CURLINFO_EFFECTIVE_METHOD (#13228 )	2025-05-01 10:23:25 +02:00
Xuan-Son Nguyen	6f67cf1f48	arg : -hf do not fail if url mismatch (#13219 ) * arg : -hf do not fail if url mismatch * do not return if cannot parse metadata json	2025-04-30 21:29:15 +01:00
Olivier Chafik	3b127c7385	common : add -jf / --json-schema-file flag (#12011 )	2025-04-30 14:52:35 +02:00
Xuan-Son Nguyen	5933e6fdc9	arg : allow using -hf offline (#13202 ) * arg : allow using -hf offline * add more comments in code [no ci]	2025-04-30 10:46:32 +02:00
Georgi Gerganov	43f2b07193	common : fix noreturn compile warning (#13151 ) ggml-ci	2025-04-28 11:57:19 +03:00
Xuan-Son Nguyen	85f36e5e71	arg : fix unused variable (#13142 )	2025-04-28 08:16:59 +03:00
Xuan-Son Nguyen	2d451c8059	common : add common_remote_get_content (#13123 ) * common : add common_remote_get_content * support max size and timeout * add tests	2025-04-26 22:58:12 +02:00
frob	d5fe4e81bd	grammar : handle maxItems == 0 in JSON schema (#13117 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-04-26 10:10:20 +02:00
Georgi Gerganov	13b4548877	cmake : do not include ./src as public for libllama (#13062 ) * cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci	2025-04-24 16:00:10 +03:00
Xuan-Son Nguyen	7c727fbe39	arg : add --no-mmproj-offload (#13093 ) * arg : add --no-mmproj-offload * Update common/arg.cpp	2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen	80982e815e	arg : clean up handling --mmproj with -hf (#13082 ) * arg : clean up handling --mmproj with -hf * rm change about no_mmproj * Revert "rm change about no_mmproj" This reverts commit `2cac8e0efb`. * handle no_mmproj explicitly * skip download mmproj on examples not using it	2025-04-24 12:14:13 +02:00
Xuan-Son Nguyen	243453533e	llava : update documentations (#13055 ) * llava : update documentations * fix typo	2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen	84a9bf2fc2	mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012 ) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-21 15:32:58 +02:00
Prajwal B Mehendarkar	bc091a4dc5	common : Define cache directory on AIX (#12915 )	2025-04-12 17:33:39 +02:00
Olivier Chafik	b6930ebc42	`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 ) * `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema) * test all chat formats w/o tools	2025-04-11 21:47:52 +02:00
yuri@FreeBSD	68b08f36d0	common : Define cache directory on FreeBSD (#12892 )	2025-04-11 21:45:44 +02:00
tastelikefeet	b2034c2b55	contrib: support modelscope community (#12664 ) * support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>	2025-04-11 14:01:56 +02:00
Prajwal B Mehendarkar	1d343b4069	arg : Including limits file on AIX (#12822 )	2025-04-08 14:30:59 +02:00
Xuan-Son Nguyen	bd3f59f812	cmake : enable curl by default (#12761 ) * cmake : enable curl by default * no curl if no examples * fix build * fix build-linux-cross * add windows-setup-curl * fix * shell * fix path * fix windows-latest-cmake* * run: include_directories * LLAMA_RUN_EXTRA_LIBS * sycl: no llama_curl * no test-arg-parser on windows * clarification * try riscv64 / arm64 * windows: include libcurl inside release binary * add msg * fix mac / ios / android build * will this fix xcode? * try clearing the cache * add bunch of licenses * revert clear cache * fix xcode * fix xcode (2) * fix typo	2025-04-07 13:35:19 +02:00
Sergey Fedorov	f1e3eb4249	common : fix includes in arg.cpp and gemma3-cli.cpp (#12766 ) * arg.cpp: add a missing include * gemma3-cli.cpp: fix cinttypes include	2025-04-05 17:46:00 +02:00
エシュナヴァリシア	c6ff5d2a8d	common: custom hf endpoint support (#12769 ) * common: custom hf endpoint support Add support for custom huggingface endpoints via HF_ENDPOINT environment variable You can now specify a custom huggingface endpoint using the HF_ENDPOINT environment variable when using the --hf-repo flag, which works similarly to huggingface-cli's endpoint configuration. Example usage: HF_ENDPOINT=https://hf-mirror.com/ ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is" The trailing slash in the URL is optional: HF_ENDPOINT=https://hf-mirror.com ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is" * Update common/arg.cpp readability Improvement Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Apply suggestions from code review --------- Co-authored-by: ベアトリーチェ <148695646+MakiSonomura@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-04-05 15:31:42 +02:00
Olivier Chafik	7a84777f42	sync: minja (#12739 ) * sync: minja https://github.com/google/minja/pull/57 * fix json include	2025-04-04 21:16:39 +01:00
R0CKSTAR	5f696e88e0	sync : minja (inclusionAI/Ling) and update tests (#12699 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-03 13:51:35 +02:00
Diego Devesa	e0e912f49b	llama : add option to override model tensor buffers (#11397 ) * llama : add option to override tensor buffers * ggml : fix possible underflow in ggml_nbytes	2025-04-02 14:52:01 +02:00
Xuan-Son Nguyen	42eb248f46	common : remove json.hpp from common.cpp (#12697 ) * common : remove json.hpp from common.cpp * fix comment	2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
R0CKSTAR	a6f32f0b34	Fix clang warning in gguf_check_reserved_keys (#12686 ) * Fix clang warning in gguf_check_reserved_keys Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix typo Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-01 13:12:53 +02:00
Johannes Gäßler	dd373dd3bf	llama: fix error on bad grammar (#12628 )	2025-03-28 18:08:52 +01:00
Piotr	2099a9d5db	server : Support listening on a unix socket (#12613 ) * server : Bump cpp-httplib to include AF_UNIX windows support Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> * server : Allow running the server example on a unix socket Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> --------- Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-03-27 23:41:04 +01:00
Michał Moskal	2447ad8a98	upgrade to llguidance 0.7.10 (#12576 )	2025-03-26 11:06:09 -07:00
marcoStocchi	f4c3dd5daa	llama-tts : add '-o' option (#12398 ) * added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR #12042 is closed as superseded with this one.	2025-03-15 17:23:11 +01:00
Sigbjørn Skjæret	774973b8f3	main : add -sysf / --system-prompt-file (#12249 ) (#12250 ) * add system_prompt_file * add -sysf / --system-prompt-file * remove system_prompt_file	2025-03-14 16:57:05 +01:00
fairydreaming	8fcb563613	Load all MoE experts during warmup (#11571 ) * llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup * common : use new API to enable warmup mode during model warmup --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-03-14 13:47:05 +01:00
Xuan-Son Nguyen	be7c303410	arg : no n_predict = -2 for examples except for main and infill (#12364 )	2025-03-13 12:34:54 +01:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
marcoStocchi	6ef79a67ca	common : refactor '-o' option (#12278 ) As discussed in PR 'llama-tts : add -o option' (#12042): * common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option. * cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.	2025-03-10 13:34:13 +02:00
Olivier Chafik	4e39a3c332	`server`: extract <think> tags from qwq outputs (#12297 ) * extract <think> tags from qwq outputs * const for all static regexes in chat.cpp	2025-03-10 10:59:03 +00:00
Olivier Chafik	87c2630546	allow missing content in message if tool_calls provided (#12293 )	2025-03-10 09:45:07 +00:00
Georgi Gerganov	1e2f78a004	server : add speculative decoding presets for FIM (#12287 )	2025-03-09 19:08:20 +02:00
Olivier Chafik	7cf64f6bee	sync: minja - support QwQ-32B (#12235 ) `8a76f7815e`	2025-03-07 09:33:37 +00:00

1 2 3 4 5 ...

504 Commits