llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-28 13:20:27 -04:00

Author	SHA1	Message	Date
Radoslav Gerganov	ae77ded2c2	docs : fix backends table in README.md (#14796 )	2025-07-25 21:24:50 +08:00
Aman Gupta	2be60cbc27	docs : fix link for tools/perplexity in README.md (#14780 )	2025-07-20 20:13:47 +02:00
Reese Levine	21c021745d	ggml: Add initial WebGPU backend (#14521 ) * Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults * Initialize webgpu device * Making progress on setting up the backend * Finish more boilerplate/utility functions * Organize file and work on alloc buffer * Add webgpu_context to prepare for actually running some shaders * Work on memset and add shader loading * Work on memset polyfill * Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it * Implement get_tensor and buffer_clear * Finish rest of setup * Start work on compute graph * Basic mat mul working * Work on emscripten build * Basic WebGPU backend instructions * Use EMSCRIPTEN flag * Work on passing ci, implement 4d tensor multiplication * Pass thread safety test * Implement permuting for mul_mat and cpy * minor cleanups * Address feedback * Remove division by type size in cpy op * Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends * Fix name * Fix macos dawn prefix path	2025-07-16 18:18:51 +03:00
Tarek Dakhran	67eade1bf9	docs : add LFM2 to models section (#14650 ) * readme : add LFM2 to models section * fix copy paste...	2025-07-12 19:07:08 +02:00
Georgi Gerganov	aaa088d87f	readme : add hot PRs (#14636 ) * readme : add hot PRs * cont * readme : update title * readme : hot PRs links * cont	2025-07-11 16:07:55 +03:00
Georgi Gerganov	b7cc7745e3	readme : remove survey link (#14168 )	2025-06-13 11:55:44 +03:00
Georgi Gerganov	a681b4ba83	readme : remove project status link (#14149 )	2025-06-12 14:43:09 +03:00
Olexandr88	d01d112abb	readme : add badge (#13938 )	2025-06-05 10:50:55 +03:00
Xuan-Son Nguyen	ea1431b0fa	docs : add "Quick start" section for new users (#13862 ) * docs : add "Quick start" section for non-technical users * rm flox * Update README.md	2025-06-03 13:09:36 +02:00
ddh0	8726392d3d	readme : update bindings (#13950 )	2025-06-01 11:44:30 +03:00
Xuan-Son Nguyen	797990c4bc	mtmd : add ultravox audio input (#13623 ) * convert ok, load ok * warmup ok * test * still does not work? * fix padding * temporary give up * fix merge conflict * build_ultravox() * rm test * fix merge conflict * add necessary mtmd APIs * first working version (only 4s of audio) * will this monster compile? * fix compile * please compile * fPIC * fix windows * various fixes * clean up audio_helpers * fix conversion * add some debug stuff * long audio input ok * adapt the api * add --audio arg * final touch UX * add miniaudio to readme * fix typo * refactor kv metadata * mtmd_default_marker()	2025-05-22 20:42:48 +02:00
R0CKSTAR	33983057d0	musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647 ) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-05-21 09:58:49 +08:00
Xuan-Son Nguyen	06c1e4abc1	readme : add list of dependencies and their license (#13591 )	2025-05-16 20:04:18 +02:00
Xuan-Son Nguyen	33eff40240	server : vision support via libmtmd (#12898 ) * server : (experimental) vision support via libmtmd * mtmd : add more api around mtmd_image_tokens * mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * abstract out the batch management * small fix * refactor logic adding tokens to batch * implement hashing image * use FNV hash, now hash bitmap instead of file data * allow decoding image embedding to be split into batches * rm whitespace * disable some features when mtmd is on * fix --no-mmproj-offload * mtmd_context_params no timings * refactor server_inp to server_tokens * fix the failing test case * init * wip * working version * add mtmd::bitmaps * add test target * rm redundant define * test: mtmd_input_chunks_free * rm outdated comment * fix merging issue * explicitly create mtmd::input_chunks * mtmd_input_chunk_copy * add clone() * improve server_input struct * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff * fix detokenize * add const to various places * add warning about breaking changes * add c api * helper: use mtmd_image_tokens_get_n_pos * fix ctx_shift * fix name shadowing * more strict condition * support remote image_url * remote image_url log * add CI test * do not log base64 * add "has_multimodal" to /props * remove dangling image * speculative: use slot.cache_tokens.insert * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * rm can_be_detokenized * on prmpt processing done, assert cache_tokens.size * handle_completions_impl returns void * adapt the new web ui * update docs and hot topics * rm assert * small fix (2) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-09 19:29:37 +02:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Xuan-Son Nguyen	00e3e5a194	mtmd : add qwen2vl and qwen2.5vl (#13141 ) * llava : add clip_n_output_tokens, deprecate clip_n_patches * mtmd : add qwen2vl and qwen2.5vl * decode_embd_batch::set_position_... * working version * deprecate llama-qwen2vl-cli * correct order W, H of clip_embd_nbytes_by_img * edit existing line in hot topics	2025-04-29 11:47:04 +02:00
Georgi Gerganov	d0a417f3c7	readme : update hot topics (#13150 )	2025-04-28 12:10:18 +03:00
Xuan-Son Nguyen	84a9bf2fc2	mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012 ) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-21 15:32:58 +02:00
tastelikefeet	b2034c2b55	contrib: support modelscope community (#12664 ) * support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>	2025-04-11 14:01:56 +02:00
Yuxuan Zhang	06bb53ad9b	llama-model : add Glm4Model implementation for GLM-4-0414 (#12867 ) * GLM-4-0414 * use original one * Using with tensor map * fix bug * change order * change order * format with flask8	2025-04-11 12:10:10 +02:00
Georgi Gerganov	47277d6d1d	readme : add rpc backend (#12842 )	2025-04-09 10:54:42 +03:00
Daniel Bevenius	348888e0dc	docs : add XCFramework section to README.md [no ci] (#12746 ) This commit adds a new section to the README.md file, detailing the usage of the XCFramework. The motivation for this is that it might not be immediately clear to users how to use the XCFramework in their projects and hopefully this will help.	2025-04-04 10:24:12 +02:00
Sigbjørn Skjæret	2c3f8b850a	llama : support BailingMoE (Ling) (#12634 )	2025-03-30 22:21:03 +02:00
Juyoung Suk	b3de7cac73	llama : add Trillion 7B model support (#12556 ) * Support Trillion 7B * Update llama.h * Update llama.h * Update llama-vocab.cpp for Trillion * Update llama-vocab.cpp	2025-03-30 20:38:33 +02:00
John Bean	89b2b56e86	readme: added Sidekick to available UIs (#12311 )	2025-03-10 16:13:09 +02:00
Lucas Moura Belo	3d652bfddf	readme : update bindings (#12229 )	2025-03-06 21:15:13 +02:00
Olivier Chafik	669912d9a5	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 ) * sampler: turn lazy grammar trigger words to regexes * add scripts/tool_bench.sh & .py * constrain llama json output regardless of function name if matches at beginning * update relaxed newline space rule in grammar tests * support add_generation_prompt query parameter (useful for /apply_template) * Update src/llama-grammar.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-05 13:05:13 +00:00
Georgi Gerganov	20a9b8f5e1	readme : fix roadmap link (#12185 )	2025-03-04 18:42:44 +02:00
Kante Yin	53e4db1012	readme : update infra list (#9096 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2025-02-26 09:49:36 +02:00
Georgi Gerganov	c2cd24fbfd	readme : add notice about new package registry (#11890 ) * readme : add notice about new package registry * cont : fix whitespace	2025-02-15 20:29:56 +02:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Georgi Gerganov	04045bb842	readme : minor	2025-02-14 00:16:56 +02:00
Daniel Bevenius	c48f630d1c	llama : add --completion-bash option (#11846 ) This commit adds a new option `--completion-bash` to the llama.cpp which outputs a source-able bash completion script. The motivation for this change is to provide a more user-friendly experience for users who use the command-line interface of llama.cpp. This is currently only basic and all options are displayed for all llama executables but this can be improved in the future if needed. Example usage: ```console $ build/bin/llama-cli --completion-bash > ~/.llama-completion.bash $ source ~/.llama-completion.bash $ ./build/bin/llama-server --m<TAB> --main-gpu --mirostat --mirostat-lr --model --multiline-input --min-p --mirostat-ent --mlock --model-url ```	2025-02-13 14:46:59 +01:00
lhez	4078c77f98	docs: add OpenCL (#11697 )	2025-02-11 15:04:13 -07:00
Matvey Soloviev	c3db0480bb	readme : add link to Autopen under UIs (#11684 ) Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.	2025-02-06 01:55:25 +01:00
Shelby Jenkins	106045e7bb	readme : add llm_client Rust crate to readme bindings (#11628 ) [This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it. It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible. It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face. So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.	2025-02-04 13:20:55 +02:00
piDack	0cec062a63	llama : add support for GLM-Edge and GLM-Edge-V series models (#10573 ) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <\|end\|> or </s> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <yuhang.li@zhipuai.cn> Co-authored-by: piDack <pcdack@hotmail.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: liyuhang <yuhang.li@aminer.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-02 09:48:46 +02:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Guspan Tanadi	7919256c57	readme : reference examples relative links (#11505 )	2025-01-30 06:58:02 +01:00
Georgi Gerganov	2cc9b8c32c	readme : update hot topics	2025-01-26 14:30:15 +02:00
Georgi Gerganov	16d3df7ab0	readme : add plugin links (#11355 )	2025-01-22 19:44:26 +02:00
musoles	7a689c415e	README : added kalavai to infrastructure list (#11216 )	2025-01-17 01:10:49 +01:00
Xuan Son Nguyen	84a44815f7	cli : auto activate conversation mode if chat template is available (#11214 ) * cli : auto activate conversation mode if chat template is detected * add warn on bad template * update readme (writing with the help of chatgpt) * update readme (2) * do not activate -cnv for non-instruct models	2025-01-13 20:18:12 +01:00
Molly Sophia	ee7136c6d1	llama: add support for QRWKV6 model architecture (#11001 ) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-10 09:58:08 +08:00
Pierrick Hymbert	f8feb4b01a	model: Add support for PhiMoE arch (#11003 ) * model: support phimoe * python linter * doc: minor Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> * doc: minor Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> * doc: add phimoe as supported model ggml-ci --------- Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>	2025-01-09 11:21:41 +01:00
Benson Wong	a45433ba20	readme : add llama-swap to infrastructure section (#11032 ) * list llama-swap under tools in README * readme: add llama-swap to Infrastructure	2025-01-02 09:14:54 +02:00
Eric Curtin	7909e8588d	llama-run : improve progress bar (#10821 ) Set default width to whatever the terminal is. Also fixed a small bug around default n_gpu_layers value. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2024-12-19 03:58:00 +01:00
redbeard	6b064c92b4	docs: Fix HIP (née hipBLAS) in README (#10880 ) Related to #10524 / `be0e350c` references to hipBLAS have been removed across the repository. This fixes the link from the repositories `README.md`. Signed-off-by: Brian 'redbeard' Harrington <redbeard@dead-city.org>	2024-12-18 10:35:00 +02:00
Ruan	4f51968aca	readme : update typos (#10863 )	2024-12-17 11:47:20 +02:00
Valentin Mamedov	a0974156f3	llama : add Deepseek MoE v1 & GigaChat models (#10827 ) * Add deepseek v1 arch & gigachat template * improve template code * add readme * delete comments * remove comment * fix format * lint llama.cpp * fix order of deepseek and deepseek2, move gigachat temlate to the end of func * fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need * remove comments * move deepseek above deepseek2 * change placement of gigachat chat template	2024-12-15 19:02:46 +02:00

1 2 3 4 5 ...

459 Commits