llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-29 12:35:16 +00:00

Author	SHA1	Message	Date
Johannes Gäßler	df984e0147	llama: refactor llama_decode_impl (#11381 ) b4565	2025-01-27 12:07:12 +01:00
Ihar Hrachyshka	acd38efee3	metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441 ) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise). b4564	2025-01-27 09:41:59 +02:00
Xuan Son Nguyen	caf773f249	docker : fix ARM build and Vulkan build (#11434 ) * ci : do not fail-fast for docker * build arm64/amd64 separatedly * fix pip * no fast fail * vulkan: try jammy	2025-01-26 22:45:32 +01:00
Georgi Gerganov	178a7eb952	metal : use residency sets (#11427 ) * metal : use residency sets ggml-ci * metal : restore commandBufferWithUnretainedReferences calls [no ci] * metal : release descriptors ggml-ci * metal : check env GGML_METAL_NO_RESIDENCY ggml-ci * metal : fix build + clean-up ggml-ci b4562	2025-01-26 20:06:16 +02:00
Nuno	6f53d8a6b4	docker: add missing vulkan library to base layer and update to 24.04 (#11422 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-26 18:22:43 +01:00
bandoti	19f65187cb	cmake: add ggml find package (#11369 ) * Add initial ggml cmake package * Add build numbers to ggml find-package * Expand variables with GGML_ prefix * Guard against adding to cache variable twice * Add git to msys2 workflow * Handle ggml-cpu-* variants * Link ggml/ggml-base libraries to their targets * Replace main-cmake-pkg with simple-cmake-pkg * Interface features require c_std_90 * Fix typo * Removed unnecessary bracket from status message * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b4560	2025-01-26 12:07:48 -04:00
Frank Mai	1d8ee06000	rpc: fix register position (#11424 ) Signed-off-by: thxCode <thxcode0824@gmail.com> b4559	2025-01-26 16:20:34 +01:00
Georgi Gerganov	2cc9b8c32c	readme : update hot topics	2025-01-26 14:30:15 +02:00
Jeff Bolz	f35726c2fb	build: apply MSVC /bigobj option to c/cpp files only (#11423 ) b4557	2025-01-26 03:10:03 +01:00
Jeff Bolz	4a75d19376	vulkan: compile shaders on-demand (#11406 ) Reduce first-run startup time and memory consumption. Should fix #11339.	2025-01-25 22:29:57 +01:00
uvos	26771a1491	Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420 )	2025-01-25 21:01:12 +01:00
Jeff Bolz	ca6baf76c1	build: add /bigobj to MSVC build (#11407 )	2025-01-25 11:26:37 -06:00
Diego Devesa	6e264a905b	docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419 )	2025-01-25 17:22:41 +01:00
Xuan Son Nguyen	49b0e3cec4	server : fix cleaning up stream task (#11418 ) * server : fix cleaning up stream task * one more spot b4552	2025-01-25 16:36:44 +01:00
Diego Devesa	20a758155b	docker : fix CPU ARM build (#11403 ) * docker : fix CPU ARM build * add CURL to other builds	2025-01-25 15:22:29 +01:00
Georgi Gerganov	00c24acb2a	ci : fix line breaks on windows builds (#11409 ) * ci : fix line breaks on windows builds * cont : another try * ci : fix powershell line breaks b4550	2025-01-25 13:36:48 +02:00
jiahao su	466ea66f33	CANN: Add Ascend CANN build ci (#10217 ) * CANN: Add Ascend CANN build ci * Update build.yml * Modify cann image version * Update build.yml * Change to run on x86 system * Update build.yml * Update build.yml * Modify format error * Update build.yml * Add 'Ascend NPU' label restrictions * Exclude non PR event Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> * Update build.yml --------- Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> b4549	2025-01-25 00:26:01 +01:00
uvos	5f0db9522f	hip : Add hipGraph and VMM support to ROCM (#11362 ) * Add hipGraph support * Enable VMM on rocm b4548	2025-01-25 00:02:23 +01:00
Johannes Gäßler	c5d9effb49	CUDA: fix FP16 cuBLAS GEMM (#11396 ) b4547	2025-01-24 21:02:43 +01:00
uvos	9fbadaef4f	rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356 ) b4546	2025-01-24 17:50:49 +01:00
Georgi Gerganov	9755129c27	release : pack /lib in the packages (#11392 ) * release : pack /lib and /include in the packages * cmake : put libs in /bin * TMP : push artifacts * Revert "TMP : push artifacts" This reverts commit `4decf2c4df`. * ci : fix HIP cmake compiler options to be on first line * ci : restore the original HIP commands * ci : change ubuntu build from latest to 20.04 * ci : try to fix macos build rpaths * ci : remove obsolete MacOS build * TMP : push artifacts * ci : change back to ubuntu latest * ci : macos set build rpath to "@loader_path" * ci : fix typo * ci : change ubuntu package to 22.04 * Revert "TMP : push artifacts" This reverts commit `537b09e70f`. b4545	2025-01-24 18:41:30 +02:00
Jafar Uruç	a07c2c8a52	docs : Update readme to build targets for local docker build (#11368 )	2025-01-24 14:30:13 +01:00
Johannes Gäßler	8137b4bb2b	CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380 ) b4543	2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann	1af6945eb0	cmake : avoid -march=native when reproducible build is wanted (#11366 ) See https://reproducible-builds.org/ for why this is good and https://reproducible-builds.org/specs/source-date-epoch/ for the definition of this variable. Without this patch, compiling on different machines produced different binaries, which made verification of results difficult. Fixes: #11317 This patch was done while working on reproducible builds for openSUSE. b4542	2025-01-24 13:21:35 +02:00
Eric Curtin	01f37edf1a	Update llama-run README.md (#11386 ) For consistency Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-24 09:39:24 +00:00
stduhpf	c07e87f38b	server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364 ) * webui : put DeepSeek R1 CoT in a collapsible <details> element * webui: refactor split * webui: don't use regex to split cot and response * webui: format+qol * webui: no loading icon if the model isn't generating * ui fix, add configs * add jsdoc types * only filter </think> for assistant msg * build * update build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-24 09:02:38 +01:00
Jeff Bolz	564804b79b	tests: fix some mul_mat test gaps (#11375 ) Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types. b4539	2025-01-23 14:51:24 -06:00
Eric Curtin	05f63cc9ee	Update documentation (#11373 ) To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <ecurtin@redhat.com> b4538	2025-01-23 20:04:31 +00:00
Eric Curtin	f7fb43cd0b	Add -ngl (#11372 ) Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <ecurtin@redhat.com> b4537	2025-01-23 16:16:18 +00:00
Xuan Son Nguyen	5845661640	server : add more clean up when cancel_tasks is called (#11340 ) * server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if b4536	2025-01-23 13:56:05 +01:00
Eric Curtin	f211d1dc10	Treat hf.co/ prefix the same as hf:// (#11350 ) ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <ecurtin@redhat.com> b4535	2025-01-23 10:38:20 +00:00
amd-dwang	955a6c2d91	Vulkan-run-test: fix mmq_wg_denoms (#11343 ) There should be a copy-and-paste error here. mmq_wg_denoms should be used together with warptile_mmq, instead of wg_denoms. b4534	2025-01-23 08:14:28 +01:00
Jeff Bolz	1971adf55e	vulkan: sort shaders for more deterministic binary (#11315 ) Fixes #11306. b4533	2025-01-23 08:07:50 +01:00
Jeff Bolz	5245729e33	vulkan: fix diag_mask_inf (#11323 ) With robustbufferaccess disabled, this shader was showing OOB stores. There is a bounds check in the code, but the workgrouop dimensions were reversed vs CUDA and it was running the wrong number of threads. So fix the workgroup dimensions and disable robustness for this pipeline. b4532	2025-01-23 08:01:17 +01:00
Diego Devesa	6152129d05	main : update README documentation for batch size (#11353 ) * main : update README documentation for batch size * fix formatting * minor	2025-01-22 19:22:20 +01:00
Georgi Gerganov	16d3df7ab0	readme : add plugin links (#11355 )	2025-01-22 19:44:26 +02:00
Diego Devesa	12c2bdf2de	server : fix draft context not being released (#11354 ) b4529	2025-01-22 17:44:40 +01:00
Olivier Chafik	c64d2becb1	`minja`: sync at `0f5f7f2b37` (#11352 ) b4528	2025-01-22 16:16:27 +00:00
Jiří Podivín	96f4053934	Adding logprobs to /v1/completions (#11344 ) Signed-off-by: Jiri Podivin <jpodivin@redhat.com> b4527	2025-01-22 12:51:32 +01:00
Olivier Chafik	a94f3b2727	`common`: utils to split / join / repeat strings (from json converter) (#11342 ) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp b4526	2025-01-22 09:51:44 +00:00
tc-mb	3e3357fd77	llava : support Minicpm-omni (#11289 ) * init * add readme * update readme * no use make * update readme * update fix code * fix editorconfig-checker * no change convert py * use clip_image_u8_free b4525	2025-01-22 09:35:48 +02:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b4524	2025-01-21 13:18:51 +00:00
Xuan Son Nguyen	e28245f35f	export-lora : fix tok_embd tensor (#11330 ) b4523	2025-01-21 14:07:12 +01:00
Radoslav Gerganov	6da5bec81c	rpc : better caching of the base buffer pointer (#11331 ) There is no need to use map, just store the base pointer in the buffer context. b4522	2025-01-21 15:06:41 +02:00
Eric Curtin	2e2f8f093c	linenoise.cpp refactoring (#11301 ) More RAII mainly Signed-off-by: Eric Curtin <ecurtin@redhat.com> b4521	2025-01-21 09:32:35 +00:00
Georgi Gerganov	2139667ec4	metal : fix out-of-bounds write (#11314 ) ggml-ci b4520	2025-01-21 08:48:13 +02:00
Georgi Gerganov	80d0d6b4b7	common : add -hfd option for the draft model (#11318 ) * common : add -hfd option for the draft model * cont : fix env var * cont : more fixes b4519	2025-01-20 22:29:43 +02:00
Jeff Bolz	aea8ddd516	vulkan: fix coopmat2 validation failures (#11284 ) mul mat and flash attention shaders were loading f32 types directly into A/B matrices, which happens to work but is technically invalid usage. For FA, we can load it as an Accumulator matrix and convert and this is not in the inner loop and is cheap enough. For mul mat, it's more efficient to do this conversion in a separate pass and have the input(s) be f16. coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3. b4518	2025-01-20 10:38:32 -06:00
Georgi Gerganov	9f7add1cde	examples : fix add_special conditions (#11311 )	2025-01-20 16:36:08 +02:00
Christopher Nielsen	90d987b105	mmap: add include for cerrno (#11296 ) ggml-ci Co-authored-by: Xuan Son Nguyen <son@huggingface.co> b4516	2025-01-20 16:02:43 +02:00

1 2 3 4 5 ...

4565 Commits