llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-19 00:57:41 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	b8b173274d	server : remove old commented code [no ci]	2025-03-20 18:20:54 +02:00
Georgi Gerganov	8a23b4a54a	server : avoid common_batch ggml-ci	2025-03-20 16:52:24 +02:00
Georgi Gerganov	76fd7d6f5b	perplexity : avoid common_batch ggml-ci	2025-03-20 12:28:39 +02:00
Georgi Gerganov	8b80d68338	embedding : avoid common_batch ggml-ci	2025-03-19 14:29:04 +02:00
Georgi Gerganov	6f54ee660c	retrieval : avoid common_batch ggml-ci	2025-03-19 13:50:15 +02:00
Xuan Son Nguyen	32c2c41d5e	android : fix permission	2025-03-19 10:49:30 +01:00
Georgi Gerganov	96ca6e8d23	swift : adapt to new API	2025-03-19 10:48:42 +02:00
Georgi Gerganov	b0db7fc2c6	android : adapt to new API	2025-03-19 10:16:55 +02:00
Georgi Gerganov	7a3c178d78	speculative : adapt to new llama API ggml-ci	2025-03-18 22:05:44 +02:00
Xuan Son Nguyen	dc4bb64290	Merge branch 'master' into xsn/private_batch_api	2025-03-18 15:45:22 +01:00
Georgi Gerganov	810e0af3f5	server : fix warmup draft cache type (#12446 ) ggml-ci	2025-03-18 12:05:42 +02:00
Sigbjørn Skjæret	60c902926c	docs : bring llama-cli conversation/template docs up-to-date (#12426 )	2025-03-17 21:14:32 +01:00
Xuan-Son Nguyen	eab5606d7b	Apply suggestions from code review	2025-03-17 12:17:14 +01:00
Xuan-Son Nguyen	de788e071b	Update examples/tts/tts.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-17 12:05:23 +01:00
marcoStocchi	f4c3dd5daa	llama-tts : add '-o' option (#12398 ) * added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR #12042 is closed as superseded with this one.	2025-03-15 17:23:11 +01:00
Xuan Son Nguyen	116b9a1662	rename to init_from_text	2025-03-14 22:17:07 +01:00
Eric Curtin	9f2250ba72	Add CLI arg to llama-run to adjust the number of threads used (#12370 ) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-03-14 16:41:20 +00:00
Xuan Son Nguyen	eaffba0f2e	llama_batch_ext_ptr::from_text/embd	2025-03-14 17:12:03 +01:00
Xuan Son Nguyen	a363251fac	qwen2vl: use llama_batch_ext_set_pos	2025-03-14 11:25:36 +01:00
Victor	add2a3aa5a	server: fix "--grammar-file" parameter (#12285 )	2025-03-14 11:21:17 +01:00
Xuan Son Nguyen	ba79369615	fix llama_batch_ext_init_from_embd	2025-03-14 11:17:22 +01:00
Xuan Son Nguyen	07d84fa3c2	fix missing n_past in various places this is actually a revert of `cda0e4b648`	2025-03-14 10:47:08 +01:00
Xuan Son Nguyen	32940369d3	fix gemma3-cli	2025-03-14 10:33:28 +01:00
Xuan Son Nguyen	5e6a6d4e1c	fix llama-run n_past	2025-03-14 10:32:43 +01:00
Xuan Son Nguyen	04f8641815	rm redundant llama_batch_ext_set_output_last	2025-03-13 23:14:16 +01:00
Xuan Son Nguyen	c3dd79007b	fix llama_batch_ext_init_from_text	2025-03-13 23:09:27 +01:00
Xuan Son Nguyen	65f0184517	compile ok	2025-03-13 22:56:35 +01:00
Xuan Son Nguyen	47086fa82d	apply to the rest	2025-03-13 22:36:27 +01:00
Xuan Son Nguyen	4aabf4e8f4	return output ID from llama_batch_ext_add/set	2025-03-13 17:47:07 +01:00
Xuan Son Nguyen	17f954c8e2	Merge branch 'master' into xsn/private_batch_api	2025-03-13 15:55:18 +01:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Ishaan Gandhi	2048b5913d	server : fix crash when using verbose output with input tokens that are not in printable range (#12178 ) (#12338 ) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-03-13 11:10:05 +01:00
Daniel Bevenius	80a02aa858	llama.swiftui : fix xcframework dir in README [no ci] (#12353 ) This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.	2025-03-12 13:45:32 +01:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen	96e1280839	clip : bring back GPU support (#12322 ) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up	2025-03-11 09:20:16 +01:00
marcoStocchi	6ef79a67ca	common : refactor '-o' option (#12278 ) As discussed in PR 'llama-tts : add -o option' (#12042): * common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option. * cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.	2025-03-10 13:34:13 +02:00
Olivier Chafik	be421fc429	`tool-call`: ensure there's always a non-empty tool call id (#12292 )	2025-03-10 09:45:29 +00:00
Olivier Chafik	2b3a25c212	`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 ) * Fix typo in lazy grammar handling (fixes trigger tokens) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-10 09:44:42 +00:00
tc-mb	8352cdc87b	llava : fix bug in minicpm-v code (#11513 ) * fix bug in minicpm-v code * update readme of minicpm-v	2025-03-10 10:33:24 +02:00
Georgi Gerganov	7ab364390f	server : infill gen ends on new line (#12254 )	2025-03-07 20:54:30 +02:00
Sigbjørn Skjæret	8fad3c7a7c	server : Log original chat template parsing error (#12233 )	2025-03-07 11:15:33 +01:00
Aaron Teo	e9b2f84f14	llava: add big-endian conversion for image encoder (#12218 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-03-06 09:33:21 +01:00
Han Yin	57b6abf85a	android : fix KV cache log message condition (#12212 )	2025-03-06 08:22:49 +02:00
Olivier Chafik	669912d9a5	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 ) * sampler: turn lazy grammar trigger words to regexes * add scripts/tool_bench.sh & .py * constrain llama json output regardless of function name if matches at beginning * update relaxed newline space rule in grammar tests * support add_generation_prompt query parameter (useful for /apply_template) * Update src/llama-grammar.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-05 13:05:13 +00:00
Clauszy	06a92a193a	server : fix cache reuse logic (#12161 ) The first kv shift offsets the positions of all tokens after head_c. When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.	2025-03-05 09:25:45 +02:00
Daniel Bevenius	a057897ad4	llama : add xcframework build script (#11996 ) * llama : add xcframework build script This commit adds a script to build an XCFramework for Apple ios, macos, visionos, and tvos platforms. The generated XCFramework can then be added to a project and used in the same way as a regular framework. The llama.swiftui example project has been updated to use the XCFramework and can be started using the following command: ```console $ open examples/llama.swiftui/llama.swiftui.xcodeproj/ ``` Refs: https://github.com/ggml-org/llama.cpp/issues/10747 * examples : remove llama.cpp (source dir ref) from project.pbxproj This commit removes the reference to llama.cpp from the project.pbxproj file since Package.swift has been removed. * ci : updated build.yml to use build-xcframework.sh * ci : add xcframework build to github releases This commit adds the ability to create a GitHub release with the xcframework build artifact. * scripts : add apple app validation scripts This commit adds scripts that can validate the iOS, macOS, tvOS, and VisionOS applications. The scripts create a simple test app project, copy the llama.xcframework to the test project, build and archive the app, create an IPA from the archive, and validate the IPA using altool. The motivation for this is to provide some basic validation and hopefully avoid having to manually validate apps in Xcode. * llama : remove Package.swift This commit removes the Package.swift file, as we are now building an XCFramework for the project. * llama : remove Sources and spm-headers directories * llama : use TargetConditionals.h for visionOS/tvOS	2025-03-05 06:30:31 +01:00
mgroeber9110	5bbe6a9fe9	ggml : portability fixes for VS 2017 (#12150 ) * Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>	2025-03-04 18:53:26 +02:00
Sigbjørn Skjæret	56d7a9f812	main: allow preloading conversation with -p and add -st / --single-turn (#12145 ) * Add chat template formatting to -no-cnv * only enable prompt formatting if explicitly enabled * add -st / --single-turn * add --single-turn and -p in conversation mode * fix -sys + -p * reword warning * small readability change and fix (long) outdated example usage * only activate single turn in conversation mode	2025-03-04 12:19:39 -04:00
Olivier Chafik	1a24c4621f	`server`: fix deadly typo in response_format.json_schema.schema handling (#12168 )	2025-03-04 08:24:07 +02:00
dm4	c43af9276b	tts: add speaker file support (#12048 ) * tts: add speaker file support Signed-off-by: dm4 <sunrisedm4@gmail.com> * tts: handle outetts-0.3 * tts : add new line in error message --------- Signed-off-by: dm4 <sunrisedm4@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-03 15:09:29 +02:00

1 2 3 4 5 ...

1430 Commits