llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-18 05:56:00 -04:00

Author	SHA1	Message	Date
Xuan Son Nguyen	624a683c6f	fix compile	2025-03-14 22:30:29 +01:00
Xuan Son Nguyen	116b9a1662	rename to init_from_text	2025-03-14 22:17:07 +01:00
Xuan Son Nguyen	eaffba0f2e	llama_batch_ext_ptr::from_text/embd	2025-03-14 17:12:03 +01:00
Xuan Son Nguyen	8e7714fa77	fix compile	2025-03-14 11:28:15 +01:00
Xuan Son Nguyen	a363251fac	qwen2vl: use llama_batch_ext_set_pos	2025-03-14 11:25:36 +01:00
Xuan Son Nguyen	ba79369615	fix llama_batch_ext_init_from_embd	2025-03-14 11:17:22 +01:00
Xuan Son Nguyen	07d84fa3c2	fix missing n_past in various places this is actually a revert of `cda0e4b648`	2025-03-14 10:47:08 +01:00
Xuan Son Nguyen	32940369d3	fix gemma3-cli	2025-03-14 10:33:28 +01:00
Xuan Son Nguyen	5e6a6d4e1c	fix llama-run n_past	2025-03-14 10:32:43 +01:00
Xuan Son Nguyen	bfdddbc150	bring back mistakenly deleted llama_batch_init/free	2025-03-14 00:22:28 +01:00
Xuan Son Nguyen	54566ad95d	correct comment	2025-03-14 00:21:06 +01:00
Xuan Son Nguyen	04f8641815	rm redundant llama_batch_ext_set_output_last	2025-03-13 23:14:16 +01:00
Xuan Son Nguyen	c3dd79007b	fix llama_batch_ext_init_from_text	2025-03-13 23:09:27 +01:00
Xuan Son Nguyen	65f0184517	compile ok	2025-03-13 22:56:35 +01:00
Xuan Son Nguyen	9fb2d81eab	fix common_batch missing seq_id	2025-03-13 22:38:04 +01:00
Xuan Son Nguyen	47086fa82d	apply to the rest	2025-03-13 22:36:27 +01:00
Xuan Son Nguyen	4aabf4e8f4	return output ID from llama_batch_ext_add/set	2025-03-13 17:47:07 +01:00
Xuan Son Nguyen	86973cb14a	fix merge errors	2025-03-13 17:32:36 +01:00
Xuan Son Nguyen	17f954c8e2	Merge branch 'master' into xsn/private_batch_api	2025-03-13 15:55:18 +01:00
Xuan-Son Nguyen	be7c303410	arg : no n_predict = -2 for examples except for main and infill (#12364 ) b4882	2025-03-13 12:34:54 +01:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Ishaan Gandhi	2048b5913d	server : fix crash when using verbose output with input tokens that are not in printable range (#12178 ) (#12338 ) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> b4880	2025-03-13 11:10:05 +01:00
Oscar Barenys	f08f4b3187	Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301 ) b4879	2025-03-12 20:06:58 +01:00
Daniel Bevenius	80a02aa858	llama.swiftui : fix xcframework dir in README [no ci] (#12353 ) This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.	2025-03-12 13:45:32 +01:00
Alberto Cabrera Pérez	363f8c5d67	sycl : variable sg_size support for mmvq kernels (#12336 ) b4877	2025-03-12 09:57:32 +00:00
uvos	34c961b181	CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315 ) When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need to avoid launching them with parameters for warp64 b4876	2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64 b4875	2025-03-12 09:30:24 +01:00
Jeff Bolz	bf69cfe62f	vulkan: fix bug in coopmat1 mul_mat_id (#12316 ) * tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id b4874	2025-03-12 06:59:19 +01:00
uvos	10f2e81809	CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177 ) refactor mmqv to unify the calculation of nwarps and rows per block between host and device code. --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> b4873	2025-03-11 20:16:03 +01:00
jklincn	ba7654380a	ggml-backend : fix backend search path (#12330 ) * Fix backend search path * replace .native() with '/' * reverted .native() b4872	2025-03-11 14:25:17 +01:00
BB-fat	6ab2e4765a	metal : Cache the Metal library at the device context level (#12265 ) b4871	2025-03-11 13:45:02 +02:00
Xuan-Son Nguyen	96e1280839	clip : bring back GPU support (#12322 ) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up b4870	2025-03-11 09:20:16 +01:00
Eve	2c9f833d17	mat vec double buffer (#12188 ) b4869	2025-03-10 19:28:11 +00:00
R0CKSTAR	251364549f	musa: support new arch mp_31 and update doc (#12296 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> b4868	2025-03-10 18:18:25 +01:00
Henry Linjamäki	8acdacb3ea	opencl: use OpenCL C standard supported by the device (#12221 ) This patch nudges the llama.cpp a bit to be supported on PoCL which doesn't support OpenCL C CL2.0. The issue is solved by querying the device for the supported OpenCL C versions and using the highest one available. b4867	2025-03-10 09:57:00 -07:00
John Bean	89b2b56e86	readme: added Sidekick to available UIs (#12311 )	2025-03-10 16:13:09 +02:00
Georgi Gerganov	e128a1bf5b	tests : fix test-quantize-fns to init the CPU backend (#12306 ) ggml-ci b4865	2025-03-10 14:07:15 +02:00
marcoStocchi	6ef79a67ca	common : refactor '-o' option (#12278 ) As discussed in PR 'llama-tts : add -o option' (#12042): * common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option. * cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program. b4864	2025-03-10 13:34:13 +02:00
Olivier Chafik	4e39a3c332	`server`: extract <think> tags from qwq outputs (#12297 ) * extract <think> tags from qwq outputs * const for all static regexes in chat.cpp b4863	2025-03-10 10:59:03 +00:00
Olivier Chafik	be421fc429	`tool-call`: ensure there's always a non-empty tool call id (#12292 )	2025-03-10 09:45:29 +00:00
Olivier Chafik	87c2630546	allow missing content in message if tool_calls provided (#12293 ) b4861	2025-03-10 09:45:07 +00:00
Olivier Chafik	2b3a25c212	`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 ) * Fix typo in lazy grammar handling (fixes trigger tokens) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b4860	2025-03-10 09:44:42 +00:00
tc-mb	8352cdc87b	llava : fix bug in minicpm-v code (#11513 ) * fix bug in minicpm-v code * update readme of minicpm-v b4859	2025-03-10 10:33:24 +02:00
Georgi Gerganov	1e2f78a004	server : add speculative decoding presets for FIM (#12287 )	2025-03-09 19:08:20 +02:00
Georgi Gerganov	0fd7ca7a21	authors : update (#12271 )	2025-03-08 18:26:00 +02:00
Jason C.H	6fefc05a7a	ggml-backend : make path_str compatible with C++20 (#12269 ) b4856	2025-03-08 17:02:39 +01:00
Georgi Gerganov	7ab364390f	server : infill gen ends on new line (#12254 ) b4855	2025-03-07 20:54:30 +02:00
Daniel Bevenius	7c7f3b7f43	ggml : skip intermediate .air file when compiling .metallib (#12247 ) This commit updates the compilation of default.metallib to skip the intermediate .air (Apple Intermediate Representation) file. The motivation for this change is to simplify the custom command a little and avoid generating and then removing the .air file. b4854	2025-03-07 14:15:27 +01:00
Georgi Gerganov	102ac1891d	sync : ggml ggml-ci b4853	2025-03-07 14:49:44 +02:00
vmobilis	d6ae2fa061	ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) * ggml_compute_forward_concat() for arbitrary tensor type * Check that tensors' type match * ggml-cpu.c: check type of source tensors * ggml-cpu.c: move tensor type check to ggml_compute_forward_concat() * ggml.c: check concatenated tensor type * Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c ..., as it was moved to ggml.c.	2025-03-07 14:49:44 +02:00

1 2 3 4 5 ...

4914 Commits