llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-13 06:23:34 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	952feedfca	context : disable encoder embd tensor for now ggml-ci	2025-02-27 15:07:10 +02:00
Georgi Gerganov	4efe989886	context : pass embeddings tensor from encoder to decoder ggml-ci	2025-02-25 16:11:17 +02:00
Georgi Gerganov	e2b3294f2c	context : fix enc-dec state save/load ggml-ci	2025-02-25 12:14:34 +02:00
Georgi Gerganov	e5bc5f8e02	context : enc-dec is now working ggml-ci	2025-02-25 12:10:34 +02:00
Georgi Gerganov	be58e30017	enc-dec : compose wip ggml-ci	2025-02-24 18:12:24 +02:00
Georgi Gerganov	9cd78f11a1	context : explicit llama_context_i abstract interface ggml-ci	2025-02-24 13:38:11 +02:00
Georgi Gerganov	4a1054b552	context : reuse built_attn_mha ggml-ci	2025-02-24 11:29:52 +02:00
Georgi Gerganov	a5a85a3bc0	context : fix recurrent reserve ggml-ci	2025-02-24 08:59:12 +02:00
Georgi Gerganov	0699a44c83	context : remove redundant virtual, protected -> private ggml-ci	2025-02-23 20:02:11 +02:00
Georgi Gerganov	6378112cb5	graph : remove the build_kv_... API from llama_graph_i ggml-ci	2025-02-23 19:39:22 +02:00
Georgi Gerganov	372fa3a894	cont : enc should work now, next is dec ggml-ci	2025-02-23 12:20:23 +02:00
Georgi Gerganov	f5e80208c5	wip enc-dec	2025-02-21 19:17:47 +02:00
Georgi Gerganov	c4c0a4d13c	Merge branch 'master' into gg/llama-kv-cache	2025-02-21 19:14:07 +02:00
Georgi Gerganov	51f311e057	llama : skip loading unused tensors (#12004 ) * llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci b4753	2025-02-21 18:33:18 +02:00
Georgi Gerganov	3753b30d65	context : fix n_outputs init ggml-ci	2025-02-21 15:53:26 +02:00
Georgi Gerganov	f588a70da3	context : wrap input tensors in struct ggml-ci	2025-02-21 15:09:28 +02:00
Georgi Gerganov	ebf1bdf97b	context : add logs ggml-ci	2025-02-21 14:35:23 +02:00
Johannes Gäßler	586d5fe6eb	doc: update contributing guidelines [no ci] (#11969 )	2025-02-21 12:51:25 +01:00
Georgi Gerganov	548c230dff	graph : remove worst_case from the API ggml-ci	2025-02-21 13:29:25 +02:00
PureJourney	ecc8e3aeff	CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984 ) * CUDA: correct the lowest Maxwell supported by CUDA 12 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> b4751	2025-02-21 12:21:05 +01:00
Georgi Gerganov	2645a7d9a9	context : add save/load for recurrent context ggml-ci	2025-02-21 10:28:42 +02:00
Bodhi	0b3863ff95	MUSA: support ARM64 and enable dp4a .etc (#11843 ) * MUSA: support ARM64 and enable __dp4a .etc * fix cross entropy loss op for musa * update * add cc info log for musa * add comment for the MUSA .cc calculation block --------- Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>	2025-02-21 09:46:23 +02:00
Alex Brooks	ee02ad02c5	clip : fix visual encoders with no CLS (#11982 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> b4749	2025-02-21 08:11:03 +02:00
Georgi Gerganov	08011c2ca1	context : add llama_kv_cache_recurrent prototype ggml-ci	2025-02-20 20:55:13 +02:00
momonga	c392e5094d	server (webui): Fix Premature Submission During IME Conversion (#11971 ) * fix skip ime composing * fix npm rebuild * fix warn --------- Co-authored-by: momonga <115213907+mmnga@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-20 19:43:22 +01:00
Georgi Gerganov	ad870c49f4	context : fix causal input for cache-less case ggml-ci	2025-02-20 20:01:02 +02:00
Georgi Gerganov	b1554be1d7	context : add cache-less llama_context ggml-ci	2025-02-20 18:30:04 +02:00
Charles Xu	c5d91a7400	ggml-cpu: Add CPU backend support for KleidiAI library (#11390 ) * ggml-cpu: Add CPU backend support for KleidiAI library * Add environmental variable GGML_KLEIDIAI_SME * Add support for multithread LHS conversion * Switch kernel selection order to dotprod and i8mm * updates for review comments * More updates for review comments * Reorganize and rename KleidiAI files * Move ggml-cpu-traits.h to source file * Update cmake for SME build and add alignment for SME * Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list b4747	2025-02-20 15:06:51 +02:00
Georgi Gerganov	072280ea6b	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-20 14:26:43 +02:00
Prashant Vithule	4806498bf1	ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917 ) * Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file * Improved Formating of code in ggml-cpu-quants.c file * style : minor fixes * style : less whitespaces * style : ptr spaceing --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b4746	2025-02-20 12:08:32 +02:00
Michael Engel	0d559580a0	run : add --chat-template-file (#11961 ) Relates to: https://github.com/ggml-org/llama.cpp/issues/11178 Added --chat-template-file CLI option to llama-run. If specified, the file will be read and the content passed for overwriting the chat template of the model to common_chat_templates_from_model. Signed-off-by: Michael Engel <mengel@redhat.com> b4745	2025-02-20 10:35:11 +02:00
Johannes Gäßler	d04e7163c8	doc: add links to ggml examples [no ci] (#11958 )	2025-02-19 20:45:17 +01:00
Georgi Gerganov	f95b04a21c	model : fix order kvq -> qkv ggml-ci	2025-02-19 18:52:20 +02:00
Georgi Gerganov	2eacb4c1bf	graph : simplify attention api ggml-ci	2025-02-19 18:43:49 +02:00
Georgi Gerganov	e17e4b72d1	context : add llama_context_recurrent ggml-ci	2025-02-19 16:07:27 +02:00
Georgi Gerganov	5f11a5502a	kv-cache : remove llama_kv_cache_i	2025-02-19 14:36:27 +02:00
Daniel Bevenius	d07c621393	common : add llama.vim preset for Qwen2.5 Coder (#11945 ) This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: https://github.com/ggml-org/llama.cpp/issues/10932 b4743	2025-02-19 12:29:52 +01:00
Georgi Gerganov	abd4d0bc4f	speculative : update default params (#11954 ) * speculative : update default params * speculative : do not discard the last drafted token b4742	2025-02-19 13:29:42 +02:00
Daniel Bevenius	9626d9351a	llama : fix indentation in llama-grammar [no ci] (#11943 ) This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.	2025-02-19 06:16:23 +01:00
igardev	b58934c183	server : (webui) Enable communication with parent html (if webui is in iframe) (#11940 ) * Webui: Enable communication with parent html (if webui is in iframe): - Listens for "setText" command from parent with "text" and "context" fields. "text" is set in inputMsg, "context" is used as hidden context on the following requests to the llama.cpp server - On pressing na Escape button sends command "escapePressed" to the parent Example handling from the parent html side: - Send command "setText" from parent html to webui in iframe: const iframe = document.getElementById('askAiIframe'); if (iframe) { iframe.contentWindow.postMessage({ command: 'setText', text: text, context: context }, ''); } - Listen for Escape key from webui on parent html: // Listen for escape key event in the iframe window.addEventListener('keydown', (event) => { if (event.key === 'Escape') { // Process case when Escape is pressed inside webui } }); Move the extraContext from storage to app.context. * Fix formatting. * add Message.extra * format + build * MessageExtraContext * build * fix display * rm console.log --------- Co-authored-by: igardev <ivailo.gardev@akros.ch> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-18 23:01:44 +01:00
Georgi Gerganov	f5cedbcaaa	kv-cache : prepare for abstraction ggml-ci	2025-02-18 21:28:58 +02:00
Olivier Chafik	63e489c025	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 ) * tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type * addressed clang-tidy lints in [test-]chat.* * rm minja deps from util & common & move it to common/minja/ * add name & tool_call_id to common_chat_msg * add common_chat_tool * added json <-> tools, msgs conversions to chat.h * fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens) * fix deepseek r1 slow test (no longer <think> opening w/ new template) * allow empty tools w/ auto + grammar * fix & test server grammar & json_schema params w/ & w/o --jinja b4739	2025-02-18 18:03:23 +00:00
Xuan-Son Nguyen	63ac128563	server : add TEI API format for /rerank endpoint (#11942 ) * server : add TEI API format for /rerank endpoint * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix * also gitignore examples/server/*.gz.hpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b4738	2025-02-18 14:21:41 +01:00
Georgi Gerganov	2bffc2d514	model : pass llama_graph_i as ptr ggml-ci	2025-02-18 14:57:26 +02:00
Georgi Gerganov	9e50456e19	context : minor simplify ggml-ci	2025-02-18 14:53:02 +02:00
Georgi Gerganov	befe14f06f	llama : reorder encode/decode in sources	2025-02-18 14:47:53 +02:00
Georgi Gerganov	bc6f187e9c	cont : use returend tensors from the graph build ggml-ci	2025-02-18 14:24:17 +02:00
Georgi Gerganov	172f61690c	cont : return important tensors ggml-ci	2025-02-18 13:48:43 +02:00
Georgi Gerganov	c23590319a	graph : add llama_graph_result ggml-ci	2025-02-18 13:48:21 +02:00
MoonRide303	5137da7b8c	scripts: corrected encoding when getting chat template (#11866 ) (#11907 ) Signed-off-by: MoonRide303 <moonride303@gmail.com>	2025-02-18 10:30:16 +01:00

1 2 3 4 5 ...

4841 Commits