llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-28 13:20:27 -04:00

Author	SHA1	Message	Date
Michelle Tan	89d604f2c8	server: Fix `has_next_line` in JSON response (#10818 ) * Update server JSON response. * Add unit test to check `has_new_line` JSON response * Remove `has_new_line` unit test changes. * Address code review comment: type check for `has_new_line` in unit test	2024-12-14 23:29:45 +01:00
cduk	56eea0781c	Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771 ) Signed-off-by: Charles Darke <s.cduk@toodevious.com> Co-authored-by: Charles Darke <s.cduk@toodevious.com>	2024-12-13 23:21:49 +01:00
Xuan Son Nguyen	adffa6ffd5	common : improve -ctv -ctk CLI arguments (#10806 ) * common : improve ctv ctk cli argument * regenerate docs * even better approach * use std::vector	2024-12-12 22:53:05 +01:00
CentricStorm	5555c0c1f6	docs: update server streaming mode documentation (#9519 ) Provide more documentation for streaming mode.	2024-12-11 23:40:40 +01:00
Xuan Son Nguyen	235f6e14bf	server : (UI) add tok/s, get rid of completion.js (#10786 ) * get rid of completion.js * extract chat bubble to a component * add tok/s info * sync * fix BASE_URL * only extract timings when it's enabled * fix auto scroll	2024-12-11 20:52:14 +01:00
kallewoof	484d2f31ae	bug-fix: snprintf prints NULL in place of the last character (#10419 ) * bug-fix: snprintf prints NULL in place of the last character We need to give snprintf enough space to print the last character and the null character, thus we allocate one extra byte and then ignore it when converting to std::string. * add comment about extra null-term byte requirement	2024-12-11 14:48:04 +01:00
CentricStorm	4b4d92b098	docs: fix server documentation formatting (#10776 )	2024-12-11 11:47:43 +01:00
Yüg	a86ad841f1	server : add flag to disable the web-ui (#10762 ) (#10751 ) Co-authored-by: eugenio.segala <esegala@deloitte.co.uk>	2024-12-10 18:22:34 +01:00
Xuan Son Nguyen	ce8784bdb1	server : fix format_infill (#10724 ) * server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req	2024-12-08 23:04:29 +01:00
Xuan Son Nguyen	e52522b869	server : bring back info of final chunk in stream mode (#10722 ) * server : bring back into to final chunk in stream mode * clarify a bit * traling space	2024-12-08 20:38:51 +01:00
Xuan Son Nguyen	3573fa8e7b	server : (refactor) no more json in server_task input (#10691 ) * server : (refactor) no more json in server_task input * add test for slots endpoint * add tests for /props and /slots * remove task inf_type * fix CI by adding safe_json_to_str * add "model_path" to /props * update readme	2024-12-07 20:21:09 +01:00
Georgi Gerganov	ce4a7b8493	server : various fixes (#10704 ) * server : various fixes ggml-ci * server : show curent seed in slot_params ggml-ci * fix /slots endpoint * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : reflect endpoint response changes in the readme ggml-ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-07 18:02:05 +02:00
Georgi Gerganov	c2a16c0bdb	server : fix free of spec context and batch (#10651 ) ggml-ci	2024-12-07 11:52:44 +02:00
Xuan Son Nguyen	6c5bc0625f	server : (refactoring) do not rely on JSON internally (#10643 ) * server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs	2024-12-06 11:14:32 +01:00
Plamen Minev	7736837d62	fix(server) : not show alert when DONE is received (#10674 )	2024-12-05 22:36:41 +01:00
Georgi Gerganov	1da7b76569	server : fix speculative decoding with context shift (#10641 ) * server : fix speculative decoding with context shift ggml-ci * server : take into account speculative limits ggml-ci * server : add tests	2024-12-04 22:38:20 +02:00
Xuan Son Nguyen	91c36c269b	server : (web ui) Various improvements, now use vite as bundler (#10599 ) * hide buttons in dropdown menu * use npm as deps manager and vite as bundler * fix build * fix build (2) * fix responsive on mobile * fix more problems on mobile * sync build * (test) add CI step for verifying build * fix ci * force rebuild .hpp files * cmake: clean up generated files pre build	2024-12-03 19:38:44 +01:00
Nikolaos Pothitos	82bca2257b	readme : add option, update default value, fix formatting (#10271 ) * readme : document --no-display-prompt * readme : update default prompt context size * readme : remove unnecessary indentation Indenting a line with four spaces makes Markdown treat that section as plain text. * readme : indent commands under bullets * readme : indent commands in lettered list	2024-12-03 12:50:08 +02:00
Georgi Gerganov	70b98fadbc	server : fix default draft model parameters (#10586 ) * server : force F16 KV cache for the draft model ggml-ci * server : fix draft params ggml-ci * server : various params fixes ggml-ci	2024-12-03 11:20:00 +02:00
Xuan Son Nguyen	642330ac7c	llama : add enum for built-in chat templates (#10623 ) * llama : add enum for supported chat templates * use "built-in" instead of "supported" * arg: print list of built-in templates * fix test * update server README	2024-12-02 22:10:19 +01:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
haopeng	64ed2091b2	server: Add "tokens per second" information in the backend (#10548 ) * add cmake rvv support * add timings * remove space * update readme * fix * fix code * remove empty line * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-02 14:45:54 +01:00
alek3y	86dc11c5bc	server : bind to any port when specified (#10590 )	2024-12-01 13:33:12 +02:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Xuan Son Nguyen	b782e5c7d4	server : add more test cases (#10569 ) * server : add split model test * add test speculative * add invalid cases	2024-11-29 21:48:56 +01:00
Xuan Son Nguyen	6c59567689	server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568 ) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3	2024-11-28 19:17:49 +01:00
Xuan Son Nguyen	9f912511bc	common : fix duplicated file name with hf_repo and hf_file (#10550 )	2024-11-27 22:30:52 +01:00
Xuan Son Nguyen	45abe0f74e	server : replace behave with pytest (#10416 ) * server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt	2024-11-26 16:20:18 +01:00
Georgi Gerganov	84e1c33cde	server : fix parallel speculative decoding (#10513 ) ggml-ci	2024-11-26 13:36:40 +02:00
Georgi Gerganov	47f931c8f9	server : enable cache_prompt by default (#10501 ) ggml-ci	2024-11-25 21:50:07 +02:00
Diego Devesa	10bce0450f	llama : accept a list of devices to use to offload a model (#10497 ) * llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency	2024-11-25 19:30:06 +01:00
brucepro	a9a678a6b2	Add download chat feature to server chat (#10481 ) * Add download chat feature to server chat Add a download feature next to the delete chat feature in the server vue chat interface. * code style --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-25 17:11:55 +01:00
Georgi Gerganov	9ca2e67762	server : add speculative decoding support (#10455 ) * server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci	2024-11-25 16:31:38 +02:00
Georgi Gerganov	d9d54e498d	speculative : refactor and add a simpler example (#10362 ) * speculative : refactor and add a simpler example ggml-ci * speculative : clean-up and add comments and TODOs [no ci] * speculative : manage context in common_speculative ggml-ci * speculative : simplify ggml-ci * speculative : simplify (cont) ggml-ci * speculative : add --draft-min CLI arg * speculative : minor fixup * make : build fixes * speculative : do not redraft previous drafts ggml-ci * speculative : fix the draft sampling ggml-ci * speculative : fix compile warning * common : refactor args ggml-ci * common : change defaults [no ci] * common : final touches ggml-ci	2024-11-25 09:58:41 +02:00
Johannes Gäßler	4e54be0ec6	llama/ex: remove --logdir argument (#10339 )	2024-11-16 23:00:41 +01:00
MaggotHATE	bcdb7a2386	server: (web UI) Add samplers sequence customization (#10255 ) * Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-16 14:26:54 +01:00
Xuan Son Nguyen	9901068ac7	server : (web UI) add copy button for code block, fix api key (#10242 ) * server : (web ui) add copy btn for code blocks * fix problem with api key * use settings-modal-short-input component * always show copy btn for code snippet	2024-11-15 10:48:49 +01:00
Alexey Parfenov	ff7fb670d0	server : add missing docs (#10269 )	2024-11-13 13:16:30 +02:00
Jhen-Jie Hong	0e712a5acb	server : fix incorrect res in validate_model_chat_template (#10272 ) * server : fix validate_model_chat_template * server : fix chat res	2024-11-13 13:15:23 +02:00
Georgi Gerganov	b141e5f6ef	server : enable KV cache defrag by default (#10233 ) ggml-ci	2024-11-11 08:38:43 +02:00
MaggotHATE	505f33274d	server : (web UI) Add back sampler settings (#10239 ) * Add back samplers to server * Added tooltips with basic information * Fixed stretching of input fields. * use component for settings input, move help msg to tooltips --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-10 15:42:25 -04:00
Xuan Son Nguyen	76c6e7f105	server : minor UI fix (#10207 )	2024-11-07 18:44:38 -04:00
Xuan Son Nguyen	a71d81cf8c	server : revamp chat UI with vuejs and daisyui (#10175 ) * server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)	2024-11-07 17:31:10 -04:00
Georgi Gerganov	b11f9ba9b8	server : remove hack for extra parallel slot (#10187 ) ggml-ci	2024-11-06 13:29:01 +02:00
Xuan Son Nguyen	9e0ecfb697	server : clarify /slots endpoint, add is_processing (#10162 ) * server : clarify /slots endpoint, add is_processing * fix tests	2024-11-04 16:33:29 +01:00
sasha0552	42cadc74bd	server : fix slot selection by lru (#10126 ) * server : fix slot selection by lru, migrate lcs to `size_t` * minor debug log fix	2024-11-02 18:34:56 +02:00
Georgi Gerganov	45950415ed	server : fix endpoint checks (#10135 ) ggml-ci	2024-11-02 18:34:00 +02:00
sasha0552	d865d1478c	server : fix smart selection of available slot (#10120 ) * Fix smart selection of available slot * minor fix * replace vectors of tokens with shorthands	2024-11-01 14:33:14 +01:00
Kevin Gibbons	0a683e8088	server : include scheme when printing URL (#10106 )	2024-10-31 14:02:35 +01:00
Georgi Gerganov	8d8ff71536	llama : remove Tail-Free sampling (#10071 ) ggml-ci	2024-10-29 10:42:05 +02:00

1 2 3 4 5 ...

573 Commits