llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-27 12:05:03 +00:00

Author	SHA1	Message	Date
anzz1	c86ba036e6	Enable ANSI colors on Windows 10+ (#311 ) * Enable ANSI colors on Windows 10+ On older versions function will silently fail without any ill effects * Do not call SetConsoleMode if the mode is already set * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-21 18:14:46 +02:00
tjohnman	d5f56a5e5a	Check for reverse prompt by characters instead of tokens (#292 ) (#330 ) * Check for reverse prompt by characters instead of tokens (#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-21 18:04:43 +02:00
Georgi Gerganov	3bfa3b43b7	Fix convert script, warnings alpaca instructions, default params	2023-03-21 17:59:16 +02:00
anzz1	975d2cebf9	cmdline option for custom amount of model parts (--n_parts N) (#348 ) * cmdline option for custom amount of model parts (--n_parts N) * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-21 17:42:43 +02:00
Georgi Gerganov	eb34620aec	Add tokenizer test + revert to C++11 (#355 ) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now	2023-03-21 17:29:41 +02:00
Mack Straight	a791a68b61	move file magic/version to header, print expected version (#319 )	2023-03-20 19:26:01 +00:00
Mack Straight	074bea2eb1	sentencepiece bpe compatible tokenizer (#252 ) * potential out of bounds read * fix quantize * style * Update convert-pth-to-ggml.py * mild cleanup * don't need the space-prefixing here rn since main.cpp already does it * new file magic + version header field * readme notice * missing newlines Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>	2023-03-20 03:17:23 -07:00
cocktailpeanut	da5303c1ea	bugfix: default should not be interactive (#304 )	2023-03-19 23:44:20 +02:00
Rickey Bowers Jr	5c19c70ba6	fix coloring of last `n_batch` of prompt, and refactor line input (#221 ) * fix coloring of last `n_batch` of prompt, and refactor line input * forgot the newline that needs to be sent to the model * (per #283) try to force flush of color reset in SIGINT handler	2023-03-19 19:44:30 +00:00
tjohnman	24568371ae	Support for multiple reverse prompts. (#299 ) Co-authored-by: Johnman <> Co-authored-by: Johnman <tjohnman@github>	2023-03-19 21:33:06 +02:00
tjohnman	ad5fd5b60c	Make prompt randomization optional. (#300 ) Co-authored-by: Johnman <>	2023-03-19 20:36:19 +02:00
tjohnman	368d0c8a9e	Respect the maximum number of tokens in interactive. (#298 ) Co-authored-by: Johnman <johnman@github> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 20:31:17 +02:00
slaren	50fae10d03	Add --ignore-eos parameter (#181 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 20:22:48 +02:00
Qingyou Meng	084e2f0ec0	interactive mode: print '\n' in sigint_handler, this flush stdout thus ensure color reset. (#283 )	2023-03-19 20:10:00 +02:00
Erik Scholz	0b366e7357	Command line switch to use F16 for memory_k and memory_v (refactor of #154 ) (#294 ) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>	2023-03-19 19:57:00 +02:00
Georgi Gerganov	c494ed5b94	Fix off-by-one bug (#115 )	2023-03-19 19:46:32 +02:00
Georgi Gerganov	70f01cb863	Drop trailing new line from file prompts (#80 )	2023-03-19 19:05:04 +02:00
Georgi Gerganov	9e1707218a	Add "--instruct" argument for usage with Alpaca (#240 ) Also start adding prompts in "./prompts"	2023-03-19 18:37:02 +02:00
Ronsor	d7def1a752	Warn user if a context size greater than 2048 tokens is specified (#274 ) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.	2023-03-18 20:10:47 -04:00
Alex Nguyen	d3f202d57b	Remove unused code since n_vocab is model.hparams.n_vocab (#262 )	2023-03-18 13:51:49 +00:00
Justin Suess	e03e359730	fixed warning with std::ignore about unused function result (#151 ) fixed warning with std::ignore about unused function result	2023-03-18 11:44:09 +00:00
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	2023-03-17 21:05:58 +01:00
hoangmit	6eac39ba95	Add RMS norm and use it (#187 ) * add ggml_rms_norm * update op num	2023-03-16 00:41:38 +02:00
Rickey Bowers Jr	2d15d6c9a9	add SIGINT support for _WIN32 environments (#120 ) * add SIGINT support for _WIN32 environments * perhaps more consistent	2023-03-15 21:56:24 +02:00
Justin Suess	2d64715ad4	added ctx_size parameter (#148 ) * added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:42:40 +02:00
Justin Suess	16b2c61a22	fixed color reset on exit (#149 ) * fixed color reset on exit * added sigint handler for ansi_color_reset * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:39:38 +02:00
Georgi Gerganov	4497ad819c	Print system information	2023-03-13 19:15:08 +02:00
Pavol Rusnak	671d5cac15	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	2023-03-13 18:39:56 +02:00
uint256_t	63fd76fbb0	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:33:43 +02:00
Val Kharitonov	2a20f48efa	Fix UTF-8 handling (including colors) (#79 )	2023-03-13 18:24:18 +02:00
Matvey Soloviev	a169bb889c	Gate signal support on being on a unixoid system. (#74 )	2023-03-13 04:08:01 +01:00
Matvey Soloviev	460c482540	Fix token count accounting	2023-03-13 01:04:41 +01:00
Matvey Soloviev	404fac0d62	Fix color getting reset before prompt output done (#65 ) (cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)	2023-03-13 00:07:34 +02:00
Matvey Soloviev	96ea727f47	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	2023-03-12 23:13:28 +02:00
beiller	02f0c6fe7f	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 22:23:15 +02:00
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
beiller	129c7d1ea8	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 11:27:42 +02:00
Georgi Gerganov	7d9ed7b25f	Bump memory buffer	2023-03-11 12:45:01 +02:00
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Georgi Gerganov	70bc0b8b15	Fix a bug in the rope calculation	2023-03-10 23:46:57 +02:00
Georgi Gerganov	319cdb3e1f	Final touches	2023-03-10 21:50:46 +02:00
Georgi Gerganov	26c0846629	Initial release	2023-03-10 20:56:40 +02:00

42 Commits