llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-30 04:45:17 +00:00

Author	SHA1	Message	Date
comex	563cdc391d	Support calling mlock() on loaded model data on Linux and macOS (#453 ) * Support calling mlock() on loaded model data on Linux and macOS This is enabled by a new --mlock command line option. Using mlock() disables swapping and memory compression for the model data. Doing so can be useful on systems where the model takes up a large fraction of system RAM. In my experience, macOS is quite eager to start compressing llama.cpp's memory, which then makes it halt for a few seconds while it decompresses, even with a model that uses "only" 25GB out of 32GB. Of course, this comes at the cost of forcing the system to swap or compress other processes' memory instead, so it needs to be used with care and shouldn't be enabled by default. In theory it should be possible to support this on Windows as well using VirtualLock(), but I'm not much of a Windows user. * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-24 17:19:05 +02:00
Luciano	8d4a855c24	Add embedding mode with arg flag. Currently working (#282 ) * working but ugly * add arg flag, not working on embedding mode * typo * Working! Thanks to @nullhook * make params argument instead of hardcoded boolean. remove useless time check * start doing the instructions but not finished. This probably doesnt compile * Embeddings extraction support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-24 17:05:13 +02:00
Georgi Gerganov	3cd8dde0d1	Revert "Fix memory allocation issues and seg faults" This reverts commit `4870e455b3`. Will provide the correct fix later	2023-03-24 06:22:28 +02:00
Georgi Gerganov	4870e455b3	Fix memory allocation issues and seg faults	2023-03-24 00:11:53 +02:00
Georgi Gerganov	483bab2e3d	Avoid the transposed X branch in the Z = X * Y matrix multiplication (#439 ) Should make results reproducible for different number of threads and batch sizes	2023-03-23 23:22:01 +02:00
Yusuf Kağan Hanoğlu	d5850c53ca	Add missing header for memcpy (#386 ) fixed: memcpy is not defined	2023-03-22 10:55:45 +02:00
Georgi Gerganov	928480ef5b	Init llama_context_params properly from CLI (#370 )	2023-03-22 07:45:14 +02:00
Georgi Gerganov	f5a77a629b	Introduce C-style API (#370 ) * Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning	2023-03-22 07:32:36 +02:00

1 2 3 4 5

208 Commits