llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-29 12:35:16 +00:00

Author	SHA1	Message	Date
jaime-m-p	213701b51a	Detokenizer fixes (#8039 ) * Add llama_detokenize(): - Update header files location - UNKNOWN and CONTROL are 'special pieces' - Remove space after UNKNOWN and CONTROL - Refactor llama_token_to_piece() - Add flag: clean_up_tokenization_spaces - Symmetric params for llama_tokenize() and llama_detokenize() * Update and fix tokenizer tests: - Using llama_detokenize() - Unexpected vocab type as test fail instead of error - Useful when automating tests: - If you don't know in advance the vocab type - Differenciate other loading errors - Skip unicode surrogaes and undefined - Gracefully exit threads - Using exit() is throwing random exceptions - Clean old known problematic codepoints - Minor: confusing hexadecimal codepoint * Update bruteforce random tests - Add detokenizer checks - New generator: ascii_lr_strip - New generator: apostrophe - Add more vocabs files - Detokenize special tokens. - Replace errors with '\uFFFD' when detokenizing to 'utf-8' - More edge cases - Better detokenization results check * Fix add_space_prefix, set false by default * Better leading space removal * Do not remove space when decoding special tokens * Bugfix: custom regexs splits undefined unicode codepoints * 'viking' detokenizer clean spaces b3324	2024-07-05 19:01:35 +02:00
Xuan Son Nguyen	be20e7f49d	Reorganize documentation pages (#8325 ) * re-organize docs * add link among docs * add link to build docs * fix style * de-duplicate sections	2024-07-05 18:08:32 +02:00
Georgi Gerganov	7ed03b8974	llama : fix compile warning (#8304 ) b3322	2024-07-05 17:32:09 +03:00
Natsu	1d894a790e	cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281 )	2024-07-05 17:29:35 +03:00
Ouadie EL FAROUKI	1f3e1b66e2	Enabled more data types for oneMKL gemm_batch (#8236 )	2024-07-05 13:23:25 +01:00
Georgi Gerganov	148ec970b6	convert : remove AWQ remnants (#8320 )	2024-07-05 10:15:36 +03:00
Georgi Gerganov	2cccbaa008	llama : minor indentation during tensor loading (#8304 ) * llama : minor indentation during tensor loading ggml-ci * llama : use int for layer iterators [no ci]	2024-07-05 10:15:24 +03:00
Johannes Gäßler	8e558309dc	CUDA: MMQ support for iq4_nl, iq4_xs (#8278 ) b3317	2024-07-05 09:06:31 +02:00
Daniele	0a423800ff	CUDA: revert part of the RDNA1 optimizations (#8309 ) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s b3316	2024-07-05 09:06:09 +02:00
Douglas Hanley	d12f781074	llama : streamline embeddings from "non-embedding" models (#8087 ) b3315	2024-07-05 10:05:56 +03:00
Johannes Gäßler	bcefa03bc0	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311 ) b3314	2024-07-05 09:05:34 +02:00
Pieter Ouwerkerk	5a7447c569	readme : fix minor typos [no ci] (#8314 )	2024-07-05 09:58:41 +03:00
Daniel Bevenius	61ecafa390	passkey : add short intro to README.md [no-ci] (#8317 ) * passkey : add short intro to README.md [no-ci] This commit adds a short introduction to the README.md file in the examples/passkey directory. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * Update examples/passkey/README.md --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-05 09:14:24 +03:00
Georgi Gerganov	aa5898dc53	llama : prefer n_ over num_ prefix (#8308 ) b3311	2024-07-05 09:10:03 +03:00
Georgi Gerganov	6c05752c50	contributing : update guidelines (#8316 )	2024-07-05 09:09:47 +03:00
luoyu-intel	a9554e20b6	[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266 ) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp b3309	2024-07-05 13:06:13 +08:00
Georgi Gerganov	e235b267a2	py : switch to snake_case (#8305 ) * py : switch to snake_case ggml-ci * cont ggml-ci * cont ggml-ci * cont : fix link * gguf-py : use snake_case in scripts entrypoint export * py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-05 07:53:33 +03:00
Neo Zhang Jianyu	f09b7cb609	rm get_work_group_size() by local cache for performance (#8286 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com> b3307	2024-07-05 10:32:29 +08:00
Xuan Son Nguyen	a38b884c6c	cli: add EOT when user hit Ctrl+C (#8296 ) * main: add need_insert_eot * do not format system prompt if it is empty b3306	2024-07-04 20:55:03 +02:00
Icecream95	d7fd29fff1	llama : add OpenELM support (#7359 ) * Initial OpenELM support (270M only so far) * Fill out missing entries in llama_model_type_name * fixup! Initial OpenELM support (270M only so far) Fix formatting * llama : support all OpenELM models * llama : add variable GQA and variable FFN sizes Some metadata keys can now also be arrays to support setting their value per-layer for models like OpenELM. * llama : minor spacing changes Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : use std::array for per-layer hparams * llama : fix save/load state * llama : do not print hparams for vocab-only models * llama : handle n_head == 0 * llama : use const ref for print_f and fix division by zero * llama : fix t5 uses of n_head and n_ff * llama : minor comment --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b3305	2024-07-04 20:14:21 +03:00
Daniel Bevenius	6f63d646c1	tokenize : add --show-count (token) option (#8299 ) This commit adds a new option to the tokenize example, --show-count. When this is set the total number of tokens are printed to stdout. This was added as an option as I was concerned that there might be scripts that use the output from this program and it might be better to not print this information by default. The motivation for this is that can be useful to find out how many tokens a file contains, for example when trying to determine prompt input file sizes for testing. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> b3304	2024-07-04 19:38:58 +03:00
ditsuke	51d2ebadbb	build: Export hf-to-gguf as snakecase b3303	2024-07-04 15:39:13 +00:00
ditsuke	1e920018d3	doc: Add context for why we add an explicit pytorch source	2024-07-04 15:39:13 +00:00
ditsuke	01a5f06550	chore: Remove rebase artifacts	2024-07-04 15:39:13 +00:00
ditsuke	07786a61a2	chore: Fixup requirements and build	2024-07-04 15:39:13 +00:00
ditsuke	de14e2ea2b	chore: ignore all __pychache__	2024-07-04 15:39:13 +00:00
ditsuke	821922916f	fix: Update script paths in CI scripts	2024-07-04 15:39:13 +00:00
ditsuke	b1c3f26e5e	fix: Actually include scripts in build Not namespaced though :(	2024-07-04 15:39:13 +00:00
ditsuke	b0a46993df	build(python): Package scripts with pip-0517 compliance	2024-07-04 15:39:13 +00:00
fairydreaming	807b0c49ff	Inference support for T5 and FLAN-T5 model families (#5763 ) * llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b3295	2024-07-04 15:46:11 +02:00
Daniel Bevenius	f8c4c0738d	tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231 ) This commit adds the compile definition `_CRT_SECURE_NO_WARNINGS` to the root cmake subproject. The motivation for this is that currently the following warnings are displayed when compiling the tests and common cmake subprojects: ```console test-llama-grammar.cpp C:\llama.cpp\src\.\llama.cpp(1406,77): warning C4996: 'strerror': This function or variable may be unsafe. Consider using strerror_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [C:\llama.cpp\build\tests\test-llama-grammar.vcxproj] ... ``` This compile definition is currently set for the `src` subproject and this change moves into the root cmake project so that it is applied to all cmake subprojects. b3294	2024-07-04 13:53:42 +03:00
Daniel Bevenius	402d6feffa	llama : suppress unref var in Windows MSVC (#8150 ) * llama : suppress unref var in Windows MSVC This commit suppresses two warnings that are currently generated for src/llama.cpp when building on Windows MSVC ```console C:\llama.cpp\src\llama.cpp(14349,45): warning C4101: 'ex': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] C:\llama.cpp\src\llama.cpp(19285,44): warning C4101: 'e': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] ``` * Update src/llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b3293	2024-07-04 13:50:57 +03:00
Georgi Gerganov	20fc3804bf	convert : fix gemma v1 tokenizer convert (#8248 ) ggml-ci b3292	2024-07-04 10:41:03 +03:00
AidanBeltonS	f619024764	[SYCL] Remove unneeded semicolons (#8280 ) b3291	2024-07-04 09:07:19 +08:00
Daniele	d23287f122	Define and optimize RDNA1 (#8085 ) b3290	2024-07-04 01:02:58 +02:00
slaren	5f2d4e60e2	ppl : fix n_seq_max for perplexity (#8277 ) * ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence b3289	2024-07-03 20:33:31 +03:00
Xuan Son Nguyen	916248af1f	fix phi 3 conversion (#8262 )	2024-07-03 16:01:54 +02:00
Judd	f8d6a23804	fix typo (#8267 ) Co-authored-by: Judd <foldl@boxvest.com> b3287	2024-07-03 14:40:16 +02:00
AidanBeltonS	fadde67135	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values b3286	2024-07-03 09:55:34 +08:00
MistApproach	a27152b602	fix: add missing short command line argument -mli for multiline-input (#8261 ) b3285	2024-07-02 22:56:46 +02:00
Clint Herron	3e2618bc7b	Adding step to `clean` target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809 . (#8257 ) b3284	2024-07-02 13:19:56 -04:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 ) b3283	2024-07-02 12:18:10 -04:00
Faisal Zaghloul	968967376d	Add `JAIS` model(s) (#8118 ) * Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com> b3282	2024-07-02 16:36:00 +02:00
Daniel Bevenius	023b8807e1	convert-hf : print output file name when completed (#8181 ) * convert-hf : print output file name when completed This commit adds the output file name to the log message when the conversion is completed. The motivation for this change is that when `--outfile` option is not specified it migth not be obvious where the output file is written. With this change the output of running the script will be something like the following: ```console INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Updates the output of to support printing the directory if the output is split into multiple files. Also the output file name is now retrieved from the model_instance object. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use parent attribute of Path object and string interpolation. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use os.sep instead of hardcoding the path separator. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-02 09:40:49 +03:00
slaren	0e0590adab	cuda : update supports_op for matrix multiplication (#8245 ) b3280	2024-07-02 09:39:38 +03:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16 b3279	2024-07-02 12:50:07 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size b3278	2024-07-02 10:16:00 +08:00
Xuan Son Nguyen	5fac350b9c	Fix gemma2 tokenizer convert (#8244 ) * fix gemma2 tokenizer convert * remove scores * improve code, fix new line issue	2024-07-02 01:07:23 +02:00
Johannes Gäßler	cb5fad4c6c	CUDA: refactor and optimize IQ MMVQ (#8215 ) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix b3276	2024-07-01 20:39:06 +02:00
Mateusz Charytoniuk	dae57a1ebc	readme: add Paddler to the list of projects (#8239 )	2024-07-01 20:13:22 +03:00

1 2 3 4 5 ...

3324 Commits