Commit Graph

1505 Commits

Author SHA1 Message Date
d96ca7ded7 server : fix crash when prompt exceeds context size (#3996) b1505 2023-11-10 23:48:21 -06:00
34b0a08207 gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981)
* gguf-py: Refactor and add file reading support

* Replay changes from #3871

Credit to @cebtenzzre for that pull

* Various type annotation fixes.

* sort imports with isort (again)

* Fix missing return statement in add_tensor

* style cleanup with flake8

* fix NamedTuple and Enum usage

* Fix an issue with state init in GGUFReader

Move examples to an examples/ directory

Clean up examples

Add an example of modifying keys in a GGUF file

Update documentation with info on examples

Try to support people importing gguf/gguf.py directly

* Damagage is not a word.

* Clean up gguf-py/examples/modify_gguf.py whitespace

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Update gguf-py/examples/modify_gguf.py formatting

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Update gguf-py/gguf/gguf_reader.py type hint

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Make examples executable, formatting changes

* Add more information to GGUFReader and examples comments

* Include a gguf Python package version bump

* Add convert-gguf-endian.py script

* cleanup

* gguf-py : bump minor version

* Reorganize scripts

* Make GGUFReader endian detection less arbitrary

* Add JSON dumping support to gguf-dump.py

Which I kind of regret now

* A few for gguf-dump.py cleanups

* Murder accidental tuple in gguf-py/scripts/gguf-dump.py

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* cleanup

* constants : remove unneeded type annotations

* fix python 3.8 compat

* Set up gguf- scripts in pyproject.toml

* And include scripts/__init__.py, derp

* convert.py: We can't currently support Q8_0 on big endian.

* gguf-py: SpecialVocab: Always try available sources for special token ids

gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json

gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata
u

* cleanup

* Promote add_X_token to GGUF metadata for BOS and EOS

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-11 08:04:50 +03:00
4a4fd3eefa server : allow continue edit on completion mode (#3950)
* server : allow continue edit on completion mode

* server : handle abort case in runCompletion

* server : style improvement
b1503
2023-11-10 16:49:33 -06:00
df9d1293de Unbreak persimmon after #3837 (#4010) b1502 2023-11-10 14:24:54 +01:00
a75fa576ab scripts: Generalize convert scripts (#3838)
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
2023-11-09 11:09:29 +01:00
57ad015dc3 server : add min_p param (#3877)
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
b1500
2023-11-08 20:00:34 -06:00
875fb42871 ggml-alloc : fix backend assignments of views (#3982) b1499 2023-11-08 13:15:14 +01:00
0a7c980b6f gguf : track writer state, free unneeded tensors, cleanup (#3871) 2023-11-07 12:43:04 -05:00
413503d4b9 make : do not add linker flags when compiling static llava lib (#3977) b1497 2023-11-07 20:25:32 +03:00
e9c1cecb9d ggml : fix backward rope after YaRN (#3974)
* fix backward process of rope

rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.

* fix finetune rope call to use correct default attn_factor of 1.0f

* remove unused `ggml_rope_xpos_back`

it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

* fix comments explaining the sinus sign in ggml_forward_rope

* add missing function arguments in declaration

* fix function argument type in declaration
b1496
2023-11-07 10:04:51 +02:00
54b4df8886 Use params when loading models in llava-cli (#3976)
llava-cli was loading models with default params and ignoring settings
from the cli. This switches to a generic function to load the params
from the cli options.
b1495
2023-11-07 10:43:59 +03:00
46876d2a2c cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
* protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build

* doc: add comments to ggml_cublas_loaded()

* fix defined(...)
b1494
2023-11-07 08:49:08 +02:00
381efbf480 llava : expose as a shared library for downstream projects (#3613)
* wip llava python bindings compatibility

* add external llava API

* add base64 in-prompt image support

* wip refactor image loading

* refactor image load out of llava init

* cleanup

* further cleanup; move llava-cli into its own file and rename

* move base64.hpp into common/

* collapse clip and llava libraries

* move llava into its own subdir

* wip

* fix bug where base64 string was not removed from the prompt

* get libllava to output in the right place

* expose llava methods in libllama.dylib

* cleanup memory usage around clip_image_*

* cleanup and refactor *again*

* update headerdoc

* build with cmake, not tested (WIP)

* Editorconfig

* Editorconfig

* Build with make

* Build with make

* Fix cyclical depts on Windows

* attempt to fix build on Windows

* attempt to fix build on Windows

* Upd TODOs

* attempt to fix build on Windows+CUDA

* Revert changes in cmake

* Fix according to review comments

* Support building as a shared library

* address review comments

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
b1493
2023-11-07 00:36:23 +03:00
2833a6f63c ggml-cuda : fix f16 mul mat (#3961)
* ggml-cuda : fix f16 mul mat

ggml-ci

* silence common.cpp warning (bonus)
b1492
2023-11-05 18:45:16 +01:00
d9ccce2e33 Allow common process_escapes to handle \x sequences (#3928)
* Allow common process_escapes to handle \x sequences

* Fix edge case when second hex digit is NUL
b1491
2023-11-05 10:06:06 -07:00
bb60fd0bf6 server : fix typo for --alias shortcut from -m to -a (#3958) 2023-11-05 18:15:27 +02:00
132d25b8a6 cuda : fix disabling device with --tensor-split 1,0 (#3951)
Co-authored-by: slaren <slarengh@gmail.com>
b1489
2023-11-05 10:08:57 -05:00
3d48f42efc llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
as done in https://github.com/ggerganov/llama.cpp/pull/3827
b1488
2023-11-05 14:40:08 +02:00
Eve
c41ea36eaa cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
b1487
2023-11-05 10:03:09 +02:00
Eve
a7fac013cf ci : use intel sde when ci cpu doesn't support avx512 (#3949) b1486 2023-11-05 09:46:44 +02:00
48ade94538 cuda : revert CUDA pool stuff (#3944)
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de.

ggml-ci
b1485
2023-11-05 09:12:13 +02:00
f28af0d81a gguf-py: Support 01.AI Yi models (#3943) 2023-11-04 16:20:34 -06:00
d9b33fe95b metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) b1483 2023-11-03 21:18:18 +02:00
5ba3746171 ggml-metal: fix yarn rope (#3937) 2023-11-03 14:00:31 -04:00
abb77e7319 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) b1481 2023-11-03 12:13:09 +01:00
8f961abdc4 speculative : change default p_accept to 0.5 + CLI args (#3919)
ggml-ci
2023-11-03 09:41:56 +02:00
05816027d6 common : YAYF (yet another YARN fix) (#3925)
ggml-ci
2023-11-03 09:24:00 +02:00
3fdbe6b66b llama : change yarn_ext_factor placeholder to -1 (#3922) 2023-11-03 08:31:58 +02:00
629f917cd6 cuda : add ROCM aliases for CUDA pool stuff (#3918) b1477 2023-11-02 21:58:22 +02:00
51b2fc11f7 cmake : fix relative path to git submodule index (#3915) b1476 2023-11-02 21:40:31 +02:00
224e7d5b14 readme : add notice about #3912 2023-11-02 20:44:12 +02:00
c7743fe1c1 cuda : fix const ptrs warning causing ROCm build issues (#3913) b1474 2023-11-02 20:32:11 +02:00
d6069051de cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)
* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
b1473
2023-11-02 19:10:39 +02:00
4ff1046d75 gguf : print error for GGUFv1 files (#3908) b1472 2023-11-02 16:22:30 +02:00
21958bb393 cmake : disable LLAMA_NATIVE by default (#3906) b1471 2023-11-02 14:10:33 +02:00
2756c4fbff gguf : remove special-case code for GGUFv1 (#3901)
ggml-ci
b1470
2023-11-02 11:20:21 +02:00
1efae9b7dc llm : prevent from 1-D tensors being GPU split (#3697) b1469 2023-11-02 09:54:44 +02:00
b12fa0d1c1 build : link against build info instead of compiling against it (#3879)
* cmake : fix build when .git does not exist

* cmake : simplify BUILD_INFO target

* cmake : add missing dependencies on BUILD_INFO

* build : link against build info instead of compiling against it

* zig : make build info a .cpp source instead of a header

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>

* cmake : revert change to CMP0115

---------

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
b1468
2023-11-02 08:50:16 +02:00
4d719a6d4e cuda : check if this fixes Pascal card regression (#3882) b1467 2023-11-02 08:35:10 +02:00
183b3fac6c metal : fix build errors and kernel sig after #2268 (#3898) b1466 2023-11-02 08:33:37 +02:00
2fffa0d61f cuda : fix RoPE after #2268 (#3897) b1465 2023-11-02 07:49:44 +02:00
0eb332a10f llama : fix llama_context_default_params after #2268 (#3893) b1464 2023-11-01 19:29:14 -04:00
d02e98cde0 ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)
* ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel

* fix warnings
b1463
2023-11-01 23:10:09 +01:00
898aeca90a llama : implement YaRN RoPE scaling (#2268)
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>
b1462
2023-11-01 18:04:33 -04:00
c43c2da8af llm : fix llm_build_kqv taking unused tensor (benign, #3837) b1461 2023-11-01 23:08:30 +02:00
523e49b111 llm : fix falcon norm after refactoring (#3837) b1460 2023-11-01 23:00:50 +02:00
e16b9fa4ba metal : multi-simd softmax (#3710)
ggml-ci
b1459
2023-11-01 21:25:00 +02:00
ff8f9a88da common : minor (#3715) b1458 2023-11-01 21:15:55 +02:00
50337961a6 llm : add llm_build_context (#3881)
* llm : add llm_build_context

* llm : deduce norm eps based on type + explict max_alibi_bias, clamp_kqv

* llm : restore the non-graph llm_build_ functional API

ggml-ci

* llm : cleanup + comments
b1457
2023-11-01 20:11:02 +02:00
0e40806c1c common : allow caller to handle help/argument exceptions (#3715)
* Allow caller to handle help/argument exceptions

* Prepend newline to usage output

* Add new gpt_params_parse_ex function to hide arg-parse impl

* Fix issue blocking success case

* exit instead of returning false

* Update common/common.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/common.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b1456
2023-11-01 19:42:01 +02:00