Commit Graph

3324 Commits

Author SHA1 Message Date
91f8ad167d Server: version bump for httplib and json (#6169)
* server: version bump for httplib and json

* fix build

* bring back content_length
b2474
2024-03-20 13:30:36 +01:00
6b7e76d28c gitignore : ignore curl-related files 2024-03-20 14:17:34 +02:00
bc0baab2ea server : allow to override -ngl in tests (#6170) 2024-03-20 14:14:32 +02:00
d795988d9e Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"
This reverts commit f8c4e745e1.
b2471
2024-03-20 13:29:49 +02:00
f8c4e745e1 llava : add a MobileVLM_V2-1.7B backup (#6152)
* Add MobileVLM_V2 backup

* Update MobileVLM-README.md

* Update examples/llava/MobileVLM-README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/llava/convert-image-encoder-to-gguf.py

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* clip :  fix whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 13:20:37 +02:00
47cc7a7bf9 Server: Handle n_keep parameter in the request (#6174) 2024-03-20 12:02:34 +01:00
bd60d82d0c server tests : more pythonic process management; fix bare except: (#6146)
* server tests : remove seemingly redundant newlines in print()

* server tests : use built-in subprocess features, not os.kill and psutil

* server tests : do not catch e.g. SystemExit; use print_exc

* server tests: handle TimeoutExpired exception

* server tests: fix connect on dual-stack systems

* server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127)

* server: tests: remove the hack on windows since now we get the good socket family

* server: tests: add new tokens regex following new repeat penalties default changed in (#6127)

* server: tests: add new tokens regex following new repeat penalties default changed in (#6127)

---------

Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>
b2468
2024-03-20 06:33:49 +01:00
6c0b287748 update readme sycl for new update (#6151)
* update readme sycl for new update

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

* update by review comments

* update w64devkit link

* update for verify device id part

* Update README-sycl.md

Co-authored-by: Meng, Hengyu <airdldl@163.com>

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-03-20 11:21:41 +08:00
d26e8b669d increase igpu cluster limit (#6159) b2466 2024-03-20 08:28:49 +05:30
d8b009a945 Remove undeed header file. (#6158) b2465 2024-03-19 17:16:09 +01:00
d0d5de42e5 gguf-split: split and merge gguf per batch of tensors (#6135)
* gguf-split: split and merge gguf files per tensor

* gguf-split: build with make toolchain

* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split

* split : minor style + fix compile warnings

* gguf-split: remove --upload not implemented

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-19 12:05:44 +01:00
b80cf3b2d1 common : disable repeat penalties by default (#6127) b2463 2024-03-19 10:21:54 +02:00
970a48060a ci : exempt some labels from being tagged as stale (#6140) b2462 2024-03-19 10:06:54 +02:00
4c28b82529 common : print usage on '-h' and '--help' (#6145) b2461 2024-03-19 07:59:36 +02:00
2d15886bb0 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06)
  → 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)
b2460
2024-03-18 18:51:30 +00:00
d199ca79f2 mpt : implement backwards compatiblity with duped output tensor (#6139) b2459 2024-03-18 12:49:02 -04:00
104f5e0fc1 clip : fix memory leak (#6138) b2458 2024-03-18 17:40:22 +02:00
5e1b7f94a0 backend : set max split inputs to GGML_MAX_SRC (#6137) b2457 2024-03-18 16:33:44 +01:00
ac9ee6a4ad ci : disable stale issue messages (#6126) b2456 2024-03-18 13:45:38 +02:00
4f6d1337ca ci : temporary disable sanitizer builds (#6128) b2455 2024-03-18 13:45:27 +02:00
2bf8d0f7c4 backend : offload large batches to GPU (#6083)
* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b2454
2024-03-18 11:03:04 +01:00
496bc79bc2 common : tidy-up argument parsing (#6105)
* Tidy-up argument parsing.

* Missing ref.

* common : minor

* common : add static classifier

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b2453
2024-03-18 10:27:44 +02:00
9b03719ad7 convert : add support for CamembertModel architecture (#6119)
Adding support for CamembertModel architecture used by :
https://huggingface.co/dangvantuan/sentence-camembert-large
2024-03-18 10:17:00 +02:00
3a6efdd03c convert : use f32 outtype for bf16 tensors (#6106)
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion.
Change the outtype to f32 to default to a lossless conversion.
2024-03-18 10:04:41 +02:00
d01b3c4c32 common: llama_load_model_from_url using --model-url (#6098)
* common: llama_load_model_from_url with libcurl dependency

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b2450
2024-03-17 19:12:37 +01:00
cd776c37c9 ci : close all stale issues at once (#6115) b2449 2024-03-17 18:51:57 +01:00
dc0f612548 ggml:fix finding transfer queue family index error (#6094)
Co-authored-by: GainLee <ligen@meizu.com>
b2448
2024-03-17 18:12:22 +01:00
c47cf414ef ggml : add AVX512F SIMD (#6088) b2447 2024-03-16 17:52:02 +02:00
b5f4ae09c3 gritlm : add initial README.md (#6086)
* gritlm: add initial README.md to examples/gritlm

This commit adds a suggestion for an initial README.md for the gritlm
example.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Use the `scripts/hf.sh` script to download the model file.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Fix editorconfig-checker error in examples/gritlm/README.md.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-03-16 17:46:29 +02:00
dfbfdd60f9 readme : add wllama as a wasm binding (#6100) 2024-03-16 17:42:08 +02:00
15961ec04d common : refactor nested if causing error C1061 on MSVC (#6101)
* Refactor nested if causing error C1061 on MSVC.

* Revert back and remove else's.

* Add flag to track found arguments.
b2444
2024-03-16 17:39:15 +02:00
a56d09a440 ci : close inactive issue with workflow (#6053)
* issues: ci - close inactive issue with workflow

* ci: close issue, change workflow schedule time
2024-03-16 14:20:53 +02:00
d84c48505f llama : fix Baichuan2 13B (#6092) 2024-03-15 23:14:16 +02:00
877b4d0c62 llama : add support for control vectors (#5970)
* control vector api and implementation

* control-vectors : minor code style updates

* disable control vector when data == nullptr

use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings)

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-15 22:43:02 +02:00
12247f4c69 llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
	https://huggingface.co/CohereForAI/c4ai-command-r-v01

Based on the llama2 model with a few changes:

1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
   self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used

Find GGUF files here:
	https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF

To convert model to GGUF format yourself:

1) Download Command-R Hugging Face safetensors:
	git lfs install
	git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01

2) Run:
	python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
b2440
2024-03-15 22:41:22 +02:00
4e9a7f7f7f llava : change API to pure C style for Rust FFI bindgen (#6079)
Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>
b2439
2024-03-15 16:31:05 +02:00
3020327f6c cuda : disable unused cudaLaunchHostFunc code (#6078) b2438 2024-03-15 14:24:03 +02:00
46acb36767 fix set main gpu error (#6073) b2437 2024-03-15 18:53:53 +08:00
131b058409 make : ggml-metal.o depends on ggml.h b2436 2024-03-15 11:38:40 +02:00
753e36f650 [SYCL] Fix non-intel device selection (#6042)
* Fix non-intel device selection

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
b2435
2024-03-15 14:56:20 +05:30
7ce2c77f88 gguf : add support for I64 and F64 arrays (#6062)
* gguf : add support for I64 and F64 arrays

GGML currently does not support I64 or F64 arrays and they are not often
used in machine learning, however if in the future the need arises, it
would be nice to add them now, so that the types are next to the other
types I8, I16, I32 in the enums, and it also reserves their type number.

Furthermore, with this addition the GGUF format becomes very usable for
most computational applications of NumPy (being compatible with the most
common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster,
and more versatile alternative to the `npz` format, and a simpler
alternative to the `hdf5` format.

The change in this PR seems small, not significantly increasing the
maintenance burden. I tested this from Python using GGUFWriter/Reader
and `gguf-dump`, as well as from C, everything seems to work.

* Fix compiler warnings
b2434
2024-03-15 10:46:51 +02:00
aab606a11f llama : add Orion chat template (#6066) b2433 2024-03-15 10:44:57 +02:00
b0bc9f4a9d llama-bench : use random tokens to improve accuracy with mixtral (#6069) b2432 2024-03-15 10:22:24 +02:00
4755afd1cb llama : fix integer overflow during quantization (#6063) b2431 2024-03-14 22:58:41 +02:00
6e0438da3c gguf : fix resource leaks (#6061)
There several places where a gguf context is allocated. A call to gguf_free
is missing in some error paths. Also on linux, llama-bench was missing a
fclose.
b2430
2024-03-14 20:29:32 +02:00
727107707a gguf-py : bump version to 0.8.0 (#6060) 2024-03-14 19:57:31 +02:00
69ff61397d llama : support models without vocabulary (#5798)
* additional methods to read model and ctx parameters

* vocab size as a part of a model metadata

* models without vocabulary, convert.py part

* models without vocabulary, llama.cpp part

* PR clean up

* converter scrypt fixes

* llama_vocab_type update (renamed the new key)

* pr review fixes

* revert function renaming

* one more NoVocab assert
b2428
2024-03-14 18:21:56 +02:00
044ec4b2a5 embedding : add EOS token if not present (#899) b2427 2024-03-14 15:14:14 +02:00
77178eedc8 gguf-py : fix dtype check (#6045) 2024-03-14 13:32:14 +02:00
15a333260a readme : improve readme for Llava-1.6 example (#6044)
Co-authored-by: Jian Liao <jianliao@adobe.com>
2024-03-14 13:18:23 +02:00