Commit Graph

3734 Commits

Author SHA1 Message Date
d7c042d1ae ggml : make n_threads_cur atomic_int 2024-09-11 21:12:11 +03:00
1b28061400 llama : skip token bounds check when evaluating embeddings (#9437) b3733 2024-09-11 17:52:13 +02:00
8db003a19d py : support converting local models (#7547)
* Support of converting local models added to convert-hf-to-gguf-update.py

* Description fixed

* shutil added to imports
2024-09-11 15:29:51 +03:00
0996c5597f llava : correct args for minicpmv-cli (#9429) b3731 2024-09-11 12:59:13 +02:00
5bb2c5dbd2 files : remove accidentally added lora_test submodule (#9430) 2024-09-11 13:02:09 +03:00
67155ab7f5 feat: Implements retrying logic for downloading models using --model-url flag (#9255)
* feat: Implements retrying logic for downloading models using --model-url flag

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* apply comments

* implements a retry function to avoid duplication

* fix editorconfig

* change function name

---------

Co-authored-by: farbod <farbod.bjary82@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b3729
2024-09-11 11:22:37 +02:00
5af118efda CUDA: fix --split-mode row race condition (#9413) b3728 2024-09-11 10:22:40 +02:00
d2b496bff4 batched-bench : remove unused code (#9305) b3727 2024-09-11 10:03:54 +03:00
b34e023480 musa: remove Clang builtins mapping (#9421)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b3726
2024-09-11 03:46:55 +02:00
51b6038636 sycl : update support conditions (#9394)
* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
b3725
2024-09-11 08:53:42 +08:00
cb9c933eb2 flake.lock: Update (#9360)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
  → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'a5d394176e.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01)
  → '356624c120.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)
  → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-10 15:46:59 -07:00
6cd4e03444 arg : bring back missing ifdef (#9411)
* arg : bring back missing ifdef

* replace with llama_supports_gpu_offload
b3723
2024-09-10 22:41:29 +02:00
8d300bd35f enable --special arg for llama-server (#9419)
Co-authored-by: matteo serva <matteo.serva@gmail.com>
b3722
2024-09-10 22:40:59 +02:00
49006c67b4 llama : move random seed generation to the samplers (#9398)
* llama_sampler_penalties : clamp penalty_last_n to zero
b3721
2024-09-10 18:04:25 +02:00
00ba2ff781 metal : fix compile warning with GGML_METAL_NDEBUG (#0) b3720 2024-09-10 10:17:43 +03:00
83008b7cfe llama : update llm_build_copy_mask_state comment [no ci] (#9385)
This commit updates the comment, which seems to contain a typo or be an
outdated comment, in the copy_mask_state function changing the variable
n_rs to n_kv.

I believe this change is correct and what the comment wants to
convey is to copy the states that are not going to be used in the
upcoming processing, which are the tokens states from n_seqs up to
the number of possible token states n_kv.
2024-09-10 10:03:21 +03:00
0b4ac75772 RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
b3718
2024-09-10 10:02:30 +03:00
fb3f249815 make : do not run llama-gen-docs when building (#9399) b3717 2024-09-10 09:23:33 +03:00
bfe76d4a17 common : move arg parser code to arg.cpp (#9388)
* common : move arg parser to arg.cpp

* better categorize args

* add cmake

* missing climits

* missing cstdarg

* common : more explicit includes

* fix build

* refactor gpt_params_parse

* update server readme

* fix test

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3716
2024-09-09 23:36:09 +02:00
293bebe077 rpc : fix segfault with nkvo (#9389)
* rpc : fix nkvo

* rpc : buf_size must not be static

ref: #9337

---------

Co-authored-by: slaren <slarengh@gmail.com>
b3715
2024-09-09 18:40:10 +03:00
5fac4d5764 ggml : vector length agnostic SVE support (#9290)
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Removed WhiteSpaces

* ggml : style changes + fix 512-bit nb loop check

- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts

* Update ggml/src/ggml-quants.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3714
2024-09-09 18:37:18 +03:00
5fb5e24811 llama : minor sampling refactor (2) (#9386) b3713 2024-09-09 17:10:46 +02:00
38ca6f644b readme : update hot topics 2024-09-09 15:51:37 +03:00
8e6e2fbe14 CUDA: fix variable name conflict for Windows build (#9382) b3711 2024-09-09 14:22:53 +02:00
5ed087573e readme : add LLMUnity to UI projects (#9381)
* add LLMUnity to UI projects

* add newline to examples/rpc/README.md to fix editorconfig-checker unit test
2024-09-09 14:21:38 +03:00
54f376d0b9 rpc : update README [no ci] (#9320)
Update README with instructions how to offload model layers to both
local and remote devices
2024-09-09 11:04:39 +03:00
b2e89a3274 Arm AArch64: Documentation updates (#9321)
* Arm AArch64: Documentation updates

* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels

* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats

* Add newline to the end of docs/build.md
2024-09-09 10:02:45 +03:00
daa9623ab0 Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118)
* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
b3707
2024-09-08 21:43:48 +02:00
e079bffb66 cuda : fix FA Q src index (1 -> 0) (#9374) b3706 2024-09-08 22:01:02 +03:00
3f7ccfd649 common : bring back missing args, add env var duplication check (#9375)
* common : bring back missing args

* move duplication check to test-arg-parser

* add check for duplicated env var

* correct default values
b3705
2024-09-08 18:08:55 +02:00
a249843d89 common : restore --n-gpu-layers (#9371) b3704 2024-09-08 16:44:42 +02:00
19f4a7b296 llama : refactor samplers internal implementation (#9370) b3703 2024-09-08 15:52:07 +02:00
2a358fb0c4 [SYCL] add check malloc result on device (#9346)
* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
b3702
2024-09-08 19:05:29 +08:00
eae597182c llama : sanitize tokens in the upper bound (#9359) b3701 2024-09-08 12:41:51 +02:00
00b02bb249 imatrix : fix arg parser for imatrix (#9366)
* imatrix : fix arg parser

* beautify printing first arg
b3700
2024-09-08 12:12:17 +02:00
a876861455 metal : update support condition for im2col + fix warning (#0) b3699 2024-09-08 11:05:55 +03:00
385decbd63 sync : ggml 2024-09-08 11:05:55 +03:00
60a3107ccd scripts : option to increase git patch context 2024-09-08 11:05:55 +03:00
406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947)
sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946)
test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
202084d31d tests: add gradient tests for all backends (ggml/932)
* tests: add gradient checking to test-backend-ops

* remove old comment

* reorder includes

* adjust SIN/COS parameters

* add documentation, use supports_op if possible
2024-09-08 11:05:55 +03:00
dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) 2024-09-08 11:05:55 +03:00
ba1cf846ed cann : fix doxy (ggml/0) 2024-09-08 11:05:55 +03:00
d2d3200b38 cann : add Ascend NPU support (whisper/2336)
* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp
2024-09-08 11:05:55 +03:00
51d964a4ef cuda : mark BF16 CONT as unsupported 2024-09-08 11:05:55 +03:00
efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)
* ggml_cont: fix issue with transposed tensors when one dimension is 1

when using multiple threads, it is not enough
to check for the tensors to be contiguous for
ggml_compute_forward_dup_same_cont to work correctly.
The tensors strides also need to match.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Add ggml_cont tests

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Remove dead code

it isn't possible to reach this code because
all these functions are invoked by ggml_compute_forward_dup
if and only if src0->type != dst->type

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Make ggml_compute_forward_dup_same_cont work with contiguous tensors

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

---------

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-08 11:05:55 +03:00
fbb7fcffbc llama : set attrs of mislabelled EOT/EOM tokens (#9348) b3688 2024-09-08 08:51:00 +03:00
a5b5d9a101 llama.android : fix build (#9350) b3687 2024-09-08 00:33:50 +03:00
f12295b8a9 llama : fix empty ring buffer push (#9358) b3686 2024-09-08 00:33:33 +03:00
faf69d4237 llama : sanitize invalid tokens (#9357)
* common : do not add null tokens during warmup

ggml-ci

* llama : check that the input tokens are valid

ggml-ci

* tests : fix batch size of bert model

ggml-ci
b3685
2024-09-08 00:33:13 +03:00