Commit Graph

3285 Commits

Author SHA1 Message Date
a27152b602 fix: add missing short command line argument -mli for multiline-input (#8261) b3285 2024-07-02 22:56:46 +02:00
3e2618bc7b Adding step to clean target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257) b3284 2024-07-02 13:19:56 -04:00
07a3fc0608 Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) b3283 2024-07-02 12:18:10 -04:00
968967376d Add JAIS model(s) (#8118)
* Add `JAIS` model(s)

* cleanup

* address review comments

* remove hack

* un-hardcode max-alibi-bias

* minor tweaks

---------

Co-authored-by: fmz <quic_fzaghlou@quic.com>
b3282
2024-07-02 16:36:00 +02:00
023b8807e1 convert-hf : print output file name when completed (#8181)
* convert-hf : print output file name when completed

This commit adds the output file name to the log message when the
conversion is completed.

The motivation for this change is that when `--outfile` option is not
specified it migth not be obvious where the output file is written.

With this change the output of running the script will be something like
the following:
```console
INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf.
```

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! convert-hf : print output file name when completed

Updates the output of to support printing the directory if the output is
split into multiple files. Also the output file name is now retrieved
from the model_instance object.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! convert-hf : print output file name when completed

Use parent attribute of Path object and string interpolation.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! convert-hf : print output file name when completed

Use os.sep instead of hardcoding the path separator.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-02 09:40:49 +03:00
0e0590adab cuda : update supports_op for matrix multiplication (#8245) b3280 2024-07-02 09:39:38 +03:00
a9f3b10215 [SYCL] Fix win build conflict of math library (#8230)
* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16
b3279
2024-07-02 12:50:07 +08:00
d08c20edde [SYCL] Fix the sub group size of Intel (#8106)
* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size
b3278
2024-07-02 10:16:00 +08:00
5fac350b9c Fix gemma2 tokenizer convert (#8244)
* fix gemma2 tokenizer convert

* remove scores

* improve code, fix new line issue
2024-07-02 01:07:23 +02:00
cb5fad4c6c CUDA: refactor and optimize IQ MMVQ (#8215)
* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix
b3276
2024-07-01 20:39:06 +02:00
dae57a1ebc readme: add Paddler to the list of projects (#8239) 2024-07-01 20:13:22 +03:00
49122a873f gemma2: add sliding window mask (#8227)
* gemma2: add sliding window mask

* fix data_swa uninitialized

* better naming

* add co-author

Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>

* replace list with single tensor

* update

* llama : minor styling

* convert : add sanity check for query_pre_attn_scalar

* fix small typo in README

---------

Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3274
2024-07-01 18:48:34 +02:00
0ddeff1023 readme : update tool list (#8209)
* Added gppm to Tool list in README

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3273
2024-07-01 15:48:16 +03:00
3840b6f593 nix : enable curl (#8043)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 14:47:04 +03:00
257f8e41e2 nix : remove OpenCL remnants (#8235)
* nix : remove OpenCL remnants

* minor : remove parentheses
2024-07-01 14:46:18 +03:00
694c59cb42 Document BERT support. (#8205)
* Update README.md

document BERT support

* Update README.md
2024-07-01 13:40:58 +02:00
197fe6c1d7 [SYCL] Update SYCL-Rope op and Refactor (#8157)
* align with rope.cu and move sycl-op to a single file
b3269
2024-07-01 19:39:06 +08:00
d0a7145ba9 flake.lock: Update (#8218) b3268 2024-06-30 16:09:34 -07:00
9ef0780062 Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203)
* preserve new line llama_chat_format_single

* disable chat template if in-prefix/suffix is set

* remove redundant change
b3267
2024-06-30 20:27:13 +02:00
1c5eba6f8e llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)
* Add attention and final logit softcapping.

* fix

* Add custom add_ functions

* Disable flash attention for Gemma2

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add default value for attention and final logit softcap value

* Add custom kq scaling from Gemma2Attention

* Remove custom pre attention scaling and use computed value instead.

---------

Co-authored-by: slaren <slarengh@gmail.com>
b3266
2024-06-29 23:44:08 -04:00
72272b83a3 fix code typo in llama-cli (#8198) b3265 2024-06-29 00:14:20 +02:00
8748d8ac6f json: attempt to skip slow tests when running under emulator (#8189) b3264 2024-06-28 18:02:05 +01:00
26a39bbd6b Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal (#8172)
* tmp_contains

* minicpm chat template

* add DeepSeek Lite template

* change deepseek-lite to deepseek2

* correct code comment

* correct code from master branch
b3263
2024-06-28 15:11:44 +02:00
38373cfbab Add SPM infill support (#8016)
* add --spm-infill option

* support --spm-infill

* support --spm-infill
b3262
2024-06-28 12:53:43 +02:00
b851b3fba0 cmake : allow user to override default options (#8178) b3261 2024-06-28 12:37:45 +02:00
139cc621e9 json: restore default additionalProperties to false, fix some pattern escapes (#8180)
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset

* json: revert default of additionalProperties to false

* Update README.md
b3260
2024-06-28 09:26:45 +01:00
e57dc62057 llama: Add support for Gemma2ForCausalLM (#8156)
* Inference support for Gemma 2 model family

* Update convert-hf-to-gguf.py, constants, and tensor mappings

* cleanup

* format fix

* Fix special token vocab bug

* Don't add space prefix

* fix deleted lines

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add model type names

* Add control vector

* Fix model type identification

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
b3259
2024-06-27 21:00:43 -07:00
a27aa50ab7 Add missing items in makefile (#8177) b3258 2024-06-28 02:19:11 +02:00
cb0b06a8a6 json: update grammars/README w/ examples & note about additionalProperties (#8132)
* json: update grammars/README

* mention broken prefixItems

* add mention to llama-gbnf-validator

* json: explicit type: object for nested items object in cli example
2024-06-27 22:08:42 +01:00
558f44bf83 CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <loonerin@users.noreply.github.com>
b3256
2024-06-27 21:01:23 +02:00
8172ee9da9 cmake : fix deprecated option names not working (#8171)
* cmake : fix deprecated option names not working

* remove LlAMA_OPENMP
2024-06-27 20:04:39 +02:00
16791b8f0b Add chatml fallback for cpp llama_chat_apply_template (#8160)
* add chatml fallback for cpp `llama_chat_apply_template`

* remove redundant code
b3254
2024-06-27 18:14:19 +02:00
ab3679112d flake.lock: Update (#8071)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13)
  → 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Philip Taron <philip.taron@gmail.com>
2024-06-27 08:37:29 -07:00
97877eb10b Control vector loading fixes (#8137)
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow

* refactored `llama_control_vector_load_one()`

* allow multiple directions for same layer in same file

* llama_control_vector_load_one() and llama_control_vector_load() now break on error

* removed unnecessary ggml_free() call
b3252
2024-06-27 16:48:07 +02:00
387952651a Delete examples/llama.android/llama/CMakeLists.txt (#8165)
* Delete examples/llama.android/llama/CMakeLists.txt

https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-2194534244

This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead.

* Update CMakeLists.txt

Pick local llama.cpp files instead of fetching content from git
2024-06-27 16:39:29 +02:00
6030c61281 Add Qwen2MoE 57B-A14B model identifier (#8158)
* Add Qwen2MoE 57B-A14B

* Add Qwen2MoE 57B-A14B
b3250
2024-06-27 16:27:41 +02:00
85a267daaa CUDA: fix MMQ stream-k for --split-mode row (#8167) b3249 2024-06-27 16:26:05 +02:00
f675b20a3b Added support for Viking pre-tokenizer (#8135)
Co-authored-by: kustaaya <kustaaya@protonmail.com>
b3248
2024-06-27 10:58:54 +02:00
911e35bb8b llama : fix CodeLlama FIM token checks (#8144)
* account for space prefix character

* use find instead
2024-06-27 10:46:41 +03:00
ac146628e4 Fix llama-android.cpp for error - "common/common.h not found" (#8145)
- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"
b3246
2024-06-27 03:57:57 +02:00
9b31a40c6d clip : suppress unused variable warnings (#8105)
* clip : suppress unused variable warnings

This commit suppresses unused variable warnings for the variables e in
the catch blocks.

The motivation for this change is to suppress the warnings that are
generated on Windows when using the MSVC compiler. The warnings are
not displayed when using GCC because GCC will mark all catch parameters
as used.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! clip : suppress unused variable warnings

Remove e (/*e*/) instead instead of using GGML_UNUSED.

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
b3245
2024-06-27 01:50:09 +02:00
c70d117c37 scripts : fix filename sync 2024-06-26 23:25:22 +03:00
ae5d0f4b89 ci : publish new docker images only when the files change (#8142) b3243 2024-06-26 21:59:28 +02:00
31ec3993f6 ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140) b3242 2024-06-26 21:34:14 +02:00
c7ab7b612c make : fix missing -O3 (#8143) b3241 2024-06-26 21:20:22 +03:00
f2d48fffde sync : ggml b3240 2024-06-26 19:39:19 +03:00
4713bf3093 authors : regen 2024-06-26 19:36:44 +03:00
0e814dfc42 devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139)
ggml-ci
2024-06-26 19:32:07 +03:00
a95631ee97 readme : update API notes 2024-06-26 19:26:13 +03:00
f3f65429c4 llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00