Commit Graph

3155 Commits

Author SHA1 Message Date
b8cb44e812 more llama-cli(.exe) 2024-06-10 16:08:06 +01:00
051633ed2d update dockerfile refs 2024-06-10 16:05:11 +01:00
1cc651446d rename(make): llama-baby-llama 2024-06-10 16:03:18 +01:00
0fcf2c328e rename dockerfile w/ llama-cli 2024-06-10 15:44:49 +01:00
0bb2a3f233 fix some missing -cli suffixes 2024-06-10 15:42:20 +01:00
daeaeb1222 Merge remote-tracking branch 'origin/master' into bins 2024-06-10 15:38:41 +01:00
5265c15d4c rename llama|main -> llama-cli; consistent RPM bin prefixes 2024-06-10 15:34:14 +01:00
fd5ea0f897 ci : try win-2019 on server windows test (#7854) 2024-06-10 15:18:41 +03:00
c28a83902c examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
d9da0e4986 server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
1f0dabda8d CUDA: use tensor cores for MMQ (#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-10 11:45:13 +02:00
af4ae502dd use the correct SYCL context for host USM allocations (#7777)
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
10ceba354a flake.lock: Update (#7838)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
  → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
e95beeb1fc imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
57bf62ce7c docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700)
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
3e2ee44315 server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
42b53d192f CUDA: revise q8_1 data layout for mul_mat_q (#7824) 2024-06-09 09:42:25 +02:00
2decf57bc6 convert-hf : set the model name based on cli arg, if present (#7693)
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
5795b94182 convert-hf : match model part name prefix and suffix (#7687)
In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 

But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.

This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
ed9f252118 gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 

In addition use_temp_file is now opt-in instead of opt-out defaulting to False.

Also GGUFWriter now does not require output file name until when actually writing to it.

And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
fe1e3917cf Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
d4d915d351 url: save -mu downloads to new cache location (#7826)
* url: save -mu download to new cache location

* url: fs_get_cache_file_path util

* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
347f30803f rename Dockerfiles 2024-06-08 15:10:32 +01:00
78eae7f3ba gitignore /llama-* 2024-06-08 14:29:35 +01:00
efaa441233 fix llama-lookup-* Makefile rules 2024-06-08 14:26:11 +01:00
b0eb3b88e9 rm bin files 2024-06-08 14:16:32 +01:00
eef922e02e sort cmake example subdirs 2024-06-08 14:09:28 +01:00
b648243496 add/fix gbnf-validator subfolder to cmake 2024-06-08 14:07:56 +01:00
81222f02db prefix more cmake targets w/ llama- 2024-06-08 14:05:34 +01:00
10650b692d rename {main->llama}-cmake-pkg binary 2024-06-08 13:57:06 +01:00
78bca8cb07 fix main refs 2024-06-08 13:52:03 +01:00
ab5efbb3b6 Prefix all example bins w/ llama- 2024-06-08 13:42:01 +01:00
23d0df5bd5 main: target name -> llama-cli 2024-06-08 12:50:35 +01:00
fe93cc96cc Merge remote-tracking branch 'origin/master' into bins 2024-06-08 12:04:52 +01:00
7a16ce7db2 server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
da799b4189 vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
c00fad71e5 gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
27615f5ab2 cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
0dba58269f Update server-llm.sh 2024-06-07 11:52:40 +01:00
7027b27d76 server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
af8f0169da Update .gitignore 2024-06-07 10:14:03 +01:00
7fbe6006c9 update straggling refs 2024-06-07 09:42:21 +01:00
99df4cc091 rm accidentally checked in bins 2024-06-07 09:40:09 +01:00
a5cabd7649 server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
d5c938cd77 [SYCL] fix softmax r2r result wrong issue (#7811) 2024-06-07 14:28:26 +08:00
c9ee7118d5 check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
fbd83131f5 Merge remote-tracking branch 'origin/master' into bins 2024-06-07 00:51:31 +01:00
a0a7f2b031 Update build.yml 2024-06-07 00:38:05 +01:00
8695baebc0 update more names 2024-06-07 00:21:01 +01:00
ee459f40f6 server : fix --threads-http arg (#7801) 2024-06-06 19:19:59 +03:00