Commit Graph

308 Commits

Author SHA1 Message Date
Rinne
089b1c93ba readme : add C#/.NET bindings repo (#1409) 2023-05-12 08:39:40 +03:00
Georgi Gerganov
b9fd7eee57 ggml : remove bit shuffling (#1405)
* ggml : remove Q4_0 bit shufling (ARM NEON)

* ggml : remove Q4_1 bit shuffling (ARM NEON + reference)

* ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)

* ggml : remove Q4_2 bit shuffling (WIP, BROKEN)

* ggml : remove Q5_0 bit shuffling (ARM NEON)

* ggml : 2x faster scalar implementations

* ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)

* ggml : simplify scalar dot

* ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit

* ggml : fix Q4_1 quantization

* ggml : update cuBLAS + normalize variable names

* ggml : remove Q4_2 mode

* ggml : minor formatting

* ggml : fix Q5_0 quantization

* scripts : add script for measuring the time per token

* AVX implementations (#1370)

* ggml : uniform 5th bit extraction

* llama : produce error upon loading old model files

* llama : fix model magic/version write

* ggml : speed-up Q5_0 + Q5_1 at 4 threads

* ggml : preserve old Q4 and Q5 formats

* ggml : simplify Q8_1 - no need for low / high sums anymore

* ggml : fix Q8_0 and Q8_1 rounding

* Revert "AVX implementations (#1370)"

This reverts commit 948d124837.

* ggml : fix AVX2 implementation

* sha : update hashes for 7B and 13B

* readme : update timings + remove warning banner

* llama : update v2 PR number to 1405

* ggml : fix WASM comments

* ggml : back to original bit order

* readme : add note that Q4 and Q5 have been changed

* llama : fix return for unknown version

---------

Co-authored-by: Stephan Walter <stephan@walter.name>
2023-05-12 00:23:08 +03:00
Georgi Gerganov
56551bc11f readme : add notice about upcoming breaking change 2023-05-08 22:52:18 +03:00
AlpinDale
fe60904eef readme : add TOC and Pygmalion instructions (#1359) 2023-05-08 19:33:30 +03:00
Georgi Gerganov
f9a6364912 llama : require first token to be BOS (#1303)
* llama : require first token to be BOS

* scripts : add ppl-run-all.sh

* perplexity : add BOS for each chunk

* readme : update perplexity values after BOS fix

* perplexity : add clarifying comments
2023-05-08 17:41:54 +03:00
Johannes Gäßler
1f48b0abcf Documented CUDA reproducibility, added warning (#1346) 2023-05-08 02:42:01 +02:00
DaniAndTheWeb
173d0e6419 makefile: automatic Arch Linux detection (#1332)
This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux
2023-05-05 23:57:14 +02:00
Pavol Rusnak
921dcee00a readme: add missing info (#1324) 2023-05-05 16:43:36 +02:00
44670
360cfe5bec readme : add OpenBuddy link (#1321) 2023-05-04 19:33:31 +03:00
Georgi Gerganov
bca9ad938a minor : fix whitespaces (#1302) 2023-05-03 20:09:42 +03:00
KASR
b0c71c7b6d scripts : platform independent script to verify sha256 checksums (#1203)
* python script to verify the checksum of the llama models

Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability.

* Update README.md

update to the readme for improved readability and to explain the usage of the python checksum verification script

* update the verification script

I've extended the script based on suggestions by @prusnak

The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks.

* minor improvment

small change so that the available ram is checked and not the total ram

* remove the part of the code that reads the file at once if enough ram is available

based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks.

* Update verify-checksum-models.py

quick fix to pass the git check
2023-05-03 18:31:28 +03:00
Stephan Walter
36d19a603b Remove Q4_3 which is no better than Q5 (#1218) 2023-04-28 23:10:43 +00:00
Georgi Gerganov
7f15c5c477 readme : update hot topics 2023-04-28 21:32:52 +03:00
Folko-Ven
78ec543733 Correcting link to w64devkit (#1214)
Correcting link to w64devkit (change seeto to skeeto).
2023-04-28 16:22:48 +02:00
Georgi Gerganov
f9be42add0 readme : add quantization info 2023-04-26 23:24:42 +03:00
DaniAndTheWeb
ea3ad7eb60 Updating build instructions to include BLAS support (#1183)
* Updated build information

First update to the build instructions to include BLAS.

* Update README.md

* Update information about BLAS

* Better BLAS explanation

Adding a clearer BLAS explanation and adding a link to download the CUDA toolkit.

* Better BLAS explanation

* BLAS for Mac

Specifying that BLAS is already supported on Macs using the Accelerate Framework.

* Clarify the effect of BLAS

* Windows Make instructions

Added the instructions to build with Make on Windows

* Fixing typo

* Fix trailing whitespace
2023-04-26 22:03:03 +02:00
Pavol Rusnak
859fee6dfb quantize : use map to assign quantization type from string (#1191)
instead of `int` (while `int` option still being supported)

This allows the following usage:

`./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0`

instead of:

`./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
2023-04-26 18:43:27 +02:00
mgroeber9110
9b0a4d4214 examples/main README improvements and some light refactoring (#1131) 2023-04-24 15:45:32 +00:00
Pavol Rusnak
c6524f46eb readme : update gpt4all instructions (#980) 2023-04-23 10:21:26 +02:00
CRD716
834695fe3a Minor: Readme fixed grammar, spelling, and misc updates (#1071) 2023-04-19 19:52:14 +00:00
Georgi Gerganov
7cd5c4a3e9 readme : add warning about Q4_2 and Q4_3 2023-04-19 19:07:54 +03:00
Georgi Gerganov
7faa7460f0 readme : update hot topics about new LoRA functionality 2023-04-18 20:10:26 +03:00
Atsushi Tatsuma
e9298af389 readme : add Ruby bindings (#1029) 2023-04-17 22:34:35 +03:00
comex
723dac55fa py : new conversion script (#545)
Current status: Working, except for the latest GPTQ-for-LLaMa format
  that includes `g_idx`.  This turns out to require changes to GGML, so
  for now it only works if you use the `--outtype` option to dequantize it
  back to f16 (which is pointless except for debugging).

  I also included some cleanup for the C++ code.

  This script is meant to replace all the existing conversion scripts
  (including the ones that convert from older GGML formats), while also
  adding support for some new formats.  Specifically, I've tested with:

  - [x] `LLaMA` (original)
  - [x] `llama-65b-4bit`
  - [x] `alpaca-native`
  - [x] `alpaca-native-4bit`
  - [x] LLaMA converted to 'transformers' format using
        `convert_llama_weights_to_hf.py`
  - [x] `alpaca-native` quantized with `--true-sequential --act-order
        --groupsize 128` (dequantized only)
  - [x] same as above plus `--save_safetensors`
  - [x] GPT4All
  - [x] stock unversioned ggml
  - [x] ggmh

  There's enough overlap in the logic needed to handle these different
  cases that it seemed best to move to a single script.

  I haven't tried this with Alpaca-LoRA because I don't know where to find
  it.

  Useful features:

  - Uses multiple threads for a speedup in some cases (though the Python
    GIL limits the gain, and sometimes it's disk-bound anyway).

  - Combines split models into a single file (both the intra-tensor split
    of the original and the inter-tensor split of 'transformers' format
    files).  Single files are more convenient to work with and more
    friendly to future changes to use memory mapping on the C++ side.  To
    accomplish this without increasing memory requirements, it has some
    custom loading code which avoids loading whole input files into memory
    at once.

  - Because of the custom loading code, it no longer depends in PyTorch,
    which might make installing dependencies slightly easier or faster...
    although it still depends on NumPy and sentencepiece, so I don't know
    if there's any meaningful difference.  In any case, I also added a
    requirements.txt file to lock the dependency versions in case of any
    future breaking changes.

  - Type annotations checked with mypy.

  - Some attempts to be extra user-friendly:

      - The script tries to be forgiving with arguments, e.g. you can
        specify either the model file itself or the directory containing
        it.

      - The script doesn't depend on config.json / params.json, just in
        case the user downloaded files individually and doesn't have those
        handy.  But you still need tokenizer.model and, for Alpaca,
        added_tokens.json.

      - The script tries to give a helpful error message if
        added_tokens.json is missing.
2023-04-14 10:03:03 +03:00
CRD716
ec29272175 readme : remove python 3.10 warning (#929) 2023-04-13 16:59:53 +03:00
Genkagaku.GPT
7e941b95eb readme : llama node binding (#911)
* chore: add nodejs binding

* chore: add nodejs binding
2023-04-13 16:54:27 +03:00
Judd
4579af95e8 zig : update build.zig (#872)
* update

* update readme

* minimize the changes.

---------

Co-authored-by: zjli2019 <zhengji.li@ingchips.com>
2023-04-13 16:43:22 +03:00
Georgi Gerganov
f76cb3a34d readme : change "GPU support" link to discussion 2023-04-12 14:48:57 +03:00
Georgi Gerganov
782438070f readme : update hot topics with link to "GPU support" issue 2023-04-12 14:31:12 +03:00
Nicolai Weitkemper
4dbbd40750 readme: link to sha256sums file (#902)
This is to emphasize that these do not need to be obtained from elsewhere.
2023-04-12 08:46:20 +02:00
Pavol Rusnak
8b679987cd Fix whitespace, add .editorconfig, add GitHub workflow (#883) 2023-04-11 19:45:44 +00:00
qouoq
a0caa34b16 Add BAIR's Koala to supported models (#877) 2023-04-10 22:41:53 +02:00
Pavol Rusnak
d2beca95dc Make docker instructions more explicit (#785) 2023-04-06 08:56:58 +02:00
Georgi Gerganov
3416298929 Update README.md 2023-04-05 19:54:30 +03:00
Georgi Gerganov
8d10406d6e readme : change logo + add bindings + add uis + add wiki 2023-04-05 18:56:20 +03:00
Adithya Balaji
594cc95fab readme : update with CMake and windows example (#748)
* README: Update with CMake and windows example

* README: update with code-review for cmake build
2023-04-05 17:36:12 +03:00
Thatcher Chamberlin
d8d4e865cd Add a missing step to the gpt4all instructions (#690)
`migrate-ggml-2023-03-30-pr613.py` is needed to get gpt4all running.
2023-04-02 12:48:57 +02:00
rimoliga
d0a7f742e7 readme: replace termux links with homepage, play store is deprecated (#680) 2023-04-01 16:57:30 +02:00
Pavol Rusnak
9733104be5 drop quantize.py (now that models are using a single file) 2023-03-31 01:07:32 +02:00
Georgi Gerganov
3df890aef4 readme : update supported models 2023-03-30 22:31:54 +03:00
Georgi Gerganov
b467702b87 readme : fix typos 2023-03-29 19:38:31 +03:00
Georgi Gerganov
516d88e75c readme : add GPT4All instructions (close #588) 2023-03-29 19:37:20 +03:00
Stephan Walter
b391579db9 Update README and comments for standalone perplexity tool (#525) 2023-03-26 16:14:01 +03:00
Georgi Gerganov
348d6926ee Add logo to README.md 2023-03-26 10:20:49 +03:00
Georgi Gerganov
55ad42af84 Move chat scripts into "./examples" 2023-03-25 20:37:09 +02:00
Georgi Gerganov
4a7129acd2 Remove obsolete information from README 2023-03-25 16:30:32 +02:00
Gary Mulder
f4f5362edb Update README.md (#444)
Added explicit **bolded** instructions clarifying that people need to request access to models from Facebook and never through through this repo.
2023-03-24 15:23:09 +00:00
Georgi Gerganov
b6b268d441 Add link to Roadmap discussion 2023-03-24 09:13:35 +02:00
Stephan Walter
a50e39c6fe Revert "Delete SHA256SUMS for now" (#429)
* Revert "Delete SHA256SUMS for now (#416)"

This reverts commit 8eea5ae0e5.

* Remove ggml files until they can be verified
* Remove alpaca json
* Add also model/tokenizer.model to SHA256SUMS + update README

---------

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-03-23 15:15:48 +01:00
Gary Mulder
8a3e5ef801 Move model section from issue template to README.md (#421)
* Update custom.md

* Removed Model section as it is better placed in README.md

* Updates to README.md model section

* Inserted text that was removed from  issue template about obtaining models from FB and links to papers describing the various models

* Removed IPF down links for the Alpaca 7B models as these look to be in the old data format and probably shouldn't be directly linked to, anyway

* Updated the perplexity section to point at Perplexity scores #406 discussion
2023-03-23 11:30:40 +00:00