llama : add Command R Plus support (#6491)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-27 12:05:03 +00:00

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <seast@Ss-Mac-Studio.local>
Co-authored-by: S <s@example.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

This commit is contained in:

Carolinabanana

2024-04-09 09:16:13 +01:00

committed by

GitHub

parent e11a8999b5

commit 5dc9dd7152

16 changed files with 358 additions and 318 deletions

310

ggml-quants.c

View File

File diff suppressed because it is too large Load Diff

llama : add Command R Plus support (#6491)

310 ggml-quants.c View File

310

ggml-quants.c

View File