llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-26 11:13:53 -04:00

Files

Georgi Gerganov 9cb317f77e ggml : full ALiBi support (#7192 )

* ggml : full ALiBi support

* ggml : update ggml_soft_max_ext() CUDA, SYCL

* ggml : ggml_flash_attn_ext() support ALiBi (CPU)

* ggml : ggml_flash_attn_ext() support ALiBi (Metal)

* ggml : fix warning

* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

ggml-ci

* ggml : fix assert message

* vulkan : add dev notes

* ggml : require mask when using ALiBi

ggml-ci

* convert : fix convert for refact models

2024-05-11 10:32:41 +03:00

__init__.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

constants.py

convert-hf : save memory with lazy evaluation (#7075 )

2024-05-08 18:16:38 -04:00

gguf_reader.py

convert-hf : save memory with lazy evaluation (#7075 )

2024-05-08 18:16:38 -04:00

gguf_writer.py

convert-hf : save memory with lazy evaluation (#7075 )