llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-21 10:17:58 +00:00

Files

Kawrakow 76aa30a263 Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183 )

* k_cache: be able to use Q5_0

* k_cache: be able to use Q5_1 on CODA

* k_cache: be able to use Q5_0 on Metal

* k_cache: be able to use Q5_1 on Metal

* k_cache: be able to use IQ4_NL - just CUDA for now

* k_cache: be able to use IQ4_NL on Metal

* k_cache: add newly added supported types to llama-bench and CUDA supports_op

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-03-21 08:27:57 +01:00

base64.hpp

llava : expose as a shared library for downstream projects (#3613 )

2023-11-07 00:36:23 +03:00

build-info.cpp.in

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

CMakeLists.txt

common: llama_load_model_from_url using --model-url (#6098 )

2024-03-17 19:12:37 +01:00

common.cpp

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183 )