diff --git a/README.md b/README.md index 576332bc5..91401fa98 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) ---- +## Quick start + +Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine: + +- Install `llama.cpp` using [brew, nix or winget](docs/install.md) +- Run with Docker - see our [Docker documentation](docs/docker.md) +- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases) +- Build from source by cloning this repository - check out [our build guide](docs/build.md) + +Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more. + +Example command: + +```sh +# Use a local model file +llama-cli -m my_model.gguf + +# Or download and run a model directly from Hugging Face +llama-cli -hf ggml-org/gemma-3-1b-it-GGUF + +# Launch OpenAI-compatible API server +llama-server -hf ggml-org/gemma-3-1b-it-GGUF +``` + ## Description The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide @@ -230,6 +254,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo + ## Supported backends | Backend | Target devices | @@ -246,16 +271,6 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo | [OpenCL](docs/backend/OPENCL.md) | Adreno GPU | | [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All | -## Building the project - -The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h). -The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries: - -- Clone this repository and build locally, see [how to build](docs/build.md) -- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md) -- Use a Docker image, see [documentation for Docker](docs/docker.md) -- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases) - ## Obtaining and quantizing models The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`: @@ -263,7 +278,11 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt - [Trending](https://huggingface.co/models?library=gguf&sort=trending) - [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf) -You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf /[:quant]`. +You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf /[:quant]`. For example: + +```sh +llama-cli -hf ggml-org/gemma-3-1b-it-GGUF +``` By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`. diff --git a/docs/build.md b/docs/build.md index 32717a793..680b0d839 100644 --- a/docs/build.md +++ b/docs/build.md @@ -1,5 +1,9 @@ # Build llama.cpp locally +The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h). + +The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. + **To get the Code:** ```bash diff --git a/docs/install.md b/docs/install.md index 4971c1828..7200bf9b7 100644 --- a/docs/install.md +++ b/docs/install.md @@ -1,28 +1,42 @@ # Install pre-built version of llama.cpp -## Homebrew +| Install via | Windows | Mac | Linux | +|-------------|---------|-----|-------| +| Winget | ✅ | | | +| Homebrew | | ✅ | ✅ | +| MacPorts | | ✅ | | +| Nix | | ✅ | ✅ | -On Mac and Linux, the homebrew package manager can be used via +## Winget (Windows) + +```sh +winget install llama.cpp +``` + +The package is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/issues/8188 + +## Homebrew (Mac and Linux) ```sh brew install llama.cpp ``` + The formula is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/discussions/7668 -## MacPorts +## MacPorts (Mac) ```sh sudo port install llama.cpp ``` -see also: https://ports.macports.org/port/llama.cpp/details/ -## Nix +See also: https://ports.macports.org/port/llama.cpp/details/ -On Mac and Linux, the Nix package manager can be used via +## Nix (Mac and Linux) ```sh nix profile install nixpkgs#llama-cpp ``` + For flake enabled installs. Or @@ -34,13 +48,3 @@ nix-env --file '' --install --attr llama-cpp For non-flake enabled installs. This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164). - -## Flox - -On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via - -```sh -flox install llama-cpp -``` - -Flox follows the nixpkgs build of llama.cpp.