docs : add "Quick start" section for new users (#13862)

* docs : add "Quick start" section for non-technical users * rm flox * Update README.md
2025-06-27 03:55:20 +00:00 · 2025-06-03 13:09:36 +02:00
parent 71e74a3ac9
commit ea1431b0fa
3 changed files with 54 additions and 27 deletions
--- a/README.md
+++ b/README.md
@ -28,6 +28,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
 ----
 ## Quick start
 Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
 - Install `llama.cpp` using [brew, nix or winget](docs/install.md)
 - Run with Docker - see our [Docker documentation](docs/docker.md)
 - Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
 - Build from source by cloning this repository - check out [our build guide](docs/build.md)
 Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
 Example command:
 ```sh
 # Use a local model file
 llama-cli -m my_model.gguf
 # Or download and run a model directly from Hugging Face
 llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
 # Launch OpenAI-compatible API server
 llama-server -hf ggml-org/gemma-3-1b-it-GGUF
 ```
 ## Description
 The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
@ -230,6 +254,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
 </details>
 ## Supported backends
 | Backend | Target devices |
@ -246,16 +271,6 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
 | [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
 | [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All |
 ## Building the project
 The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
 The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
 - Clone this repository and build locally, see [how to build](docs/build.md)
 - On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
 - Use a Docker image, see [documentation for Docker](docs/docker.md)
 - Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
 ## Obtaining and quantizing models
 The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
@ -263,7 +278,11 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt
 - [Trending](https://huggingface.co/models?library=gguf&sort=trending)
 - [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
-You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
+You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
 ```sh
 llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
 ```
 By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
--- a/docs/build.md
+++ b/docs/build.md
@ -1,5 +1,9 @@
 # Build llama.cpp locally
 The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
 The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server.
 **To get the Code:**
 ```bash
--- a/docs/install.md
+++ b/docs/install.md
@ -1,28 +1,42 @@
 # Install pre-built version of llama.cpp
-## Homebrew
+| Install via | Windows | Mac | Linux |
 |-------------|---------|-----|-------|
 | Winget      | ✅      |      |      |
 | Homebrew    |         | ✅   | ✅   |
 | MacPorts    |         | ✅   |      |
 | Nix         |         | ✅   | ✅   |
-On Mac and Linux, the homebrew package manager can be used via
+## Winget (Windows)
 ```sh
 winget install llama.cpp
 ```
 The package is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/issues/8188
 ## Homebrew (Mac and Linux)
 ```sh
 brew install llama.cpp
 ```
 The formula is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/discussions/7668
-## MacPorts
+## MacPorts (Mac)
 ```sh
 sudo port install llama.cpp
 ```
 see also: https://ports.macports.org/port/llama.cpp/details/
-## Nix
+See also: https://ports.macports.org/port/llama.cpp/details/
-On Mac and Linux, the Nix package manager can be used via
+## Nix (Mac and Linux)
 ```sh
 nix profile install nixpkgs#llama-cpp
 ```
 For flake enabled installs.
 Or
@ -34,13 +48,3 @@ nix-env --file '<nixpkgs>' --install --attr llama-cpp
 For non-flake enabled installs.
 This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164).
 ## Flox
 On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via
 ```sh
 flox install llama-cpp
 ```
 Flox follows the nixpkgs build of llama.cpp.