mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-06-27 03:55:20 +00:00
docs : add "Quick start" section for new users (#13862)
* docs : add "Quick start" section for non-technical users * rm flox * Update README.md
This commit is contained in:
41
README.md
41
README.md
@ -28,6 +28,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
|
|||||||
|
|
||||||
----
|
----
|
||||||
|
|
||||||
|
## Quick start
|
||||||
|
|
||||||
|
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
|
||||||
|
|
||||||
|
- Install `llama.cpp` using [brew, nix or winget](docs/install.md)
|
||||||
|
- Run with Docker - see our [Docker documentation](docs/docker.md)
|
||||||
|
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
|
||||||
|
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
|
||||||
|
|
||||||
|
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
|
||||||
|
|
||||||
|
Example command:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# Use a local model file
|
||||||
|
llama-cli -m my_model.gguf
|
||||||
|
|
||||||
|
# Or download and run a model directly from Hugging Face
|
||||||
|
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
|
||||||
|
|
||||||
|
# Launch OpenAI-compatible API server
|
||||||
|
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
|
||||||
|
```
|
||||||
|
|
||||||
## Description
|
## Description
|
||||||
|
|
||||||
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
|
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
|
||||||
@ -230,6 +254,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
|
||||||
## Supported backends
|
## Supported backends
|
||||||
|
|
||||||
| Backend | Target devices |
|
| Backend | Target devices |
|
||||||
@ -246,16 +271,6 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
|
|||||||
| [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
|
| [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
|
||||||
| [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All |
|
| [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All |
|
||||||
|
|
||||||
## Building the project
|
|
||||||
|
|
||||||
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
|
|
||||||
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
|
|
||||||
|
|
||||||
- Clone this repository and build locally, see [how to build](docs/build.md)
|
|
||||||
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
|
|
||||||
- Use a Docker image, see [documentation for Docker](docs/docker.md)
|
|
||||||
- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
|
|
||||||
|
|
||||||
## Obtaining and quantizing models
|
## Obtaining and quantizing models
|
||||||
|
|
||||||
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
|
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
|
||||||
@ -263,7 +278,11 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt
|
|||||||
- [Trending](https://huggingface.co/models?library=gguf&sort=trending)
|
- [Trending](https://huggingface.co/models?library=gguf&sort=trending)
|
||||||
- [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
|
- [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
|
||||||
|
|
||||||
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
|
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
|
||||||
|
```
|
||||||
|
|
||||||
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
|
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
|
||||||
|
|
||||||
|
@ -1,5 +1,9 @@
|
|||||||
# Build llama.cpp locally
|
# Build llama.cpp locally
|
||||||
|
|
||||||
|
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
|
||||||
|
|
||||||
|
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server.
|
||||||
|
|
||||||
**To get the Code:**
|
**To get the Code:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -1,28 +1,42 @@
|
|||||||
# Install pre-built version of llama.cpp
|
# Install pre-built version of llama.cpp
|
||||||
|
|
||||||
## Homebrew
|
| Install via | Windows | Mac | Linux |
|
||||||
|
|-------------|---------|-----|-------|
|
||||||
|
| Winget | ✅ | | |
|
||||||
|
| Homebrew | | ✅ | ✅ |
|
||||||
|
| MacPorts | | ✅ | |
|
||||||
|
| Nix | | ✅ | ✅ |
|
||||||
|
|
||||||
On Mac and Linux, the homebrew package manager can be used via
|
## Winget (Windows)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
winget install llama.cpp
|
||||||
|
```
|
||||||
|
|
||||||
|
The package is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/issues/8188
|
||||||
|
|
||||||
|
## Homebrew (Mac and Linux)
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
brew install llama.cpp
|
brew install llama.cpp
|
||||||
```
|
```
|
||||||
|
|
||||||
The formula is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/discussions/7668
|
The formula is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggml-org/llama.cpp/discussions/7668
|
||||||
|
|
||||||
## MacPorts
|
## MacPorts (Mac)
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
sudo port install llama.cpp
|
sudo port install llama.cpp
|
||||||
```
|
```
|
||||||
see also: https://ports.macports.org/port/llama.cpp/details/
|
|
||||||
|
|
||||||
## Nix
|
See also: https://ports.macports.org/port/llama.cpp/details/
|
||||||
|
|
||||||
On Mac and Linux, the Nix package manager can be used via
|
## Nix (Mac and Linux)
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
nix profile install nixpkgs#llama-cpp
|
nix profile install nixpkgs#llama-cpp
|
||||||
```
|
```
|
||||||
|
|
||||||
For flake enabled installs.
|
For flake enabled installs.
|
||||||
|
|
||||||
Or
|
Or
|
||||||
@ -34,13 +48,3 @@ nix-env --file '<nixpkgs>' --install --attr llama-cpp
|
|||||||
For non-flake enabled installs.
|
For non-flake enabled installs.
|
||||||
|
|
||||||
This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164).
|
This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164).
|
||||||
|
|
||||||
## Flox
|
|
||||||
|
|
||||||
On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via
|
|
||||||
|
|
||||||
```sh
|
|
||||||
flox install llama-cpp
|
|
||||||
```
|
|
||||||
|
|
||||||
Flox follows the nixpkgs build of llama.cpp.
|
|
||||||
|
Reference in New Issue
Block a user