mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-07-22 02:38:03 +00:00
scripts: synthetic prompt mode for server-bench.py (#14695)
This commit is contained in:
@ -7,7 +7,7 @@ Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
|
||||
**Features:**
|
||||
* LLM inference of F16 and quantized models on GPU and CPU
|
||||
* [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
|
||||
* Reranking endoint (https://github.com/ggml-org/llama.cpp/pull/9510)
|
||||
* Reranking endpoint (https://github.com/ggml-org/llama.cpp/pull/9510)
|
||||
* Parallel decoding with multi-user support
|
||||
* Continuous batching
|
||||
* Multimodal ([documentation](../../docs/multimodal.md)) / with OpenAI-compatible API support
|
||||
|
Reference in New Issue
Block a user