parallel : increase the variability of the prompt lengths (#13927)

ggml-ci
2025-08-25 09:38:35 -04:00 · 2025-05-30 19:38:07 +03:00
parent df0c0c7d02
commit dd665cc9d4
2 changed files with 6 additions and 3 deletions
--- a/examples/parallel/README.md
+++ b/examples/parallel/README.md
@@ -4,7 +4,7 @@ Simplified simulation of serving incoming requests in parallel

 ## Example

-Generate 128 client requests (`-ns 128`), simulating 8 concurrent clients (`-np 8`). The system prompt is shared (`-pps`), meaning that it is computed once at the start. The client requests consist of 10 junk questions (`-j 10`) followed by the actual question.
+Generate 128 client requests (`-ns 128`), simulating 8 concurrent clients (`-np 8`). The system prompt is shared (`-pps`), meaning that it is computed once at the start. The client requests consist of up to 10 junk questions (`--junk 10`) followed by the actual question.

 ```bash
 llama-parallel -m model.gguf -np 8 -ns 128 --top-k 1 -pps --junk 10 -c 16384