mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-08-17 13:40:55 -04:00
server: tests: add truncated prompt tests, better kv cache size (#5933)
* server: tests: add truncated prompt tests, better size * server, tests : update regex --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
@@ -6,8 +6,8 @@ Feature: Parallel
|
||||
Given a server listening on localhost:8080
|
||||
And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
|
||||
And 42 as server seed
|
||||
And 512 as batch size
|
||||
And 64 KV cache size
|
||||
And 128 as batch size
|
||||
And 256 KV cache size
|
||||
And 2 slots
|
||||
And continuous batching
|
||||
Then the server is starting
|
||||
@@ -76,6 +76,7 @@ Feature: Parallel
|
||||
| disabled | 128 |
|
||||
| enabled | 64 |
|
||||
|
||||
|
||||
Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size #3969
|
||||
Given a prompt:
|
||||
"""
|
||||
|
Reference in New Issue
Block a user