llama : sanitize invalid tokens (#9357)

* common : do not add null tokens during warmup ggml-ci * llama : check that the input tokens are valid ggml-ci * tests : fix batch size of bert model ggml-ci
2025-08-19 22:36:13 -04:00 · 2024-09-08 00:33:13 +03:00
parent e536426ded
commit faf69d4237
3 changed files with 26 additions and 4 deletions
--- a/examples/server/tests/features/embeddings.feature
+++ b/examples/server/tests/features/embeddings.feature
@@ -9,8 +9,11 @@ Feature: llama.cpp server
    And   a model alias bert-bge-small
    And   42 as server seed
    And   2 slots
-    And   1024 as batch size
-    And   1024 as ubatch size
+    # the bert-bge-small model has context size of 512
+    # since the generated prompts are as big as the batch size, we need to set the batch size to 512
+    # ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20
+    And   512 as batch size
+    And   512 as ubatch size
    And   2048 KV cache size
    And   embeddings extraction
    Then  the server is starting