server: continue to update other slots on embedding concurrent request (#5699)

* server: #5655 - continue to update other slots on embedding concurrent request. * server: tests: add multi users embeddings as fixed * server: tests: adding OAI compatible embedding concurrent endpoint * server: tests: adding OAI compatible embedding with multiple inputs
2025-08-28 11:08:19 -04:00 · 2024-02-24 19:16:04 +01:00
parent 4c4cb30736
commit 9e359a4f47
5 changed files with 168 additions and 78 deletions
--- a/examples/server/tests/features/server.feature
+++ b/examples/server/tests/features/server.feature
@@ -60,6 +60,19 @@ Feature: llama.cpp server
    """
    Then embeddings are generated

+  Scenario: OAI Embeddings compatibility with multiple inputs
+    Given a model tinyllama-2
+    Given a prompt:
+      """
+      In which country Paris is located ?
+      """
+    And a prompt:
+      """
+      Is Madrid the capital of Spain ?
+      """
+    When an OAI compatible embeddings computation request for multiple inputs
+    Then embeddings are generated
+

  Scenario: Tokenize / Detokenize
    When tokenizing: