server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937)

Closes #5850
2025-08-20 06:36:48 -04:00 · 2024-03-08 12:25:04 +01:00
parent e457fb3540
commit 76e868821a
3 changed files with 46 additions and 13 deletions
--- a/examples/server/tests/features/server.feature
+++ b/examples/server/tests/features/server.feature
@@ -29,6 +29,7 @@ Feature: llama.cpp server
    And   a completion request with no api error
    Then  <n_predicted> tokens are predicted matching <re_content>
    And   prometheus metrics are exposed
+    And   metric llamacpp:tokens_predicted is <n_predicted>

    Examples: Prompts
      | prompt                           | n_predict | re_content                       | n_predicted |