mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-08-30 11:59:59 -04:00
Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.