Logo
Explore Help
Sign In
tqcq/llama.cpp
0
0
Fork 0
You've already forked llama.cpp
mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-05 02:23:54 +00:00
Code Issues Packages Projects Releases Wiki Activity
Files
d0a417f3c7a5a22ef05b3b76d91dbe1d3362bf0c
llama.cpp/examples/server/tests/unit
History
Georgi Gerganov a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
2025-04-08 19:54:51 +03:00
..
test_basic.py
…
test_chat_completion.py
server: fix deadly typo in response_format.json_schema.schema handling (#12168)
2025-03-04 08:24:07 +02:00
test_completion.py
server : Fixed wrong function name in llamacpp server unit test (#11473)
2025-01-29 00:03:42 +01:00
test_ctx_shift.py
…
test_embedding.py
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
2025-04-08 19:54:51 +03:00
test_infill.py
server : fix extra BOS in infill endpoint (#11106)
2025-01-06 15:36:08 +02:00
test_lora.py
server : allow using LoRA adapters per-request (#10994)
2025-01-02 15:05:18 +01:00
test_rerank.py
server : add TEI API format for /rerank endpoint (#11942)
2025-02-18 14:21:41 +01:00
test_security.py
…
test_slot_save.py
…
test_speculative.py
server : allow using LoRA adapters per-request (#10994)
2025-01-02 15:05:18 +01:00
test_tokenize.py
…
test_tool_call.py
tool-call: ensure there's always a non-empty tool call id (#12292)
2025-03-10 09:45:29 +00:00
Powered by Gitea Version: 1.24.1 Page: 2574ms Template: 532ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API