Logo
Explore Help
Sign In
tqcq/llama.cpp
0
0
Fork 0
You've already forked llama.cpp
mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-19 17:17:40 +00:00
Code Issues Packages Projects Releases Wiki Activity
Files
da84c04d8fa43ff92b172feb8130c74d062f956a
llama.cpp/examples/server/tests/unit
History
Georgi Gerganov a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
2025-04-08 19:54:51 +03:00
..
test_basic.py
…
test_chat_completion.py
server: fix deadly typo in response_format.json_schema.schema handling (#12168)
2025-03-04 08:24:07 +02:00
test_completion.py
server : Fixed wrong function name in llamacpp server unit test (#11473)
2025-01-29 00:03:42 +01:00
test_ctx_shift.py
…
test_embedding.py
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
2025-04-08 19:54:51 +03:00
test_infill.py
server : fix extra BOS in infill endpoint (#11106)
2025-01-06 15:36:08 +02:00
test_lora.py
server : allow using LoRA adapters per-request (#10994)
2025-01-02 15:05:18 +01:00
test_rerank.py
server : add TEI API format for /rerank endpoint (#11942)
2025-02-18 14:21:41 +01:00
test_security.py
…
test_slot_save.py
…
test_speculative.py
server : allow using LoRA adapters per-request (#10994)
2025-01-02 15:05:18 +01:00
test_tokenize.py
…
test_tool_call.py
tool-call: ensure there's always a non-empty tool call id (#12292)
2025-03-10 09:45:29 +00:00
Powered by Gitea Version: 1.24.2 Page: 2811ms Template: 7ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API