llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-28 20:25:20 +00:00

Files

Xuan Son Nguyen 958367bf53 server : refactor slot input data, move tokenizer to HTTP thread (#10023 )

* server : refactor slot input data, move tokenizer to HTTP thread

* move prompt_tokens.empty() check

* fix incorrect if branch

* fix infinite generation loop

* bring back infill validation

* add infill test

* try fixing format_infill

* fix test

* remove redundant code

* rename completion to inference

* update docs

* use llama_tokens everywhere

2024-10-24 21:51:22 +02:00

steps

server : refactor slot input data, move tokenizer to HTTP thread (#10023 )

2024-10-24 21:51:22 +02:00

ctx_shift.feature

server : remove self-extend features (#9860 )

2024-10-12 16:06:31 +03:00

embeddings.feature

llama : add reranking support (#9510 )

2024-09-28 17:42:03 +03:00

environment.py

server tests : more pythonic process management; fix bare except: (#6146 )