Xuan Son Nguyen
958367bf53
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere
2024-10-24 21:51:22 +02:00
..
2024-10-10 22:57:42 +02:00
2024-10-18 23:18:01 +02:00
2024-10-10 22:57:42 +02:00
2024-10-18 23:18:01 +02:00
2024-10-10 22:57:42 +02:00
2024-10-18 23:18:01 +02:00
2024-10-10 22:57:42 +02:00
2024-10-10 22:57:42 +02:00
2024-10-02 10:21:57 +03:00
2024-10-10 22:57:42 +02:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-21 09:46:40 +03:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-18 23:18:01 +02:00
2024-10-10 22:57:42 +02:00
2024-10-18 23:18:01 +02:00
2024-09-20 20:55:36 +02:00
2024-10-08 14:21:43 +02:00
2024-10-10 22:57:42 +02:00
2024-10-10 20:14:55 +02:00
2024-10-21 09:46:40 +03:00
2024-10-24 21:51:22 +02:00
2024-10-18 23:18:01 +02:00
2024-10-21 09:46:40 +03:00
2024-09-18 08:30:31 +08:00
2024-10-10 22:57:42 +02:00
2024-10-02 10:14:44 +03:00
2024-10-16 19:03:24 +03:00
2024-10-23 17:16:56 +03:00