* perplexity: give more information about constraints on failure
This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.
* log formatting
* log error and return instead of storing max_seq_exceeded int
* check if s0 is zero for -np check
This commit updates comments and error messages to use "decode" instead
of "eval" in perplexity.cpp.
The motivation for this is that `llama_eval` was renamed to
`llama_decode` a while ago, but the comments and error messages
still referred to "eval". This change ensures consistency and clarity.
* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci