context : perform output reorder lazily upon access after sync (#14853)

* context : perform output reorder after lazily upon access after sync

ggml-ci

* cont : add TODO
This commit is contained in:
Georgi Gerganov
2025-07-24 16:31:48 +03:00
committed by GitHub
parent 820de57d4f
commit e4868d16d2
3 changed files with 47 additions and 13 deletions

View File

@ -956,6 +956,7 @@ extern "C" {
// in the order they have appeared in the batch.
// Rows: number of tokens for which llama_batch.logits[i] != 0
// Cols: n_vocab
// TODO: deprecate in favor of llama_get_logits_ith() (ref: https://github.com/ggml-org/llama.cpp/pull/14853#issuecomment-3113143522)
LLAMA_API float * llama_get_logits(struct llama_context * ctx);
// Logits for the ith token. For positive indices, Equivalent to:
@ -970,6 +971,7 @@ extern "C" {
// in the order they have appeared in the batch.
// shape: [n_outputs*n_embd]
// Otherwise, returns NULL.
// TODO: deprecate in favor of llama_get_embeddings_ith() (ref: https://github.com/ggml-org/llama.cpp/pull/14853#issuecomment-3113143522)
LLAMA_API float * llama_get_embeddings(struct llama_context * ctx);
// Get the embeddings for the ith token. For positive indices, Equivalent to: