mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-19 09:08:04 +00:00

Files

Minsoo Cheong 586e7bc561 sampling : deduplicated code for probability distribution access (#6240 )

* sampling: remove duplicated code for probability distribution access

* free original_logits

* fix original_logits allocation

* fixes based on review @cebtenzzre

* change function name to `llama_sampling_prepare`

2024-03-24 10:54:07 +02:00

CMakeLists.txt

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

README.md

speculative : implement stochastic speculative sampling (#5625 )

2024-03-04 20:24:00 +02:00

speculative.cpp

sampling : deduplicated code for probability distribution access (#6240 )

2024-03-24 10:54:07 +02:00

README.md

llama.cpp/examples/speculative

Demonstration of speculative decoding and tree-based speculative decoding techniques

More info:

https://github.com/ggerganov/llama.cpp/pull/2926
https://github.com/ggerganov/llama.cpp/pull/3624
https://github.com/ggerganov/llama.cpp/pull/5625