mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-22 02:38:03 +00:00

Files

Georgi Gerganov b0f27361f3 sampling : avoid expensive softmax during greedy sampling (#9605 )

* sampling : avoid expensive softmax during greedy sampling

ggml-ci

* speculative : fix default RNG seed + set sparams.n_probs

* Update tests/test-sampling.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* sampling : add clarifying comment [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>

2024-09-24 09:03:17 +03:00

CMakeLists.txt

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

README.md

…

speculative.cpp

sampling : avoid expensive softmax during greedy sampling (#9605 )

2024-09-24 09:03:17 +03:00

README.md

llama.cpp/examples/speculative

Demonstration of speculative decoding and tree-based speculative decoding techniques

More info:

https://github.com/ggerganov/llama.cpp/pull/2926
https://github.com/ggerganov/llama.cpp/pull/3624
https://github.com/ggerganov/llama.cpp/pull/5625