mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-08 18:04:54 -04:00

Files

Georgi Gerganov fcca0a7004 refact : fix convert script + zero out KV cache to avoid nans (#3523 )

* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements

2023-10-09 14:32:17 +03:00

CMakeLists.txt

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

parallel.cpp

refact : fix convert script + zero out KV cache to avoid nans (#3523 )

2023-10-09 14:32:17 +03:00

README.md

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

README.md

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel