llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-26 11:13:53 -04:00

Author	SHA1	Message	Date
Johannes Gäßler	133d99c599	CUDA: deduplicate FlashAttention code (#7352 )	2024-05-18 12:36:25 +02:00
Johannes Gäßler	0fc1e820a9	CUDA: faster large batch FA without tensor cores (#7314 )	2024-05-17 18:54:52 +02:00