llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-26 11:13:53 -04:00

Author	SHA1	Message	Date
Georgi Gerganov	9b3d833189	cuda : fix compile warning (#7454 )	2024-05-22 12:36:37 +03:00
Johannes Gäßler	95fb0aefab	CUDA: remove incorrect precision check (#7454 )	2024-05-22 10:24:29 +02:00
Johannes Gäßler	133d99c599	CUDA: deduplicate FlashAttention code (#7352 )	2024-05-18 12:36:25 +02:00
Johannes Gäßler	0fc1e820a9	CUDA: faster large batch FA without tensor cores (#7314 )	2024-05-17 18:54:52 +02:00