CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) · f64d44a9b9 - llama.cpp - Cat's Mantra

tqcq/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-17 13:40:55 -04:00

CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590)

This commit is contained in:

Johannes Gäßler

2023-08-13 00:24:45 +02:00

committed by

GitHub

parent b19edd54d5

commit f64d44a9b9

2 changed files with 587 additions and 391 deletions

976

ggml-cuda.cu

View File

File diff suppressed because it is too large Load Diff