Convert vector to f16 for dequantize mul mat vec (#1913)

* Convert vector to f16 for dmmv * compile option * Added compilation option description to README * Changed cmake CUDA_ARCHITECTURES from "OFF" to "native"
2025-08-20 06:36:48 -04:00 · 2023-06-19 10:23:56 +02:00
parent b24c3049d9
commit 16b9cd1939
5 changed files with 158 additions and 68 deletions
--- a/llama.cpp
+++ b/llama.cpp
@@ -1620,7 +1620,7 @@ static bool llama_eval_internal(
                    model.layers[il].w1,
                    cur);
            offload_func(cur);
-            ggml_set_name(cur, "result_w2");
+            ggml_set_name(cur, "result_w1");

            // SILU activation
            cur = ggml_silu(ctx0, cur);