llama : accept a list of devices to use to offload a model (#10497)

* llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency
2025-08-13 11:57:43 -04:00 · 2024-11-25 19:30:06 +01:00
parent 1f922254f0
commit 10bce0450f
9 changed files with 104 additions and 27 deletions
--- a/examples/speculative-simple/speculative-simple.cpp
+++ b/examples/speculative-simple/speculative-simple.cpp
@@ -46,6 +46,7 @@ int main(int argc, char ** argv) {
    ctx_tgt   = llama_init_tgt.context;

    // load the draft model
+    params.devices      = params.speculative.devices;
    params.model        = params.speculative.model;
    params.n_ctx        = params.speculative.n_ctx;
    params.n_batch      = params.speculative.n_ctx > 0 ? params.speculative.n_ctx : params.n_batch;