mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-08-08 09:57:45 -04:00
Load all MoE experts during warmup (#11571)
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup * common : use new API to enable warmup mode during model warmup --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
This commit is contained in:
@@ -29,6 +29,7 @@ struct llama_cparams {
|
||||
bool offload_kqv;
|
||||
bool flash_attn;
|
||||
bool no_perf;
|
||||
bool warmup;
|
||||
|
||||
enum llama_pooling_type pooling_type;
|
||||
|
||||
|
Reference in New Issue
Block a user