llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-22 10:48:12 +00:00

Files

hipudding 7a395f67a7 CANN: Add support for async operator submission (#12864 )

Submit operators using asynchronous threads to improve performance.

Use the environment variable GGML_CANN_ASYNC_MODE to control whether
asynchronous submission is enabled. It is disabled by default.

Testing shows a 10%–20% performance improvement in scenarios with
small parameter sizes, especially in quantized models.

2025-04-17 20:34:16 +08:00

cmake

scripts : update sync + fix cmake merge

2025-03-27 10:09:29 +02:00

include

ggml : add bilinear upscale support (ggml/1185)

2025-04-11 00:17:47 +03:00

src

CANN: Add support for async operator submission (#12864 )

2025-04-17 20:34:16 +08:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

CUDA/HIP: Share the same unified memory allocation logic. (#12934 )

2025-04-15 11:20:38 +02:00