threadpool : skip polling for unused threads (#9461)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-26 19:55:04 +00:00

* threadpool: skip polling for unused threads

Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1).
This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur).

n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written
from one thread and read from other threads (not a race conditions).

* threadpool: further simplify and improve ggml_barrier

Avoid using strict memory order while polling, yet make sure that all threads go through
full memory barrier (memory fence) on ggml_barrier entrace and exit.

* threads: add simple barrier test

This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead.

* threadpool: improve thread sync for new-graphs

Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order
to keep it efficient, once the new graph is detected we do full fence using read-modify-write
with strict memory order.

* threadpool: improve abort handling

Do not use threadpool->ec (exit code) to decide whether to exit the compute loop.
threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it.

Instead introduce atomic threadpool->abort flag used for this. This is consistent with
how we handle threadpool->stop or pause.

While at it add an explicit atomic_load for n_threads_cur for consistency.

* test-barrier: release threadpool before releasing the context

fixes use-after-free detected by gcc thread-sanitizer on x86-64
for some reason llvm sanitizer is not detecting this issue.

This commit is contained in:

Max Krasnyansky

2024-09-17 01:19:46 -07:00

committed by

GitHub

parent 503147a9f9

commit 0226613853

3 changed files with 170 additions and 50 deletions

									
										1

tests/CMakeLists.txt
									
												View File
												
				@ -119,6 +119,7 @@ llama_target_and_test(test-grammar-parser.cpp)

				llama_target_and_test(test-llama-grammar.cpp)

				llama_target_and_test(test-grammar-integration.cpp)

				llama_target_and_test(test-grad0.cpp)

				llama_target_and_test(test-barrier.cpp)

				# llama_target_and_test(test-opt.cpp) # SLOW

				llama_target_and_test(test-backend-ops.cpp)

threadpool : skip polling for unused threads (#9461)

1 tests/CMakeLists.txt Unescape Escape View File

1

tests/CMakeLists.txt

View File