# Benchmarks Hardware: Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz Software: Windows 10, MSVC 2017, MinGW GCC 7.2.0 Time unit: milliseconds (unless explicitly specified) ## EventQueue enqueue and process -- single threading
Iterations Queue size Event count Event Types Listener count Time of single threading Time of multi threading
100k 100 10M 100 100 401 1146
100k 1000 100M 100 100 4012 11467
100k 1000 100M 1000 1000 4102 11600
Given `eventpp::EventQueue`, which `Policies` is either single threading or multi threading, the benchmark adds `Listener count` listeners to the queue, each listener is an empty lambda. Then the benchmark starts timing. It loops `Iterations` times. In each loop, the benchmark puts `Queue size` events, then process the event queue. There are `Event types` kinds of event type. `Event count` is `Iterations * Queue size`. The EventQueue is processed in one thread. The Single/Multi threading in the table means the policies used. ## EventQueue enqueue and process -- multiple threading
Mutex Enqueue threads Process threads Event count Event Types Listener count Time
std::mutex 1 1 10M 100 100 2283
SpinLock 1 1 10M 100 100 1692
std::mutex 1 3 10M 100 100 3446
SpinLock 1 3 10M 100 100 3025
std::mutex 2 2 10M 100 100 4000
SpinLock 2 2 10M 100 100 3076
std::mutex 4 4 10M 100 100 1971
SpinLock 4 4 10M 100 100 1755
std::mutex 16 16 10M 100 100 928
SpinLock 16 16 10M 100 100 2082
There are `Enqueue threads` threads enqueuing events to the queue, and `Process threads` threads processing the events. The total event count is `Event count`. `Mutex` is the mutex type used to protect the data. The multi threading version shows slower than previous single threading version, since the mutex locks cost time. When there are fewer threads (about around the number of CPU cores which is 4 here), `eventpp::SpinLock` has better performance than `std::mutex`. But there are much more threads than CPU cores (here is 16 enqueue threads and 16 process threads), `eventpp::SpinLock` has worse performance than `std::mutex`. ## CallbackList append/remove callbacks The benchmark loops 100K times, in each loop it appends 1000 empty callbacks to a CallbackList, then remove all that 1000 callbacks. So there are totally 100M append/remove operations. The total benchmarked time is about 21000 milliseconds. That's to say in 1 milliseconds there can be 5000 append/remove operations. ## CallbackList invoking VS native function invoking Iterations: 100,000,000
Function Compiler Native invoking CallbackList single threading CallbackList multi threading
Inline global function MSVC 2017 217 1501 6921
GCC 7.2 187 1489 4463
Non-inline global function MSVC 2017 241 1526 6544
GCC 7.2 233 1488 4787
Function object MSVC 2017 194 1498 6433
GCC 7.2 212 1485 4951
Member virtual function MSVC 2017 207 1533 6558
GCC 7.2 212 1485 4489
Member non-virtual function MSVC 2017 214 1533 6390
GCC 7.2 211 1486 4872
Member non-inline virtual function MSVC 2017 206 1522 6578
GCC 7.2 182 1666 4593
Member non-inline non-virtual function MSVC 2017 206 1491 6992
GCC 7.2 205 1486 4490
All functions MSVC 2017 1374 10951 29973
GCC 7.2 1223 9770 22958
Testing functions ```c++ #if defined(_MSC_VER) #define NON_INLINE __declspec(noinline) #else // gcc #define NON_INLINE __attribute__((noinline)) #endif volatile int globalValue = 0; void globalFunction(int a, const int b) { globalValue += a + b; } NON_INLINE void nonInlineGlobalFunction(int a, const int b) { globalValue += a + b; } struct FunctionObject { void operator() (int a, const int b) { globalValue += a + b; } virtual void virFunc(int a, const int b) { globalValue += a + b; } void nonVirFunc(int a, const int b) { globalValue += a + b; } NON_INLINE virtual void nonInlineVirFunc(int a, const int b) { globalValue += a + b; } NON_INLINE void nonInlineNonVirFunc(int a, const int b) { globalValue += a + b; } }; #undef NON_INLINE ```