Benchmarks

Hardware: HP laptop, Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz, 16 GB RAM
Software: Windows 10, MinGW GCC 11.3.0, MSVC 2022
Time unit: milliseconds (unless explicitly specified)

Unless it's specified, the default compiler is GCC.
The hardware used for benchmark is pretty medium to low end at the time of benchmarking (December 2023).

EventQueue enqueue and process -- single threading

Iterations	Queue size	Event count	Event Types	Listener count	Time of single threading	Time of multi threading
100k	100	10M	100	100	289	939
100k	1000	100M	100	100	2822	9328
100k	1000	100M	1000	1000	2923	9502

Given eventpp::EventQueue<size_t, void (size_t), Policies>, which Policies is either single threading or multi threading, the benchmark adds Listener count listeners to the queue, each listener is an empty lambda. Then the benchmark starts timing. It loops Iterations times. In each loop, the benchmark puts Queue size events, then process the event queue.
There are Event types kinds of event type. Event count is Iterations * Queue size.
The EventQueue is processed in one thread. The Single/Multi threading in the table means the policies used.

EventQueue enqueue and process -- multiple threading

Mutex	Enqueue threads	Process threads	Event count	Event Types	Listener count	Time
std::mutex	1	1	10M	100	100	1824
SpinLock	1	1	10M	100	100	1303
std::mutex	1	3	10M	100	100	2989
SpinLock	1	3	10M	100	100	3186
std::mutex	2	2	10M	100	100	3151
SpinLock	2	2	10M	100	100	3049
std::mutex	4	4	10M	100	100	1657
SpinLock	4	4	10M	100	100	1659
std::mutex	16	16	10M	100	100	708
SpinLock	16	16	10M	100	100	1891

There are Enqueue threads threads enqueuing events to the queue, and Process threads threads processing the events. The total event count is Event count. Mutex is the mutex type used to protect the data.
The multi threading version shows slower than previous single threading version, since the mutex locks cost time.
When there are fewer threads (about around the number of CPU cores which is 4 here), eventpp::SpinLock has better performance than std::mutex. But there are much more threads than CPU cores (here is 16 enqueue threads and 16 process threads), eventpp::SpinLock has worse performance than std::mutex.

CallbackList append/remove callbacks

The benchmark loops 100K times, in each loop it appends 1000 empty callbacks to a CallbackList, then remove all that 1000 callbacks. So there are totally 100M append/remove operations.
The total benchmarked time is about 16000 milliseconds. That's to say in 1 milliseconds there can be 6000 append/remove operations.

CallbackList invoking VS native function invoking

Iterations: 100,000,000

Function	Compiler	Native invoking	CallbackList single threading	CallbackList multi threading
Inline global function	MSVC	139	1267	3058
Inline global function	GCC	141	1149	2563
Non-inline global function	MSVC	143	1273	3047
Non-inline global function	GCC	132	1218	2583
Function object	MSVC	139	1198	2993
Function object	GCC	141	1107	2633
Member virtual function	MSVC	159	1221	3076
Member virtual function	GCC	140	1231	2691
Member non-virtual function	MSVC	140	1266	3054
Member non-virtual function	GCC	140	1193	2701
Member non-inline virtual function	MSVC	158	1223	3103
Member non-inline virtual function	GCC	133	1231	2676
Member non-inline non-virtual function	MSVC	134	1266	3028
Member non-inline non-virtual function	GCC	134	1205	2652
All functions	MSVC	91	903	2214
All functions	GCC	89	858	1852

Testing functions

#if defined(_MSC_VER)
#define NON_INLINE __declspec(noinline)
#else
// gcc
#define NON_INLINE __attribute__((noinline))
#endif

volatile int globalValue = 0;

void globalFunction(int a, const int b)
{
    globalValue += a + b;
}

NON_INLINE void nonInlineGlobalFunction(int a, const int b)
{
    globalValue += a + b;
}

struct FunctionObject
{
    void operator() (int a, const int b)
    {
        globalValue += a + b;
    }

    virtual void virFunc(int a, const int b)
    {
        globalValue += a + b;
    }

    void nonVirFunc(int a, const int b)
    {
        globalValue += a + b;
    }

    NON_INLINE virtual void nonInlineVirFunc(int a, const int b)
    {
        globalValue += a + b;
    }

    NON_INLINE void nonInlineNonVirFunc(int a, const int b)
    {
        globalValue += a + b;
    }
};

#undef NON_INLINE

6.9 KiB Raw Blame History

Benchmarks

EventQueue enqueue and process -- single threading

EventQueue enqueue and process -- multiple threading

CallbackList append/remove callbacks

CallbackList invoking VS native function invoking

6.9 KiB

Raw Blame History