Benchmarks

Hardware: Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
Software: Windows 10, MSVC 2017, MinGW GCC 7.2.0
Time unit: milliseconds (unless explicitly specified)

EventQueue enqueue and process -- single threading

Iterations	Queue size	Event count	Event Types	Listener count	Time of single threading	Time of multi threading
100k	100	10M	100	100	401	1146
100k	1000	100M	100	100	4012	11467
100k	1000	100M	1000	1000	4102	11600

Given eventpp::EventQueue<size_t, void (size_t), Policies>, which Policies is either single threading or multi threading, the benchmark adds Listener count listeners to the queue, each listener is an empty lambda. Then the benchmark starts timing. It loops Iterations times. In each loop, the benchmark puts Queue size events, then process the event queue.
There are Event types kinds of event type. Event count is Iterations * Queue size.
The EventQueue is processed in one thread. The Single/Multi threading in the table means the policies used.

EventQueue enqueue and process -- multiple threading

Mutex	Enqueue threads	Process threads	Event count	Event Types	Listener count	Time
std::mutex	1	1	10M	100	100	2283
SpinLock	1	1	10M	100	100	1692
std::mutex	1	3	10M	100	100	3446
SpinLock	1	3	10M	100	100	3025
std::mutex	2	2	10M	100	100	4000
SpinLock	2	2	10M	100	100	3076
std::mutex	4	4	10M	100	100	1971
SpinLock	4	4	10M	100	100	1755
std::mutex	16	16	10M	100	100	928
SpinLock	16	16	10M	100	100	2082

There are Enqueue threads threads enqueuing events to the queue, and Process threads threads processing the events. The total event count is Event count. Mutex is the mutex type used to protect the data.
The multi threading version shows slower than previous single threading version, since the mutex locks cost time.
When there are fewer threads (about around the number of CPU cores which is 4 here), eventpp::SpinLock has better performance than std::mutex. But there are much more threads than CPU cores (here is 16 enqueue threads and 16 process threads), eventpp::SpinLock has worse performance than std::mutex.

CallbackList append/remove callbacks

The benchmark loops 100K times, in each loop it appends 1000 empty callbacks to a CallbackList, then remove all that 1000 callbacks. So there are totally 100M append/remove operations.
The total benchmarked time is about 21000 milliseconds. That's to say in 1 milliseconds there can be 5000 append/remove operations.

CallbackList invoking VS native function invoking

Iterations: 100,000,000

Function	Compiler	Native invoking	CallbackList single threading	CallbackList multi threading
Inline global function	MSVC 2017	217	1501	6921
Inline global function	GCC 7.2	187	1489	4463
Non-inline global function	MSVC 2017	241	1526	6544
Non-inline global function	GCC 7.2	233	1488	4787
Function object	MSVC 2017	194	1498	6433
Function object	GCC 7.2	212	1485	4951
Member virtual function	MSVC 2017	207	1533	6558
Member virtual function	GCC 7.2	212	1485	4489
Member non-virtual function	MSVC 2017	214	1533	6390
Member non-virtual function	GCC 7.2	211	1486	4872
Member non-inline virtual function	MSVC 2017	206	1522	6578
Member non-inline virtual function	GCC 7.2	182	1666	4593
Member non-inline non-virtual function	MSVC 2017	206	1491	6992
Member non-inline non-virtual function	GCC 7.2	205	1486	4490
All functions	MSVC 2017	1374	10951	29973
All functions	GCC 7.2	1223	9770	22958

Testing functions

#if defined(_MSC_VER)
#define NON_INLINE __declspec(noinline)
#else
// gcc
#define NON_INLINE __attribute__((noinline))
#endif

volatile int globalValue = 0;

void globalFunction(int a, const int b)
{
    globalValue += a + b;
}

NON_INLINE void nonInlineGlobalFunction(int a, const int b)
{
    globalValue += a + b;
}

struct FunctionObject
{
    void operator() (int a, const int b)
    {
        globalValue += a + b;
    }

    virtual void virFunc(int a, const int b)
    {
        globalValue += a + b;
    }

    void nonVirFunc(int a, const int b)
    {
        globalValue += a + b;
    }

    NON_INLINE virtual void nonInlineVirFunc(int a, const int b)
    {
        globalValue += a + b;
    }

    NON_INLINE void nonInlineNonVirFunc(int a, const int b)
    {
        globalValue += a + b;
    }
};

#undef NON_INLINE

6.8 KiB Raw Blame History

Benchmarks

EventQueue enqueue and process -- single threading

EventQueue enqueue and process -- multiple threading

CallbackList append/remove callbacks

CallbackList invoking VS native function invoking

6.8 KiB

Raw Blame History