mirror of
https://github.com/wqking/eventpp.git
synced 2024-12-26 15:52:40 +08:00
350 lines
6.9 KiB
Markdown
350 lines
6.9 KiB
Markdown
# Benchmarks
|
|
|
|
Hardware: HP laptop, Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz, 16 GB RAM
|
|
Software: Windows 10, MinGW GCC 11.3.0, MSVC 2022
|
|
Time unit: milliseconds (unless explicitly specified)
|
|
|
|
Unless it's specified, the default compiler is GCC.
|
|
The hardware used for benchmark is pretty medium to low end at the time of benchmarking (December 2023).
|
|
|
|
## EventQueue enqueue and process -- single threading
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Iterations</th>
|
|
<th>Queue size</th>
|
|
<th>Event count</th>
|
|
<th>Event Types</th>
|
|
<th>Listener count</th>
|
|
<th>Time of single threading</th>
|
|
<th>Time of multi threading</th>
|
|
</tr>
|
|
<tr>
|
|
<td>100k</td>
|
|
<td>100</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>289</td>
|
|
<td>939</td>
|
|
</tr>
|
|
<tr>
|
|
<td>100k</td>
|
|
<td>1000</td>
|
|
<td>100M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>2822</td>
|
|
<td>9328</td>
|
|
</tr>
|
|
<tr>
|
|
<td>100k</td>
|
|
<td>1000</td>
|
|
<td>100M</td>
|
|
<td>1000</td>
|
|
<td>1000</td>
|
|
<td>2923</td>
|
|
<td>9502</td>
|
|
</tr>
|
|
<table>
|
|
|
|
Given `eventpp::EventQueue<size_t, void (size_t), Policies>`, which `Policies` is either single threading or multi threading, the benchmark adds `Listener count` listeners to the queue, each listener is an empty lambda. Then the benchmark starts timing. It loops `Iterations` times. In each loop, the benchmark puts `Queue size` events, then process the event queue.
|
|
There are `Event types` kinds of event type. `Event count` is `Iterations * Queue size`.
|
|
The EventQueue is processed in one thread. The Single/Multi threading in the table means the policies used.
|
|
|
|
## EventQueue enqueue and process -- multiple threading
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Mutex</th>
|
|
<th>Enqueue threads</th>
|
|
<th>Process threads</th>
|
|
<th>Event count</th>
|
|
<th>Event Types</th>
|
|
<th>Listener count</th>
|
|
<th>Time</th>
|
|
</tr>
|
|
<tr>
|
|
<td>std::mutex</td>
|
|
<td>1</td>
|
|
<td>1</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>1824</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SpinLock</td>
|
|
<td>1</td>
|
|
<td>1</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>1303</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::mutex</td>
|
|
<td>1</td>
|
|
<td>3</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>2989</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SpinLock</td>
|
|
<td>1</td>
|
|
<td>3</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>3186</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::mutex</td>
|
|
<td>2</td>
|
|
<td>2</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>3151</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SpinLock</td>
|
|
<td>2</td>
|
|
<td>2</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>3049</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::mutex</td>
|
|
<td>4</td>
|
|
<td>4</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>1657</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SpinLock</td>
|
|
<td>4</td>
|
|
<td>4</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>1659</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::mutex</td>
|
|
<td>16</td>
|
|
<td>16</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>708</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SpinLock</td>
|
|
<td>16</td>
|
|
<td>16</td>
|
|
<td>10M</td>
|
|
<td>100</td>
|
|
<td>100</td>
|
|
<td>1891</td>
|
|
</tr>
|
|
</table>
|
|
|
|
There are `Enqueue threads` threads enqueuing events to the queue, and `Process threads` threads processing the events. The total event count is `Event count`. `Mutex` is the mutex type used to protect the data.
|
|
The multi threading version shows slower than previous single threading version, since the mutex locks cost time.
|
|
When there are fewer threads (about around the number of CPU cores which is 4 here), `eventpp::SpinLock` has better performance than `std::mutex`. But there are much more threads than CPU cores (here is 16 enqueue threads and 16 process threads), `eventpp::SpinLock` has worse performance than `std::mutex`.
|
|
|
|
## CallbackList append/remove callbacks
|
|
|
|
The benchmark loops 100K times, in each loop it appends 1000 empty callbacks to a CallbackList, then remove all that 1000 callbacks. So there are totally 100M append/remove operations.
|
|
The total benchmarked time is about 16000 milliseconds. That's to say in 1 milliseconds there can be 6000 append/remove operations.
|
|
|
|
## CallbackList invoking VS native function invoking
|
|
|
|
Iterations: 100,000,000
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Function</th>
|
|
<th>Compiler</th>
|
|
<th>Native invoking</th>
|
|
<th>CallbackList single threading</th>
|
|
<th>CallbackList multi threading</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Inline global function</td>
|
|
<td>MSVC</td>
|
|
<td>139</td>
|
|
<td>1267</td>
|
|
<td>3058</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>141</td>
|
|
<td>1149</td>
|
|
<td>2563</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Non-inline global function</td>
|
|
<td>MSVC</td>
|
|
<td>143</td>
|
|
<td>1273</td>
|
|
<td>3047</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>132</td>
|
|
<td>1218</td>
|
|
<td>2583</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Function object</td>
|
|
<td>MSVC</td>
|
|
<td>139</td>
|
|
<td>1198</td>
|
|
<td>2993</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>141</td>
|
|
<td>1107</td>
|
|
<td>2633</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Member virtual function</td>
|
|
<td>MSVC</td>
|
|
<td>159</td>
|
|
<td>1221</td>
|
|
<td>3076</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>140</td>
|
|
<td>1231</td>
|
|
<td>2691</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Member non-virtual function</td>
|
|
<td>MSVC</td>
|
|
<td>140</td>
|
|
<td>1266</td>
|
|
<td>3054</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>140</td>
|
|
<td>1193</td>
|
|
<td>2701</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Member non-inline virtual function</td>
|
|
<td>MSVC</td>
|
|
<td>158</td>
|
|
<td>1223</td>
|
|
<td>3103</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>133</td>
|
|
<td>1231</td>
|
|
<td>2676</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">Member non-inline non-virtual function</td>
|
|
<td>MSVC</td>
|
|
<td>134</td>
|
|
<td>1266</td>
|
|
<td>3028</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>134</td>
|
|
<td>1205</td>
|
|
<td>2652</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td rowspan="2">All functions</td>
|
|
<td>MSVC</td>
|
|
<td>91</td>
|
|
<td>903</td>
|
|
<td>2214</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GCC</td>
|
|
<td>89</td>
|
|
<td>858</td>
|
|
<td>1852</td>
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
Testing functions
|
|
```c++
|
|
#if defined(_MSC_VER)
|
|
#define NON_INLINE __declspec(noinline)
|
|
#else
|
|
// gcc
|
|
#define NON_INLINE __attribute__((noinline))
|
|
#endif
|
|
|
|
volatile int globalValue = 0;
|
|
|
|
void globalFunction(int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
|
|
NON_INLINE void nonInlineGlobalFunction(int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
|
|
struct FunctionObject
|
|
{
|
|
void operator() (int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
|
|
virtual void virFunc(int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
|
|
void nonVirFunc(int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
|
|
NON_INLINE virtual void nonInlineVirFunc(int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
|
|
NON_INLINE void nonInlineNonVirFunc(int a, const int b)
|
|
{
|
|
globalValue += a + b;
|
|
}
|
|
};
|
|
|
|
#undef NON_INLINE
|
|
```
|