leveldb/port
costan 5c39524f36 Replace SSE-optimized CRC32C in POSIX port with external library.
Maintaining a hardware-accelerated CRC32C implementation tailored for
all modern platforms deserves a repository of its own. We extracted the
implementation here into https://github.com/google/crc32c and improved
it in that repository. This CL removes the SSE-optimized implementation
from this codebase, and adds the ability to use the google/crc32c
library, if it is present on the system.

The benchmarks below show the performance impact of the change. In
summary, open source builds that use the google/crc32c library can
expect a 3x improvement in CRC32C throughput, whereas builds that do not
use the library will see a 50% drop in CRC32C throughput. This
translates in much smaller changes in overall leveldb performance.

Baseline, MacBookPro13,3 with Core i7 6920HQ:
LevelDB:    version 1.20
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
------------------------------------------------
fillseq      :       3.064 micros/op;   36.1 MB/s
fillsync     :      57.861 micros/op;    1.9 MB/s (1000 ops)
fillrandom   :       3.887 micros/op;   28.5 MB/s
overwrite    :       4.140 micros/op;   26.7 MB/s
readrandom   :       7.433 micros/op; (1000000 of 1000000 found)
readrandom   :       6.825 micros/op; (1000000 of 1000000 found)
readseq      :       0.244 micros/op;  453.4 MB/s
readreverse  :       0.387 micros/op;  285.8 MB/s
compact      :  449707.000 micros/op;
readrandom   :       4.196 micros/op; (1000000 of 1000000 found)
readseq      :       0.228 micros/op;  485.8 MB/s
readreverse  :       0.320 micros/op;  345.2 MB/s
fill100K     :     562.556 micros/op;  169.6 MB/s (1000 ops)
crc32c       :       0.768 micros/op; 5085.0 MB/s (4K per op)
snappycomp   :       4.220 micros/op;  925.7 MB/s (output: 55.1%)
snappyuncomp :       0.635 micros/op; 6155.7 MB/s
acquireload  :      13.054 micros/op; (each op is 1000 loads)

New with crc32c, MacBookPro13,3 with Core i7 6920HQ:
LevelDB:    version 1.20
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
------------------------------------------------
fillseq      :       2.820 micros/op;   39.2 MB/s
fillsync     :      51.988 micros/op;    2.1 MB/s (1000 ops)
fillrandom   :       3.747 micros/op;   29.5 MB/s
overwrite    :       4.047 micros/op;   27.3 MB/s
readrandom   :       7.287 micros/op; (1000000 of 1000000 found)
readrandom   :       6.927 micros/op; (1000000 of 1000000 found)
readseq      :       0.253 micros/op;  437.5 MB/s
readreverse  :       0.411 micros/op;  269.2 MB/s
compact      :  440405.000 micros/op;
readrandom   :       4.159 micros/op; (1000000 of 1000000 found)
readseq      :       0.230 micros/op;  481.1 MB/s
readreverse  :       0.320 micros/op;  345.9 MB/s
fill100K     :     558.222 micros/op;  170.9 MB/s (1000 ops)
crc32c       :       0.214 micros/op; 18263.5 MB/s (4K per op)
snappycomp   :       4.471 micros/op;  873.7 MB/s (output: 55.1%)
snappyuncomp :       0.833 micros/op; 4688.5 MB/s
acquireload  :      13.289 micros/op; (each op is 1000 loads)

New without crc32c, MacBookPro13,3 with Core i7 6920HQ
LevelDB:    version 1.20
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
------------------------------------------------
fillseq      :       3.094 micros/op;   35.8 MB/s
fillsync     :      52.160 micros/op;    2.1 MB/s (1000 ops)
fillrandom   :       4.090 micros/op;   27.0 MB/s
overwrite    :       4.006 micros/op;   27.6 MB/s
readrandom   :       6.584 micros/op; (1000000 of 1000000 found)
readrandom   :       6.676 micros/op; (1000000 of 1000000 found)
readseq      :       0.280 micros/op;  395.2 MB/s
readreverse  :       0.391 micros/op;  283.2 MB/s
compact      :  433911.000 micros/op;
readrandom   :       4.261 micros/op; (1000000 of 1000000 found)
readseq      :       0.251 micros/op;  440.5 MB/s
readreverse  :       0.356 micros/op;  310.9 MB/s
fill100K     :     584.023 micros/op;  163.3 MB/s (1000 ops)
crc32c       :       1.384 micros/op; 2822.3 MB/s (4K per op)
snappycomp   :       4.763 micros/op;  820.1 MB/s (output: 55.1%)
snappyuncomp :       0.766 micros/op; 5098.6 MB/s
acquireload  :      12.931 micros/op; (each op is 1000 loads)

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171667771
2017-10-10 11:46:40 -07:00
..
win reverting disastrous MOE commit, returning to r21 2011-04-19 23:11:15 +00:00
atomic_pointer.h Use __APPLE__ instead of OS_MACOS. The former is compiler-provided. 2017-08-24 15:00:45 -07:00
port_example.h Implement support for Intel crc32 instruction (SSE 4.2) 2017-02-28 14:08:46 -08:00
port_posix.cc Including atomic_pointer.h in port_posix 2015-12-09 10:35:07 -08:00
port_posix.h Replace SSE-optimized CRC32C in POSIX port with external library. 2017-10-10 11:46:40 -07:00
port.h Remove static initializer; fix endian-ness detection; fix build on 2012-05-30 09:45:46 -07:00
README reverting disastrous MOE commit, returning to r21 2011-04-19 23:11:15 +00:00
thread_annotations.h Release 1.18 2014-09-16 14:19:52 -07:00

This directory contains interfaces and implementations that isolate the
rest of the package from platform details.

Code in the rest of the package includes "port.h" from this directory.
"port.h" in turn includes a platform specific "port_<platform>.h" file
that provides the platform specific implementation.

See port_posix.h for an example of what must be provided in a platform
specific header file.