96 Commits

Author SHA1 Message Date
Victor Costan
a7528a5d2b Clean up util/coding.{h,cc}.
1) Inline EncodeFixed{32,64}(). They emit single machine instructions on 64-bit processors.
2) Remove size narrowing compiler warnings from DecodeFixed{32,64}().
3) Add comments explaining the current state of optimizations in compilers we care about.
4) Change C-style includes, like <stdint.h>, to C++ style, like <cstdint>.
5) memcpy -> std::memcpy.

The optimization comments are based on https://godbolt.org/z/RdIqS1. The missed optimization opportunities in clang have been reported as https://bugs.llvm.org/show_bug.cgi?id=41761

The change does not have significant impact on benchmarks. Results below.

LevelDB:    version 1.22
Date:       Mon May  6 10:42:18 2019
CPU:        72 * Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
CPUCache:   25344 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)

With change
------------------------------------------------
fillseq      :       2.327 micros/op;   47.5 MB/s
fillsync     :    4185.526 micros/op;    0.0 MB/s (1000 ops)
fillrandom   :       3.662 micros/op;   30.2 MB/s
overwrite    :       4.261 micros/op;   26.0 MB/s
readrandom   :       4.239 micros/op; (1000000 of 1000000 found)
readrandom   :       3.649 micros/op; (1000000 of 1000000 found)
readseq      :       0.174 micros/op;  636.7 MB/s
readreverse  :       0.271 micros/op;  408.7 MB/s
compact      :  570495.000 micros/op;
readrandom   :       2.735 micros/op; (1000000 of 1000000 found)
readseq      :       0.118 micros/op;  937.3 MB/s
readreverse  :       0.190 micros/op;  583.7 MB/s
fill100K     :     860.164 micros/op;  110.9 MB/s (1000 ops)
crc32c       :       1.131 micros/op; 3455.2 MB/s (4K per op)
snappycomp   :       3.034 micros/op; 1287.5 MB/s (output: 55.1%)
snappyuncomp :       0.544 micros/op; 7176.0 MB/s

Baseline
------------------------------------------------
fillseq      :       2.365 micros/op;   46.8 MB/s
fillsync     :    4240.165 micros/op;    0.0 MB/s (1000 ops)
fillrandom   :       3.244 micros/op;   34.1 MB/s
overwrite    :       4.153 micros/op;   26.6 MB/s
readrandom   :       4.698 micros/op; (1000000 of 1000000 found)
readrandom   :       4.065 micros/op; (1000000 of 1000000 found)
readseq      :       0.192 micros/op;  576.3 MB/s
readreverse  :       0.286 micros/op;  386.7 MB/s
compact      :  635979.000 micros/op;
readrandom   :       3.264 micros/op; (1000000 of 1000000 found)
readseq      :       0.169 micros/op;  652.8 MB/s
readreverse  :       0.213 micros/op;  519.5 MB/s
fill100K     :    1055.367 micros/op;   90.4 MB/s (1000 ops)
crc32c       :       1.353 micros/op; 2887.3 MB/s (4K per op)
snappycomp   :       3.036 micros/op; 1286.7 MB/s (output: 55.1%)
snappyuncomp :       0.540 micros/op; 7238.6 MB/s
PiperOrigin-RevId: 246856811
2019-05-06 11:23:02 -07:00
Victor Costan
24424a1ef2 Style cleanup.
1) Convert iterator-based for loops to C++11 foreach loops.
2) Convert "void operator=" to "T& operator=".
3) Switch from copy operators from private to public deleted.
4) Switch from empty ctors / dtors to "= default" where appropriate.

PiperOrigin-RevId: 246679195
2019-05-04 17:42:20 -07:00
Chris Mumford
9bd23c7676 Correct class/structure declaration order.
1. Correct the class/struct declaration order to be IAW
   the Google C++ style guide[1].
2. For non-copyable classes, switched from non-implemented
   private methods to explicitly deleted[2] methods.
3. Minor const and member initialization fixes.

[1] https://google.github.io/styleguide/cppguide.html#Declaration_Order
[2] http://eel.is/c++draft/dcl.fct.def.delete

PiperOrigin-RevId: 246521844
2019-05-03 09:48:57 -07:00
Chris Mumford
297e66afc1 Format all files IAW the Google C++ Style Guide.
Use clang-format to correct formatting to be in agreement with the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html). Doing this simplifies the process of accepting changes. Also fixed a few warnings flagged by clang-tidy.

PiperOrigin-RevId: 246350737
2019-05-02 19:04:50 -07:00
Chris Mumford
2f008ac19e Initialize class members to default values in constructors.
There were a few members which were identified to have been left
uninitialized in some constructors. These were very likely to
have been set before being used, otherwise the ASan tests would
have caught them, but still good practice to have them
initialized. This addresses some items reported in issue #668.

PiperOrigin-RevId: 243370145
2019-04-12 18:50:15 -07:00
Chris Mumford
ffabb1ae86 Merge pull request #665 from cheng-chang:coding
PiperOrigin-RevId: 243316002
2019-04-12 13:22:46 -07:00
Cheng Chang
cf1b5f4732 Remove unnecessary bit operation. 2019-03-22 17:32:20 +08:00
Felipe Oliveira Carvalho
7035af5fc3 Two small fixes for the Windows implementation (#661)
* Check if NOMIMMAX is defined before defining it

* Pass char* for a %s format in a snprintf call
2019-03-21 08:45:04 -07:00
costan
15e2278966 Use override consistently in leveldb::test::ErrorEnv.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=239453565
2019-03-20 13:57:32 -07:00
cmumford
ea49b27d06 Switch corruption_test to use InMemEnv.
This change switches corruption_test, which previously used direct file
I/O to corrupt table files for open databases, to use InMemEnv. Using an
Env eliminates some platform dependencies thus simplifying the tests.

Also removed EnvWindowsTestHelper::RelaxFilePermissions().  This was
only added because the Windows Env opens files for exclusive access.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=239305329
2019-03-20 13:57:03 -07:00
costan
201f77d137 Inline defaults in options.
This CL moves default values for
leveldb::{Options,ReadOptions,WriteOptions} from constructors to member
declarations, and removes now-redundant comments stating the defaults.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=239271242
2019-03-20 13:56:22 -07:00
costan
7d8e41e49b leveldb: Replace AtomicPointer with std::atomic.
This CL removes AtomicPointer from leveldb's port interface. Its usage is replaced with std::atomic<> from the C++11 standard library.

AtomicPointer was used to wrap flags, numbers, and pointers, so its instances are replaced with std::atomic<bool>, std::atomic<int>, std::atomic<size_t> and std::atomic<Node*>.

This CL does not revise the memory ordering. AtomicPointer's methods are replaced mechanically with their std::atomic equivalents, even when the underlying usage is incorrect. (Example: DBImpl::has_imm_ is written using release stores, even though it is always read using relaxed ordering.) Revising the memory ordering is left for future CLs.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=237865146
2019-03-11 13:41:25 -07:00
costan
ed76289b25 Align windows_logger with posix_logger.
Fixes GitHub issue #657.

This CL also makes the Windows CI green.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=237255887
2019-03-07 10:04:01 -08:00
cmumford
c69d33b0ec Added native support for Windows.
This change adds a native Windows port (port_windows.h) and a
Windows Env (WindowsEnv).

Note1: "small" is defined when including <Windows.h> so some
parameters were renamed to avoid conflict.

Note2: leveldb::Env defines the method: "DeleteFile" which is
also a constant defined when including <Windows.h>. The solution
was to ensure this macro is defined in env.h which forces
the function, when compiled, to be either DeleteFileA or
DeleteFileW when building for MBCS or UNICODE respectively.

This resolves #519 on GitHub.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=236364778
2019-03-01 18:00:35 -08:00
costan
296de8d5b8 leveldb: Fix PosixWritableFile::Sync() on Apple systems.
Apple doesn't follow POSIX specifications for fsync(). Instead, fsync() guarantees to flush the buffer cache to the device, which means the data will survive kernel panics, but may not survive power outages. Applications that need stronger guarantees (like databases) need to use fcntl(F_FULLFSYNC).

This CL switches PosixWritableFile::Sync() to get the stronger guarantees on Apple systems. The improved implementation follows the same principles as SQLite [1] and node.js [2].

Research for the fcntl() to fsync() fallback strategy:

Apple's released source code at https://opensource.apple.com/ shows at least three different error codes being returned when a filesystem does not support F_FULLFSYNC.

fcntl() is implemented in xnu-4903.221.2 in bsd/kern/kern_descrip.c, where it delegates to fcntl_nocancel(). The documentation for fcntl_nocancel() mentions error codes for some operations, but does not include F_FULLFSYNC. The F_FULLSYNC branch in fcntl_nocancel() calls VNOP_IOCTL(_, F_FULLSYNC, NULL, 0, _), whose return value sets the error
code.

VNOP_IOCTL() is implemented in bsd/vfs/kpi_vfs.c and calls the ioctl function in the vnode's operation vector. The per-filesystem function names follow the pattern _vnop_ioctl() for all the instances in opensource code: {hfs,msdosfs,nfs,ntfs,smbfs,webdav,zfs}_vnop_ioctl().

hfs-407.30.1, msdosfs-229.200.3, and nfs in xnu-4903.221.2 handle F_FULLFSYNC. ntfs-94.200.1 and smb-759.40.1 do not handle F_FULLFSYNC, and the default branch returns ENOSUP. webdav-380.200.1 also does not handle F_FULLFSYNC, but the default branch returns EINVAL. zfs-59 also does not handle F_FULLSYNC, and its default branch returns ENOTTY.

From a different angle, Apple's ntfs-94.200.1 includes utility code that uses fcntl(F_FULLFSYNC) and falls back to fsync() just like we do, supporting the hypothesis that there is no good way to detect lack of F_FULLFSYNC support. Also, Apple's fcntl() man page [3] does not mention a way to detect lack of F_FULLFSYNC support.

[1] https://www.sqlite.org/src/doc/trunk/src/os_unix.c
[2] https://github.com/libuv/libuv/blob/master/src/unix/fs.c
[3] https://developer.apple.com/library/archive/documentatiVon/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html
Tested:
    https://travis-ci.org/pwnall/leveldb/builds/477318498
    TAP global presubmit

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=228593729
2019-01-09 14:58:22 -08:00
cmumford
af7abf06ea Add back space to POSIX Logger.
The space in between the header and log message was mistakenly omitted
in a prior commit. Re-adding.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=228202737
2019-01-07 22:03:34 -08:00
costan
1cb3840881 Clean up env_posix.cc.
General cleanup principles:
* Use override when applicable.
* Remove static when redundant (methods and  globals in anonymous
  namespaces).
* Use const on class members where possible.
* Standardize on "status" for Status local variables.
* Renames where clarity can be improved.
* Qualify standard library names with std:: when possible, to
  distinguish from POSIX names.
* Qualify POSIX names with the global namespace (::) when possible, to
  distinguish from standard library names.

This also refactors the background thread synchronization logic so that
it's statically analyzable.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=219212089
2018-10-29 16:40:15 -07:00
costan
a7dc502e9f Rework once initialization in env_posix.cc.
C++11 guarantees thread-safe initialization of static variables inside
functions. This is a more restricted form of std::call_once or
pthread_once_t (e.g., single call site), so the compiler might be able
to generate better code [1]. Equally important, having less
platform-dependent code in env_posix.cc makes it easier to port to other
platforms.

Due to the change above, this CL introduced a new approach for storing
the singleton PosixEnv instance returned by Env::Default(). The new
approach avoids a dynamic memory allocation, which eliminates the false
positive from LeakSanitizer reported in
https://github.com/google/leveldb/issues/539 and
https://github.com/google/leveldb/issues/113

[1] https://stackoverflow.com/a/27206650/

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=214293129
2018-09-24 13:37:31 -07:00
costan
c43565dd39 C++11 cleanup for util/mutexlock.h.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=213398583
2018-09-24 13:37:01 -07:00
costan
73d5834ece Rework threading in env_posix.cc.
This commit replaces the use of pthreads in the POSIX port with std::thread
and port::Mutex + port::CondVar. This is intended to simplify porting
the env to a different platform.

The indirect use of pthreads in PosixLogger is replaced with
std:🧵:id(), based on an approach prototyped by @cmumfordx@.

The pthreads dependency in CMakeFiles is not removed, because some C++
standard library implementations must be linked against pthreads for
std::thread use. Figuring out this dependency is left for future work.

Switching away from pthreads also fixes
https://github.com/google/leveldb/issues/381

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=212478311
2018-09-11 11:02:14 -07:00
costan
05709fb43e Remove InitOnce from the port API.
This is not an API-breaking change, because it reduces the API that the
leveldb embedder must implement. The project will build just fine
against ports that still implement InitOnce.

C++11 guarantees thread-safe initialization of static variables inside
functions. This is a more restricted form of std::call_once or
pthread_once_t (e.g., single call site), so the compiler might be able
to generate better code [1]. Equally important, having less code in
port_example.h makes it easier to port to other platforms.

Due to the change above, this CL introduces a new approach for storing
the singleton BytewiseComparatorImpl instance returned by
BytewiseComparator(). The new approach avoids a dynamic memory
allocation, which eliminates the false positive from LeakSanitizer
reported in https://github.com/google/leveldb/issues/200

[1] https://stackoverflow.com/a/27206650/

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=212348004
2018-09-10 19:04:59 -07:00
costan
bb88f25115 Clean up PosixWritableFile in env_posix.cc.
This is separated from the general cleanup because of the logic changes
in SyncDirIfManifest().

General cleanup principles:
* Use override when applicable.
* Remove static when redundant (methods and  globals in anonymous
  namespaces).
* Use const on class members where possible.
* Standardize on "status" for Status local variables.
* Renames where clarity can be improved.
* Qualify standard library names with std:: when possible, to
  distinguish from POSIX names.
* Qualify POSIX names with the global namespace (::) when possible, to
  distinguish from standard library names.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=211709673
2018-09-08 02:17:01 -07:00
costan
7b945f2003 Clean up posix_logger.h.
General cleanup principles:
* Use override when applicable.
* Use const on class members where possible.
* Renames where clarity can be improved.
* Qualify standard library names with std:: when possible, to
  distinguish from POSIX names.
* Qualify POSIX names with the global namespace (::) when possible, to
  distinguish from standard library names.

This also revamps the logic for putting together a message into the
in-memory buffer before that is passed to fwrite(). While correct in
practice, the current implementation advances a char pointer past the
size of its buffer, which is technically undefined behavior.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=211472570
2018-09-04 10:38:12 -07:00
costan
03064cbbb2 Simplify Limiter in env_posix.cc.
Now that we require C++11, we can use std::atomic<int>, which has
primitives for most of the logic we need. As a bonus, the happy path for
Limiter::Acquire() and Limiter::Release() only performs one atomic
operation.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=211469518
2018-09-04 10:36:40 -07:00
costan
4de9594f6f Add move constructor to Status.
This will result in smaller code generation when Status instances are
passed around.

Benchmarks don't indicate a significant change either way.
CPU:        48 * Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
CPUCache:   30720 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)

Baseline:
fillseq      :       3.589 micros/op;   30.8 MB/s
fillsync     :    4165.299 micros/op;    0.0 MB/s (1000 ops)
fillrandom   :       5.864 micros/op;   18.9 MB/s
overwrite    :       7.830 micros/op;   14.1 MB/s
readrandom   :       5.534 micros/op; (1000000 of 1000000 found)
readrandom   :       4.292 micros/op; (1000000 of 1000000 found)
readseq      :       0.312 micros/op;  354.1 MB/s
readreverse  :       0.501 micros/op;  220.8 MB/s
compact      :  886211.000 micros/op;
readrandom   :       3.518 micros/op; (1000000 of 1000000 found)
readseq      :       0.251 micros/op;  441.2 MB/s
readreverse  :       0.456 micros/op;  242.4 MB/s
fill100K     :    1329.723 micros/op;   71.7 MB/s (1000 ops)
crc32c       :       1.976 micros/op; 1976.7 MB/s (4K per op)
snappycomp   :       4.705 micros/op;  830.2 MB/s (output: 55.1%)
snappyuncomp :       0.958 micros/op; 4079.1 MB/s
acquireload  :       0.727 micros/op; (each op is 1000 loads)

New:
fillseq      :       3.129 micros/op;   35.4 MB/s
fillsync     :    2748.099 micros/op;    0.0 MB/s (1000 ops)
fillrandom   :       5.394 micros/op;   20.5 MB/s
overwrite    :       7.253 micros/op;   15.3 MB/s
readrandom   :       5.655 micros/op; (1000000 of 1000000 found)
readrandom   :       4.425 micros/op; (1000000 of 1000000 found)
readseq      :       0.298 micros/op;  371.3 MB/s
readreverse  :       0.508 micros/op;  217.9 MB/s
compact      :  885842.000 micros/op;
readrandom   :       3.545 micros/op; (1000000 of 1000000 found)
readseq      :       0.252 micros/op;  438.2 MB/s
readreverse  :       0.425 micros/op;  260.2 MB/s
fill100K     :    1418.347 micros/op;   67.2 MB/s (1000 ops)
crc32c       :       1.987 micros/op; 1966.0 MB/s (4K per op)
snappycomp   :       4.767 micros/op;  819.4 MB/s (output: 55.1%)
snappyuncomp :       0.916 micros/op; 4264.9 MB/s
acquireload  :       0.665 micros/op; (each op is 1000 loads)

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=194002392
2018-04-23 16:22:30 -07:00
costan
d177a0263c Replace port_posix with port_stdcxx.
The porting layer implements threading primitives: atomic pointers,
condition variables, mutexes, thread-safe initialization. These are all
specified in C++11, so the reference open source port implementation can
become platform-independent.

The porting layer will remain in place to allow the use of other
implementations with more features, such as the built-in deadlock
detection in abseil's Mutex.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=193245934
2018-04-17 13:26:47 -07:00
costan
8046a51b21 Add forgotten <limits> header to util/logging.cc.
Commit a0008deb679480fd30e845d7e52421af72160c2c introduced
std::numeric_limits usage in logging.cc, but didn't #include <limits>

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=192840190
2018-04-13 16:21:07 -07:00
costan
a0008deb67 Reimplement ConsumeDecimalNumber.
The old implementation caused odd crashes on ARM, which were fixed by
changing a local variable type. The main suspect is the use of a static
local variable. This CL replaces the static local variable with
constexpr, which still ensures the compiler sees the expressions as
constants.

The CL also replaces Slice operations in the functions' inner loop with
iterator-style pointer operations, which can help the compiler generate
less code.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=192832175
2018-04-13 15:37:20 -07:00
costan
1f7dd5d5f6 Add tests for ConsumeDecimalNumber.
ConsumeDecimalNumber has fairly non-trivial logic, and a previous
version has crashed inexplicably on Android. Having some test coverage
will make it easier to tweak / simplify the function later on.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=192821751
2018-04-13 15:36:55 -07:00
costan
09217fd067 Replace NULL with nullptr in C++ files.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=192365747
2018-04-10 16:26:43 -07:00
costan
0db30413a4 leveldb: Add more thread safety annotations.
After this CL, all classes with Mutex members should be covered by annotations. Exceptions are atomic members, which shouldn't need locking, and DBImpl members that cause errors when annotated, which will be tackled separately.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=190260865
2018-03-23 12:56:14 -07:00
costan
0fa5a4f7b1 Extend thread safety annotations.
This CL makes it easier to reason about thread safety by:

1) Adding Clang thread safety annotations according to comments.
2) Expanding a couple of variable names, without adding extra lines of code.
3) Adding const in a couple of places.
4) Replacing an always-non-null const pointer with a reference.
5) Fixing style warnings in the modified files.

This CL does not annotate the DBImpl members that claim to be protected
by the instance mutex, but are accessed without the mutex being held.
Those members (and their unprotected accesses) will be addressed in
future CLs.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=189354657
2018-03-16 10:32:40 -07:00
costan
8143c12f3f Fix includes in util/testharness.h.
This CL removes unused headers included by util/testharness.h, adds
precise includes where the build breaks, and fixes style errors in the
edited files.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=189331061
2018-03-16 10:31:48 -07:00
costan
aece2068d7 Remove extern from function declarations.
External linkage is the default for function declarations in C++.

This also fixes ClangTidy errors generated by removing the "extern"
keyword as described above.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=188730416
2018-03-12 09:24:48 -07:00
costan
542590d2a8 leveldb: Include <algorithm> in util/env_test.cc.
CL 170738066 introduced std::min and std::max to env_test.cc. These
require the <algorithm> header.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171024062
2017-10-04 10:38:33 -07:00
costan
8ae7998aab Fix FD leak in POSIX Env.
Deleting a PosixWritableFile without calling Close() leaks the file
descriptor. While the API description in include/leveldb/env.h does not
specify whether the caller is responsible for Close()ing the file before
deleting it, all other Env file implementations do release underlying
resources when destroyed, even if Close() is not called.

The leak shows up when running db_tests on Mac Travis, or on a vanilla
MacOS install.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170906843
2017-10-03 13:45:04 -07:00
costan
d9a9e02edf leveldb: Add tests for CL 170769101.
This also removes std::unique_ptr introduced in CL 170738066, because
it's C++11-only, and the open source version still supports older
versions at the moment.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170876919
2017-10-03 11:32:02 -07:00
costan
4447f9cace Remove handling for unused LRUHandle representation special case.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170876103
2017-10-03 11:32:02 -07:00
sanjay
2372ac574f Fix file writing bug in CL 170738066.
If the file already existed, we should have truncated it. This was not
detected by leveldb tests since leveldb code avoids reusing same files,
but there was code elsewhere that was directly using leveldb files and
relying on this behavior.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170769101
2017-10-03 11:32:02 -07:00
cmumford
1c75e88055 Fix use of uninitialized value in LRUHandle.
If leveldb::Options::block_cache is set to a cache of zero capacity
then it is possible for LRUHandle::next to be used without having been
set.

Conditional jump or move depends on uninitialised value(s):
  leveldb::(anonymous namespace)::LRUHandle::key() const (cache.cc:58)
  leveldb::(anonymous namespace)::LRUCache::Unref(leveldb::(anonymous namespace)::LRUHandle*) (cache.cc:234)
  leveldb::(anonymous namespace)::LRUCache::Release(leveldb::Cache::Handle*) (cache.cc:266)
  leveldb::(anonymous namespace)::ShardedLRUCache::Release(leveldb::Cache::Handle*) (cache.cc:375)
  leveldb::CacheTest::Insert(int, int, int) (cache_test.cc:59)

This bug forced a commit reversion in Chromium. For more information see
https://bugs.chromium.org/p/chromium/issues/detail?id=761398#c4

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170749054
2017-10-03 11:30:48 -07:00
sanjay
7e12c00ecf Fix issue 474: a race between the f*_unlocked() STDIO calls in
env_posix.cc and concurrent application calls to fflush(NULL).

The fix is to avoid using stdio in env_posix.cc but add our own
buffering where we need it.

Added a test to reproduce the bug.

Added a test for Env reads/writes.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170738066
2017-10-03 11:27:09 -07:00
costan
bcd9a8ea4a Use portable CRC32C from google/crc32c.
Benchmark results below. More results at
354d61ef97.

New, MacBookPro13,3 with Core i7 6920HQ:
LevelDB:    version 1.20
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
WARNING: Snappy compression is not enabled
------------------------------------------------
fillseq      :       2.952 micros/op;   37.5 MB/s
fillsync     :      43.932 micros/op;    2.5 MB/s (1000 ops)
fillrandom   :       3.856 micros/op;   28.7 MB/s
overwrite    :       4.053 micros/op;   27.3 MB/s
readrandom   :       4.234 micros/op; (1000000 of 1000000 found)
readrandom   :       3.923 micros/op; (1000000 of 1000000 found)
readseq      :       0.201 micros/op;  550.8 MB/s
readreverse  :       0.356 micros/op;  310.6 MB/s
compact      :  436800.000 micros/op;
readrandom   :       2.375 micros/op; (1000000 of 1000000 found)
readseq      :       0.151 micros/op;  734.3 MB/s
readreverse  :       0.298 micros/op;  370.7 MB/s
fill100K     :     554.075 micros/op;  172.1 MB/s (1000 ops)
crc32c       :       1.393 micros/op; 2805.0 MB/s (4K per op)
snappycomp   :    3902.000 micros/op; (snappy failure)
snappyuncomp :    3821.000 micros/op; (snappy failure)
acquireload  :      13.088 micros/op; (each op is 1000 loads)

Baseline, MacBookPro13,3 with Core i7 6920HQ:
LevelDB:    version 1.20
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
WARNING: Snappy compression is not enabled
------------------------------------------------
fillseq      :       3.000 micros/op;   36.9 MB/s
fillsync     :      46.721 micros/op;    2.4 MB/s (1000 ops)
fillrandom   :       3.922 micros/op;   28.2 MB/s
overwrite    :       4.080 micros/op;   27.1 MB/s
readrandom   :       4.409 micros/op; (1000000 of 1000000 found)
readrandom   :       3.895 micros/op; (1000000 of 1000000 found)
readseq      :       0.190 micros/op;  582.4 MB/s
readreverse  :       0.413 micros/op;  267.6 MB/s
compact      :  441076.000 micros/op;
readrandom   :       2.308 micros/op; (1000000 of 1000000 found)
readseq      :       0.170 micros/op;  651.2 MB/s
readreverse  :       0.302 micros/op;  366.2 MB/s
fill100K     :     614.289 micros/op;  155.3 MB/s (1000 ops)
crc32c       :       3.547 micros/op; 1101.2 MB/s (4K per op)
snappycomp   :    3393.000 micros/op; (snappy failure)
snappyuncomp :    3171.000 micros/op; (snappy failure)
acquireload  :      12.761 micros/op; (each op is 1000 loads)

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170100372
2017-09-26 18:50:41 -07:00
cmumford
09a3c8e741 Switched variable type from int to uint64_t in ConsumeDecimalNumber.
An Android test was occasionally crashing with a SEGV in ConsumeDecimalNumber
Switching a local variable from an int to uint64_t eliminated these crashes.
Speculating this is either a compiler, runtime library, or emulator issue.

Switching this type to uint64_t also eliminates a compiler warning
about comparing an int with a uint64_t.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=166399695
2017-08-24 15:40:54 -07:00
costan
8415f00eee leveldb: Report missing CURRENT manifest file as database corruption.
BTRFS reorders rename and write operations, so it is possible that a filesystem crash and recovery results in a situation where the file pointed to by CURRENT does not exist. DB::Open currently reports an I/O error in this case. Reporting database corruption is a better hint to the caller, which can attempt to recover the database or erase it and start over.

This issue is not merely theoretical. It was reported as having showed up in the wild at https://github.com/google/leveldb/issues/195 and at https://crbug.com/738961. Also, asides from the BTRFS case described above, incorrect data in CURRENT seems like a possible corruption case that should be handled gracefully.

The Env API changes here can be considered backwards compatible, because an implementation that returns Status::IOError instead of Status::NotFound will still get the same functionality as before.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161432630
2017-07-10 14:14:00 -07:00
costan
f3f139737c Separate Env tests from PosixEnv tests.
env_test.cc defines EnvPosixTest which tests the Env implementation returned by Env::Default(). The naming is a bit unfortunate, as the tests in env_test.cc are written against the Env contract, and therefore are applicable to any Env implementation. An instance of the confusion caused by the naming is [] which added a dependency from env_test.cc to EnvPosixTestHelper, which is closely coupled to EnvPosix.

This change disentangles EnvPosix-specific test code into a env_posix_test.cc file. The code there uses EnvPosixTestHelper and specifically targets the EnvPosix implementation. env_test.cc now implements EnvTest, and contains tests that are also applicable to other ports, which may define their own Env implementation.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=148914642
2017-03-01 13:53:23 -08:00
costan
ea175e28f8 Implement support for Intel crc32 instruction (SSE 4.2)
This change authored by vadimskipin and submitted via:

    https://github.com/google/leveldb/pull/309

Changes made to support iOS builds and other architectures
without support for SSE 4.2.

db_bench reports original crc32 speed at:

    crc32c : 3.610 micros/op; 1082.0 MB/s (4K per op)

with this change performance has increased to:

    crc32c : 0.843 micros/op; 4633.6 MB/s (4K per op)

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=148694935
2017-02-28 14:08:46 -08:00
cmumford
95cd743e5e Including <limits> for std::numeric_limits.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=146841327
2017-02-09 14:09:51 -08:00
cmumford
646c3588de Limit the number of read-only files the POSIX Env will have open.
Background compaction can create an unbounded number of
leveldb::RandomAccessFile instances. On 64-bit systems mmap is used and
file descriptors are only used beyond a certain number of mmap's.
32-bit systems to not use mmap at all. leveldb::RandomAccessFile does not
observe Options.max_open_files so compaction could exhaust the file
descriptor limit.

This change uses getrlimit to determine the maximum number of open
files and limits RandomAccessFile to approximately 20% of that value.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=143505556
2017-01-04 09:13:20 -08:00
corrado
a2fb086d07 Add option for max file size. The currend hard-coded value of 2M is inefficient in colossus.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=134391640
2016-09-28 10:52:24 -07:00
m3b
06a191b8de fix problems in LevelDB's caching code
Background:

LevelDB uses a cache (util/cache.h, util/cache.cc) of (key,value)
pairs for two purposes:
- a cache of (table, file handle) pairs
- a cache of blocks

The cache places the (key,value) pairs in a reference-counted
wrapper.  When it returns a value, it returns a reference to this
wrapper.  When the client has finished using the reference and
its enclosed (key,value), it calls Release() to decrement the
reference count.

Each (key,value) pair has an associated resource usage.  The
cache maintains the sum of the usages of the elements it holds,
and removes values as needed to keep the sum below a capacity
threshold.  It maintains an LRU list so that it will remove the
least-recently used elements first.

The max_open_files option to LevelDB sets the size of the cache
of (table, file handle) pairs.  The option is not used in any
other way.

The observed behaviour:

If LevelDB at any time used more file handles concurrently than
the cache size set via max_open_files, it attempted to reduce the
number by evicting entries from the table cache.  This could
happen most easily during compaction, and if max_open_files was
low.  Because the handles were in use, their reference count did
not drop to zero, and so the usage sum in the cache was not
modified by the evictions.  Subsequent Insert() calls returned
valid handles, but their entries were immediately evicted from
the cache, which though empty still acted as though full.  As a
result, there was effectively no caching, and the number of open
file handles rose []ly until it hit system-imposed limits and
the process died.

If one set max_open_files lower, the cache was more likely to
exhibit this beahviour, and cause the process to run out of file
descriptors.  That is, max_open_files acted in almost exactly the
opposite manner from what was intended.

The problems:

1. The cache kept all elements on its LRU list eligible for capacity
   eviction---even those with outstanding references from clients.  This was
   ineffective in reducing resource consumption because there was an
   outstanding reference, guaranteeing that the items remained.  A secondary
   issue was that there is no guarantee that these in-use items will be the
   last things reached in the LRU chain, which actually recorded
   "least-recently requested" rather than "least-recently used".

2. The sum of usages was decremented not when a (key,value) was evicted from
   the cache, but when its reference count went to zero.  Thus, when things
   were removed from the cache, either by garbage collection or via Erase(),
   the usage sum was not necessarily decreased.  This allowed the cache to act
   as though full when it was in fact not, reducing caching effectiveness, and
   leading to more resources being consumed---the opposite of what the
   evictions were intended to achieve.

3. (minor) The cache's clients insert items into it by first looking up the
   key, and inserting only if no value is found.  Although the cache has an
   internal lock, the clients use no locking to ensure atomicity of the
   Lookup/Insert pair.  (see table/table.cc:  block_cache->Insert() and
   db/table_cache.cc:  cache_->Insert()).  Thus, if two threads Insert() at
   about the same time, they can both Lookup(), find nothing, and both
   Insert().  The second Insert() would evict the first value, leaving each
   thread with a handle on its own version of the data, and with the second
   version in the cache.  It would be better if both threads ended up with a
   handle on the same (key,value) pair, which implies it must be the first item
   inserted.  This suggests that Insert() should not replace an existing value.

   This can be made safe with current usage inside LeveDB itself, but this is
   not easy to change first because Cache is a public interface, so to change
   the semantics of an existing call might break things, second because Cache
   is an abstract virtual class, so adding a new abstract virtual method may
   break other implementations, and third, the new method "insert without
   replacing" cannot be implemented in terms of the existing methods, so cannot
   be implemented with a non-abstract default.   But fortunately, the effects
   of this issue are minor, so this issue is not fixed by this change.

The changes:

The assumption in the fixes is that it is always better to cache
entries unless removal from the cache would lead to deallocation.

Cache entries now have an "in_cache" boolean indicating whether
the cache has a reference on the entry.  The only ways that this can
become false without the entry being passed to its "deleter" are via
Erase(), via Insert() when an element with a duplicate key is inserted,
or on destruction of the cache.

The cache now keeps two linked lists instead of one.  All items
in the cache are in one list or the other, and never both.  Items
still referenced by clients but erased from the cache are in
neither list.  The lists are:
- in-use:  contains the items currently referenced by clients, in no particular
  order.  (This list is used for invariant checking.  If we removed the check,
  elements that would otherwise be on this list could be left as disconnected
  singleton lists.)
- LRU:  contains the items not currently referenced by clients, in LRU order

A new internal Ref() method increments the reference count.  If
incrementing from 1 to 2 for an item in the cache, it is moved
from the LRU list to the in-use list.

The Unref() call now moves things from the in-use list to the LRU
list if the reference count falls to 1, and the item is in the
cache.  It no longer adjusts the usage sum.  The usage sum now
reflects only what is in the cache, rather than including
still-referenced items that have been evicted.

The LRU_Append() now takes a "list" parameter so that it can be
used to append either to the LRU list or the in-use list.

Lookup() is modified to use the new Ref() call, rather than
adjusting the reference count and LRU chain directly.

Insert() eviction code is also modified to adjust the usage sum and the
in_cache boolean of the evicted elements.  Some LevelDB tests assume that there
will be no caching whatsoever if the cache size is set to zero, so this is
handled as a special case.

A new private method FinishErase() is factored out
with the common code from where items are removed from the cache.

Erase() is modified to adjust the usage sum and the in_cache
boolean of the erased elements, and to use FinishErase().

Prune() is modified to use FinishErase() also, and to make use of the fact that
the lru_ list now contains only items with reference count 1.

- EvictionPolicy is modified to test that an entry with an
outstanding handle is not evicted.  This test fails with the old cache.cc.

- A new test case UseExceedsCacheSize verifies that even when the
cache is overfull of entries with outstanding handles, none are
evicted.  This test fails with the old cache.cc, and is the key
issue that causes file descriptors to run out when the cache
size is set too small.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=123247237
2016-07-06 09:15:53 -07:00