Closesgoogle/leveldb#320
During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.
This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.
SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.
The original change posted in https://github.com/google/leveldb/pull/339
has been modified to also include changed made by Chris Mumford<cmumford@google.com>
in 4b72cb14f8
1. Releasing snapshots during test cleanup to avoid
memory leak warnings.
2. Refactored test to use testutil.h to be in line
with other issue tests and to create the test
database in the correct temporary location.
3. Added copyright banner.
Otherwise, just minor formatting and limiting character
width to 80 characters.
Additionally the change was rebased on top of current master and
changes previously made to the Makefile were ported to the
CMakeLists.txt.
Testing Done:
A test program (issue320_test) was constructed that performs mutations
while snapshots are active. issue320_test fails without this bug fix
after 64k writes. It passes with this bug fix. It was run with 200M
writes and passed.
Unit tests were written for the new function that was added to the
code. Make test was run and seen to pass.
Signed-off-by: Richard Cole <richcole@amazon.com>
This change switches corruption_test, which previously used direct file
I/O to corrupt table files for open databases, to use InMemEnv. Using an
Env eliminates some platform dependencies thus simplifying the tests.
Also removed EnvWindowsTestHelper::RelaxFilePermissions(). This was
only added because the Windows Env opens files for exclusive access.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=239305329
This CL moves default values for
leveldb::{Options,ReadOptions,WriteOptions} from constructors to member
declarations, and removes now-redundant comments stating the defaults.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=239271242
This CL removes AtomicPointer from leveldb's port interface. Its usage is replaced with std::atomic<> from the C++11 standard library.
AtomicPointer was used to wrap flags, numbers, and pointers, so its instances are replaced with std::atomic<bool>, std::atomic<int>, std::atomic<size_t> and std::atomic<Node*>.
This CL does not revise the memory ordering. AtomicPointer's methods are replaced mechanically with their std::atomic equivalents, even when the underlying usage is incorrect. (Example: DBImpl::has_imm_ is written using release stores, even though it is always read using relaxed ordering.) Revising the memory ordering is left for future CLs.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=237865146
This change adds a native Windows port (port_windows.h) and a
Windows Env (WindowsEnv).
Note1: "small" is defined when including <Windows.h> so some
parameters were renamed to avoid conflict.
Note2: leveldb::Env defines the method: "DeleteFile" which is
also a constant defined when including <Windows.h>. The solution
was to ensure this macro is defined in env.h which forces
the function, when compiled, to be either DeleteFileA or
DeleteFileW when building for MBCS or UNICODE respectively.
This resolves#519 on GitHub.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=236364778
ssize_t is not standard C++. It is a POSIX extension. Therefore, it does
not belong in generic code.
This change tweaks the logic in DBIter to remove the need for signed
integers, so ssize_t can be replaced with size_t. The impacted method
and private member are renamed to better express their purpose.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=211471606
WriteBatchInternal has a method for efficiently concatenating two
WriteBatches. This commit exposes the method to the public API.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208724311
* Omit SnapshotImpl::list_ when assert() isn't on
* Make SnapshotImpl::number_ const and set it in the constructor
* Make SnapshotImpl::number_ private and access it via a getter
* Rename SnapshotImpl::number_ to SnapshotImpl::sequence_number_
* Rename SnapshotList::list_ to SnapshotList::head_
* Wrap casting from Snapshot* to SnapshotImpl* in ToSnapshotImpl()
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=194852828
After this CL, all classes with Mutex members should be covered by annotations. Exceptions are atomic members, which shouldn't need locking, and DBImpl members that cause errors when annotated, which will be tackled separately.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=190260865
This CL makes it easier to reason about thread safety by:
1) Adding Clang thread safety annotations according to comments.
2) Expanding a couple of variable names, without adding extra lines of code.
3) Adding const in a couple of places.
4) Replacing an always-non-null const pointer with a reference.
5) Fixing style warnings in the modified files.
This CL does not annotate the DBImpl members that claim to be protected
by the instance mutex, but are accessed without the mutex being held.
Those members (and their unprotected accesses) will be addressed in
future CLs.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=189354657
This CL removes unused headers included by util/testharness.h, adds
precise includes where the build breaks, and fixes style errors in the
edited files.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=189331061
External linkage is the default for function declarations in C++.
This also fixes ClangTidy errors generated by removing the "extern"
keyword as described above.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=188730416
This removes the use of the non-portable headers <sys/types.h> and <unistd.h> in c_test.c.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=188503102
When the max file size option was added in CL 134391640 the C API
was not modified to support this.
This change was contributed by GitHub user @olt and fixes issue #439.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=173466388
Deleted two unused assignments:
1. offset_in_block in Reader::SkipToInitialBlock().
2. in_fragmented_record in Reader::ReadRecord().
Reasons for the change:
1. offset_in_block is not read again after the if condition.
2. The kFullRecordType switch branch returns, so
in_fragmented_record isn't read again.
3. The kFirstType switch branch sets in_fragmented_record to
true after the if, so the write in the if is ignored.
Change contributed by @C0deAi on GitHub.
This fixes https://github.com/google/leveldb/issues/517
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172763897
env_posix.cc and concurrent application calls to fflush(NULL).
The fix is to avoid using stdio in env_posix.cc but add our own
buffering where we need it.
Added a test to reproduce the bug.
Added a test for Env reads/writes.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170738066
12 lines above, there is an "if (!s.ok()) { return s; }" block of code.
"s" is never modified between that block and the "if" removed by this
CL, so "s.ok()" must be true.
The code most likely intended to say "if (!builder->ok())", because the
builder->Add() call above can modify the TableBuilder's status, as a
side-effect. However, this approach would have required setting "s =
builder.status()" in the "else" branch, near the "builder.Abandon()"
call. So, removing the "if" outright is simpler than following that line
of thought.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=167326229
The dead code has been in the codebase since the initial commit and is
generating a compiler warning when used in Xcode.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164174594
BTRFS reorders rename and write operations, so it is possible that a filesystem crash and recovery results in a situation where the file pointed to by CURRENT does not exist. DB::Open currently reports an I/O error in this case. Reporting database corruption is a better hint to the caller, which can attempt to recover the database or erase it and start over.
This issue is not merely theoretical. It was reported as having showed up in the wild at https://github.com/google/leveldb/issues/195 and at https://crbug.com/738961. Also, asides from the BTRFS case described above, incorrect data in CURRENT seems like a possible corruption case that should be handled gracefully.
The Env API changes here can be considered backwards compatible, because an implementation that returns Status::IOError instead of Status::NotFound will still get the same functionality as before.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161432630
Markdown is more readable in a text editor and when hosted
on GitHub is more readable than HTML.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=148830423
Detailed description:
Suppose an input split is generated between two leveldb record blocks and the preceding block ends with null padding.
A reader that previously read at least 1 record within the first block (before encountering the padding) upon trying to read the next record, will successfully and correctly read the next logical record from the subsequent block, but will return a last record offset pointing to the padding in the first block.
When this happened in a [], it resulted in duplicate records being handled at what appeared to be different offsets that were separated by only a few bytes.
This behavior is only observed when at least 1 record was read from the first block before encountering the padding. If the initial offset for a reader was within the padding, the correct record offset would be reported, namely the offset within the second block.
The tests failed to catch this scenario/bug, because each read test only read a single record with an initial offset. This CL adds an explicit test case for this scenario, and modifies the test structure to read all remaining records in the test case after an initial offset is specified. Thus an initial offset that jumps to record #3, with 5 total records in the test file, will result in reading 2 records, and validating the offset of each of them in order to pass successfully.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115338487
1. Object files, libraries, and compiled executables are put
into subdirectories.
2. The shared library is linked from individual object files.
This provides for greater parallelism on large desktops
while at the same time making for easier builds on small
(i.e. embedded) systems. Fixes issue #279.
3. One program, db_bench, is compiled using the shared library.
4. The source file for "leveldbutil" was renamed from
leveldb_main.cc to leveldbutil.cc. This provides for simpler
makefile rules.
5. Because all targets placed the library (libleveldb.a) at the top
level, the last platform built (desktop/device) always overwrote
any prior artifact.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=113407013
The write operations in the table happens without holding the mutex
lock, but concurrent writes are avoided using "writers_" queue.
The Arena::MemoryUsage could access the blocks when write happens.
So, the memory usage is cached in atomic word and can be loaded
from any thread safely.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=107573379
The approximate RAM usage of the database is calculated from the memory
allocated for write buffers and the block cache. This is to give an
estimate of memory usage to leveldb clients.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=104222307
Prior implementation would always try to reuse the manifest, even if reuse_logs
was false (the default). This was missed because the stock
Env::NewAppendableFile implementation returns false forcing the creation of a
new log.
(Based on a suggestion by cmumford.)
"open" benchmark on my workstation speeds up significantly since we
can now avoid three fdatasync calls and a compaction per open:
Before: ~80000 microseconds
After: ~130 microseconds
Details:
(1) Added Options::reuse_logs (currently defaults to false) to control
new behavior. The intention is to change the default to true after some
baking.
(2) Added Env::NewAppendableFile() whose default implementation returns
a not-supported error.
(3) VersionSet::Recovery attempts to reuse the MANIFEST from which
it is recovering.
(4) DBImpl recovery attempts to reuse the last log file and memtable.
(5) db_test.cc now tests a new configuration that sets reuse_logs to true.
(6) fault_injection_test also tests a reuse_logs==true config.
(7) Added a new recovery_test.