This replaces the registration server, and adds dispatch to a delegate
on crash requests.
(As you are already aware) we went around in circles on trying to come
up with a slightly-too-fancy threading design. All of them seemed to
have problems when it comes to out of order events, and orderly
shutdown, so I've gone back to something not-too-fancy.
Two named pipe instances (that clients connect to) are created. These
are used only for registration (which should take <1ms), so 2 should be
sufficient to avoid any waits. When a client registers, we duplicate
an event to it, which is used to signal when it wants a dump taken.
The server registers threadpool waits on that event, and also on the
process handle (which will be signalled when the client process exits).
These requests (in particular the taking of the dump) are serviced
on the threadpool, which avoids us needing to manage those threads,
but still allows parallelism in taking dumps. On process termination,
we use an IO Completion Port to post a message back to the main thread
to request cleanup. This complexity is necessary so that we can
unregister the threadpool waits without being on the threadpool, which
we need to do synchronously so that we can be sure that no further
callbacks will execute (and expect to have the client data around
still).
In a followup, I will readd support for DumpWithoutCrashing -- I don't
think it will be too difficult now that we have an orderly way to
clean up client records in the server.
R=cpu@chromium.org, mark@chromium.org, jschuh@chromium.org
BUG=crashpad:1,crashpad:45
Review URL: https://codereview.chromium.org/1301853002 .
MachOImageReader::GetCrashpadInfo() expects the CrashpadInfo struct to
be the only thing in a __DATA,__crashpad_info section, and enforces this
by checking that the section’s size matches the size declared in the
struct’s size_ field.
Under AddressSanitizer, a red zone follows the structure. While not
reflected in the size of the structure, it is reflected in the size of
the section, causing MachOImageReader::GetCrashpadInfo() to reject the
CrashpadInfo on the assumption that something else is present in the
section.
By specifying an alignment greater than the minimum red zone size of 32
bytes, red zone generation can be suppressed.
TEST=crashpad_snapshot_test
BUG=crashpad:44
R=glider@chromium.org, rsesek@chromium.org
Review URL: https://codereview.chromium.org/1296523003 .
Calling std::vector<>::operator[]() with an out-of-range index argument
is undefined behavior. In two cases, Crashpad used &v[0] in situations
where it was known that the address would not be used. These calls were
wrapped in conditions guarding against vector emptiness.
While s[0] is valid on an empty string, in two cases, Crashpad used
&s[0] as an argument to a system call that would be a no-op. These calls
were wrapped in similar conditions to avoid the system call.
The two uses of vector with undefined behavior were caught by the
following tests in crashpad_snapshot_test with
UndefinedBehaviorSanitizer:
[ RUN ] CrashpadInfoClientOptions.OneModule
/Users/mark/compilatorium/llvm.build/bin/../include/c++/v1/vector:1493:12:
runtime error: reference binding to null pointer of type
'crashpad::process_types::section'
[ OK ] CrashpadInfoClientOptions.OneModule (72 ms)
[ RUN ] ProcessSnapshotMinidump.Empty
/Users/mark/compilatorium/llvm.build/bin/../include/c++/v1/vector:1493:12:
runtime error: reference binding to null pointer of type
'MINIDUMP_DIRECTORY'
[ OK ] ProcessSnapshotMinidump.Empty (1 ms)
The Crashpad codebase was audited by searching for resize() calls and
analyzing how resized strings and vectors are used.
TEST=*
BUG=
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1283243004 .
Found by -fsanitize=undefined:
[ RUN ] Launchd.CFPropertyToLaunchData_FloatingPoint
../../../util/mac/launchd_test.mm:82:33: runtime error: value
1.79769e+308 is outside the range of representable values of type
'float'
[ OK ] Launchd.CFPropertyToLaunchData_FloatingPoint (2 ms)
TEST=crashpad_util_test Launchd.CFPropertyToLaunchData_FloatingPoint
BUG=
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1302843004 .
Chromium builds with a newer clang than the Crashpad buildbot, and it
reports:
../../../handler/crash_report_upload_thread.cc:148:16: error: 'ThreadMain' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
virtual void ThreadMain() {
^
../../../util/thread/thread.h:46:16: note: overridden virtual function is here
virtual void ThreadMain() = 0;
^
1 error generated.
R=scottmg@chromium.org
Review URL: https://codereview.chromium.org/1302833002 .
Under asan, there are many more instructions than without. The “nearby
PC” check is much less useful, and would likely fail.
TEST=crashpad_client_test CaptureContext.CaptureContext
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1298943003 .
While not strictly asan-related, this bug was found while running tests
under asan. Evidently, strings are pooled differently in that build
configuration.
TEST=crashpad_util_test ExceptionPorts.*
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1291573004 .
HTTPTransport.Upload33k failed on Windows due to WinHTTP timing out. The
test server, http_transport_test_server.py, writes the entire request to
a stdout pipe, to be received by crashpad_util_test. crashpad_util_test
is also the HTTP client, and it does not attempt to read from this pipe
until the HTTP transaction is complete. http_transport_test_server.py
must not write to stdout until the transaction is complete, otherwise,
there is a risk of deadlock if the pipe buffer fills up. The new
Upload33k test sends a large request, which was filling up the pipe
buffer on Windows.
This also adds an Upload33k_LengthUnknown test variant to exercise a
large POST when the length is not known ahead of time. This more closely
matches how Crashpad crash uploads are done on OS X.
TEST=crashpad_util_test HTTPTransport.*
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1286173007 .
CFStream’s CFReadStreamGetBuffer() calls the Read() callback without
initializing at_eof. The callback function is responsible for setting it
on any successful read operation. See 10.10.2 CF-1152.14/CFStream.c.
By chance, at_eof seems to always have an initial value of false on
x86_64, but true on 32-bit x86. Crashpad’s Read() callback assumed that
the initial value was always false. The discrepancy caused truncation
and possibly hangs when a 32-bit process attempted to upload a request
body larger than 32kB, the buffer size used by NSMutableURLRequest or
something between it and CFReadStream.
A new test with more than 32kB of data is added.
As discussed in:
https://groups.google.com/a/chromium.org/d/topic/crashpad-dev/Vz--qMZJRPU
TEST=crashpad_util_test HTTPTransport.Upload33k
BUG=
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1304433004 .
After 6083a2706d55, it is possible to determine the expected size of a
versioned structure such as dyld_all_image_infos. The expected size is
compared against the actual size of the structure as returned by
task_info() (TASK_DYLD_INFO).
TEST=crashpad_snapshot_test
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1272283004 .
Rather than declaring ExpectedSizeForVersion() for all process_types
types and providing a default NOTREACHED() implementation, this only
declares it for process_types that request it by stating
PROCESS_TYPE_STRUCT_VERSIONED() in their proctype definition. This also
allows the argument to have the correct type, matching the type of the
struct’s version field.
TEST=crashpad_snapshot_test
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1274663005 .
The system’s crashreporter_annotations_t structure was always present
as version 4 since Mac OS X 10.7. In OS X 10.11, it is now present as
version 5. It has also grown from 56 to 64 bytes per otool examination
of CoreFoundation’s __DATA,__crash_info section. The extra 8 bytes are
presumed to be a new field at the end of the structure, although this
is not confirmed.
The existing MachOImageAnnotationsReader.CrashAbort test only validated
that the “message” field in crashreporter_annotations_t was recovered
correctly, but
MachOImageAnnotationsReader::ReadCrashReporterClientAnnotations() also
recovers the “message2” field. A new test,
MachOImageAnnotationsReader.CrashModuleInitialization, is added to
ensure that the “messgae2” field can be recovered properly.
This change will resolve warnings such as:
[pid:tid:yyyymmdd,hhmmss.uuuuuu:WARNING
mach_o_image_annotations_reader.cc:82] unexpected crash info version 5
in
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
BUG=crashpad:40
TEST=crashpad_snapshot_test MachOImageAnnotationsReader.CrashAbort,
MachOImageAnnotationsReader.CrashModuleInitialization
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1277513003 .
OS X 10.11 introduces System Integrity Protection. One facet of that
forbids code injection into system executables. A Crashpad test checks
that information can be recovered from dyld in early-launch crashes by
requesting dyld load a nonexistent library with DYLD_INSERT_LIBRARIES.
The executable was meaningless but a system-provided executable,
/usr/bin/true, was used for convenience.
This test hung on OS X 10.11 because DYLD_INSERT_LIBRARIES was ignored
for the system executable, and no crash occurred. The test waited for a
crash that would never come.
A custom no-op executable, crashpad_snapshot_test_no_op, is provided as
an executable that does work with DYLD_INSERT_LIBRARIES.
BUG=crashpad:41
TEST=crashpad_snapshot_test MachOImageAnnotationsReader.CrashDyld
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1276553005 .
The cl_kernels bug (Apple bug 20239912) in which cl_kernels modules show
up with an __LD,__compact_unwind section inside the __TEXT segment, is
still present in Mac OS X 10.11. This results in these warnings and a
failure to load the module:
[pid:tid:yyyymmdd,hhmmss.uuuuuu:WARNING
mach_o_image_segment_reader.cc:142] section.segname incorrect in
segment __TEXT, section __LD,__compact_unwind 3/6, load command 0x19
0/6, module cl_kernels, address 0x10e964000
BUG=crashpad:42
TEST=crashpad_snapshot_test ProcessReader.*Modules
R=rsesek@chromium.org
Review URL: https://codereview.chromium.org/1276573002 .
Both an SDK check and a runtime OS version check need to guard the use
of task_dyld_info_data_t::all_image_info_format. The SDK check, which
was already present, ensures that the field and macro constants are
present in the SDK. The runtime check is also necessary. This bug was
exposed in a 10.10 SDK and 10.6 deployment target build.
TEST=crashpad_snapshot_test ProcessTypes.DyldImagesSelf
BUG=chromium:463170
R=erikchen@chromium.org, rsesek@chromium.org
Review URL: https://codereview.chromium.org/1277523002 .
Now that we have a multiprocess test harness, add a test for
ProcessReaderWin for reading from a child.
Parent test code wasn't closing handles properly; fix that.
R=rsesek@chromium.org
BUG=crashpad:1
Review URL: https://codereview.chromium.org/1160843006
This test was added in https://codereview.chromium.org/1052813002. It
was previously checking the timestamp from in-memory module traversal
vs. the disk mtime. This is flaky (of course) because it depends on
the linker writing the header and closing the file during the same time
quantum. So the bots occasionally failed with:
[ RUN ] ProcessInfo.Self
e:\b\build\slave\chromium_win_dbg\build\crashpad\util\win\process_info_test.cc(86): error: Value of: GetTimestampForModule(GetModuleHandleW(nullptr))
Actual: 1431650338
Expected: modules[0].timestamp
Which is: 1431650337
Instead, use imagehlp to pull the timestamp out of the header so that
it matches the header value that will be the in-memory timestamp.
R=cpu@chromium.orgTBR=mark@chromium.org
Review URL: https://codereview.chromium.org/1139103003
Retrieve context and save to thread context. NtQueryInformationThread
is no longer required (right now?) because to retrieve the CONTEXT, the
thread needs to be Suspend/ResumeThread'd anyway, and the return value
of SuspendThread is the previous SuspendCount.
I haven't handle the x86 case yet -- that would ideally be via
Wow64GetThreadContext (I think) but unfortunately that's Vista+, so I'll
likely need to to a bit of fiddling to get that sorted out. (It's actually
likely going to be NtQueryInformationThread again, but one thing at a
time for now.)
R=cpu@chromium.org, rsesek@chromium.orgTBR=mark@chromium.org
BUG=crashpad:1
Review URL: https://codereview.chromium.org/1133203002
The next big piece of functionality in snapshot. There's a bit more
grubbing around in the NT internals than would be nice, and it has
made me start to question the value avoiding MinidumpWriteDump. But
this seems to extract most of the data we need (I haven't pulled
the cpu context yet, but I hope that won't be too hard.)
R=mark@chromium.org
BUG=crashpad:1
Review URL: https://codereview.chromium.org/1131473005