The "file-limit" annotation will be used to confirm the theory that
certain crashes are caused by systems at or near their file descriptor
table size limits.
The annotation records the system-wide kern.num_files and kern.maxfiles
values, and the process-specific current and maximum file descriptor
limits.
The annotation will be set on crashpad_handler startup, and will be
refreshed every time an exception is handled and every time the upload
thread processes a pending report.
It’s expected that this annotation will be removed after enough data has
been collected to confirm the theory. However, the principle is useful
enough that we may want to provide this feature more generally under
bugs 19 or 21.
Bug: crashpad:180
Change-Id: I3bb78fae60e0567bc4ac2625716e0abe0ddae08c
Reviewed-on: https://chromium-review.googlesource.com/479914
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Self-monitoring revealed this CHECK was being hit in the wild:
base::debug::BreakDebugger() debugger_posix.cc:260
logging::LogMessage::~LogMessage() logging.cc:759
logging::MachLogMessage::~MachLogMessage() mach_logging.cc:45
crashpad::ExceptionHandlerServer::Run() exception_handler_server.cc:108
crashpad::HandlerMain() handler_main.cc:744
The MACH_CHECK() was:
108 MACH_CHECK(mr == MACH_MSG_SUCCESS, mr) << "MachMessageServer::Run";
Crash reports captured the full message, including the value of mr:
[0418/015158.777231:FATAL:exception_handler_server.cc(108)] Check failed: mr == MACH_MSG_SUCCESS. MachMessageServer::Run: (ipc/send) invalid destination port (0x10000003)
0x10000003 = MACH_SEND_INVALID_DEST.
This can happen when attempting to send a Mach message to a dead name.
Send (and send-once) rights become dead names when the corresponding
receive right dies. This would not normally happen for exception
requests originating in the kernel. It can happen for requests
originating from a user task: when the user task dies, the receive right
dies with it. All it takes to trigger this CHECK() in crashpad_handler
is for a Crashpad client to die (or be killed) while the handler is
processing a SimulateCrash() that the client originated.
Accept MACH_SEND_INVALID_DEST as a valid return value for
MachMessageServer::Run().
Note that MachMessageServer’s test coverage was already aware of this
behavior. MachMessageServer::Run()’s documentation is updated to reflect
it too.
Change-Id: I483c065d3c5f9a7da410ef3ad54db45ee53aa3c2
Reviewed-on: https://chromium-review.googlesource.com/479093
Commit-Queue: Mark Mentovai <mark@chromium.org>
Reviewed-by: Robert Sesek <rsesek@chromium.org>
76a67a37b1d0 adds crashpad_handler’s --monitor-self argument, which
results in a second crashpad_handler instance running out of the same
database as the initial crashpad_handler instance that it monitors. The
two handlers start at nearly the same time, and will initially be on
precisely the same schedule for periodic tasks such as scanning for new
reports to upload and pruning the database. This is an unnecessary
duplication of effort.
This adds a new --no-periodic-tasks argument to crashpad_handler. When
the first instance of crashpad_handler starts a second to monitor it, it
will use this argument, which prevents the second instance from
performing these tasks.
When --no-periodic-tasks is in effect, crashpad_handler will still be
able to upload crash reports that it knows about by virtue of having
written them itself, but it will not scan the database for other pending
reports to upload.
Bug: crashpad:143
Test: crashpad_util_test ThreadSafeVector.ThreadSafeVector
Change-Id: I7b249dd7b6d5782448d8071855818f986b98ab5a
Reviewed-on: https://chromium-review.googlesource.com/473827
Reviewed-by: Robert Sesek <rsesek@chromium.org>
With multiple crashpad_handlers running out of the same database, it was
possible for more than one to attempt to upload the same report. Nothing
ensured that the reports remained pending between the calls to
CrashReportDatabaseMac::GetPendingReports() and
CrashReportDatabaseMac::GetReportForUploading().
The Windows equivalent did not share this bug, but it would return
kBusyError. kReportNotFound is a better code.
Test: crashpad_client_test CrashReportDatabaseTest.*
Change-Id: Ieaee7f94ca8e6f2606d000bd2ba508d3cfa2fe07
Reviewed-on: https://chromium-review.googlesource.com/473928
Reviewed-by: Robert Sesek <rsesek@chromium.org>
--monitor-self-annotations allows the Crashpad-using application to push
module-level annotations in to crashpad_handler. These annotations will
appear in any crash report written for that handler by --monitor-self.
Bug: crashpad:143
Change-Id: If47395da75a90be4f4bdce0630ce95ea93f9fcf3
Reviewed-on: https://chromium-review.googlesource.com/467746
Reviewed-by: Scott Graham <scottmg@chromium.org>
https://crbug.com/678959 added “fallback” crash reporting for
crashpad_handler on Windows, in a Chrome- and Windows-specific way. This
implements a more general self-monitor mechanism that will work on
multiple platforms and in the absence of Chrome.
When starting crashpad_handler (let’s call it the “first instance”) with
--monitor-self, it will start another crashpad_handler (the “second
instance”). The second instance monitors the first one for crashes. The
second instance will be started in mostly the same way as the first
instance, except --monitor-self will not be provided to the second
instance.
Bug: crashpad:143
Change-Id: I76f3f47d1762d8ecae1814357cb672c8b7bd5e95
Reviewed-on: https://chromium-review.googlesource.com/466267
Reviewed-by: Sigurður Ásgeirsson <siggi@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
This supports the “double handler” or “double handler with low
probability” models from https://crashpad.chromium.org/bug/143.
For crashpad_handler to be become its own client, it needs access to its
own executable path to pass to CrashpadClient::StartHandler(). This was
formerly available in the test-only test::Paths::Executable(). Bring
that function’s implementation to the non-test Paths::Executable() in
util/misc, and rename test::Paths to test::TestPaths to avoid future
confusion.
test::TestPaths must still be used to access TestDataRoot(), which does
not make any sense to non-test code.
test::TestPaths::Executable() is retained for use by tests, which most
likely prefer the fatal semantics of that function. Paths::Executable()
is not fatal because for the purposes of implementing the double
handler, a failure to locate the executable path (which may happen on
some systems in deeply-nested directory hierarchies) shouldn’t cause the
initial crashpad_handler to abort, even if it does prevent a second
crashpad_handler from being started.
Bug: crashpad:143
Test: crashpad_util_test Paths.*, crashpad_test_test TestPaths.*
Change-Id: I9f75bf61839ce51e33c9f7c0d7031cebead6a156
Reviewed-on: https://chromium-review.googlesource.com/466346
Reviewed-by: Scott Graham <scottmg@chromium.org>
Commit-Queue: Mark Mentovai <mark@chromium.org>
This is like 270490ff79df, but for things run by end_to_end_test.py, and
things run for it by crash_other_program.exe.
Change-Id: Iabf3c762c50f41eb61ab31f714c646364196e745
Reviewed-on: https://chromium-review.googlesource.com/458822
Commit-Queue: Mark Mentovai <mark@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
ReadFile() attempted to continue reading after a short read. In most
cases, this is fine. However, ReadFile() would keep trying to fill a
partially-filled buffer until experiencing a 0-length read(), signaling
end-of-file. For certain weird file descriptors like terminal input, EOF
is an ephemeral condition, and attempting to read beyond EOF doesn’t
actually return 0 (EOF) provided that they remain open, it will block
waiting for more input. Consequently, ReadFile() and anything based on
ReadFile() had an undocumented and quirky interface, which was that any
short read that it returned (not an underlying short read) actually
indicated EOF.
This facet of ReadFile() was unexpected, so it’s being removed. The new
behavior is that ReadFile() will return an underlying short read. The
behavior of FileReaderInterface::Read() is updated in accordance with
this change.
Upon experiencing a short read, the caller can determine the best
action. Most callers were already prepared for this behavior. Outside of
util/file, only crashpad_database_util properly implemented EOF
detection according to previous semantics, and adapting it to new
semantics is trivial.
Callers who require an exact-length read can use the new
ReadFileExactly(), or the newly renamed LoggingReadFileExactly() or
CheckedReadFileExactly(). These functions will retry following a short
read. The renamed functions were previously called LoggingReadFile() and
CheckedReadFile(), but those names implied that they were simply
wrapping ReadFile(), which is not the case. They wrapped ReadFile() and
further, insisted on a full read. Since ReadFile()’s semantics are now
changing but these functions’ are not, they’re now even more distinct
from ReadFile(), and must be renamed to avoid confusion.
Test: *
Change-Id: I06b77e0d6ad8719bd2eb67dab93a8740542dd908
Reviewed-on: https://chromium-review.googlesource.com/456676
Reviewed-by: Robert Sesek <rsesek@chromium.org>
BUG=crashpad:165, chromium:696721
Change-Id: I85c6740955fdbdfd7f17208c095a4685e28bfacc
Reviewed-on: https://chromium-review.googlesource.com/448960
Commit-Queue: Sigurður Ásgeirsson <siggi@chromium.org>
Reviewed-by: Mark Mentovai <mark@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Use these utilities for signal handling in crashpad_handler
BUG=crashpad:30
TEST=crashpad_util_test Signals.*
Change-Id: I6c9a1de35c4a81b58d77768c4753bdba5ebea4df
Reviewed-on: https://chromium-review.googlesource.com/446917
Commit-Queue: Mark Mentovai <mark@chromium.org>
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Chromium has many build configurations. One important configuration
that’s not tested by its commit queue doesn’t use |condition| in
DLOG_IF(severity, condition) or any of the D*LOG_IF macros, resulting in
errors such as
…/handler/handler_main.cc:166:7: error: unused variable 'rv' [-Werror,-Wunused-variable]
int rv = sigaction(sig, &sa, nullptr);
^
BUG=chromium:695314
Change-Id: I09a57379e8276b5ffa7f8f81706581a802d76809
Reviewed-on: https://chromium-review.googlesource.com/446559
Reviewed-by: Robert Sesek <rsesek@chromium.org>
It could be useful to put our existing Crashpad.HandlerCrashed metrics
into context by getting a sense of handler starts, clean exits, and
other types of exits.
BUG=crashpad:100
Change-Id: I8982075158ea6d210eb2ddad678302e339a42192
Reviewed-on: https://chromium-review.googlesource.com/444124
Reviewed-by: Scott Graham <scottmg@chromium.org>
This adds zlib to Crashpad. By default in standalone Crashpad builds,
the system zlib will be used where available. A copy of Chromium’s zlib
(currently a slightly patched 1.2.11) is checked out via DEPS into
third_party for use on Windows, which does not have a system zlib.
zlib is used to produce gzip streams for HTTP upload request bodies sent
by crashpad_handler by default. The Content-Encoding: gzip header is set
for these compressed request bodies. Compression can be disabled for
upload to servers without corresponding decompression support by
starting crashpad_handler with the --no-upload-gzip option.
Most minidumps compress quite well with zlib. A size reduction of 90% is
not uncommon.
BUG=crashpad:157
TEST=crashpad_util_test GzipHTTPBodyStream.*:HTTPTransport.*
Change-Id: I99b86db3952c3685cd78f5dc858a60b54399c513
Reviewed-on: https://chromium-review.googlesource.com/438585
Reviewed-by: Robert Sesek <rsesek@chromium.org>
This installs signal handlers in the crashpad_handler process to log
these crashes via the Crashpad.HandlerCrash.ExceptionCode.Mac histogram.
This is roughly the same mechanism that’s used for Windows.
The signal handler tries fairly hard to avoid swallowing signals, so
that things appear to outside observers (including debuggers and crash
handlers) identically to how they would look if no signal handler was
present.
The signal handler uses a different mapping schema than the existing
Crashpad.ExceptionCode.Mac histogram for reasons explained in code
comments. Because the mappings should not overlap, the new values may be
added to the existing CrashpadMacExceptionCodes enum.
BUG=crashpad:100
Change-Id: I9b8bda1c59d0a180501c285cdc672840a54f5efc
Reviewed-on: https://chromium-review.googlesource.com/435451
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Previously, only the top-level exception code was reported via the
Crashpad.ExceptionCode.Mac histogram. Making this histogram work
(https://crbug.com/678720) has revealed that Chrome is triggering
EXC_RESOURCE exceptions at a rate in excess of 4x that of ordinary
crashes. These exceptions were not previously visible because they are
not uploaded unless the system treats them as fatal, which it does not
normally do absent an explicit request.
In order to learn more about the problem, this change augments the data
reported via the Crashpad.ExceptionCode.Mac histogram to report (at
least) second-level exception data. This means that we will no longer
see just EXC_RESOURCE, but potentially more useful information such as
EXC_RESOURCE / RESOURCE_TYPE_IO / FLAVOR_IO_PHYSICAL_WRITES. This also
applies to other exception types, so that the majority of crashes
currently falling into the EXC_CRASH bucket will now have additional
information decoded and will be reported as, for example, EXC_BAD_ACCESS
/ KERN_INVALID_ADDRESS, EXC_BAD_INSTRUCTION / EXC_I386_INVOP, and
EXC_CRASH / SIGABRT.
Because the old mechanism was only live (in an “it works” sense) for
several days, and the new mechanism does not overlap with histogram
values used by the old one, there’s no need to invent a new histogram
name.
BUG=chromium:684051
Change-Id: Ia0a372b4127f7b3b2e7dbbaac9304cce3b5aadfe
Reviewed-on: https://chromium-review.googlesource.com/430933
Reviewed-by: Scott Graham <scottmg@chromium.org>
I haven't been able to reproduce this locally, but we see errors in
crash dumps where the unloaded module list consists of a number of
modules with invalid names and implausible addresses. My assumption is
that RTL_UNLOAD_EVENT_TRACE isn't correct for some OS levels. Instead of
trying to finesse and test that, use RtlGetUnloadEventTraceEx() instead
of RtlGetUnloadEventTrace(), which returns an element size. (This
function is Vista+ which is why it wasn't used the first time around.)
R=mark@chromium.org
BUG=chromium:620175
Change-Id: I4d7080a03623276f9c1c038d6e7329af70e4a64c
Reviewed-on: https://chromium-review.googlesource.com/421564
Reviewed-by: Mark Mentovai <mark@chromium.org>
The strangest discovery relates to the # <h1> title in navbar.md.
Gitiles renders it small unless there’s a [home] reference, so use that.
This should only affect wrapping the site logo in the [home] link, but
it appears to control the size of the navbar title too. See
https://code.google.com/p/gitiles/issues/detail?id=130.
BUG=crashpad:138,gitiles:130
Change-Id: I11b3a79f045efa22358b3c3ef4b50ce2e6b3282e
Reviewed-on: https://chromium-review.googlesource.com/408458
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Most of the world, including the Chromium universe, seems to be
standardizing on Markdown for documentation. Markdown provides the
benefit of automatic rendering on Gitiles (Gerrit), and on GitHub
mirrors as well. Crashpad should fit in with its surroundings.
There are two quirks that I was unable to resolve.
- Markdown does not allow **emphasis** within a ```code fence```
region. In blocks showing interactive examples, the AsciiDoc
documentation used this to highlight what the user was expected to
type.
- Markdown does not have a “definition list” (<dl>). This would have
been nice in man pages for the Options and Exit Status sections.
In its place, I used unnumbered lists. This is a little ugly, but
it’s not the end of the world.
The new Markdown-formatted documentation is largely identical to the
AsciiDoc that it replaces. Minor editorial revisions were made.
References to Mac OS X now mention macOS, and tool man pages describing
tools that that access task ports now mention System Integrity
Protection (SIP).
The AppEngine-based https://crashpad.chromium.org/ app in doc/appengine
is still necessary to serve Doxygen-generated documentation. This app is
updated to redirect existing generated-HTML URLs to Gitiles’ automatic
Markdown rendering.
Scripts in doc/support are updated to adapt to this change. All AsciiDoc
support files in doc/support have been removed.
BUG=crashpad:138
Change-Id: I15ad423d5b7aa1b7aa2ed1d2cb72639eec7c81aa
Reviewed-on: https://chromium-review.googlesource.com/408256
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
The Windows 10 loader starts a few extra threads before main(). In a few
of the test cases, the tests were relying on thread ordering (generally,
the test thread being at index #1). Instead, use other signals to find
the correct thread to verify.
R=mark@chromium.org
Change-Id: Icb1f5a8fdf3a0ea6d82ab65960dbcb650965f269
Reviewed-on: https://chromium-review.googlesource.com/407073
Reviewed-by: Mark Mentovai <mark@chromium.org>
Not surprisingly,
"0x278,0x27c,0x280,0x274,0x288,0x978a70,0x978a80,0x978a90" is not a
valid directory to store metrics in.
Fortunately --metrics was processed before --initial-client-data in a
local build, otherwise this could have lurked for a long time. :(
R=mark@chromium.org
BUG=655788,656800
Change-Id: I3ac3d1b487f55ddf0172bac51f8d9efc411c3329
Reviewed-on: https://chromium-review.googlesource.com/406938
Reviewed-by: Mark Mentovai <mark@chromium.org>
Previously, StartHandler() launched the handler process, then connected
over a pipe to register for crash handling. Instead, the initial client
can create and inherit handles to the handler and pass those handle
values and other data (addresses, etc.) on the command line.
This should improve startup time as there's no need to synchronize with
the process at startup, and allows avoiding a call to CreateProcess()
directly in StartHandler(), which is important for registration for
crash reporting from DllMain().
Incidentally adds new utility functions for string/number conversion and
string splitting.
Note: API change; UseHandler() is removed for all platforms.
BUG=chromium:567850,chromium:656800
Change-Id: I1602724183cb107f805f109674c53e95841b24fd
Reviewed-on: https://chromium-review.googlesource.com/400015
Reviewed-by: Mark Mentovai <mark@chromium.org>
Three new metrics:
- counting upload success/failure;
- enum tracking the reason upload was skipped;
- enum describing how an upload got to the pending state.
R=mark@chromium.org, asvitkine@chromium.org
BUG=crashpad:100
Change-Id: I5e0cbc1ac3424e974f3a51560e5cdad484ffc038
Reviewed-on: https://chromium-review.googlesource.com/388855
Reviewed-by: Mark Mentovai <mark@chromium.org>
Includes mini_chromium DEPS roll for:
88e0a3e Add stub of sparse_histogram.h
R=mark@chromium.org
BUG=crashpad:100
Change-Id: I4c541a33be0f7f47e972af638d4765bd06682acf
Reviewed-on: https://chromium-review.googlesource.com/386385
Reviewed-by: Mark Mentovai <mark@chromium.org>
In order to allow on-demand uploads for crash reports, adding a
upload_explicitly_requested bit on 'pending' state and necessary support
for it.
BUG=chromium:620762
Change-Id: Ida38e483fe8d0e48eb5cbe95e8b8bfd96a2f8f00
Reviewed-on: https://chromium-review.googlesource.com/367328
Reviewed-by: Mark Mentovai <mark@chromium.org>
Reviewed-by: Robert Sesek <rsesek@chromium.org>
This switches the default behaviour of crashpad_handler.exe to be a
/subsystem:windows app, so that normal usage won't cause a console to be
popped up. At the same time, creates a copy of crashpad_handler.exe in
the output dir named crashpad_handler.com. The .com doesn't affect
normal operation, as the way StartHandler() uses CreateProcess()
requires a real path to a file. However, when run from a command prompt,
.com are found before .exe, so editbin the .com to be to a console app,
which will be run in preference to the exe when run as just
"crashpad_handler", as one tends to do from a command prompt when
debugging. That is:
d:\src\crashpad\crashpad\out\Debug>where crashpad_handler
d:\src\crashpad\crashpad\out\Debug\crashpad_handler.com
d:\src\crashpad\crashpad\out\Debug\crashpad_handler.exe
d:\src\crashpad\crashpad\out\Debug>crashpad_handler --help
Usage: crashpad_handler [OPTION]...
...
d:\src\crashpad\crashpad\out\Debug>crashpad_handler.exe --help
<no output>
d:\src\crashpad\crashpad\out\Debug>crashpad_handler.com --help
Usage: crashpad_handler.com [OPTION]...
...
We also use the .com file in test invocations so that output streams
will be visible.
R=mark@chromium.org
Change-Id: I1a27f88472d491b2a1d76e63c45e6415d9f679c0
Reviewed-on: https://chromium-review.googlesource.com/371578
Reviewed-by: Mark Mentovai <mark@chromium.org>
Because DumpAndCrashTargetProcess() suspends the process, the thread
suspend count is one too high for all threads other than the injection
one in the thread snapshots. Compensate for this when we detect this
type of exception.
BUG=crashpad:103
Change-Id: Ib77112fddf5324fc0e43f598604e56f77d67ff54
Reviewed-on: https://chromium-review.googlesource.com/340372
Reviewed-by: Mark Mentovai <mark@chromium.org>
Adds a new client API which allows causing an exception in another
process. This is accomplished by injecting a thread that calls
RaiseException(). A special exception code is used that indicates to the
handler that the exception arguments contain a thread id and exception
code, which are in turn used to fabricate an exception record. This is
so that the API can allow the client to "blame" a particular thread in
the target process.
The target process must also be a registered Crashpad client, as the
normal exception mechanism is used to handle the exception.
The injection of a thread is used instead of DebugBreakProcess() which
does not cause the UnhandledExceptionFilter() to be executed.
NtCreateThreadEx() is used in lieu of CreateRemoteThread() as it allows
passing of a flag which avoids calling DllMain()s. This is necessary to
allow thread creation to succeed even when the target process is
deadlocked on the loader lock.
BUG=crashpad:103
Change-Id: I797007bd2b1e3416afe3f37a6566c0cdb259b106
Reviewed-on: https://chromium-review.googlesource.com/339263
Reviewed-by: Mark Mentovai <mark@chromium.org>
Add a user-configurable cap on the amount of memory that is gathered by
dereferencing thread stacks. (SyzyAsan stores a tremendously large
number of pointers on the stack, so the dumps were ending up in the ~25M
range.)
Also reduce the range around pointers somewhat.
Change-Id: I6bce57d86bd2f6a796e1580c530909e089ec00ed
Reviewed-on: https://chromium-review.googlesource.com/338463
Reviewed-by: Mark Mentovai <mark@chromium.org>