At the moment, while performing a software single-step, gdbserver fails
to reinsert software single-step breakpoints for a LWP when
interrupted by a signal in another thread. This commit fixes this
problem by reinstalling software single-step breakpoints in
linux_process_target::resume_stopped_resumed_lwps in
gdbserver/linux-low.cc.
This bug was discovered due to a failing assert in maybe_hw_step()
in gdbserver/linux-low.cc. Looking at the backtrace revealed
that the caller was linux_process_target::resume_stopped_resumed_lwps.
I was uncertain whether the assert should still be valid when called
from that method, so I tried hoisting the assert from maybe_hw_step
to all callers except resume_stopped_resumed_lwps. But running the
new test case, described below, showed that merely eliminating the
assert for this case was NOT a good fix - a study of the log file for
the test showed that the single-step operation failed to occur.
Instead GDB (via gdbserver) stopped at the next breakpoint that was
hit.
Zhiyong Yan had proposed a fix which resinserted software single-step
breakpoints, albeit at a different location in linux-low.cc. Testing
revealed that, while running gdb.threads/pending-fork-event-detach,
the executable associated with that test would die due to a SIGTRAP
after the test program was detached. Examination of the core file(s)
showed that a breakpoint instruction had been left in program memory.
Test results were otherwise very good, so Zhiyong was definitely on
the right track!
This commit causes software single-step breakpoint(s) to be inserted
before the call to maybe_hw_step in resume_stopped_resumed_lwps. This
will cause 'has_single_step_breakpoints (thread)' to be true, so that
the assert in maybe_hw_step...
/* GDBserver must insert single-step breakpoint for software
single step. */
gdb_assert (has_single_step_breakpoints (thread));
...will no longer fail. And better still, the single-step breakpoints
are reinstalled, so that stepping will actually work, even when
interrupted.
The C code for the test case was loosely adapted from the reproducer
provided in Zhiyong's bug report for this problem. The .exp file was
copied from next-fork-other-thread.exp and then tweaked slightly. As
noted in a comment in next-fork-exec-other-thread.exp, I had to remove
"on" from the loop for non-stop as it was failing on all architectures
(including x86-64) that I tested. I have a feeling that it ought to
work, but this can be investigated separately and (re)enabled once it
works. I also increased the number of iterations for the loop running
the "next" commands. I've had some test runs which don't show the bug
until the loop counter exceeded 100 iterations. The C file for the
new test uses shorter delays than next-fork-other-thread.c though, so
it doesn't take overly long (IMO) to run this new test.
Running the new test on a Raspberry Pi w/ a 32-bit (Arm) kernel and
userland using a gdbserver build without the fix in this commit shows
the following results:
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=auto: i=12: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=on: i=9: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=off: i=18: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=auto: i=3: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=on: i=11: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=off: i=1: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=auto: non-stop=off: displaced-stepping=auto: i=1: next to break here
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=auto: non-stop=off: displaced-stepping=on: i=3: next to break here
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=auto: non-stop=off: displaced-stepping=off: i=1: next to break here
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=auto: i=47: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=on: i=57: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=off: non-stop=off: displaced-stepping=auto: i=1: next to break here
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=off: non-stop=off: displaced-stepping=on: i=10: next to break here
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=off: non-stop=off: displaced-stepping=off: i=1: next to break here
=== gdb Summary ===
# of unexpected core files 12
# of expected passes 3011
# of unexpected failures 14
Each of the 12 core files were caused by the failed assertion in
maybe_hw_step in linux-low.c. These correspond to 12 of the
unexpected failures.
When the tests are run using a gdbserver build which includes the fix
in this commit, the results are significantly better, but not perfect:
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=auto: i=143: next to other line
FAIL: gdb.threads/next-fork-exec-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=on: i=25: next to other line
=== gdb Summary ===
# of expected passes 10178
# of unexpected failures 2
I think that the two remaining failures are due to some different
problem. They are also racy - I've seen runs with no failures or only
one failure, but never more than two. Also, those runs were conducted
with the loop count in next-fork-exec-other-thread.exp set to 200.
During his testing of this fix and the new test case, Luis Machado
found that this test was taking a long time and asked about ways to
speed it up. I then conducted additional tests in which I gradually
reduced the loop count, timing each one, also noting the number of
failures. With the loop count set to 30, I found that I could still
reliably reproduce the failures that Zhiyong reported (in which, with
the proper settings, core files are created). But, with the loop
count set to 30, the other failures noted above were much less likely
to show up. Anyone wishing to investigate those other failures should
set the loop count back up to 200.
Running the new test on x86-64 and aarch64, both native and
native-gdbserver shows no failures.
Also, I see no regressions when running the entire test suite for
armv7l-unknown-linux-gnueabihf (i.e. the Raspberry Pi w/ 32-bit
kernel+userland) with --target_board=native-gdbserver. Additionally,
using --target_board=native-gdbserver, I also see no regressions for
the entire test suite for x86-64 and aarch64 running Fedora 38.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
Co-Authored-By: Zhiyong Yan <zhiyong.yan@windriver.com>
Tested-By: Zhiyong Yan <zhiyong.yan@windriver.com>
Tested-By: Luis Machado <luis.machado@arm.com>
Since commit 05c06f318f ("Linux: Access memory even if threads are
running"), GDB prefers pread64/pwrite64 to access inferior memory
instead of ptrace. That change broke reading shared libraries on
SPARC64 Linux, as reported by PR gdb/30525 ("gdb cannot read shared
libraries on SPARC64").
On SPARC64 Linux, surprisingly (to me), userspace shared libraries are
mapped at high 64-bit addresses:
(gdb) info sharedlibrary
Cannot access memory at address 0xfff80001002011e0
Cannot access memory at address 0xfff80001002011d8
Cannot access memory at address 0xfff80001002011d8
From To Syms Read Shared Object Library
0xfff80001000010a0 0xfff8000100021f80 Yes (*) /lib64/ld-linux.so.2
(*): Shared library is missing debugging information.
Those addresses are 64-bit addresses with the high bits set. When
interpreted as signed, they're negative.
The Linux kernel rejects pread64/pwrite64 if the offset argument of
type off_t (a signed type) is negative, which happens if the memory
address we're accessing has its high bit set. See
linux/fs/read_write.c sys_pread64 and sys_pwrite64 in Linux.
Thankfully, lseek does not fail in that situation. So the fix is to
use the 'lseek + read|write' path if the offset would be negative.
Fix this in both native GDB and GDBserver.
Tested on a SPARC64 GNU/Linux and x86-64 GNU/Linux.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30525
Change-Id: I79c724f918037ea67b7396fadb521bc9d1b10dc5
Tom de Vries pointed out that my earlier patch:
commit 873a185be2
Date: Fri Dec 16 07:56:57 2022 -0700
Don't use struct buffer in handle_qxfer_btrace
regressed gdb.btrace/reconnect.exp. I didn't notice this because I
did not have libipt installed.
This patch fixes the bug.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30169
Tested-By: Bruno Larsen <blarsen@redhat.com>
This patch doesn't change gdbserver behaviour, but after later changes are
made it avoids a null pointer dereference when HWCAP needs to be obtained
for a specific process while current_thread is nullptr.
Fixing linux_read_auxv, linux_get_hwcap and linux_get_hwcap2 to take a PID
parameter seems more correct than setting current_thread in one particular
code path.
Changes are propagated to allow passing the new parameter through the call
chain.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
Consider the executable from test-case gdb.base/interrupt-daemon.exp.
When starting it using gdbserver:
...
$ ./build/gdbserver/gdbserver localhost:2345 \
./outputs/gdb.base/interrupt-daemon/interrupt-daemon
...
and connecting to it using gdb:
...
$ gdb -q -ex "target remote localhost:2345" \
-ex "set follow-fork-mode child" \
-ex "break daemon_main" -ex cont
...
we are setup to do the same as in the test-case: interrupt a running inferior
using ^C.
So let's try:
...
(gdb) continue
Continuing.
^C
...
After pressing ^C, nothing happens. This a known problem, filed as
PR remote/18772.
The problem is that in linux_process_target::request_interrupt, a kill is used
to send a SIGINT, but it fails. And it fails silently.
Make the failure verbose by adding a warning, such that the gdbserver output
becomes more helpful:
...
Process interrupt-daemon created; pid = 15068
Listening on port 2345
Remote debugging from host ::1, port 35148
Detaching from process 15068
Detaching from process 15085
gdbserver: Sending SIGINT to process group of pid 15068 failed: \
No such process
...
Note that the failure can easily be reproduced using the test-case and target
board native-gdbserver:
...
(gdb) continue^M
Continuing.^M
PASS: gdb.base/interrupt-daemon.exp: fg: continue
^CFAIL: gdb.base/interrupt-daemon.exp: fg: ctrl-c stops process (timeout)
...
as reported in PR server/23382.
Tested on x86_64-linux.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Just a small optimization, it's not necessary to recompute lwp at each
iteration.
While at it, change the variable type to long, as ptid_t::lwp returns a
long.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I181670ce1f90b59cb09ea4899367750be2ad9105
Gdbserver unconditionally reports support for btrace packets. Do not
report the support, if the underlying target does not say it supports
it. Otherwise GDB would query the server with btrace-related packets
unnecessarily.
Currently, every internal_error call must be passed __FILE__/__LINE__
explicitly, like:
internal_error (__FILE__, __LINE__, "foo %d", var);
The need to pass in explicit __FILE__/__LINE__ is there probably
because the function predates widespread and portable variadic macros
availability. We can use variadic macros nowadays, and in fact, we
already use them in several places, including the related
gdb_assert_not_reached.
So this patch renames the internal_error function to something else,
and then reimplements internal_error as a variadic macro that expands
__FILE__/__LINE__ itself.
The result is that we now should call internal_error like so:
internal_error ("foo %d", var);
Likewise for internal_warning.
The patch adjusts all calls sites. 99% of the adjustments were done
with a perl/sed script.
The non-mechanical changes are in gdbsupport/errors.h,
gdbsupport/gdb_assert.h, and gdb/gdbarch.py.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
Introduce a new qXfer:libraries-svr4:read annex key/value pair
lmid=<namespace identifier>
to be used together with start and prev to provide the namespace of start
and prev to gdbserver.
Unknown key/value pairs are ignored by gdbserver so no new supports check
is needed.
Introduce a new library-list-svr4 library attribute
lmid
to provide the namespace of a library entry to GDB.
This implementation uses the address of a namespace's r_debug object as
namespace identifier.
This should have incremented the minor version but since unknown XML
attributes are ignored, anyway, and since changing the version results in
a warning from GDB, the version is left at 1.0.
When listing SVR4 shared libraries, special care has to be taken about the
first library in the default namespace as that refers to the main
executable. The load map address of this main executable is provided in
an attribute of the library-list-svr4 element.
Move that code from where we enumerate libraries inside a single namespace
to where we generate the rest of the library-list-svr4 element. This
allows us to complete the library-list-svr4 element inside one function.
There should be no functional change.
In glibc, the r_debug structure contains (amongst others) the following
fields:
int r_version:
Version number for this protocol. It should be greater than 0.
If r_version is 2, struct r_debug is extended to struct r_debug_extended
with one additional field:
struct r_debug_extended *r_next;
Link to the next r_debug_extended structure. Each r_debug_extended
structure represents a different namespace. The first r_debug_extended
structure is for the default namespace.
1. Change solib_svr4_r_map argument to take the debug base.
2. Add solib_svr4_r_next to find the link map in the next namespace from
the r_next field.
3. Update svr4_current_sos_direct to get the link map in the next namespace
from the r_next field.
4. Don't check shared libraries in other namespaces when updating shared
libraries in a new namespace.
5. Update svr4_same to check the load offset in addition to the name
6. Update svr4_default_sos to also set l_addr_inferior
7. Change the flat solib_list into a per-namespace list using the
namespace's r_debug address to identify the namespace.
Add gdb.base/dlmopen.exp to test this.
To remain backwards compatible with older gdbserver, we reserve the
namespace zero for a flat list of solibs from all namespaces. Subsequent
patches will extend RSP to allow listing libraries grouped by namespace.
This fixes PR 11839.
Co-authored-by: Lu, Hongjiu <hongjiu.lu@intel.com>
For every stop, Linux GDB and GDBserver save the stopped thread's PC,
in lwp->stop_pc. This is done in save_stop_reason, in both
gdb/linux-nat.c and gdbserver/linux-low.cc. However, while we're
going through the shell after "run", in startup_inferior, we shouldn't
be reading registers, as we haven't yet determined the target's
architecture -- the shell's architecture may not even be the same as
the final inferior's.
In gdb/linux-nat.c, lwp->stop_pc is only needed when the thread has
stopped for a breakpoint, and since when going through the shell, no
breakpoint is going to hit, we could simply teach save_stop_reason to
only record the stop pc when the thread stopped for a breakpoint.
However, in gdbserver/linux-low.cc, lwp->stop_pc is used in more cases
than breakpoint hits (e.g., it's used in tracepoints & the
"while-stepping" feature).
So to avoid GDB vs GDBserver divergence, we apply the same approach to
both implementations.
We set a flag in the inferior (process in GDBserver) whenever it is
being nursed through the shell, and when that flag is set,
save_stop_reason bails out early. While going through the shell,
we'll only ever get process exits (normal or signalled), random
signals, and exec events, so nothing is lost.
Change-Id: If0f01831514d3a74d17efd102875de7d2c6401ad
Running
$ ../gdbserver/gdbserver --once --attach :1234 539436
with ASan while /proc/sys/kernel/yama/ptrace_scope is set to 1 (prevents
attaching) shows that we fail to free some platform-specific objects
tied to the process_info (process_info_private and arch_process_info):
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f6b558b3fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x562eaf15d04a in xcalloc /home/simark/src/binutils-gdb/gdbserver/../gdb/alloc.c:100
#2 0x562eaf251548 in xcnew<process_info_private> /home/simark/src/binutils-gdb/gdbserver/../gdbsupport/poison.h:122
#3 0x562eaf22810c in linux_process_target::add_linux_process_no_mem_file(int, int) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:426
#4 0x562eaf22d33f in linux_process_target::attach(unsigned long) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:1132
#5 0x562eaf1a7222 in attach_inferior /home/simark/src/binutils-gdb/gdbserver/server.cc:308
#6 0x562eaf1c1016 in captured_main /home/simark/src/binutils-gdb/gdbserver/server.cc:3949
#7 0x562eaf1c1d60 in main /home/simark/src/binutils-gdb/gdbserver/server.cc:4084
#8 0x7f6b552f630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)
Indirect leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x7f6b558b3fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x562eaf15d04a in xcalloc /home/simark/src/binutils-gdb/gdbserver/../gdb/alloc.c:100
#2 0x562eaf2a0d79 in xcnew<arch_process_info> /home/simark/src/binutils-gdb/gdbserver/../gdbsupport/poison.h:122
#3 0x562eaf295e2c in x86_target::low_new_process() /home/simark/src/binutils-gdb/gdbserver/linux-x86-low.cc:723
#4 0x562eaf22819b in linux_process_target::add_linux_process_no_mem_file(int, int) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:428
#5 0x562eaf22d33f in linux_process_target::attach(unsigned long) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:1132
#6 0x562eaf1a7222 in attach_inferior /home/simark/src/binutils-gdb/gdbserver/server.cc:308
#7 0x562eaf1c1016 in captured_main /home/simark/src/binutils-gdb/gdbserver/server.cc:3949
#8 0x562eaf1c1d60 in main /home/simark/src/binutils-gdb/gdbserver/server.cc:4084
#9 0x7f6b552f630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)
Those objects are deleted by linux_process_target::mourn, but that is
not called if we fail to attach, we only call remove_process. I
initially fixed this by making linux_process_target::attach call
linux_process_target::mourn on failure (before calling error). But this
isn't done anywhere else (including in GDB) so it would just be
confusing to do things differently here.
Instead, add a linux_process_target::remove_linux_process helper method
(which calls remove_process), and call that instead of remove_process in
the Linux target. Move the free-ing of the extra data from the mourn
method to that new method.
Change-Id: I277059a69d5f08087a7f3ef0b8f1792a1fcf7a85
Similarly to how the native Linux target was changed
and subsequently reworked in these commits:
05c06f318f Linux: Access memory even if threads are running
8a89ddbda2 Avoid /proc/pid/mem races (PR 28065)
... teach GDBserver to access memory even when the current thread is
running, by always accessing memory via /proc/PID/mem.
The existing comment:
/* Neither ptrace nor /proc/PID/mem allow accessing memory through a
running LWP. */
... is incorrect for /proc/PID/mem does allow that.
Actually, from GDB's perspective, GDBserver could already access
memory while threads were running, but at the expense of pausing all
threads for the duration of the memory access, via
prepare_to_access_memory. This new implementation does not require
pausing any thread, thus
linux_process_target::prepare_to_access_memory /
linux_process_target::done_accessing_memory become nops. A subsequent
patch will remove the whole prepare_to_access_memory infrastructure
completely.
The GDBserver linux-low.cc implementation is simpler than GDB's
linux-nat.c's, because GDBserver always adds the unfollowed vfork/fork
children to the process list immediately when the fork/vfork event is
seen out of ptrace. I.e., there's no need to keep the file descriptor
stored on a side map, we can store it directly in the process
structure.
Change-Id: I0abfd782ceaa4ddce8d3e5f3e2dfc5928862ef61
The test introduced by the following patch would sometimes fail in this
configuration:
FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=auto: i=14: next to for loop
The test has multiple threads constantly forking or vforking while the
main thread keep doing "next"s.
(After writing the commit message, I realized this also fixes a similar
failure in gdb.threads/forking-threads-plus-breakpoint.exp with the
native-gdbserver and native-extended-gdbserver boards.)
As stop_all_threads is called, because the main thread finished its
"next", it inevitably happens at some point that we ask the remote
target to stop a thread and wait() reports that this thread stopped with
a fork or vfork event, instead of the SIGSTOP we sent to try to stop it.
While running this test, I attached to GDBserver and stopped at
linux-low.cc:3626. We can see that the status pulled from the kernel
for 2742805 is indeed a vfork event:
(gdb) p/x w
$3 = 0x2057f
(gdb) p WIFSTOPPED(w)
$4 = true
(gdb) p WSTOPSIG(w)
$5 = 5
(gdb) p/x (w >> 8) & (PTRACE_EVENT_VFORK << 8)
$6 = 0x200
However, the statement at line 3626 overrides that:
ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
OURSTATUS becomes "stopped by a SIGTRAP". The information about the
fork or vfork is lost.
It's then all downhill from there, stop_all_threads eventually asks for
a thread list update. That thread list includes the child of that
forgotten fork or vfork, the remote target goes "oh cool, a new process,
let's attach to it!", when in fact that vfork child's destiny was to be
detached.
My reverse-engineered understanding of the code around there is that the
if/else between lines 3562 and 3583 (in the original code) makes sure
OURSTATUS is always initialized (not "ignore"). Either the details are
already in event_child->waitstatus (in the case of fork/vfork, for
example), in which case we just copy event_child->waitstatus to
ourstatus. Or, if the event is a plain "stopped by a signal" or a
syscall event, OURSTATUS is set to "stopped", but without a signal
number. Lines 3601 to 3629 (in the original code) serve to fill in that
last bit of information.
The problem is that when `w` holds the vfork status, the code wrongfully
takes this branch, because WSTOPSIG(w) returns SIGTRAP:
else if (current_thread->last_resume_kind == resume_stop
&& WSTOPSIG (w) != SIGSTOP)
The intent of this branch is, for example, when we sent SIGSTOP to try
to stop a thread, but wait() reports that it stopped with another signal
(that it must have received from somewhere else simultaneously), say
SIGWINCH. In that case, we want to report the SIGWINCH. But in our
fork/vfork case, we don't want to take this branch, as the thread didn't
really stop because it received a signal. For the non "stopped by a
signal" and non "syscall signal" cases, we would ideally skip over all
that snippet that fills in the signal or syscall number.
The fix I propose is to move this snipppet of the else branch of the
if/else above. In addition to moving the code, the last two "else if"
branches:
else if (current_thread->last_resume_kind == resume_stop
&& WSTOPSIG (w) != SIGSTOP)
{
/* A thread that has been requested to stop by GDB with vCont;t,
but, it stopped for other reasons. */
ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
}
else if (ourstatus->kind () == TARGET_WAITKIND_STOPPED)
ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
are changed into a single else:
else
ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
This is the default path we take if:
- W is not a syscall status
- W does not represent a SIGSTOP that have sent to stop the thread and
therefore want to suppress it
Change-Id: If2dc1f0537a549c293f7fa3c53efd00e3e194e79
I see some failures, at least in gdb.multi/multi-re-run.exp and
gdb.threads/interrupted-hand-call.exp. Running `stress -C $(nproc)` at
the same time as the test makes those tests relatively frequent.
Let's take gdb.multi/multi-re-run.exp as an example. The failure looks
like this, an unexpected "no resumed":
continue
Continuing.
No unwaited-for children left.
(gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=2: iter=1: continue until exit
The situation is:
- Inferior 1 is stopped somewhere, it won't really play a role here.
- Inferior 2 has 2 threads, both stopped.
- We resume inferior 2, the leader thread is expected to exit, making
the process exit.
From GDB's perspective, a failing run looks like this:
[infrun] fetch_inferior_event: enter
[infrun] scoped_disable_commit_resumed: reason=handling event
[infrun] do_target_wait: Found 2 inferiors, starting at #1
[infrun] random_pending_event_thread: None found.
[remote] wait: enter
[remote] Packet received: T0506:20dcffffff7f0000;07:20dcffffff7f0000;10:9551555555550000;thread:pae4cd.ae4cd;core:e;
[remote] wait: exit
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: 713933.713933.0 [Thread 713933.713933],
[infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP
[infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP
[infrun] clear_step_over_info: clearing step over info
[infrun] context_switch: Switching context from 0.0.0 to 713933.713933.0
[infrun] handle_signal_stop: stop_pc=0x555555555195
[infrun] start_step_over: enter
[infrun] start_step_over: stealing global queue of threads to step, length = 0
[infrun] operator(): step-over queue now empty
[infrun] start_step_over: exit
[infrun] process_event_stop_test: no stepping, continue
[remote] Sending packet: $Z0,555555555194,1#8e
[remote] Packet received: OK
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [713933.713933.0] at 0x555555555195
[remote] Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a
[remote] Packet received: OK
[remote] Sending packet: $vCont;c:pae4cd.-1#9f
[infrun] prepare_to_wait: prepare_to_wait
[infrun] reset: reason=handling event
[infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote
[infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote
[infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote
[infrun] fetch_inferior_event: exit
[infrun] fetch_inferior_event: enter
[infrun] scoped_disable_commit_resumed: reason=handling event
[infrun] do_target_wait: Found 2 inferiors, starting at #0
[infrun] random_pending_event_thread: None found.
[remote] wait: enter
[remote] Packet received: N
[remote] wait: exit
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: -1.0.0 [process -1],
[infrun] print_target_wait_results: status->kind = NO_RESUMED
[infrun] handle_inferior_event: status->kind = NO_RESUMED
[remote] Sending packet: $Hgp0.0#ad
[remote] Packet received: OK
[remote] Sending packet: $qXfer:threads:read::0,1000#92
[remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n
[infrun] stop_waiting: stop_waiting
[remote] Sending packet: $qXfer:threads:read::0,1000#92
[remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n
[infrun] infrun_async: enable=0
[infrun] reset: reason=handling event
[infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote
[infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote
[infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote
[infrun] fetch_inferior_event: exit
We can see that we resume the inferior with vCont;c, but got NO_RESUMED.
When the test passes, we get an EXITED status to indicate the process
has exited.
From GDBserver's point of view, it looks like this. The logs contain
some logging I added and that are part of this patch.
[remote] getpkt: getpkt ("vCont;c:pae4cf.-1"); [no ack sent]
[threads] resume: enter
[threads] thread_needs_step_over: Need step over [LWP 713931]? Ignoring, should remain stopped
[threads] thread_needs_step_over: Need step over [LWP 713932]? Ignoring, should remain stopped
[threads] get_pc: pc is 0x555555555195
[threads] thread_needs_step_over: Need step over [LWP 713935]? No, no breakpoint found at 0x555555555195
[threads] get_pc: pc is 0x7ffff7d35a95
[threads] thread_needs_step_over: Need step over [LWP 713936]? No, no breakpoint found at 0x7ffff7d35a95
[threads] resume: Resuming, no pending status or step over needed
[threads] resume_one_thread: resuming LWP 713935
[threads] proceed_one_lwp: lwp 713935
[threads] resume_one_lwp_throw: continue from pc 0x555555555195
[threads] resume_one_lwp_throw: Resuming lwp 713935 (continue, signal 0, stop not expected)
[threads] resume_one_lwp_throw: NOW ptid=713935.713935.0 stopped=0 resumed=0
[threads] resume_one_thread: resuming LWP 713936
[threads] proceed_one_lwp: lwp 713936
[threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95
[threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected)
[threads] resume_one_lwp_throw: ptrace errno = 3 (No such process)
[threads] resume: exit
[threads] wait_1: enter
[threads] wait_1: [<all threads>]
[threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK
[threads] resume_stopped_resumed_lwps: resuming stopped-resumed LWP LWP 713935.713936 at 7ffff7d35a95: step=0
[threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95
[threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected)
[threads] resume_one_lwp_throw: ptrace errno = 3 (No such process)
[threads] operator(): check_zombie_leaders: leader_pid=713931, leader_lp!=NULL=1, num_lwps=2, zombie=0
[threads] operator(): check_zombie_leaders: leader_pid=713935, leader_lp!=NULL=1, num_lwps=2, zombie=1
[threads] operator(): Thread group leader 713935 zombie (it exited, or another thread execd).
[threads] delete_lwp: deleting 713935
[threads] wait_for_event_filtered: exit (no unwaited-for LWP)
sigchld_handler
[threads] wait_1: ret = null_ptid, TARGET_WAITKIND_NO_RESUMED
[threads] wait_1: exit
What happens is:
- We resume the leader (713935) successfully.
- The leader exits.
- We resume the secondary thread (713936), we get ESRCH. This is
expected this the leader has exited.
- resume_one_lwp_throw throws, it's caught by resume_one_lwp.
- resume_one_lwp checks with check_ptrace_stopped_lwp_gone that the
failure can be explained by the LWP becoming zombie, and swallows the
error.
- Note that this means that the secondary lwp still has stopped==1.
- wait_1 is called, probably because linux_process_target::resume marks
the async pipe at the end.
- The exit event isn't ready yet, probably because the machine is under
load, so waitpid returns nothing.
- check_zombie_leaders detects that the leader is zombie and deletes
- We try to find a resumed (non-stopped) LWP to get an event from,
there's none since the leader (that was resumed) is now deleted, and
the secondary thread is still marked stopped.
wait_for_event_filtered returns -1, causing wait_1 to return
NO_RESUMED.
What I notice here is that there is some kind of race between the
availability of the process' exit notification and the call to wait_1
that results from marking the async pipe at the end of resume.
I think what we want from this wait_1 invocation is to keep waiting, as
we will eventually get thread exit notifications for both of our
threads.
The fix I came up with is to mark the secondary thread as !stopped (or
resumed) when we fail to resume it. This makes wait_1 see that there is
at least one resume lwp, so it won't return NO_RESUMED. I think this
makes sense to consider it resumed, because we are going to receive an
exit event for it. Here's the GDBserver logs with the fix applied:
[threads] resume: enter
[threads] thread_needs_step_over: Need step over [LWP 724595]? Ignoring, should remain stopped
[threads] thread_needs_step_over: Need step over [LWP 724596]? Ignoring, should remain stopped
[threads] get_pc: pc is 0x555555555195
[threads] thread_needs_step_over: Need step over [LWP 724597]? No, no breakpoint found at 0x555555555195
[threads] get_pc: pc is 0x7ffff7d35a95
[threads] thread_needs_step_over: Need step over [LWP 724598]? No, no breakpoint found at 0x7ffff7d35a95
[threads] resume: Resuming, no pending status or step over needed
[threads] resume_one_thread: resuming LWP 724597
[threads] proceed_one_lwp: lwp 724597
[threads] resume_one_lwp_throw: continue from pc 0x555555555195
[threads] resume_one_lwp_throw: Resuming lwp 724597 (continue, signal 0, stop not expected)
[threads] resume_one_lwp_throw: NOW ptid=724597.724597.0 stopped=0 resumed=0
[threads] resume_one_thread: resuming LWP 724598
[threads] proceed_one_lwp: lwp 724598
[threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95
[threads] resume_one_lwp_throw: Resuming lwp 724598 (continue, signal 0, stop not expected)
[threads] resume_one_lwp_throw: ptrace errno = 3 (No such process)
[threads] resume: exit
[threads] wait_1: enter
[threads] wait_1: [<all threads>]
sigchld_handler
[threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK
[threads] operator(): check_zombie_leaders: leader_pid=724595, leader_lp!=NULL=1, num_lwps=2, zombie=0
[threads] operator(): check_zombie_leaders: leader_pid=724597, leader_lp!=NULL=1, num_lwps=2, zombie=1
[threads] operator(): Thread group leader 724597 zombie (it exited, or another thread execd).
[threads] delete_lwp: deleting 724597
[threads] wait_for_event_filtered: sigsuspend'ing
sigchld_handler
[threads] wait_for_event_filtered: waitpid(-1, ...) returned 724598, ERRNO-OK
[threads] wait_for_event_filtered: waitpid 724598 received 0 (exited)
[threads] filter_event: 724598 exited
[threads] wait_for_event_filtered: waitpid(-1, ...) returned 724597, ERRNO-OK
[threads] wait_for_event_filtered: waitpid 724597 received 0 (exited)
[threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK
sigchld_handler
[threads] wait_1: ret = LWP 724597.724598, exited with retcode 0
[threads] wait_1: exit
Change-Id: Idf0bdb4cb0313f1b49e4864071650cc83fb3c100
Same as the previous patch, but for GDBserver.
In summary, the current zombie leader detection code in linux-low.cc
has a race -- if a multi-threaded inferior exits just before
check_zombie_leaders finds that the leader is now zombie via checking
/proc/PID/status, check_zombie_leaders deletes the leader, assuming we
won't get an event for that exit (which we won't in some scenarios,
but not in this one), which is a false-positive scenario, where the
whole process is simply exiting. Later when we see the last LWP in
our list exit, we report that LWP's exit status as exit code, even
though for the (real) parent process, the exit code that counts is the
child's leader thread's exit code.
Like for GDB, the solution here is to:
- only report whole-process exit events for the leader.
- re-add the leader back to the LWP list when we finally see it
exit.
Change-Id: Id2d7bbb51a415534e1294fff1d555b7ecaa87f1d
This fixes the indentation of
linux_process_target::check_zombie_leaders, which will help with
keeping its comments in sync with the gdb/linux-nat.c counterpart.
Change-Id: I37332343bd80423d934249e3de2d04feefad1891
Reorganize linux-low.cc:linux_process_target::filter_event such that
all the handling for events for LWPs not in the LWP list is together.
This helps make a following patch clearer. The comments and debug
messages have also been tweaked to have them synchronized with the GDB
counterpart.
Change-Id: If9019635f63a846607cfda44b454b4254a404019
Turning on debug output in gdbserver leads to an assertion failure if
gdbserver reports a non-signal event:
[threads] wait_1: LWP 3273770: extended event with waitstatus status->kind = EXECD, execd_pathname = gdb.threads/non-ldr-exc-1/non-ldr-exc-1
[threads] wait_1: Hit a non-gdbserver trap event.
../../src/gdbserver/../gdb/target/waitstatus.h:365: A problem internal to GDBserver has been detected.
sig: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind == TARGET_WAITKIND_SIGNALLED' failed.
Fix it in the obvious way, using target_waitstatus::to_string(),
resulting in, for example:
[threads] wait_1: ret = LWP 1542412.1542412, status->kind = STOPPED, sig = GDB_SIGNAL_TRAP
Change-Id: Ia4832f9b4fa39f4af67fcaf21fd4d909a285a645
On some systems, the gnulib configuration will decide to define open
and/or close as macros to replace the POSIX C functions. This
interferes with using those names in C++ class or namespace scopes.
gdbsupport/
* event-pipe.cc (event_pipe::open): Renamed to ...
(event_pipe::open_pipe): ... this.
(event_pipe::close): Renamed to ...
(event_pipe::close_pipe): ... this.
* event-pipe.h (class event_pipe): Updated.
gdb/
* inf-ptrace.h (async_file_open, async_file_close): Updated.
gdbserver/
* gdbserver/linux-low.cc (linux_process_target::async): Likewise.
The enable_btrace target method takes a ptid_t to identify the thread on
which tracing shall be enabled.
Change this to thread_info * to avoid translating back and forth between
the two. This will be used in a subsequent patch.
Add the threads_debug_printf and THREADS_SCOPED_DEBUG_ENTER_EXIT, which
use the logging infrastructure from gdbsupport/common-debug.h. Replace
all debug_print uses that are predicated by debug_threads with
threads_dethreads_debug_printf. Replace uses of the debug_enter and
debug_exit macros with THREADS_SCOPED_DEBUG_ENTER_EXIT, which serves
essentially the same purpose, but allows showing what comes between the
enter and the exit in an indented form.
Note that "threads" debug is currently used for a bit of everything in
GDBserver, not only threads related stuff. It should ideally be cleaned
up and separated logically as is done in GDB, but that's out of the
scope of this patch.
Change-Id: I2d4546464462cb4c16f7f1168c5cec5a89f2289a
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
Replace the direct assignments to current_thread with
switch_to_thread. Use scoped_restore_current_thread when appropriate.
There is one instance remaining in linux-low.cc's wait_for_sigstop.
This will be handled in a separate patch.
Regression-tested on X86-64 Linux using the native-gdbserver and
native-extended-gdbserver board files.
While working with pending fork events, I wondered what would happen if
the user detached an inferior while a thread of that inferior had a
pending fork event. What happens with the fork child, which is
ptrace-attached by the GDB process (or by GDBserver), but not known to
the core? Sure enough, neither the core of GDB or the target detach the
child process, so GDB (or GDBserver) just stays ptrace-attached to the
process. The result is that the fork child process is stuck, while you
would expect it to be detached and run.
Make GDBserver detach of fork children it knows about. That is done in
the generic handle_detach function. Since a process_info already exists
for the child, we can simply call detach_inferior on it.
GDB-side, make the linux-nat and remote targets detach of fork children
known because of pending fork events. These pending fork events can be
stored in:
- thread_info::pending_waitstatus, if the core has consumed the event
but then saved it for later (for example, because it got the event
while stopping all threads, to present an all-stop stop on top of a
non-stop target)
- thread_info::pending_follow: if we ran to a "catch fork" and we
detach at that moment
Additionally, pending fork events can be in target-specific fields:
- For linux-nat, they can be in lwp_info::status and
lwp_info::waitstatus.
- For the remote target, they could be stored as pending stop replies,
saved in `remote_state::notif_state::pending_event`, if not
acknowledged yet, or in `remote_state::stop_reply_queue`, if
acknowledged. I followed the model of remove_new_fork_children for
this: call remote_notif_get_pending_events to process /
acknowledge any unacknowledged notification, then look through
stop_reply_queue.
Update the gdb.threads/pending-fork-event.exp test (and rename it to
gdb.threads/pending-fork-event-detach.exp) to try to detach the process
while it is stopped with a pending fork event. In order to verify that
the fork child process is correctly detached and resumes execution
outside of GDB's control, make that process create a file in the test
output directory, and make the test wait $timeout seconds for that file
to appear (it happens instantly if everything goes well).
This test catches a bug in linux-nat.c, also reported as PR 28512
("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig()
const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind ==
TARGET_WAITKIND_SIGNALLED' failed.). When detaching a thread with a
pending event, get_detach_signal unconditionally fetches the signal
stored in the waitstatus (`tp->pending_waitstatus ().sig ()`). However,
that is only valid if the pending event is of type
TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit
would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean
the thread does not exist anymore, so we wouldn't be detaching it). Add
a condition in get_detach_signal to access the signal number only if the
wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0
instead (since the thread was not stopped with a signal to begin with).
Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to
verify that we consider events in pending stop replies in the remote
target. This test has many threads constantly forking, and we detach
from the program while the program is executing. That gives us some
chance that we detach while a fork stop reply is stored in the remote
target. To verify that we correctly detach all fork children, we ask
the parent to exit by sending it a SIGUSR1 signal and have it write a
file to the filesystem before exiting. Because the parent's main thread
joins the forking threads, and the forking threads wait for their fork
children to exit, if some fork child is not detach by GDB, the parent
will not write the file, and the test will time out. If I remove the
new remote_detach_pid calls in remote.c, the test fails eventually if I
run it in a loop.
There is a known limitation: we don't remove breakpoints from the
children before detaching it. So the children, could hit a trap
instruction after being detached and crash. I know this is wrong, and
it should be fixed, but I would like to handle that later. The current
patch doesn't fix everything, but it's a step in the right direction.
Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
This patch aims at fixing a bug where an inferior is unexpectedly
created when a fork happens at the same time as another event, and that
other event is reported to GDB first (and the fork event stays pending
in GDBserver). This happens for example when we step a thread and
another thread forks at the same time. The bug looks like (if I
reproduce the included test by hand):
(gdb) show detach-on-fork
Whether gdb will detach the child of a fork is on.
(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "parent".
(gdb) si
[New inferior 2]
Reading /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-while-fork-in-other-thread/step-while-fork-in-other-thread from remote target...
Reading /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-while-fork-in-other-thread/step-while-fork-in-other-thread from remote target...
Reading symbols from target:/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-while-fork-in-other-thread/step-while-fork-in-other-thread...
[New Thread 965190.965190]
[Switching to Thread 965190.965190]
Remote 'g' packet reply is too long (expected 560 bytes, got 816 bytes): ... <long series of bytes>
The sequence of events leading to the problem is:
- We are using the all-stop user-visible mode as well as the
synchronous / all-stop variant of the remote protocol
- We have two threads, thread A that we single-step and thread B that
calls fork at the same time
- GDBserver's linux_process_target::wait pulls the "single step
complete SIGTRAP" and the "fork" events from the kernel. It
arbitrarily choses one event to report, it happens to be the
single-step SIGTRAP. The fork stays pending in the thread_info.
- GDBserver send that SIGTRAP as a stop reply to GDB
- While in stop_all_threads, GDB calls update_thread_list, which ends
up querying the remote thread list using qXfer:threads:read.
- In the reply, GDBserver includes the fork child created as a result
of thread B's fork.
- GDB-side, the remote target sees the new PID, calls
remote_notice_new_inferior, which ends up unexpectedly creating a new
inferior, and things go downhill from there.
The problem here is that as long as GDB did not process the fork event,
it should pretend the fork child does not exist. Ultimately, this event
will be reported, we'll go through follow_fork, and that process will be
detached.
The remote target (GDB-side), has some code to remove from the reported
thread list the threads that are the result of forks not processed by
GDB yet. But that only works for fork events that have made their way
to the remote target (GDB-side), but haven't been consumed by the core
yet, so are still lingering as pending stop replies in the remote target
(see remove_new_fork_children in remote.c). But in our case, the fork
event hasn't made its way to the GDB-side remote target. We need to
implement the same kind of logic GDBserver-side: if there exists a
thread / inferior that is the result of a fork event GDBserver hasn't
reported yet, it should exclude that thread / inferior from the reported
thread list.
This was actually discussed a while ago, but not implemented AFAIK:
https://pi.simark.ca/gdb-patches/1ad9f5a8-d00e-9a26-b0c9-3f4066af5142@redhat.com/#thttps://sourceware.org/pipermail/gdb-patches/2016-June/133906.html
Implementation details-wise, the fix for this is all in GDBserver. The
Linux layer of GDBserver already tracks unreported fork parent / child
relationships using the lwp_info::fork_relative, in order to avoid
wildcard actions resuming fork childs unknown to GDB. This information
needs to be made available to the handle_qxfer_threads_worker function,
so it can filter the reported threads. Add a new thread_pending_parent
target function that allows the Linux target to return the parent of an
eventual fork child.
Testing-wise, the test replicates pretty-much the sequence of events
shown above. The setup of the test makes it such that the main thread
is about to fork. We stepi the other thread, so that the step completes
very quickly, in a single event. Meanwhile, the main thread is resumed,
so very likely has time to call fork. This means that the bug may not
reproduce every time (if the main thread does not have time to call
fork), but it will reproduce more often than not. The test fails
without the fix applied on the native-gdbserver and
native-extended-gdbserver boards.
At some point I suspected that which thread called fork and which thread
did the step influenced the order in which the events were reported, and
therefore the reproducibility of the bug. So I made the test try both
combinations: main thread forks while other thread steps, and vice
versa. I'm not sure this is still necessary, but I left it there
anyway. It doesn't hurt to test a few more combinations.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28288
Change-Id: I2158d5732fc7d7ca06b0eb01f88cf27bf527b990
Make target_waitstatus_to_string a "to_string" method of
target_waitstatus, a bit like we have ptid_t::to_string already. This
will save a bit of typing.
Change-Id: Id261b7a09fa9fa3c738abac131c191a6f9c13905
I wanted to write a warning that included two target_pid_to_str calls,
like this:
warning (_("Blabla %s, blabla %s"),
target_pid_to_str (ptid1),
target_pid_to_str (ptid2));
This doesn't work, because target_pid_to_str stores its result in a
static buffer, so my message would show twice the same ptid. Change
target_pid_to_str to return an std::string to avoid this. I don't think
we save much by using a static buffer, but it is more error-prone.
Change-Id: Ie3f649627686b84930529cc5c7c691ccf5d36dc2
I stumbled on a bug caused by the fact that a code path read
target_waitstatus::value::sig (expecting it to contain a gdb_signal
value) while target_waitstatus::kind was TARGET_WAITKIND_FORKED. This
meant that the active union field was in fact
target_waitstatus::value::related_pid, and contained a ptid. The read
signal value was therefore garbage, and that caused GDB to crash soon
after. Or, since that GDB was built with ubsan, this nice error
message:
/home/simark/src/binutils-gdb/gdb/linux-nat.c:1271:12: runtime error: load of value 2686365, which is not a valid value for type 'gdb_signal'
Despite being a large-ish change, I think it would be nice to make
target_waitstatus safe against that kind of bug. As already done
elsewhere (e.g. dynamic_prop), validate that the type of value read from
the union matches what is supposed to be the active field.
- Make the kind and value of target_waitstatus private.
- Make the kind initialized to TARGET_WAITKIND_IGNORE on
target_waitstatus construction. This is what most users appear to do
explicitly.
- Add setters, one for each kind. Each setter takes as a parameter the
data associated to that kind, if any. This makes it impossible to
forget to attach the associated data.
- Add getters, one for each associated data type. Each getter
validates that the data type fetched by the user matches the wait
status kind.
- Change "integer" to "exit_status", "related_pid" to "child_ptid",
just because that's more precise terminology.
- Fix all users.
That last point is semi-mechanical. There are a lot of obvious changes,
but some less obvious ones. For example, it's not possible to set the
kind at some point and the associated data later, as some users did.
But in any case, the intent of the code should not change in this patch.
This was tested on x86-64 Linux (unix, native-gdbserver and
native-extended-gdbserver boards). It was built-tested on x86-64
FreeBSD, NetBSD, MinGW and macOS. The rest of the changes to native
files was done as a best effort. If I forgot any place to update in
these files, it should be easy to fix (unless the change happens to
reveal an actual bug).
Change-Id: I0ae967df1ff6e28de78abbe3ac9b4b2ff4ad03b7
Add a constructor to initialize the waitstatus members. Initialize the
others in the class directly.
Change-Id: I10f885eb33adfae86e3c97b1e135335b540d7442
I wanted to find, and potentially modify, all the spots where the
'tid' parameter to the ptid_t constructor was used. So, I temporarily
removed this parameter and then rebuilt.
In order to make it simpler to search through the "real" (nonzero)
uses of this parameter, something I knew I'd have to do multiple
times, I removed any ", 0" from constructor calls.
Co-Authored-By: John Baldwin <jhb@FreeBSD.org>
Update gdbserver to check r_version < 1 instead of r_version != 1 so
that r_version can be bumped for a new field in the glibc debugger
interface to support multiple namespaces. Since so far, the gdbserver
only reads fields defined for r_version == 1, it is compatible with
r_version >= 1.
All future glibc debugger interface changes will be backward compatible.
If there is ever the need for backward incompatible change to the glibc
debugger interface, a new DT_XXX element will be provided to access the
new incompatible interface.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=11839
Instead of using a static buffer. This is safer, and we don't really
mind about any extra dynamic allocation here, since it's only used for
debug purposes.
gdb/ChangeLog:
* nat/linux-waitpid.c (status_to_str): Return std::string.
* nat/linux-waitpid.h (status_to_str): Likewise.
* linux-nat.c (linux_nat_post_attach_wait): Adjust.
(linux_nat_target::attach): Adjust.
(linux_handle_extended_wait): Adjust.
(wait_lwp): Adjust.
(stop_wait_callback): Adjust.
(linux_nat_filter_event): Adjust.
(linux_nat_wait_1): Adjust.
* nat/linux-waitpid.c (status_to_str): Adjust.
* nat/linux-waitpid.h (status_to_str): Adjust.
gdbserver/ChangeLog:
* linux-low.cc (linux_process_target::wait_for_event_filtered):
Adjust to status_to_str returning std::string.
Change-Id: Ia8aead70270438a5690f243e6faafff6c38ff757
Currently, in order to tell whether support for disabling address
space randomization on Linux is available, GDB checks if the
personality syscall works, at configure time. I.e., it does a run
test, instead of a compile/link test:
AC_RUN_IFELSE([PERSONALITY_TEST],
[have_personality=true],
[have_personality=false],
This is a bit bogus, because the machine the build is done on may not
(and is when you consider distro gdbs) be the machine that eventually
runs gdb. It would be better if this were a compile/link test
instead, and then at runtime, GDB coped with the personality syscall
failing. Actually, GDB already copes.
One environment where this is problematic is building GDB in a Docker
container -- by default, Docker runs the container with seccomp, with
a profile that disables the personality syscall. You can tell Docker
to use a less restricted seccomp profile, but I think we should just
fix it in GDB.
"man 2 personality" says:
This system call first appeared in Linux 1.1.20 (and thus first
in a stable kernel release with Linux 1.2.0); library support
was added in glibc 2.3.
...
ADDR_NO_RANDOMIZE (since Linux 2.6.12)
With this flag set, disable address-space-layout randomization.
glibc 2.3 was released in 2002.
Linux 2.6.12 was released in 2005.
The original patch that added the configure checks was submitted in
2008. The first version of the patch that was submitted to the list
called personality from common code:
https://sourceware.org/pipermail/gdb-patches/2008-June/058204.html
and then was moved to Linux-specific code:
https://sourceware.org/pipermail/gdb-patches/2008-June/058209.html
Since HAVE_PERSONALITY is only checked in Linux code, and
ADDR_NO_RANDOMIZE exists for over 15 years, I propose just completely
removing the configure checks.
If for some odd reason, some remotely modern system still needs a
configure check, then we can revert this commit but drop the
AC_RUN_IFELSE in favor of always doing the AC_LINK_IFELSE
cross-compile fallback.
gdb/ChangeLog:
* linux-nat.c (linux_nat_target::supports_disable_randomization):
Remove references to HAVE_PERSONALITY.
* nat/linux-personality.c: Remove references to HAVE_PERSONALITY.
(maybe_disable_address_space_randomization)
(~maybe_disable_address_space_randomizatio): Remove references to
HAVE_PERSONALITY.
* config.in, configure: Regenerate.
gdbserver/ChangeLog:
* linux-low.cc:
(linux_process_target::supports_disable_randomization): Remove
reference to HAVE_PERSONALITY.
* config.in, configure: Regenerate.
gdbsupport/ChangeLog:
* common.m4 (personality test): Remove.
Lancelot pointed out that since the refactor at:
https://sourceware.org/pipermail/gdb-patches/2015-January/120503.html
the sys/personality.h include is not needed in linux-low.cc anymore,
as it does not call personality directly itself anymore.
gdbserver/ChangeLog:
* linux-low.cc: Don't include sys/personality.h or define
ADDR_NO_RANDOMIZE.
Same as the previous patch, but for GDBserver. The return value of this
method is never used, change it to return void.
gdbserver/ChangeLog:
* linux-low.cc (linux_process_target::filter_event): Return
void.
* linux-low.h (class linux_process_target) <filter_event>:
Return void.
Change-Id: I79e5dc04d9b21b9f01c6d675fa463d1b1a703b3a
A following patch will add a new testcase that has two processes, each
with a number of threads constantly tripping a breakpoint and stepping
over it, because the breakpoint has a condition that evals false.
Then GDB detaches from one of the processes, while both processes are
running. And then the testcase sends a SIGUSR1 to the other process.
When run against gdbserver, that would occasionaly fail like this:
(gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach
Executing on target: kill -SIGUSR1 208303 (timeout = 300)
spawn -ignore SIGHUP kill -SIGUSR1 208303
Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 208303.208305]
0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54
54 counter++; /* Set breakpoint here. */
What happened was that GDBserver is doing a step-over for process A
when a detach request for process B arrives. And that generates a
spurious SIGTRAP report for process A, as seen above.
The GDBserver logs reveal what happened:
- GDB manages to detach while a step over is in progress. That reaches
linux_process_target::complete_ongoing_step_over(), which does:
/* Passing NULL_PTID as filter indicates we want all events to
be left pending. Eventually this returns when there are no
unwaited-for children left. */
ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,
__WALL);
As the comment say, this leaves all events pending, _including_ the
just finished step SIGTRAP. We never discard that SIGTRAP. So
GDBserver reports the SIGTRAP to GDB. GDB can't explain the
SIGTRAP, so it reports it to the user.
The GDBserver log looks like this. The LWP of interest is 208305:
Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227
proceed_all_lwps: found thread 208305 needing a step-over
Starting step-over on LWP 208305. Stopping all threads
208305 starts a step-over.
>>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)
Sending sigstop to lwp 208303
Sending sigstop to lwp 207755
wait_for_sigstop: pulling events
LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK
LLW: waitpid 207755 received Stopped (signal) (stopped)
pc is 0x7f7e045593bf
Expected stop.
LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.
LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK
LLW: waitpid 208303 received Stopped (signal) (stopped)
pc is 0x7ffff7e743bf
Expected stop.
LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
LLW: exit (no unwaited-for LWP)
stop_all_lwps done, setting stopping_threads back to !stopping
<<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)
Done stopping all threads for step-over.
pc is 0x555555555227
Writing 8b to 0x555555555227 in process 208305
Could not findsigchld_handler
fast tracepoint jump at 0x555555555227 in list (uninserting).
pending reinsert at 0x555555555227
step from pc 0x555555555227
Resuming lwp 208305 (step, signal 0, stop expected)
<<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
handling possible serial event
getpkt ("D;32b8b"); [no ack sent]
The detach request arrives.
sigchld_handler
Tracing is already off, ignoring
detach: step over in progress, finish it first
GDBserver realizes a step over for 208305 was in progress, let's it
finish.
LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
LLW: waitpid 208305 received Stopped (signal) (stopped)
pc is 0x555555555227
Expected stop.
LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)
pending reinsert at 0x555555555227
step from pc 0x555555555227
Resuming lwp 208305 (step, signal 0, stop not expected)
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
sigsuspend'ing
LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
LLW: waitpid 208305 received Trace/breakpoint trap (stopped)
pc is 0x55555555522a
CSBB: LWP 208303.208305 stopped by trace
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
LLW: exit (no unwaited-for LWP)
Finished step over.
The step-over for 208305 finishes.
Writing cc to 0x555555555227 in process 208305
Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).
>>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
stop_all_lwps (stop, except=none)
wait_for_sigstop: pulling events
The detach proceeds (snipped).
...
proceed_one_lwp: lwp 208305
LWP 208305 has pending status, leaving stopped
Later on, 208305 has a pending status (the step SIGTRAP from the
step-over), so GDBserver starts the process of reporting it.
...
wait_1 ret = LWP 208303.208305, 1, 5
<<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
...
and eventually GDB receives the stop notification (T05 == SIGTRAP):
getpkt ("vStopped"); [no ack sent]
sigchld_handler
vStopped: acking 3
Writing resume reply for LWP 208303.208305:1
putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]
From the GDB side, we see:
[infrun] fetch_inferior_event: enter
[infrun] fetch_inferior_event: fetch_inferior_event enter
[infrun] do_target_wait: Found 2 inferiors, starting at #1
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: 208303.208305.0 [Thread 208303.208305],
[infrun] print_target_wait_results: status->kind = stopped, signal = GDB_SIGNAL_TRAP
[infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP
[infrun] start_step_over: enter
[infrun] start_step_over: stealing global queue of threads to step, length = 6
[infrun] operator(): putting back 6 threads to step in global queue
[infrun] start_step_over: exit
[infrun] handle_signal_stop: context switch
[infrun] context_switch: Switching context from process 0 to Thread 208303.208305
[infrun] handle_signal_stop: stop_pc=0x55555555522a
[infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)
[infrun] stop_waiting: stop_waiting
[infrun] stop_all_threads: starting
The fix is to discard the step SIGTRAP, unless GDB wanted the thread
to step.
gdbserver/ChangeLog:
* linux-low.cc (linux_process_target::complete_ongoing_step_over):
Discard step SIGTRAP, unless GDB wanted the thread to step.
This commits the result of running gdb/copyright.py as per our Start
of New Year procedure...
gdb/ChangeLog
Update copyright year range in copyright header of all GDB files.