mirror of
https://sourceware.org/git/binutils-gdb.git
synced 2025-02-23 13:21:43 +08:00
I see some failures, at least in gdb.multi/multi-re-run.exp and gdb.threads/interrupted-hand-call.exp. Running `stress -C $(nproc)` at the same time as the test makes those tests relatively frequent. Let's take gdb.multi/multi-re-run.exp as an example. The failure looks like this, an unexpected "no resumed": continue Continuing. No unwaited-for children left. (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=2: iter=1: continue until exit The situation is: - Inferior 1 is stopped somewhere, it won't really play a role here. - Inferior 2 has 2 threads, both stopped. - We resume inferior 2, the leader thread is expected to exit, making the process exit. From GDB's perspective, a failing run looks like this: [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at #1 [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: T0506:20dcffffff7f0000;07:20dcffffff7f0000;10:9551555555550000;thread:pae4cd.ae4cd;core:e; [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 713933.713933.0 [Thread 713933.713933], [infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP [infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP [infrun] clear_step_over_info: clearing step over info [infrun] context_switch: Switching context from 0.0.0 to 713933.713933.0 [infrun] handle_signal_stop: stop_pc=0x555555555195 [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] process_event_stop_test: no stepping, continue [remote] Sending packet: $Z0,555555555194,1#8e [remote] Packet received: OK [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [713933.713933.0] at 0x555555555195 [remote] Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a [remote] Packet received: OK [remote] Sending packet: $vCont;c:pae4cd.-1#9f [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] fetch_inferior_event: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at #0 [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: N [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: -1.0.0 [process -1], [infrun] print_target_wait_results: status->kind = NO_RESUMED [infrun] handle_inferior_event: status->kind = NO_RESUMED [remote] Sending packet: $Hgp0.0#ad [remote] Packet received: OK [remote] Sending packet: $qXfer:threads:read::0,1000#92 [remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n [infrun] stop_waiting: stop_waiting [remote] Sending packet: $qXfer:threads:read::0,1000#92 [remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n [infrun] infrun_async: enable=0 [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] fetch_inferior_event: exit We can see that we resume the inferior with vCont;c, but got NO_RESUMED. When the test passes, we get an EXITED status to indicate the process has exited. From GDBserver's point of view, it looks like this. The logs contain some logging I added and that are part of this patch. [remote] getpkt: getpkt ("vCont;c:pae4cf.-1"); [no ack sent] [threads] resume: enter [threads] thread_needs_step_over: Need step over [LWP 713931]? Ignoring, should remain stopped [threads] thread_needs_step_over: Need step over [LWP 713932]? Ignoring, should remain stopped [threads] get_pc: pc is 0x555555555195 [threads] thread_needs_step_over: Need step over [LWP 713935]? No, no breakpoint found at 0x555555555195 [threads] get_pc: pc is 0x7ffff7d35a95 [threads] thread_needs_step_over: Need step over [LWP 713936]? No, no breakpoint found at 0x7ffff7d35a95 [threads] resume: Resuming, no pending status or step over needed [threads] resume_one_thread: resuming LWP 713935 [threads] proceed_one_lwp: lwp 713935 [threads] resume_one_lwp_throw: continue from pc 0x555555555195 [threads] resume_one_lwp_throw: Resuming lwp 713935 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: NOW ptid=713935.713935.0 stopped=0 resumed=0 [threads] resume_one_thread: resuming LWP 713936 [threads] proceed_one_lwp: lwp 713936 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] resume: exit [threads] wait_1: enter [threads] wait_1: [<all threads>] [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK [threads] resume_stopped_resumed_lwps: resuming stopped-resumed LWP LWP 713935.713936 at 7ffff7d35a95: step=0 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] operator(): check_zombie_leaders: leader_pid=713931, leader_lp!=NULL=1, num_lwps=2, zombie=0 [threads] operator(): check_zombie_leaders: leader_pid=713935, leader_lp!=NULL=1, num_lwps=2, zombie=1 [threads] operator(): Thread group leader 713935 zombie (it exited, or another thread execd). [threads] delete_lwp: deleting 713935 [threads] wait_for_event_filtered: exit (no unwaited-for LWP) sigchld_handler [threads] wait_1: ret = null_ptid, TARGET_WAITKIND_NO_RESUMED [threads] wait_1: exit What happens is: - We resume the leader (713935) successfully. - The leader exits. - We resume the secondary thread (713936), we get ESRCH. This is expected this the leader has exited. - resume_one_lwp_throw throws, it's caught by resume_one_lwp. - resume_one_lwp checks with check_ptrace_stopped_lwp_gone that the failure can be explained by the LWP becoming zombie, and swallows the error. - Note that this means that the secondary lwp still has stopped==1. - wait_1 is called, probably because linux_process_target::resume marks the async pipe at the end. - The exit event isn't ready yet, probably because the machine is under load, so waitpid returns nothing. - check_zombie_leaders detects that the leader is zombie and deletes - We try to find a resumed (non-stopped) LWP to get an event from, there's none since the leader (that was resumed) is now deleted, and the secondary thread is still marked stopped. wait_for_event_filtered returns -1, causing wait_1 to return NO_RESUMED. What I notice here is that there is some kind of race between the availability of the process' exit notification and the call to wait_1 that results from marking the async pipe at the end of resume. I think what we want from this wait_1 invocation is to keep waiting, as we will eventually get thread exit notifications for both of our threads. The fix I came up with is to mark the secondary thread as !stopped (or resumed) when we fail to resume it. This makes wait_1 see that there is at least one resume lwp, so it won't return NO_RESUMED. I think this makes sense to consider it resumed, because we are going to receive an exit event for it. Here's the GDBserver logs with the fix applied: [threads] resume: enter [threads] thread_needs_step_over: Need step over [LWP 724595]? Ignoring, should remain stopped [threads] thread_needs_step_over: Need step over [LWP 724596]? Ignoring, should remain stopped [threads] get_pc: pc is 0x555555555195 [threads] thread_needs_step_over: Need step over [LWP 724597]? No, no breakpoint found at 0x555555555195 [threads] get_pc: pc is 0x7ffff7d35a95 [threads] thread_needs_step_over: Need step over [LWP 724598]? No, no breakpoint found at 0x7ffff7d35a95 [threads] resume: Resuming, no pending status or step over needed [threads] resume_one_thread: resuming LWP 724597 [threads] proceed_one_lwp: lwp 724597 [threads] resume_one_lwp_throw: continue from pc 0x555555555195 [threads] resume_one_lwp_throw: Resuming lwp 724597 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: NOW ptid=724597.724597.0 stopped=0 resumed=0 [threads] resume_one_thread: resuming LWP 724598 [threads] proceed_one_lwp: lwp 724598 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 724598 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] resume: exit [threads] wait_1: enter [threads] wait_1: [<all threads>] sigchld_handler [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK [threads] operator(): check_zombie_leaders: leader_pid=724595, leader_lp!=NULL=1, num_lwps=2, zombie=0 [threads] operator(): check_zombie_leaders: leader_pid=724597, leader_lp!=NULL=1, num_lwps=2, zombie=1 [threads] operator(): Thread group leader 724597 zombie (it exited, or another thread execd). [threads] delete_lwp: deleting 724597 [threads] wait_for_event_filtered: sigsuspend'ing sigchld_handler [threads] wait_for_event_filtered: waitpid(-1, ...) returned 724598, ERRNO-OK [threads] wait_for_event_filtered: waitpid 724598 received 0 (exited) [threads] filter_event: 724598 exited [threads] wait_for_event_filtered: waitpid(-1, ...) returned 724597, ERRNO-OK [threads] wait_for_event_filtered: waitpid 724597 received 0 (exited) [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK sigchld_handler [threads] wait_1: ret = LWP 724597.724598, exited with retcode 0 [threads] wait_1: exit Change-Id: Idf0bdb4cb0313f1b49e4864071650cc83fb3c100 |
||
---|---|---|
.. | ||
.dir-locals.el | ||
.gitattributes | ||
.gitignore | ||
acinclude.m4 | ||
aclocal.m4 | ||
ax.cc | ||
ax.h | ||
ChangeLog-2002-2021 | ||
config.in | ||
configure | ||
configure.ac | ||
configure.srv | ||
debug.cc | ||
debug.h | ||
dll.cc | ||
dll.h | ||
fork-child.cc | ||
gdb_proc_service.h | ||
gdbreplay.cc | ||
gdbthread.h | ||
hostio.cc | ||
hostio.h | ||
i387-fp.cc | ||
i387-fp.h | ||
inferiors.cc | ||
inferiors.h | ||
linux-aarch32-low.cc | ||
linux-aarch32-low.h | ||
linux-aarch32-tdesc.cc | ||
linux-aarch32-tdesc.h | ||
linux-aarch64-ipa.cc | ||
linux-aarch64-low.cc | ||
linux-aarch64-tdesc.cc | ||
linux-aarch64-tdesc.h | ||
linux-amd64-ipa.cc | ||
linux-arc-low.cc | ||
linux-arm-low.cc | ||
linux-arm-tdesc.cc | ||
linux-arm-tdesc.h | ||
linux-i386-ipa.cc | ||
linux-ia64-low.cc | ||
linux-low.cc | ||
linux-low.h | ||
linux-m68k-low.cc | ||
linux-mips-low.cc | ||
linux-nios2-low.cc | ||
linux-or1k-low.cc | ||
linux-ppc-ipa.cc | ||
linux-ppc-low.cc | ||
linux-ppc-tdesc-init.h | ||
linux-riscv-low.cc | ||
linux-s390-ipa.cc | ||
linux-s390-low.cc | ||
linux-s390-tdesc.h | ||
linux-sh-low.cc | ||
linux-sparc-low.cc | ||
linux-tic6x-low.cc | ||
linux-x86-low.cc | ||
linux-x86-tdesc.cc | ||
linux-x86-tdesc.h | ||
linux-xtensa-low.cc | ||
Makefile.in | ||
mem-break.cc | ||
mem-break.h | ||
netbsd-aarch64-low.cc | ||
netbsd-amd64-low.cc | ||
netbsd-i386-low.cc | ||
netbsd-low.cc | ||
netbsd-low.h | ||
notif.cc | ||
notif.h | ||
proc-service.cc | ||
proc-service.list | ||
README | ||
regcache.cc | ||
regcache.h | ||
remote-utils.cc | ||
remote-utils.h | ||
server.cc | ||
server.h | ||
symbol.cc | ||
target.cc | ||
target.h | ||
tdesc.cc | ||
tdesc.h | ||
thread-db.cc | ||
tracepoint.cc | ||
tracepoint.h | ||
utils.cc | ||
utils.h | ||
win32-i386-low.cc | ||
win32-low.cc | ||
win32-low.h | ||
x86-low.cc | ||
x86-low.h | ||
x86-tdesc.h | ||
xtensa-xtregs.cc |
README for GDBserver & GDBreplay by Stu Grossman and Fred Fish Introduction: This is GDBserver, a remote server for Un*x-like systems. It can be used to control the execution of a program on a target system from a GDB on a different host. GDB and GDBserver communicate using the standard remote serial protocol. They communicate via either a serial line or a TCP connection. For more information about GDBserver, see the GDB manual: https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Protocol.html Usage (server (target) side): First, you need to have a copy of the program you want to debug put onto the target system. The program can be stripped to save space if needed, as GDBserver doesn't care about symbols. All symbol handling is taken care of by the GDB running on the host system. To use the server, you log on to the target system, and run the `gdbserver' program. You must tell it (a) how to communicate with GDB, (b) the name of your program, and (c) its arguments. The general syntax is: target> gdbserver COMM PROGRAM [ARGS ...] For example, using a serial port, you might say: target> gdbserver /dev/com1 emacs foo.txt This tells GDBserver to debug emacs with an argument of foo.txt, and to communicate with GDB via /dev/com1. GDBserver now waits patiently for the host GDB to communicate with it. To use a TCP connection, you could say: target> gdbserver host:2345 emacs foo.txt This says pretty much the same thing as the last example, except that we are going to communicate with the host GDB via TCP. The `host:2345' argument means that we are expecting to see a TCP connection to local TCP port 2345. (Currently, the `host' part is ignored.) You can choose any number you want for the port number as long as it does not conflict with any existing TCP ports on the target system. This same port number must be used in the host GDB's `target remote' command, which will be described shortly. Note that if you chose a port number that conflicts with another service, GDBserver will print an error message and exit. On some targets, GDBserver can also attach to running programs. This is accomplished via the --attach argument. The syntax is: target> gdbserver --attach COMM PID PID is the process ID of a currently running process. It isn't necessary to point GDBserver at a binary for the running process. Usage (host side): You need an unstripped copy of the target program on your host system, since GDB needs to examine it's symbol tables and such. Start up GDB as you normally would, with the target program as the first argument. (You may need to use the --baud option if the serial line is running at anything except 9600 baud.) Ie: `gdb TARGET-PROG', or `gdb --baud BAUD TARGET-PROG'. After that, the only new command you need to know about is `target remote'. It's argument is either a device name (usually a serial device, like `/dev/ttyb'), or a HOST:PORT descriptor. For example: (gdb) target remote /dev/ttyb communicates with the server via serial line /dev/ttyb, and: (gdb) target remote the-target:2345 communicates via a TCP connection to port 2345 on host `the-target', where you previously started up GDBserver with the same port number. Note that for TCP connections, you must start up GDBserver prior to using the `target remote' command, otherwise you may get an error that looks something like `Connection refused'. Building GDBserver: See the `configure.srv` file for the list of host triplets you can build GDBserver for. Building GDBserver for your host is very straightforward. If you build GDB natively on a host which GDBserver supports, it will be built automatically when you build GDB. You can also build just GDBserver: % mkdir obj % cd obj % path-to-toplevel-sources/configure --disable-gdb % make all-gdbserver (If you have a combined binutils+gdb tree, you may want to also disable other directories when configuring, e.g., binutils, gas, gold, gprof, and ld.) If you prefer to cross-compile to your target, then you can also build GDBserver that way. For example: % export CC=your-cross-compiler % path-to-topevel-sources/configure --disable-gdb % make all-gdbserver Using GDBreplay: A special hacked down version of GDBserver can be used to replay remote debug log files created by GDB. Before using the GDB "target" command to initiate a remote debug session, use "set remotelogfile <filename>" to tell GDB that you want to make a recording of the serial or tcp session. Note that when replaying the session, GDB communicates with GDBreplay via tcp, regardless of whether the original session was via a serial link or tcp. Once you are done with the remote debug session, start GDBreplay and tell it the name of the log file and the host and port number that GDB should connect to (typically the same as the host running GDB): $ gdbreplay logfile host:port Then start GDB (preferably in a different screen or window) and use the "target" command to connect to GDBreplay: (gdb) target remote host:port Repeat the same sequence of user commands to GDB that you gave in the original debug session. GDB should not be able to tell that it is talking to GDBreplay rather than a real target, all other things being equal. Note that GDBreplay echos the command lines to stderr, as well as the contents of the packets it sends and receives. The last command echoed by GDBreplay is the next command that needs to be typed to GDB to continue the session in sync with the original session.