binutils-gdb/gdb/testsuite/gdb.base/attach.exp
Pedro Alves 2c8c5d375e testsuite: tcl exec& -> 'kill -9 $pid' is racy (attach-many-short-lived-thread.exp races and others)
The buildbots show that attach-many-short-lived-thread.exp is racy.
But after staring at debug logs and playing with SystemTap scripts for
a (long) while, I figured out that neither GDB, nor the kernel nor the
test's program itself are at fault.

The problem is simply that the testsuite machinery is currently
subject to PID-reuse races.  The attach-many-short-lived-threads.c
test program just happens to be much more susceptible to trigger this
race because threads and processes share the same number space on
Linux, and the test spawns many many short lived threads in
succession, thus enlarging the race window a lot.

Part of the problem is that several tests spawn processes with "exec&"
(in order to test the "attach" command) , and then at the end of the
test, to make sure things are cleaned up, issue a 'remote_spawn "kill
-p $testpid"'.  Since with tcl's "exec&", tcl itself is responsible
for reaping the process's exit status, when we go kill the process,
testpid may have already exited _and_ its status may have (and often
has) been reaped already.  Thus it can happen that another process
meanwhile reuses $testpid, and that "kill" command kills the wrong
process...  Frequently, that happens to be
attach-many-short-lived-thread, but this explains other test's races
as well.

In the attach-many-short-lived-threads test, it sometimes manifests
like this:

 (gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads
 Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done.
 (gdb)           Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb
 attach 5940
 Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940
 warning: process 5940 is a zombie - the process has already terminated
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ptrace: Operation not permitted.
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
 info threads
 No threads.
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
 set breakpoint always-inserted on
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on

Other times the process dies while the test is ongoing (the process is
ptrace-stopped):

 (gdb) print again = 1
 Cannot access memory at address 0x6020cc
 (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior

(Recall that on Linux, SIGKILL is not interceptable)

And other times it dies just while we're detaching:

 $4 = 319
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left
 detach
 Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process
 (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach

GDB mishandles the latter (it should ignore ESRCH while detaching just
like when continuing), but that's another story.

The fix here is to change spawn_wait_for_attach to use Expect's
'spawn' command instead of Tcl's 'exec&' to spawn programs, because
with spawn we control when to wait for/reap the process.  That allows
killing the process by PID without being subject to pid-reuse races,
because even if the process is already dead, the kernel won't reuse
the process's PID until the zombie is reaped.

The other part of the problem lies in DejaGnu itself, unfortunately.
I have occasionally seen tests (attach-many-short-lived-threads
included, but not only that one) die with a random inexplicable
SIGTERM too, and that too is caused by the same reason, except that in
that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp:

    exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &"
    ...
    catch "wait -i $shell_id"

Even if the program exits promptly, that whole cascade of kills
carries on in the background, thus potentially killing the poor
process that manages to reuse $pid...

I sent a fix for that to the DejaGnu list:
 http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html

With both patches in place, I haven't seen
attach-many-short-lived-threads.exp fail again.

Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver.

gdb/testsuite/ChangeLog:
2015-07-31  Pedro Alves  <palves@redhat.com>

	* gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id.
	Use spawn_id_get_pid.  Wait for spawn id after eof.  Use
	kill_wait_spawned_process instead of explicit "kill -9".
	* gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach
	returning a spawn id instead of a pid.  Use spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.base/attach-twice.exp: Likewise.
	* gdb.base/attach.exp: Likewise.
	(do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and
	gdb_test_multiple.
	* gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach
	returning a spawn id instead of a pid.  Use spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.base/valgrind-infcall.exp: Likewise.
	* gdb.multi/multi-attach.exp: Likewise.
	* gdb.python/py-prompt.exp: Likewise.
	* gdb.python/py-sync-interp.exp: Likewise.
	* gdb.server/ext-attach.exp: Likewise.
	* gdb.threads/attach-into-signal.exp (corefunc): Use
	spawn_wait_for_attach, spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.threads/attach-many-short-lived-threads.exp: Adjust to
	spawn_wait_for_attach returning a spawn id instead of a pid.  Use
	spawn_id_get_pid and kill_wait_spawned_process.
	* gdb.threads/attach-stopped.exp (corefunc): Use
	spawn_wait_for_attach, spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.base/break-interp.exp: Rename $res to $test_spawn_id.
	Use spawn_id_get_pid.  Wait for spawn id after eof.  Use
	kill_wait_spawned_process instead of explicit "kill -9".
	* lib/gdb.exp (can_spawn_for_attach): Adjust comment.
	(kill_wait_spawned_process, spawn_id_get_pid): New procedures.
	(spawn_wait_for_attach): Use spawn instead of exec to spawn
	processes.  Don't map cygwin/windows pids here.  Now returns a
	spawn id list.
2015-07-31 20:06:24 +01:00

477 lines
14 KiB
Plaintext

# Copyright 1997-2015 Free Software Foundation, Inc.
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>. */
# On HP-UX 11.0, this test is causing a process running the program
# "attach" to be left around spinning. Until we figure out why, I am
# commenting out the test to avoid polluting tiamat (our 11.0 nightly
# test machine) with these processes. RT
#
# Setting the magic bit in the target app should work. I added a
# "kill", and also a test for the R3 register warning. JB
if { [istarget "hppa*-*-hpux*"] } {
return 0
}
if {![can_spawn_for_attach]} {
return 0
}
standard_testfile attach.c attach2.c
set binfile2 ${binfile}2
set escapedbinfile [string_to_regexp $binfile]
#execute_anywhere "rm -f ${binfile} ${binfile2}"
remote_exec build "rm -f ${binfile} ${binfile2}"
# For debugging this test
#
#log_user 1
# build the first test case
#
if { [gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
untested attach.exp
return -1
}
# Build the in-system-call test
if { [gdb_compile "${srcdir}/${subdir}/${srcfile2}" "${binfile2}" executable {debug}] != "" } {
untested attach.exp
return -1
}
if [get_compiler_info] {
return -1
}
proc do_attach_tests {} {
global gdb_prompt
global binfile
global escapedbinfile
global srcfile
global testfile
global subdir
global timeout
# Figure out a regular expression that will match the sysroot,
# noting that the default sysroot is "target:", and also noting
# that GDB will strip "target:" from the start of filenames when
# operating on the local filesystem
set sysroot ""
set test "show sysroot"
gdb_test_multiple $test $test {
-re "The current system root is \"(.*)\"\..*${gdb_prompt} $" {
set sysroot $expect_out(1,string)
}
}
regsub "^target:" "$sysroot" "(target:)?" sysroot
# Start the program running and then wait for a bit, to be sure
# that it can be attached to.
set test_spawn_id [spawn_wait_for_attach $binfile]
set testpid [spawn_id_get_pid $test_spawn_id]
# Verify that we cannot attach to nonsense.
set test "attach to nonsense is prohibited"
gdb_test_multiple "attach abc" "$test" {
-re "Illegal process-id: abc\\.\r\n$gdb_prompt $" {
pass "$test"
}
-re "Attaching to.*, process .*couldn't open /proc file.*$gdb_prompt $" {
# Response expected from /proc-based systems.
pass "$test"
}
-re "Can't attach to process..*$gdb_prompt $" {
# Response expected on Cygwin
pass "$test"
}
-re "Attaching to.*$gdb_prompt $" {
fail "$test (bogus pid allowed)"
}
}
# Verify that we cannot attach to nonsense even if its initial part is
# a valid PID.
set test "attach to digits-starting nonsense is prohibited"
gdb_test_multiple "attach ${testpid}x" "$test" {
-re "Illegal process-id: ${testpid}x\\.\r\n$gdb_prompt $" {
pass "$test"
}
-re "Attaching to.*, process .*couldn't open /proc file.*$gdb_prompt $" {
# Response expected from /proc-based systems.
pass "$test"
}
-re "Can't attach to process..*$gdb_prompt $" {
# Response expected on Cygwin
pass "$test"
}
-re "Attaching to.*$gdb_prompt $" {
fail "$test (bogus pid allowed)"
}
}
# Verify that we cannot attach to what appears to be a valid
# process ID, but is a process that doesn't exist. Traditionally,
# most systems didn't have a process with ID 0, so we take that as
# the default. However, there are a few exceptions.
set boguspid 0
if { [istarget "*-*-*bsd*"] } {
# In FreeBSD 5.0, PID 0 is used for "swapper". Use -1 instead
# (which should have the desired effect on any version of
# FreeBSD, and probably other *BSD's too).
set boguspid -1
}
set test "attach to nonexistent process is prohibited"
gdb_test_multiple "attach $boguspid" "$test" {
-re "Attaching to.*, process $boguspid.*No such process.*$gdb_prompt $" {
# Response expected on ptrace-based systems (i.e. HP-UX 10.20).
pass "$test"
}
-re "Attaching to.*, process $boguspid failed.*Hint.*$gdb_prompt $" {
# Response expected on ttrace-based systems (i.e. HP-UX 11.0).
pass "$test"
}
-re "Attaching to.*, process $boguspid.*denied.*$gdb_prompt $" {
pass "$test"
}
-re "Attaching to.*, process $boguspid.*not permitted.*$gdb_prompt $" {
pass "$test"
}
-re "Attaching to.*, process .*couldn't open /proc file.*$gdb_prompt $" {
# Response expected from /proc-based systems.
pass "$test"
}
-re "Can't attach to process..*$gdb_prompt $" {
# Response expected on Cygwin
pass "$test"
}
-re "Attaching to.*, process $boguspid.*failed.*$gdb_prompt $" {
# Response expected on the extended-remote target.
pass "$test"
}
}
# Verify that we can attach to the process by first giving its
# executable name via the file command, and using attach with the
# process ID.
# (Actually, the test system appears to do this automatically for
# us. So, we must also be prepared to be asked if we want to
# discard an existing set of symbols.)
set test "set file, before attach1"
gdb_test_multiple "file $binfile" "$test" {
-re "Load new symbol table from.*y or n. $" {
gdb_test "y" "Reading symbols from $escapedbinfile\.\.\.*done." \
"$test (re-read)"
}
-re "Reading symbols from $escapedbinfile\.\.\.*done.*$gdb_prompt $" {
pass "$test"
}
}
set test "attach1, after setting file"
gdb_test_multiple "attach $testpid" "$test" {
-re "Attaching to program.*`?$escapedbinfile'?, process $testpid.*main.*at .*$srcfile:.*$gdb_prompt $" {
pass "$test"
}
-re "Attaching to program.*`?$escapedbinfile\.exe'?, process $testpid.*\[Switching to thread $testpid\..*\].*$gdb_prompt $" {
# Response expected on Cygwin
pass "$test"
}
}
# Verify that we can "see" the variable "should_exit" in the
# program, and that it is zero.
gdb_test "print should_exit" " = 0" "after attach1, print should_exit"
# Detach the process.
gdb_test "detach" \
"Detaching from program: .*$escapedbinfile, process $testpid" \
"attach1 detach"
# Wait a bit for gdb to finish detaching
exec sleep 5
# Purge the symbols from gdb's brain. (We want to be certain the
# next attach, which won't be preceded by a "file" command, is
# really getting the executable file without our help.)
set old_timeout $timeout
set timeout 15
set test "attach1, purging symbols after detach"
gdb_test_multiple "file" "$test" {
-re "No executable file now.*Discard symbol table.*y or n. $" {
gdb_test "y" "No symbol file now." "$test"
}
}
set timeout $old_timeout
# Verify that we can attach to the process just by giving the
# process ID.
set test "attach2, with no file"
set found_exec_file 0
gdb_test_multiple "attach $testpid" "$test" {
-re "Attaching to process $testpid.*Load new symbol table from \"$sysroot$escapedbinfile\.exe\".*y or n. $" {
# On Cygwin, the DLL's symbol tables are loaded prior to the
# executable's symbol table. This in turn always results in
# asking the user for actually loading the symbol table of the
# executable.
gdb_test "y" "Reading symbols from $sysroot$escapedbinfile\.\.\.*done." \
"$test (reset file)"
set found_exec_file 1
}
-re "Attaching to process $testpid.*Reading symbols from $sysroot$escapedbinfile.*main.*at .*$gdb_prompt $" {
pass "$test"
set found_exec_file 1
}
}
if {$found_exec_file == 0} {
set test "load file manually, after attach2"
gdb_test_multiple "file $binfile" "$test" {
-re "A program is being debugged already..*Are you sure you want to change the file.*y or n. $" {
gdb_test "y" "Reading symbols from $escapedbinfile\.\.\.*done." \
"$test (re-read)"
}
-re "Reading symbols from $escapedbinfile\.\.\.*done.*$gdb_prompt $" {
pass "$test"
}
}
}
# Verify that we can modify the variable "should_exit" in the
# program.
gdb_test_no_output "set should_exit=1" "after attach2, set should_exit"
# Verify that the modification really happened.
gdb_breakpoint [gdb_get_line_number "postloop"] temporary
gdb_continue_to_breakpoint "postloop" ".* postloop .*"
# Allow the test process to exit, to cleanup after ourselves.
gdb_continue_to_end "after attach2, exit"
# Make sure we don't leave a process around to confuse
# the next test run (and prevent the compile by keeping
# the text file busy), in case the "set should_exit" didn't
# work.
kill_wait_spawned_process $test_spawn_id
set test_spawn_id [spawn_wait_for_attach $binfile]
set testpid [spawn_id_get_pid $test_spawn_id]
# Verify that we can attach to the process, and find its a.out
# when we're cd'd to some directory that doesn't contain the
# a.out. (We use the source path set by the "dir" command.)
gdb_test "dir [standard_output_file {}]" "Source directories searched: .*" \
"set source path"
gdb_test "cd /tmp" "Working directory /tmp." \
"cd away from process working directory"
# Explicitly flush out any knowledge of the previous attachment.
set test "before attach3, flush symbols"
gdb_test_multiple "symbol-file" "$test" {
-re "Discard symbol table from.*y or n. $" {
gdb_test "y" "No symbol file now." \
"$test"
}
-re "No symbol file now.*$gdb_prompt $" {
pass "$test"
}
}
gdb_test "exec" "No executable file now." \
"before attach3, flush exec"
gdb_test "attach $testpid" \
"Attaching to process $testpid.*Reading symbols from $sysroot$escapedbinfile.*main.*at .*" \
"attach when process' a.out not in cwd"
set test "after attach3, exit"
gdb_test "kill" \
"" \
"$test" \
"Kill the program being debugged.*y or n. $" \
"y"
# Another "don't leave a process around"
kill_wait_spawned_process $test_spawn_id
}
proc do_call_attach_tests {} {
global gdb_prompt
global binfile2
set test_spawn_id [spawn_wait_for_attach $binfile2]
set testpid [spawn_id_get_pid $test_spawn_id]
# Attach
gdb_test "file $binfile2" ".*" "force switch to gdb64, if necessary"
set test "attach call"
gdb_test_multiple "attach $testpid" "$test" {
-re "warning: reading register.*I.*O error.*$gdb_prompt $" {
fail "$test (read register error)"
}
-re "Attaching to.*process $testpid.*libc.*$gdb_prompt $" {
pass "$test"
}
-re "Attaching to.*process $testpid.*\[Switching to thread $testpid\..*\].*$gdb_prompt $" {
pass "$test"
}
}
# See if other registers are problems
set test "info other register"
gdb_test_multiple "i r r3" "$test" {
-re "warning: reading register.*$gdb_prompt $" {
fail "$test"
}
-re "r3.*$gdb_prompt $" {
pass "$test"
}
}
# Get rid of the process
gdb_test "p should_exit = 1"
gdb_continue_to_end
# Be paranoid
kill_wait_spawned_process $test_spawn_id
}
proc do_command_attach_tests {} {
global gdb_prompt
global binfile
global verbose
global GDB
global INTERNAL_GDBFLAGS
global GDBFLAGS
if ![isnative] then {
unsupported "command attach test"
return 0
}
set test_spawn_id [spawn_wait_for_attach $binfile]
set testpid [spawn_id_get_pid $test_spawn_id]
gdb_exit
set res [gdb_spawn_with_cmdline_opts "--pid=$testpid"]
set test "starting with --pid"
gdb_test_multiple "" $test {
-re "Reading symbols from.*$gdb_prompt $" {
pass "$test"
}
}
# Get rid of the process
kill_wait_spawned_process $test_spawn_id
}
# Test ' gdb --pid PID -ex "run" '. GDB used to have a bug where
# "run" would run before the attach finished - PR17347.
proc test_command_line_attach_run {} {
global gdb_prompt
global binfile
if ![isnative] then {
unsupported "commandline attach run test"
return 0
}
with_test_prefix "cmdline attach run" {
set test_spawn_id [spawn_wait_for_attach $binfile]
set testpid [spawn_id_get_pid $test_spawn_id]
set test "run to prompt"
gdb_exit
set res [gdb_spawn_with_cmdline_opts \
"-iex \"set height 0\" -iex \"set width 0\" --pid=$testpid -ex \"start\""]
if { $res != 0} {
fail $test
kill_wait_spawned_process $test_spawn_id
return $res
}
gdb_test_multiple "" $test {
-re {Attaching to.*Start it from the beginning\? \(y or n\) } {
pass $test
}
}
send_gdb "y\n"
set test "run to main"
gdb_test_multiple "" $test {
-re "Temporary breakpoint .* main .*$gdb_prompt $" {
pass $test
}
}
# Get rid of the process
kill_wait_spawned_process $test_spawn_id
}
}
# Start with a fresh gdb
gdb_exit
gdb_start
gdb_reinitialize_dir $srcdir/$subdir
gdb_load ${binfile}
# This is a test of gdb's ability to attach to a running process.
do_attach_tests
# Test attaching when the target is inside a system call
gdb_exit
gdb_start
gdb_reinitialize_dir $srcdir/$subdir
do_call_attach_tests
# Test "gdb --pid"
do_command_attach_tests
test_command_line_attach_run
return 0