mirror of
https://sourceware.org/git/binutils-gdb.git
synced 2024-12-09 04:21:49 +08:00
2c8c5d375e
The buildbots show that attach-many-short-lived-thread.exp is racy. But after staring at debug logs and playing with SystemTap scripts for a (long) while, I figured out that neither GDB, nor the kernel nor the test's program itself are at fault. The problem is simply that the testsuite machinery is currently subject to PID-reuse races. The attach-many-short-lived-threads.c test program just happens to be much more susceptible to trigger this race because threads and processes share the same number space on Linux, and the test spawns many many short lived threads in succession, thus enlarging the race window a lot. Part of the problem is that several tests spawn processes with "exec&" (in order to test the "attach" command) , and then at the end of the test, to make sure things are cleaned up, issue a 'remote_spawn "kill -p $testpid"'. Since with tcl's "exec&", tcl itself is responsible for reaping the process's exit status, when we go kill the process, testpid may have already exited _and_ its status may have (and often has) been reaped already. Thus it can happen that another process meanwhile reuses $testpid, and that "kill" command kills the wrong process... Frequently, that happens to be attach-many-short-lived-thread, but this explains other test's races as well. In the attach-many-short-lived-threads test, it sometimes manifests like this: (gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done. (gdb) Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb attach 5940 Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940 warning: process 5940 is a zombie - the process has already terminated ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ptrace: Operation not permitted. (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach info threads No threads. (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads set breakpoint always-inserted on (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on Other times the process dies while the test is ongoing (the process is ptrace-stopped): (gdb) print again = 1 Cannot access memory at address 0x6020cc (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior (Recall that on Linux, SIGKILL is not interceptable) And other times it dies just while we're detaching: $4 = 319 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left detach Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach GDB mishandles the latter (it should ignore ESRCH while detaching just like when continuing), but that's another story. The fix here is to change spawn_wait_for_attach to use Expect's 'spawn' command instead of Tcl's 'exec&' to spawn programs, because with spawn we control when to wait for/reap the process. That allows killing the process by PID without being subject to pid-reuse races, because even if the process is already dead, the kernel won't reuse the process's PID until the zombie is reaped. The other part of the problem lies in DejaGnu itself, unfortunately. I have occasionally seen tests (attach-many-short-lived-threads included, but not only that one) die with a random inexplicable SIGTERM too, and that too is caused by the same reason, except that in that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp: exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &" ... catch "wait -i $shell_id" Even if the program exits promptly, that whole cascade of kills carries on in the background, thus potentially killing the poor process that manages to reuse $pid... I sent a fix for that to the DejaGnu list: http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html With both patches in place, I haven't seen attach-many-short-lived-threads.exp fail again. Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver. gdb/testsuite/ChangeLog: 2015-07-31 Pedro Alves <palves@redhat.com> * gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id. Use spawn_id_get_pid. Wait for spawn id after eof. Use kill_wait_spawned_process instead of explicit "kill -9". * gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/attach-twice.exp: Likewise. * gdb.base/attach.exp: Likewise. (do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and gdb_test_multiple. * gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/valgrind-infcall.exp: Likewise. * gdb.multi/multi-attach.exp: Likewise. * gdb.python/py-prompt.exp: Likewise. * gdb.python/py-sync-interp.exp: Likewise. * gdb.server/ext-attach.exp: Likewise. * gdb.threads/attach-into-signal.exp (corefunc): Use spawn_wait_for_attach, spawn_id_get_pid and kill_wait_spawned_process. * gdb.threads/attach-many-short-lived-threads.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.threads/attach-stopped.exp (corefunc): Use spawn_wait_for_attach, spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/break-interp.exp: Rename $res to $test_spawn_id. Use spawn_id_get_pid. Wait for spawn id after eof. Use kill_wait_spawned_process instead of explicit "kill -9". * lib/gdb.exp (can_spawn_for_attach): Adjust comment. (kill_wait_spawned_process, spawn_id_get_pid): New procedures. (spawn_wait_for_attach): Use spawn instead of exec to spawn processes. Don't map cygwin/windows pids here. Now returns a spawn id list.
129 lines
5.1 KiB
Plaintext
129 lines
5.1 KiB
Plaintext
# Copyright 2009-2015 Free Software Foundation, Inc.
|
|
# This program is free software; you can redistribute it and/or modify
|
|
# it under the terms of the GNU General Public License as published by
|
|
# the Free Software Foundation; either version 3 of the License, or
|
|
# (at your option) any later version.
|
|
#
|
|
# This program is distributed in the hope that it will be useful,
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
# GNU General Public License for more details.
|
|
#
|
|
# You should have received a copy of the GNU General Public License
|
|
# along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
#
|
|
# Contributed by Jan Kratochvil <jan.kratochvil@redhat.com>.
|
|
|
|
# Test GDB can cope with two libraries loaded with overlapping VMA ranges.
|
|
# Prelink libraries first so they can be loaded and their native address.
|
|
# In such case `struct linkmap'.l_addr will be zero. Provide different
|
|
# unprelinked library files on the disk which have zero-based VMAs. These
|
|
# different files should have their .dynamic section at a different offset in
|
|
# page size so that we get for
|
|
# warning: .dynamic section for "..." is not at the expected address
|
|
# the reason
|
|
# (wrong library or version mismatch?)
|
|
# and not:
|
|
# difference appears to be caused by prelink, adjusting expectations
|
|
# In such case both disk libraries will be loaded at VMAs starting at zero.
|
|
|
|
if [skip_shlib_tests] {
|
|
return 0
|
|
}
|
|
|
|
if {![can_spawn_for_attach]} {
|
|
return 0
|
|
}
|
|
|
|
if [get_compiler_info] {
|
|
return -1
|
|
}
|
|
|
|
# Library file.
|
|
set libname "solib-overlap-lib"
|
|
set srcfile_lib ${srcdir}/${subdir}/${libname}.c
|
|
# Binary file.
|
|
set testfile "solib-overlap-main"
|
|
set srcfile ${srcdir}/${subdir}/${testfile}.c
|
|
|
|
# Base addresses for `prelink -r' which should be compatible with both -m32 and
|
|
# -m64 targets. If it clashes with system prelinked libraries it would give
|
|
# false PASS.
|
|
# Prelink first lib1 at 0x40000000 and lib2 at 0x41000000.
|
|
# During second pass try lib1 at 0x50000000 and lib2 at 0x51000000.
|
|
foreach prelink_lib1 {0x40000000 0x50000000} { with_test_prefix "$prelink_lib1" {
|
|
set prelink_lib2 [format "0x%x" [expr $prelink_lib1 + 0x01000000]]
|
|
|
|
# Library file.
|
|
set binfile_lib1 [standard_output_file ${libname}1-${prelink_lib1}.so]
|
|
set binfile_lib1_test_msg OBJDIR/${subdir}/${libname}1-${prelink_lib1}.so
|
|
set binfile_lib2 [standard_output_file ${libname}2-${prelink_lib1}.so]
|
|
set binfile_lib2_test_msg OBJDIR/${subdir}/${libname}2-${prelink_lib1}.so
|
|
set lib_flags {debug}
|
|
# Binary file.
|
|
set binfile_base ${testfile}-${prelink_lib1}
|
|
set binfile [standard_output_file ${binfile_base}]
|
|
set binfile_test_msg OBJDIR/${subdir}/${binfile_base}
|
|
set bin_flags [list debug shlib=${binfile_lib1} shlib=${binfile_lib2}]
|
|
set escapedbinfile [string_to_regexp ${binfile}]
|
|
|
|
if { [gdb_compile_shlib ${srcfile_lib} ${binfile_lib1} $lib_flags] != ""
|
|
|| [gdb_compile_shlib ${srcfile_lib} ${binfile_lib2} $lib_flags] != ""
|
|
|| [gdb_compile ${srcfile} ${binfile} executable $bin_flags] != "" } {
|
|
untested "Could not compile ${binfile_lib1_test_msg}, ${binfile_lib2_test_msg} or ${binfile_test_msg}."
|
|
return -1
|
|
}
|
|
|
|
if {[catch "system \"prelink -N -r ${prelink_lib1} ${binfile_lib1}\""] != 0
|
|
|| [catch "system \"prelink -N -r ${prelink_lib2} ${binfile_lib2}\""] != 0} {
|
|
# Maybe we don't have prelink.
|
|
untested "Could not prelink ${binfile_lib1_test_msg} or ${binfile_lib2_test_msg}."
|
|
return -1
|
|
}
|
|
|
|
set test_spawn_id [spawn_wait_for_attach $binfile]
|
|
set testpid [spawn_id_get_pid $test_spawn_id]
|
|
|
|
remote_exec build "mv -f ${binfile_lib1} ${binfile_lib1}-running"
|
|
remote_exec build "mv -f ${binfile_lib2} ${binfile_lib2}-running"
|
|
|
|
# Provide another exported function name to cause different sizes of sections.
|
|
lappend lib_flags additional_flags=-DSYMB
|
|
|
|
if { [gdb_compile_shlib ${srcfile_lib} ${binfile_lib1} $lib_flags] != ""
|
|
|| [gdb_compile_shlib ${srcfile_lib} ${binfile_lib2} $lib_flags] != ""} {
|
|
untested "Could not recompile ${binfile_lib1_test_msg} or ${binfile_lib2_test_msg}."
|
|
kill_wait_spawned_process $test_spawn_id
|
|
return -1
|
|
}
|
|
|
|
clean_restart ${binfile_base}
|
|
# This testcase currently does not support remote targets.
|
|
# gdb_load_shlibs ${binfile_lib1} ${binfile_lib2}
|
|
|
|
# Here we should get:
|
|
# warning: .dynamic section for ".../solib-overlap-lib1.so" is not at the expected address (wrong library or version mismatch?)
|
|
# warning: .dynamic section for ".../solib-overlap-lib2.so" is not at the expected address (wrong library or version mismatch?)
|
|
|
|
set test attach
|
|
gdb_test_multiple "attach $testpid" $test {
|
|
-re "Attaching to program.*`?$escapedbinfile'?, process $testpid.*$gdb_prompt $" {
|
|
pass $test
|
|
}
|
|
-re "Attaching to program.*`?$escapedbinfile\.exe'?, process $testpid.*\[Switching to thread $testpid\..*\].*$gdb_prompt $" {
|
|
# Response expected on Cygwin
|
|
pass $test
|
|
}
|
|
}
|
|
|
|
# Detach the process.
|
|
|
|
gdb_test "detach" "Detaching from program: .*$escapedbinfile, process $testpid"
|
|
|
|
# Wait a bit for gdb to finish detaching
|
|
|
|
sleep 5
|
|
|
|
kill_wait_spawned_process $test_spawn_id
|
|
}}
|