Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
/* This testcase is part of GDB, the GNU debugger.
|
|
|
|
|
2021-01-01 16:03:39 +08:00
|
|
|
Copyright 2015-2021 Free Software Foundation, Inc.
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation; either version 3 of the License, or
|
|
|
|
(at your option) any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>. */
|
|
|
|
|
|
|
|
#define _GNU_SOURCE
|
|
|
|
#include <assert.h>
|
|
|
|
#include <pthread.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <stdio.h>
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
#include <string.h>
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
#include <limits.h>
|
|
|
|
|
|
|
|
/* How many threads fit in the target's thread number space. */
|
|
|
|
long tid_max = -1;
|
|
|
|
|
|
|
|
/* Number of threads spawned. */
|
|
|
|
unsigned long thread_counter;
|
|
|
|
|
|
|
|
/* How long it takes to spawn as many threads as fits in the thread
|
|
|
|
number space. On systems where thread IDs are just monotonically
|
|
|
|
incremented, this is enough for the tid numbers to wrap around. On
|
|
|
|
targets that randomize thread IDs, this is enough time to give each
|
|
|
|
number in the thread number space some chance of reuse. It'll be
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
capped to a lower value if we can't compute it. REUSE_TIME_CAP
|
|
|
|
is the max value, and the default value if ever the program
|
|
|
|
has problem to compute it. */
|
|
|
|
#define REUSE_TIME_CAP 60
|
|
|
|
unsigned int reuse_time = REUSE_TIME_CAP;
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
|
|
|
|
void *
|
|
|
|
do_nothing_thread_func (void *arg)
|
|
|
|
{
|
|
|
|
usleep (1);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
static void
|
|
|
|
check_rc (int rc, const char *what)
|
|
|
|
{
|
|
|
|
if (rc != 0)
|
|
|
|
{
|
|
|
|
fprintf (stderr, "unexpected error from %s: %s (%d)\n",
|
|
|
|
what, strerror (rc), rc);
|
|
|
|
assert (0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
void *
|
|
|
|
spawner_thread_func (void *arg)
|
|
|
|
{
|
|
|
|
while (1)
|
|
|
|
{
|
|
|
|
pthread_t child;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
thread_counter++;
|
|
|
|
|
|
|
|
rc = pthread_create (&child, NULL, do_nothing_thread_func, NULL);
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
check_rc (rc, "pthread_create");
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
|
|
|
|
rc = pthread_join (child, NULL);
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
check_rc (rc, "pthread_join");
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Called after the program is done counting number of spawned threads
|
|
|
|
for a period, to compute REUSE_TIME. */
|
|
|
|
|
|
|
|
void
|
|
|
|
after_count (void)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Called after enough time has passed for TID reuse to occur. */
|
|
|
|
|
|
|
|
void
|
|
|
|
after_reuse_time (void)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef __linux__
|
|
|
|
|
|
|
|
/* Get the running system's configured pid_max. */
|
|
|
|
|
|
|
|
static int
|
|
|
|
linux_proc_get_pid_max (void)
|
|
|
|
{
|
|
|
|
static const char filename[] ="/proc/sys/kernel/pid_max";
|
|
|
|
FILE *file;
|
|
|
|
char buf[100];
|
|
|
|
int retval = -1;
|
|
|
|
|
|
|
|
file = fopen (filename, "r");
|
|
|
|
if (file == NULL)
|
|
|
|
{
|
|
|
|
fprintf (stderr, "unable to open %s\n", filename);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (fgets (buf, sizeof (buf), file) != NULL)
|
|
|
|
retval = strtol (buf, NULL, 10);
|
|
|
|
|
|
|
|
fclose (file);
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
int
|
|
|
|
main (int argc, char *argv[])
|
|
|
|
{
|
|
|
|
pthread_t child;
|
|
|
|
int rc;
|
|
|
|
unsigned int reuse_time_raw = 0;
|
|
|
|
|
|
|
|
rc = pthread_create (&child, NULL, spawner_thread_func, NULL);
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
check_rc (rc, "pthread_create spawner_thread");
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
|
|
|
|
#define COUNT_TIME 2
|
|
|
|
sleep (COUNT_TIME);
|
|
|
|
|
|
|
|
#ifdef __linux__
|
|
|
|
tid_max = linux_proc_get_pid_max ();
|
|
|
|
#endif
|
|
|
|
/* If we don't know how many threads it would take to use the whole
|
|
|
|
number space on this system, just run the test for a bit. */
|
|
|
|
if (tid_max > 0)
|
|
|
|
{
|
|
|
|
reuse_time_raw = tid_max / ((float) thread_counter / COUNT_TIME) + 0.5;
|
|
|
|
|
|
|
|
/* Give it a bit more, just in case. */
|
|
|
|
reuse_time = reuse_time_raw + 3;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* 4 seconds were sufficient on the machine this was first observed,
|
|
|
|
an Intel i7-2620M @ 2.70GHz running Linux 3.18.7, with
|
|
|
|
pid_max=32768. Going forward, as machines get faster, this will
|
|
|
|
need less time, unless pid_max is set to a very high number. To
|
|
|
|
avoid unreasonably long test time, cap to an upper bound. */
|
Fix tid-reuse sometimes blocks for a very long (infinite?) time.
A failure that seems to cause a long/infinite time is the following:
For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.
which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7518700 (LWP 10368)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count
After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
print reuse_time
$1 = 4294967295
(gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
set timeout [expr $reuse_time * 2]
leading to the error:
(gdb) ERROR: integer value too large to represent as non-long integer
while executing
"expect {
-i exp8 -timeout 8589934590
-re ".*A problem internal to GDB has been detected" {
fail "$message (GDB internal error)"
gdb_intern..."
("uplevel" body line 1)
invoked from within
"uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
ERROR: GDB process no longer exists
and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip
This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.
Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."
Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.
gdb/testsuite/ChangeLog
2018-12-09 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
(reuse_time): Initialize to REUSE_TIME_CAP.
(check_rc): New function.
(main): Use REUSE_TIME_CAP instead of hardcoded 60.
Check pthread_create rc.
(spawner_thread_func): Check pthread_create and pthread_join rc.
2018-11-05 03:54:05 +08:00
|
|
|
if (reuse_time > REUSE_TIME_CAP)
|
|
|
|
reuse_time = REUSE_TIME_CAP;
|
Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
2015-04-01 20:38:06 +08:00
|
|
|
printf ("thread_counter=%lu, tid_max = %ld, reuse_time_raw=%u, reuse_time=%u\n",
|
|
|
|
thread_counter, tid_max, reuse_time_raw, reuse_time);
|
|
|
|
after_count ();
|
|
|
|
|
|
|
|
sleep (reuse_time);
|
|
|
|
|
|
|
|
after_reuse_time ();
|
|
|
|
return 0;
|
|
|
|
}
|