PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent
failure) shows that GDB can hit an unexpected scenario -- it can
happen that the kernel manages to open a /proc/PID/task/LWP/mem file,
but then reading from the file returns 0/EOF, even though the process
hasn't exited or execed.
"0" out of read/write is normally what you get when the address space
of the process the file was open for is gone, because the process
execed or exited. So when GDB gets the 0, it returns memory access
failure. In the bad case in question, the process hasn't execed or
exited, so GDB fails a memory access when the access should have
worked.
GDB has code in place to gracefully handle the case of opening the
/proc/PID/task/LWP/mem just while the LWP is exiting -- most often the
open fails with EACCES or ENOENT. When it happens, GDB just tries
opening the file for a different thread of the process. The testcase
is written such that it stresses GDB's logic of closing/reopening the
/proc/PID/task/LWP/mem file, by constantly spawning short lived
threads.
However, there's a window where the kernel manages to find the thread,
but the thread exits just after and clears its address space pointer.
In this case, the kernel creates a file successfully, but the file
ends up with no address space associated, so a subsequent read/write
returns 0/EOF too, just like if the whole process had execed or
exited. This is the case in question that GDB does not handle.
Oleg Nesterov gave this suggestion as workaround for that race:
gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm.
If statm reports something non-zero, then open() was "successfull".
I think that might work. However, I didn't try it, because I realized
we have another nasty race that that wouldn't fix.
The other race I realized is that because we close/reopen the
/proc/PID/task/LWP/mem file when GDB switches to a different inferior,
then it can happen that GDB reopens /proc/PID/task/LWP/mem just after
a thread execs, and before GDB has seen the corresponding exec event.
I.e., we can open a /proc/PID/task/LWP/mem file accessing the
post-exec address space thinking we're accessing the pre-exec address
space.
A few months back, Simon, Oleg and I discussed a similar race:
[Bug gdb/26754] Race condition when resuming threads and one does an exec
https://sourceware.org/bugzilla/show_bug.cgi?id=26754
The solution back then was to make the kernel fail any ptrace
operation until the exec event is consumed, with this kernel commit:
commit dbb5afad100a828c97e012c6106566d99f041db6
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed May 12 15:33:08 2021 +0200
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed May 12 10:45:22 2021 -0700
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
This however, only applies to ptrace, not to the /proc/pid/mem file
opening case. Also, even if it did apply to the file open case, we
would want to support current kernels until such a fix is more wide
spread anyhow.
So all in all, this commit gives up on the idea of only ever keeping
one /proc/pid/mem file descriptor open. Instead, make GDB open a
/proc/pid/mem per inferior, and keep it open until the inferior exits,
is detached or execs. Make GDB open the file right after the inferior
is created or is attached to or forks, at which point we know the
inferior is stable and stopped and isn't thus going to exec, or have a
thread exit, and so the file open won't fail (unless the whole process
is SIGKILLed from outside GDB, at which point it doesn't matter
whether we open the file).
This way, we avoid both races described above, at the expense of using
more file descriptors (one per inferior).
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419
Replaces a hard coded line number with a use of gdb_get_line_number.
I suspect that the line number has, over time, come adrift from where
it was supposed to be stopping. When the test was first added, line
770 pointed at the final 'return 0' in function main. Over time, as
things have been added, line 770 now points at some random location in
the middle of main.
So, I've marked the 'return 0' with a comment, and now the test will
always stop there.
I also removed an old comment from 1997 talking about how these tests
will only pass with the HP compiler, followed by an additional comment
from 2000 saying that the tests now pass with GCC.
I get the same results before and after this change.
Calculating "0 - pointer" can indeed result in seeming randomness as
the pointer address varies.
PR 28541
* dwarf.c (display_debug_frames): Don't print cie offset when
invalid, print "invalid" instead. Remove now redundant warning.
Investigating the PR28530 testcase, which has a fuzzed compression
header with an enormous size, I noticed that decompress_contents is
broken when the size doesn't fit in strm.avail_out. It wouldn't be
too hard to support larger sizes (patches welcome!) but for now just
stop decompress_contents from returning rubbish.
PR 28530
* compress.c (decompress_contents): Fail when uncompressed_size
is too big.
(bfd_init_section_decompress_status): Likewise.
The "set index-cache" command is used at the same time as a prefix
command (prefix for "set index-cache directory", for example), and a
boolean setting for turning the index-cache on and off. Even though I
did introduce that, I now don't think it's a good idea to do something
non-standard like this.
First, there's no dedicated CLI command to show whether the index-cache
is enabled, so it has to be custom output in the "show index-cache
handler". Also, it means there's no good way a MI frontend can find out
if the index-cache is enabled. "-gdb-show index-cache" doesn't show it
in the MI output record:
(gdb) interpreter-exec mi "-gdb-show index-cache"
~"\n"
~"The index cache is currently disabled.\n"
^done,showlist={option={name="directory",value="/home/simark/.cache/gdb"}}
Fix this by introducing "set/show index-cache enabled on/off", regular
boolean setting commands. Keep commands "set index-cache on" and "set
index-cache off" as deprecated aliases of "set index-cache enabled",
with respectively the default arguments "on" and "off".
Update tests using "set index-cache on/off" to use the new command.
Update the regexps in gdb.base/maint.exp to figure out whether the
index-cache is enabled or not. Update the doc to mention the new
commands.
Change-Id: I7d5aaaf7fd22bf47bd03e0023ef4fbb4023b37b3
The getter and setter in struct setting always receive and return values
by const reference. This is not necessary for scalar values (like bool
and int), but more importantly it makes it a bit annoying to write a
getter, you have to use a scratch static variable or something similar
that you can refer to:
const bool &
my_getter ()
{
static bool value;
value = function_returning_bool ();
return value;
}
Change the getter and setter function signatures to receive and return
value by value instead of by reference, when the underlying data type is
scalar. This means that string-based settings will still use
references, but all others will be by value. The getter above would
then be re-written as:
bool
my_getter ()
{
return function_returning_bool ();
}
This is useful for a patch later in this series that defines a boolean
setting with a getter and a setter.
Change-Id: Ieca3a2419fcdb75a6f75948b2c920b548a0af0fd
The class_deprecated enumerator isn't assigned anywhere, so remove it.
Commands that are deprecated have cmd_list_element::cmd_deprecated set
instead.
Change-Id: Ib35e540915c52aa65f13bfe9b8e4e22e6007903c
Remove two unnecessary nullptr checks. If aliases is nullptr, then the
for loops will simply be skipped.
Change-Id: I9132063bb17798391f8d019af305383fa8e0229f
I get some diffs when running autoconf in gdbserver, probably leftovers
from commit 5dfe4bfcb9 ("Fix format_pieces selftest on Windows").
Re-generate configure in that directory.
Change-Id: Icdc9906af95fbaf1047a579914b2983f8ec5db08
Always check sections with the corrupt size for non-MMO files. Skip MMO
files for compress_status == COMPRESS_SECTION_NONE since MMO has special
handling for COMPRESS_SECTION_NONE.
PR binutils/28530
* compress.c (bfd_get_full_section_contents): Always check
sections with the corrupt size.
Add/Remove the rvc extension to/from the riscv_subsets once the
.option rvc/norvc is set. So that we don't need to always check
the riscv_opts.rvc in the riscv_subset_supports, just call the
riscv_lookup_subset to search the subset list is enough.
Besides, we will need to dump the instructions according to the
elf architecture attributes. That means the dis-assembler needs
to parse the architecture string from the elf attribute before
dumping any instructions, and also needs to recognized the
INSN_CLASS* classes from riscv_opcodes. Therefore, I suppose
some functions will need to be moved from gas/config/tc-riscv.c
to bfd/elfxx-riscv.c, including riscv_multi_subset_supports and
riscv_subset_supports. This is one of the reasons why we need
this patch.
This patch passes the gcc/binutils regressions of rv32emc-elf,
rv32i-elf, rv64gc-elf and rv64gc-linux toolchains.
bfd/
* elfxx-riscv.c (riscv_remove_subset): Remove the extension
from the subset list.
(riscv_update_subset): Add/Remove an extension to/from the
subset list. This is used for the .option rvc or norvc.
* elfxx-riscv.h: Added the extern bool riscv_update_subset.
gas/
* config/tc-riscv.c (riscv_set_options): Removed the unused
rve flag.
(riscv_opts): Likewise.
(riscv_set_rve): Removed.
(riscv_subset_supports): Removed the riscv_opts.rvc check.
(riscv_set_arch): Don't need to call riscv_set_rve.
(reg_lookup_internal): Call riscv_subset_supports to check
whether the rve is supported.
(s_riscv_option): Add/Remove the rvc extension to/from the
subset list once the .option rvc/norvc is set.
The multi-run logic for mips involves a bit of codegen and rewriting
of files to include per-architecture prefixes. That can result in
files with missing prototypes which cause compiler errors. In the
case of mips-sde-elf targets, we have:
$srcdir/m16run.c -> $builddir/m16mips64r2_run.c
sim_engine_run -> m16mips64r2_engine_run
$srcdir/micromipsrun.c -> micromipsmicromips_run.c
sim_engine_run -> micromips64micromips_engine_run
micromipsmicromips_run.c:80:1: error: no previous prototype for 'micromips64micromips_engine_run' [-Werror=missing-prototypes]
80 | micromips64micromips_engine_run (SIM_DESC sd, int next_cpu_nr, int nr_cpus,
We generate headers for those prototypes in the configure script,
but only include them in the generated multi-run.c file. Update the
rewrite logic to turn the sim-engine.h include into the relevant
generated engine include so these files also have their prototypes.
$srcdir/m16run.c -> $builddir/m16mips64r2_run.c
sim-engine.h -> m16mips64r2_engine.h
$srcdir/micromipsrun.c -> micromipsmicromips_run.c
sim-engine.h -> micromips64micromips_engine.h
We don't need to build this anymore ourselves since the common build
includes it and produces the same object code. We also need to pull
in the split constant modules after the refactoring and pulling them
out of nltvals.def & targ-map.o. This doesn't matter for the sim
directly, but does for gdb and other users of libsim.
We also delete some conditional source tree logic since we already
require this be the "new" combined tree with a ../common/ dir. This
has been the case for decades at this point.
When building gnu-nat.c, we get:
CXX gnu-nat.o
gnu-nat.c: In member function 'virtual void gnu_nat_target::create_inferior(const char*, const string&, char**, int)':
gnu-nat.c:2117:13: error: 'struct inf' has no member named 'target_is_pushed'
2117 | if (!inf->target_is_pushed (this))
| ^~~~~~~~~~~~~~~~
gnu-nat.c:2118:10: error: 'struct inf' has no member named 'push_target'
2118 | inf->push_target (this);
| ^~~~~~~~~~~
This is because of a confusion between the generic `struct inferior`
variable and the gnu-nat-specific `struct inf` variable. Fix by
referring to `inferior`, not `inf`.
Adjust the comment on top of `struct inf` to clarify the purpose of that
type.
Co-Authored-By: Andrea Monaco <andrea.monaco@autistici.org>
Change-Id: I2fe2f7f6ef61a38d79860fd262b08835c963fc77
We see some additional failures when running the testsuite against a GDB
compiled with ASan, compared to a GDB compiled without ASan. Some of
them are caused by the memory leak report shown by the GDB process when
it exits, and the fact that it makes it exit with a non-zero exit code.
I generally try to remember to set ASAN_OPTIONS=detect_leaks=0 in my
environment when running the tests, but I don't always do it. I think
it would be nice if the testsuite did it. I don't see any use to have
leak detection when running the tests. That is, unless we ever have a
test that ensures GDB doesn't leak memory, which isn't going to happen
any time soon.
Here are some tests I found that were affected by this:
gdb.base/batch-exit-status.exp
gdb.base/many-headers.exp
gdb.base/quit.exp
gdb.base/with-mf.exp
gdb.dwarf2/gdb-add-index.exp
gdb.dwarf2/gdb-add-index-symlink.exp
gdb.dwarf2/imported-unit-runto-main.exp
Change-Id: I784c7df8a13979eb96587f735c1d33ba2cc6e0ca
While looking at an apparently malformed executable with
"readelf --debug-dump=loc", I got this warning:
readelf: ./main: Warning: There is a hole [0x89 - 0x95] in .debug_loc section.
However, the executable only has a .debug_loclists section.
This patch fixes the warning messages in display_debug_loc to use the
name of the section that is being processed.
binutils/ChangeLog
2021-11-03 Tom Tromey <tromey@adacore.com>
* dwarf.c (display_debug_loc): Use section name in warnings.
The current register set selection mechanism for AArch64 is static, based
on a pre-populated array of register sets.
This means that we might potentially probe register sets that are not
available. This is OK if the kernel errors out during ptrace, but probing the
tag_ctl register, for example, does not result in a ptrace error if the kernel
supports the tagged address ABI but not MTE (PR 28355).
Making the register set selection dynamic, based on feature checks, solves
this and simplifies the code a bit. It allows us to list all of the register
sets only once, and pick and choose based on HWCAP/HWCAP2 or other properties.
I plan to backport this fix to GDB 11 as well.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28355
Currently for a binary compiled normally (without -fsanitize=address) but with
LD_PRELOAD of ASAN one gets:
$ ASAN_OPTIONS=detect_leaks=0:alloc_dealloc_mismatch=1:abort_on_error=1:fast_unwind_on_malloc=0 LD_PRELOAD=/usr/lib64/libasan.so.6 gdb
=================================================================
==1909567==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete []) on 0x602000001570
#0 0x7f1c98e5efa7 in operator delete[](void*) (/usr/lib64/libasan.so.6+0xb0fa7)
...
0x602000001570 is located 0 bytes inside of 2-byte region [0x602000001570,0x602000001572)
allocated by thread T0 here:
#0 0x7f1c98e5cd1f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xaed1f)
#1 0x557ee4a42e81 in operator new(unsigned long) (/usr/libexec/gdb+0x74ce81)
SUMMARY: AddressSanitizer: alloc-dealloc-mismatch (/usr/lib64/libasan.so.6+0xb0fa7) in operator delete[](void*)
==1909567==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0
==1909567==ABORTING
Despite the code called properly operator new[] and operator delete[].
But GDB's new-op.cc provides its own operator new[] which gets translated into
malloc() (which gets recogized as operatore new(size_t)) but as it does not
translate also operators delete[] Address Sanitizer gets confused.
The question is how many variants of the delete operator need to be provided.
There could be 14 operators new but there are only 4, GDB uses 3 of them.
There could be 16 operators delete but there are only 6, GDB uses 2 of them.
It depends on libraries and compiler which of the operators will get used.
Currently being used:
U operator new[](unsigned long)
U operator new(unsigned long)
U operator new(unsigned long, std::nothrow_t const&)
U operator delete[](void*)
U operator delete(void*, unsigned long)
Tested on x86_64-linux.
yyleng gives the pattern length, xstrdup just copies up to the NUL.
So it is quite possible writing at an index of yyleng-2 overflows
the xstrdup allocated string buffer. xmemdup quite handily avoids
this problem, even writing the terminating NUL over the trailing
quote. Use it in ldlex.l too where we'd already had a report of this
problem and fixed it by hand, and to implement xmemdup0 in gas.
binutils/
* deflex.l (single and double quote strings): Use xmemdup.
gas/
* as.h (xmemdup0): Use xmemdup.
ld/
PR 20906
* ldlex.l (double quote string): Use xmemdup.
These are marked inline, so building w/gcc at higher optimization
levels will automatically discard them. But building with -O0 will
trigger unused function warnings, so fix that.
The common before/after cover functions in the common mloop generator
are not used by all architecture ports. Doesn't seem to be a hard
requirement, so marking them optional (i.e. unused) is fine.
The cris execute function is conditionally used depending on the
fast-build mode settings, so mark it unused too.
That assert would be more obvious if it were reported as
"addr_ranges <= end_ranges". Fix that by using the obvious variable
in the final loop. Stop the assertion by using a signed comparison:
It's possible for the rounding up of the arange pointer to exceed the
end of the block when the block size is fuzzed.
* dwarf.c (display_debug_aranges): Use "end_ranges" in loop
displaying ranges rather that "start". Simplify rounding up
to 2*address_size boundary. Use signed comparison in loop.
These rules don't depend on the target compiler settings, so hoist
the build logic up to the common builds for better parallelization.
We have to extend the genmloop.sh logic a bit to allow outputting
to a subdir since it always assumed cwd was the right place.
We leave the cgen maintainer rules in the subdirs for now as they
aren't normally run, and they rely on cgen logic that has not yet
been generalized.
These rules don't depend on the target compiler settings, so hoist
the build logic up to the common builds for better parallelization.
We leave the mips rules in place as they depend on complicated
arch-specific configure logic that needs to be untangled first.
This file doesn't use anything from bfd (sysdep.h), so drop that
include. This avoids an implicit dependency on the generated
config.h which can be problematic for build-time tools.
Also swap stdio.h for stddef.h. This file isn't doing or using
any I/O structures, but it does need NULL.
This patch removes any fake (linker created) function descriptor
symbol if its code entry symbol isn't dynamic, to ensure bogus dynamic
symbols are not created. The change to func_desc_adjust requires that
it be run only once, which means ppc64_elf_tls_setup can't call it for
just a few selected symbols.
PR 28523
* elf64-ppc.c (func_desc_adjust): If a function entry sym is
not dynamic and has no plt entry, hide any associated fake
function descriptor symbol.
(ppc64_elf_edit): Move func_desc_adjust iteration over syms to..
(ppc64_elf_tls_setup): ..here.
I ran into a case where a breakpoint on _exit never triggered, because it was
set past the end of the _exit prologue, past the end of the exit_group system
call (which does not return).
More concretely, the breakpoint was set at the last insn show here:
...
Dump of assembler code for function _exit:
0x00007ffff7e42ea0 <+0>: 12 00 4c 3c addis r2,r12,18
0x00007ffff7e42ea4 <+4>: 60 43 42 38 addi r2,r2,17248
0x00007ffff7e42ea8 <+8>: 00 00 00 60 nop
0x00007ffff7e42eac <+12>: f8 ff e1 fb std r31,-8(r1)
0x00007ffff7e42eb0 <+16>: 78 1b 7f 7c mr r31,r3
0x00007ffff7e42eb4 <+20>: f0 ff c1 fb std r30,-16(r1)
0x00007ffff7e42eb8 <+24>: ea 00 00 38 li r0,234
0x00007ffff7e42ebc <+28>: a0 8b 22 e9 ld r9,-29792(r2)
0x00007ffff7e42ec0 <+32>: 78 fb e3 7f mr r3,r31
0x00007ffff7e42ec4 <+36>: 14 6a c9 7f add r30,r9,r13
0x00007ffff7e42ec8 <+40>: 02 00 00 44 sc
0x00007ffff7e42ecc <+44>: 26 00 00 7c mfcr r0
0x00007ffff7e42ed0 <+48>: 00 10 09 74 andis. r9,r0,4096
...
Fix this by treating system calls the same as branches in skip_prologue:
by default, don't skip, such that the breakpoint is set at 0x00007ffff7e42eb8
instead.
Tested on ppc64le-linux, on a power 8 machine.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28527
On powerpc64le-linux, with test-case gdb.arch/powerpc-addpcis.exp I run into
SIGILL:
...
(gdb) PASS: gdb.arch/powerpc-addpcis.exp: get hexadecimal valueof "$r3"
stepi^M
^M
Program terminated with signal SIGILL, Illegal instruction.^M
The program no longer exists.^M
(gdb) PASS: gdb.arch/powerpc-addpcis.exp: set r4
...
because it's a power9 insn, and I'm running on a power8 machine.
Fix this by handling the SIGILL. Likewise in gdb.arch/powerpc-lnia.exp.
Tested on powerpc64le-linux.
When running test-case gdb.base/step-indirect-call-thunk.exp with target board
unix/-m32/-fPIE/-pie, I run into:
...
(gdb) stepi^M
0x5655552e 22 { /* inc.1 */^M
(gdb) stepi^M
0x56555530 22 { /* inc.1 */^M
(gdb) stepi^M
0x565555f7 in __x86.get_pc_thunk.ax ()^M
(gdb) FAIL: gdb.base/step-indirect-call-thunk.exp: stepi into return thunk
...
In contrast, with unix/-m32 we have instead:
...
(gdb) stepi^M
0x08048407 22 { /* inc.1 */^M
(gdb) stepi^M
23 return x + 1; /* inc.2 */^M
(gdb) stepi^M
0x0804840c 23 return x + 1; /* inc.2 */^M
(gdb) stepi^M
24 } /* inc.3 */^M
(gdb) stepi^M
0x08048410 24 } /* inc.3 */^M
(gdb) stepi^M
0x0804848f in __x86_return_thunk ()^M
(gdb) PASS: gdb.base/step-indirect-call-thunk.exp: stepi into return thunk
...
The test-case doesn't expect to run into __x86.get_pc_thunk.ax, which is a
PIC helper function for x86_64-linux.
Fix this by insn-stepping through it.
Likewise in a few other test-cases.
Tested on x86_64-linux.
Updated manpages to be consistent with help information provided by the
binary. The main changes are:
* Making all long-form options have '--', instead of a single '-';
* added most of the missing options to the manpage;
* removed the information about using '+' instead of '-', since it
doesn't seem to be supported anymore.
This also fixes 2 upstream bugs:
* https://sourceware.org/bugzilla/show_bug.cgi?id=23965; by adding
--args to the manpage
* https://sourceware.org/bugzilla/show_bug.cgi?id=10619; by adding the
double dashes
"tocopy" in this code was an int, which when the size to be copied was
larger than MAXINT could result in tocopy being negative. A negative
value of course is less than BUFSIZE, but when converted to
bfd_size_type is extremely large.
PR 995
* objcopy.c (copy_unknown_object): Correct calculation of "tocopy".
Use better variable types.
Clean up the warnings in sim-if, then reduce the -Werror disable to
the files that still aren't clean that now that we require GNU make
and can set variables on a per-object basis.
Clean up some warnings in dv-lm32cpu, and all in sim-if, then reduce
the -Werror disable to the files that still aren't clean that now that
we require GNU make and can set variables on a per-object basis.
Only two files in here still generates warnings, so reduce the -Werror
disable to that now that we require GNU make and can set variables on
a per-object basis.
Only two files in here still generates warnings, so reduce the -Werror
disable to that now that we require GNU make and can set variables on
a per-object basis.
Fix a few printf warnings in sim-main.c, and then we're left with only
one file in here still generating warnings, so reduce the -Werror
disable to that alone now that we require GNU make and can set variables
on a per-object basis.
Only one file in here still generates warnings, so reduce the -Werror
disable to that alone now that we require GNU make and can set variables
on a per-object basis.