New in v2:
- Disable sharing only for -readnow objfiles, not all objfiles.
As described in PR 27541, we hit an internal error when loading a binary
the standard way and then loading it with the -readnow option:
$ ./gdb -nx -q --data-directory=data-directory ~/a.out -ex "set confirm off" -ex "file -readnow ~/a.out"
Reading symbols from /home/simark/a.out...
Reading symbols from ~/a.out...
/home/simark/src/binutils-gdb/gdb/dwarf2/read.c:8098: internal-error: void create_all_comp_units(dwarf2_per_objfile*): Assertion `per_objfile->per_bfd->all_comp_units.empty ()' failed.
This is a recurring problem that exposes a design issue in the DWARF
per-BFD sharing feature. Things work well when loading a binary with
the same method (with/without index, with/without readnow) twice in a
row. But they don't work so well when loading a binary with different
methods. See this previous fix, for example:
efb763a5ea ("gdb: check for partial symtab presence in dwarf2_initialize_objfile")
That one handled the case where the first load is normal (uses partial
symbols) and the second load uses an index.
The problem is that when loading an objfile with a method A, we create a
dwarf2_per_bfd and some dwarf2_per_cu_data and initialize them with the
data belonging to that method. When loading another obfile sharing the
same BFD but with a different method B, it's not clear how to re-use the
dwarf2_per_bfd/dwarf2_per_cu_data previously created, because they
contain the data specific to method A.
I think the most sensible fix would be to not share a dwarf2_per_bfd
between two objfiles loaded with different methods. That means that two
objfiles sharing the same BFD and loaded the same way would share a
dwarf2_per_bfd. Two objfiles sharing the same BFD but loaded with
different methods would use two different dwarf2_per_bfd structures.
However, this isn't a trivial change. So to fix the known issue quickly
(including in the gdb 10 branch), this patch just disables all
dwarf2_per_bfd sharing for objfiles using READNOW.
Generalize the gdb.base/index-cache-load-twice.exp test to test all
the possible combinations of loading a file with partial symtabs, index
and readnow. Move it to gdb.dwarf2, since it really exercises features
of the DWARF reader.
gdb/ChangeLog:
PR gdb/27541
* dwarf2/read.c (dwarf2_has_info): Don't share dwarf2_per_bfd
with objfiles using READNOW.
gdb/testsuite/ChangeLog:
PR gdb/27541
* gdb.base/index-cache-load-twice.exp: Remove.
* gdb.base/index-cache-load-twice.c: Remove.
* gdb.dwarf2/per-bfd-sharing.exp: New.
* gdb.dwarf2/per-bfd-sharing.c: New.
Change-Id: I9ffcf1e136f3e75242f70e4e58e4ba1fd3083389
Since commit 27012aba8a "Remove Irix 6 workaround from DWARF abbrev reader"
we have:
...
(gdb) file dw2-cu-size^M
Reading symbols from dw2-cu-size...^M
DW_FORM_strp pointing outside of .debug_str section [in module dw2-cu-size]^M
(No debugging symbols found in dw2-cu-size)^M
(gdb) ptype noloc^M
No symbol table is loaded. Use the "file" command.^M
(gdb) FAIL: gdb.dwarf2/dw2-cu-size.exp: ptype noloc
...
The problem is a missing .debug_abbrev terminator in dw2-cu-size.S, which
causes the .debug_abbrev contribution to be merged with the next:
...
Number TAG (0x9b)
1 DW_TAG_compile_unit [has children]
DW_AT_name DW_FORM_string
DW_AT_producer DW_FORM_string
DW_AT_language DW_FORM_data1
DW_AT value: 0 DW_FORM value: 0
2 DW_TAG_variable [no children]
DW_AT_name DW_FORM_string
DW_AT_type DW_FORM_ref4
DW_AT_external DW_FORM_flag
DW_AT value: 0 DW_FORM value: 0
3 DW_TAG_base_type [no children]
DW_AT_name DW_FORM_string
DW_AT_byte_size DW_FORM_data1
DW_AT_encoding DW_FORM_data1
DW_AT value: 0 DW_FORM value: 0
4 DW_TAG_const_type [no children]
DW_AT_type DW_FORM_ref_udata
DW_AT value: 0 DW_FORM value: 0
1 DW_TAG_compile_unit [has children]
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
DW_AT_name DW_FORM_strp
DW_AT_comp_dir DW_FORM_strp
DW_AT_low_pc DW_FORM_addr
DW_AT_high_pc DW_FORM_data8
DW_AT_stmt_list DW_FORM_sec_offset
DW_AT value: 0 DW_FORM value: 0
...
and consequently, abbreviation code 1 no longer refers to a unique entry.
The eventually causes us to access the first attribute of this DIE:
...
<0><124>: Abbrev Number: 1 (DW_TAG_compile_unit)
<125> DW_AT_name : file1.txt
<12f> DW_AT_producer : GNU C 3.3.3
<13b> DW_AT_language : 1 (ANSI C)
...
which has form DW_FORM_string, using DW_FORM_strp.
Fix this by adding the missing .debug_abbrev terminator in dw2-cu-size.S.
gdb/testsuite/ChangeLog:
2021-03-30 Tom de Vries <tdevries@suse.de>
PR testsuite/27604
* gdb.dwarf2/dw2-cu-size.S: Add missing .debug_abbrev terminator.
During reviews, I changed the success/failure variables from int to bool, but
missed updating the code in a couple spots. Given the logic inversion, the
gdbserver code fails instead of succeeding.
Fixed with the following patch. Seems fairly obvious, so I'll push it soon.
gdbserver/ChangeLog:
2021-03-30 Luis Machado <luis.machado@linaro.org>
* server.cc (handle_general_set, handle_query): Update variable
to bool and fix verification logic.
The former two are unused anyway. And having such constants isn't very
helpful either, when they live in a place where updating the register
table wouldn't even allow noticing the need to adjust these constants.
st(1) ... st(7) will never be looked up in the hash table, so there's no
point inserting the entries. It's also not really necessary to do a 2nd
hash lookup after parsing the register number, nor is there a real
reason for having both st and st(0) entries. Plus we can easily do away
with the need for st to be first in the table.
There's no need for the extra level of indirection and the extra storage
needed for the pointer, pointing from one piece of static data to
another. Key checking of rounding being in effect off of the type field
of the structure instead.
There's no need for the extra level of indirection and the extra storage
needed for the pointer, pointing from one piece of static data to
another. Key checking of broadcast being in effect off of the type field
of the structure instead.
There's no need for the extra level of indirection and the extra storage
needed for the pointer, pointing from one piece of static data to
another. Key checking of masking being in effect off of the register
field of the structure instead.
All callers pass unsigned values (in some cases by virtue of passing
non-negative literal numbers).
This in turn requires struct {Mask,RC,Broadcast}_Operation's "operand"
fields to become unsigned, in turn allowing to reduce the amount of
casting needed (the two new casts that are necessary cast _to_
"unsigned" instead of _from_, as that's the form that'll never case
undefined behavior).
Seen after converting bfd_boolean to bool.
mmix +FAIL: ld-mmix/zeroehmmo
./ld-new -L/home/alan/src/binutils-gdb/ld/testsuite/ld-mmix -m mmo -Ttext 0xa00 -T /home/alan/src/binutils-gdb/ld/testsuite/ld-mmix/zeroeh.ld -o tmpdir/dump tmpdir/x.o tmpdir/y.o
/home/alan/src/binutils-gdb/bfd/linker.c:2294:8: runtime error: load of value 253, which is not a valid value for type '_Bool'
* elflink.c (elf_link_add_object_symbols): Don't set h->indx
unless is_elf_hash_table.
This patch supports linking powerpc64 glibc with gold, specifically
the __tls_get_addr call in elf/dl-sym.c. That call lacks marker
relocations tying it to the arg setup instructions, but the arg setup
insns are also contructed lacking the usual relocations on a Global
Dynamic TLS code sequence. So there is no chance that anything in
that sequence might be wrongly edited by the linker.
In fact, the aim of linking glibc could have been supported by simply
omitting the error whenever TLS optimisation is disabled, as it is
when linking a shared library. The patch goes further than that,
disabling TLS GD and LD sequence optimisation on a per-object basis
for object files lacking marker relocs.
PR gold/27625
* powerpc.cc (Powerpc_relobj): Add no_tls_marker_, tls_marker_,
and tls_opt_error_ variables and accessors.
(Target_powerpc::Scan::local, global): Call set_tls_marker and
set_no_tls_marker for GD and LD code sequence relocations.
(Target_powerpc::Relocate::relocate): Downgrade the "lacks marker
reloc" error to a warning when safe to do so, and omit the error
entirely if not optimising TLS sequences. Do not optimise GD and
LD sequences for objects lacking marker relocs.
(Target_powerpc::relocate_relocs): Heed no_tls_marker here too.
I noticed that language_info is only ever called with a value of '1'.
This patch removes the parameter.
2021-03-29 Tom Tromey <tromey@adacore.com>
* top.c (check_frame_language_change): Update.
* language.c (language_info): Remove parameter.
* language.h (language_info): Remove parameter.
On aarch64-linux, I noticed the compile command didn't work at all. It
always gave the following error:
aarch64-linux-gnu-g++: error: : No such file or directory
Turns out we're passing an empty argv entry to GCC (because aarch64 doesn't
have a -m64 option), and GCC's behavior is to think that is a file it needs
to open. One can reproduce it like so:
gcc "" "" "" ""
gcc: error: : No such file or directory
gcc: error: : No such file or directory
gcc: error: : No such file or directory
gcc: error: : No such file or directory
gcc: fatal error: no input files
compilation terminated.
The solution is to check for an empty string and skip adding that to argv.
Regression tested on aarch64-linux/Ubuntu 18.04/20.04.
gdb/ChangeLog:
2021-03-29 Luis Machado <luis.machado@linaro.org>
* compile/compile.c (get_args): Don't add empty argv entries.
It was reported to me that on Ubuntu 14.04 (fairly old) the documentation
fails to build with the following:
gdb/doc/gdb.texinfo:10888: warning: node `Memory' is up for `Memory Tagging' in sectioning but not in menu
gdb/doc/gdb.texinfo:10693: node `Memory' lacks menu item for `Memory Tagging' despite being its Up target
Makefile:491: recipe for target 'gdb.info' failed
make[3]: *** [gdb.info] Error 1
This doesn't seem to happen on Ubuntu 18.04/20.04, but it does make sense.
Fix this by turning @subsection into a @section and adding the
"Memory Tagging" entry to the menu.
gdb/doc/ChangeLog:
2021-03-29 Luis Machado <luis.machado@linaro.org>
* gdb.textinfo (Memory Tagging): Make it a @section.
This test causes several timeouts for Clang, taking too long time to
finish. The reason is, for an infinite loop of the form
while(1); /* suppose this is line 30. */
Clang generates code that looks like
0x00000000004004d4 <+4>: jmp 0x4004d9 <loop+9>
0x00000000004004d9 <+9>: jmp 0x4004d9 <loop+9>
So, the real loop is the instruction at address 0x4004d9. But a
breakpoint that's defined at the loop line (assume line 30 in this
case) is inserted at address 0x4004d4.
(gdb) break 30
Breakpoint 1 at 0x4004d4: file test.c, line 30.
Therefore, continuing a thread that was spinning on the loop does not hit
the breakpoint. The bug is reported at
https://bugs.llvm.org/show_bug.cgi?id=49614
Tweak the infinite loop to spin on a variable to avoid this bug. The
test is unrelated to the bug.
gdb/testsuite/ChangeLog:
2021-03-29 Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
* gdb.mi/user-selected-context-sync.exp: Spin on a variable in
the infinite loop to avoid a Clang bug.
Since c8fbd44a01 (gdb: remove
target_is_pushed free function), procfs.c compilation is broken, which
went unnoticed for lack of functioning buildbots:
/vol/src/gnu/gdb/hg/master/dist/gdb/procfs.c: In member function 'virtual void procfs_target::attach(const char*, int)':
/vol/src/gnu/gdb/hg/master/dist/gdb/procfs.c:1772:8: error: 'inf' was not declared in this scope; did you mean 'info'?
1772 | if (!inf->target_is_pushed (this))
| ^~~
| info
/vol/src/gnu/gdb/hg/master/dist/gdb/procfs.c: In member function 'virtual void procfs_target::create_inferior(const char*, const string&, char**, int)':
/vol/src/gnu/gdb/hg/master/dist/gdb/procfs.c:2865:8: error: 'inf' was not declared in this scope; did you mean 'info'?
2865 | if (!inf->target_is_pushed (this))
| ^~~
| info
Fixed by defining inf. Tested on amd64-pc-solaris2.11 and
sparcv9-sun-solaris2.11.
2021-03-29 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gdb:
* procfs.c (procfs_target::attach): Define inf.
Use it.
(procfs_target::create_inferior): Likewise.
For a long time there hasn't been a need anymore to keep together all
templates with identical mnemonics. Move the MOVQ and MOVABS ones next
to their MOV counterparts. Move the string forms of CMPSD and MOVSD next
to their CMPS / MOVS counterparts. Re-arrange what so fgar was the SSE3
section.
This makes reasonably obvious that MONITOR/MWAIT aren't suitable to
cover by CpuSSE3, but adjusting this is left for another time.
In commit 79dec6b7ba ("x86-64: optimize certain commutative
VEX-encoded insns") I missed the fact that there being subtraction
involved here doesn't matter, as absolute differences get summed up.
This way not only the overall (source) table size shrinks by quite a
bit and the risk of related templates going out of sync with one another
gets lowered, but also (dis)similarities between neighboring templates
become easier to spot.
Note that for certain SSE2AVX templates this results in benign attribute
changes:
- LDMXCSR and STMXCSR: NoAVX gets set,
- MOVMSKPS, PMOVMSKB, PEXTR{B,W} (register destination), and PINSR{B,W}
(register source): IgnoreSize and NoRex64 get set,
- CVT{DQ,PS}2PD, CVTSD2SS, MOVMSKPD, MOVDDUP, PMOV{S,Z}X{BW,WD,DQ}, and
ROUNDSD: NoRex64 gets set,
- CVTSS2SD, INSERTPS, PEXTRW (memory destination), PINSRW (memory
source), and PMOV{S,Z}X{BD,WQ,BQ}: IgnoreSize gets set.
Similarly the "normal" (non-SSE2AVX)
- non-64-bit CVTS{I,S}2SD forms get NoRex64 set,
- CMP{EQ,ORD,NEQ,UNORD}{P,S}{S,D} forms get C set,
all again in a benign way.
The remaining differences in the generated table are due to re-ordering
of entries in the course of being folded into templates.
The table entries are more natural to read (and slightly shorter) when
the prefixes, like is the case for VEX/XOP/EVEX-encoded entries, are
specified as part of the opcode. This is particularly noticable for
side-by-side legacy and SSE2AVX entries.
An implication is that we now need to use "unsigned long long" for the
initially parsed opcode in i386-gen. I don't expect this to be an issue.
Now that all base opcodes are only at most 2 bytes in size, shrink its
template field to just as much. By also shrinking extension_opcode and
operands to just what they really need, we can free up an entire 32-bit
slot (plus 4 left bits past the bitfields themselves).
At present this alters sizeof(struct insn_template) only for 32-bit
builds. In 64-bit builds it instead leaves a padding hole that will
allow to buffer future growth of other fields (opcode_modifier,
cpu_flags, operand_types[]).
Just like is already done for VEX/XOP/EVEX encoded insns, record the
encoding space information in the respective opcode modifier field. Do
this again without changing the source table, but rather by deriving the
values from their existing source representation.
* objdump.c (process_links): Use type int.
* readelf.c (request_dump): Don't increment do_dump, set it.
* windint.h (target_is_bigendian): Use type bfd_boolean.
* windmc.c (target_is_bigendian): Likewise.
* windres.c (target_is_bigendian): Likewise.
elf_backend_link_output_symbol_hook and elf_link_output_symstrtab may
return 2 when a symbol is to be discarded. Update places that use
bfd_boolean rather than int for these functions.
* elflink.c (elf_link_output_symstrtab): Make flinfo parameter
a void pointer.
(bfd_elf_final_link): Delete out_sym_func typedef and don't cast
elf_link_output_symstrtab when calling output_arch_syms and
output_arch_local_syms.
* elf-bfd.h (struct elf_backend_data <elf_backend_output_arch_syms,
elf_backend_output_arch_local_syms>): Change return type of func
arg to match elf_link_output_symstrtab.
* elf-vxworks.h (elf_vxworks_link_output_symbol_hook): Correct
return type.
* elf32-nds32.c (nds32_elf_output_symbol_hook): Correct return type.
(nds32_elf_output_arch_syms): Correct func return type.
Now that the quick functions are separate from the object file format,
there's no need to have elfread.c push a new entry on the objfile 'qf'
list. Instead, this detail can be pushed into the DWARF reader. That
is what this patch implements.
I wasn't sure whether lazy reading still makes sense or not. It's
still only used by ELF, and only in certain situations (like vfork, I
think). It may not be carrying its weight, so we may want to consider
removing this in the future.
Also, I'm unclear on why the various indices are only used for ELF.
This seems sub-optimal. However, I haven't tried to address that
here.
gdb/ChangeLog
2021-03-28 Tom Tromey <tom@tromey.com>
* elfread.c (can_lazily_read_symbols): Move to dwarf2/read.c.
(elf_symfile_read): Simplify.
* dwarf2/read.c (struct lazy_dwarf_reader): Move from elfread.c.
(make_lazy_dwarf_reader): New function.
(make_dwarf_gdb_index, make_dwarf_debug_names): Now static.
(dwarf2_initialize_objfile): Return void. Remove index_kind
parameter. Push on 'qf' list.
* dwarf2/public.h (dwarf2_initialize_objfile): Change return
type. Remove 'index_kind' parameter.
(make_dwarf_gdb_index, make_dwarf_debug_names): Don't declare.
An earlier patch neglected to delete a forward declaration of
elf_sym_fns_lazy_psyms. This is no longer defined. This patch
removes it.
gdb/ChangeLog
2021-03-27 Tom Tromey <tom@tromey.com>
* elfread.c (elf_sym_fns_lazy_psyms): Don't declare.
I noticed that I forgot to make a change in my series to make it
possible to attach multiple debug readers to an objfile. In one spot,
elf_symfile_read still clears the 'qf' list. However, this should
have been removed toward the end of that series.
This patch fixes the offending spot. Tested on x86-64 Fedora 32.
gdb/ChangeLog
2021-03-27 Tom Tromey <tom@tromey.com>
* elfread.c (elf_symfile_read): Don't clear 'qf'.
Resolve some duplicate test name warnings in gdb.arch/powerpc-*.exp
tests by either extending the existing test names, or providing a new
test name.
gdb/testsuite/ChangeLog:
* gdb.arch/powerpc-disassembler-options.exp: Extend some test
names for uniqueness.
* gdb.arch/powerpc-fpscr-gcore.exp: Add more test names for
uniqueness.
While working on gdb-add-index.sh, it appeared that it uses the non
POSIX 'local' keyword. Instead of using local to allow variable
shadowing, I rename the local one to avoid name conflicts altogether.
This commit gets rid of the following shellcheck warning:
In gdb-add-index.sh line 63:
local file="$1"
^--------^ SC2039: In POSIX sh, 'local' is undefined.
gdb/ChangeLog:
* contrib/gdb-add-index.sh: Avoid variable shadowing and get
rid of 'local'.
This changes quick_symbol_functions::map_symbol_filenames to use a
function_view, and updates all the uses. It also changes the final
parameter to 'bool'. A couple of spots are further updated to use
operator() rather than a lambda.
gdb/ChangeLog
2021-03-26 Tom Tromey <tom@tromey.com>
* symtab.c (struct output_source_filename_data): Add 'output'
method and operator().
(output_source_filename_data::output): Rename from
output_source_filename.
(output_partial_symbol_filename): Remove.
(info_sources_command): Update.
(struct add_partial_filename_data): Add operator().
(add_partial_filename_data::operator()): Rename from
maybe_add_partial_symtab_filename.
(make_source_files_completion_list): Update.
* symfile.c (quick_symbol_functions): Update.
* symfile-debug.c (objfile::map_symbol_filenames): Update.
* quick-symbol.h (symbol_filename_ftype): Change type of 'fun' and
'need_fullname'. Remove 'data' parameter.
(struct quick_symbol_functions) <map_symbol_filenames>: Likewise.
* psymtab.c (psymbol_functions::map_symbol_filenames): Update.
* psympriv.h (struct psymbol_functions) <map_symbol_filenames>:
Change type of 'fun' and 'need_fullname'. Remove 'data'
parameter.
* objfiles.h (struct objfile) <map_symbol_filenames>: Change type
of 'fun' and 'need_fullname'. Remove 'data' parameter.
* mi/mi-cmd-file.c (print_partial_file_name): Remove 'ignore'
parameter.
(mi_cmd_file_list_exec_source_files): Update.
* dwarf2/read.c
(dwarf2_base_index_functions::map_symbol_filenames): Update.
I noticed that ada-lang.c creates a lambda to call
aux_add_nonlocal_symbols. However, this code can be simplified a bit
by changing match_data to implement operator(), and then simply
passing the object as the callback. That is what this patch
implements.
gdb/ChangeLog
2021-03-26 Tom Tromey <tom@tromey.com>
* ada-lang.c (struct match_data): Add operator().
(match_data::operator()): Rename from aux_add_nonlocal_symbols.
(callback): Remove 'callback'.
I noticed that psymbol_functions::expand_symtabs_matching calls
make_ignore_params once per psymtab that is matched. This seems
possibly expensive, so this patch hoists the call out of the loop.
gdb/ChangeLog
2021-03-26 Tom Tromey <tom@tromey.com>
* psymtab.c (psymbol_functions::expand_symtabs_matching): Only
call make_ignore_params once.
Currently the psymtab variant of expand_symtabs_matching has this
check:
/* We skip shared psymtabs because file-matching doesn't apply
to them; but we search them later in the loop. */
if (ps->user != NULL)
continue;
In a larger series I'm working on, it's convenient to remove this
check. And, I noticed that a similar check is not done for
expand_symtabs_with_fullname. So, it made sense to me to remove the
check here as well.
gdb/ChangeLog
2021-03-26 Tom Tromey <tom@tromey.com>
* psymtab.c (psymbol_functions::expand_symtabs_matching): Remove
"user" check.
A recent bug (RH BZ 1931344) has exposed a bug in the core file
build-ID support that I introduced a while ago. It is pretty
easy to demonstate the problem following a simplified procedure
outlined in that bug:
[shell1]
shell1$ /usr/libexec/qemu-kvm
[shell2]
shell2$ pkill -SEGV -x qemu-kvm
[shell1]
Segmentation fault (core dumped)
Load this core file into GDB without specifying an executable
(an unfortunate Fedora/RHEL-ism), and GDB will inform the user
to install debuginfo for the "missing" executable:
$ gdb -nx -q core.12345
...
Missing separate debuginfo for the main executable file
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/e2/e9c66d3117fb2bbb5b2be122f04f2664e5df54
Core was generated by `/usr/libexec/qemu-kvm'.
Program terminated with signal SIGSEGV, Segmentation fault.
...
The suggested build-ID is actaully for gmp not qemu-kvm. The problem
lies in _bfd_elf_core_find_build_id, where we loop over program headers
looking for note segments:
/* Read in program headers and parse notes. */
for (i = 0; i < i_ehdr.e_phnum; ++i, ++i_phdr)
{
Elf_External_Phdr x_phdr;
if (bfd_bread (&x_phdr, sizeof (x_phdr), abfd) != sizeof (x_phdr))
goto fail;
elf_swap_phdr_in (abfd, &x_phdr, i_phdr);
if (i_phdr->p_type == PT_NOTE && i_phdr->p_filesz > 0)
{
elf_read_notes (abfd, offset + i_phdr->p_offset,
i_phdr->p_filesz, i_phdr->p_align);
if (abfd->build_id != NULL)
return TRUE;
}
elf_read_notes uses bfd_seek to forward the stream to the location of
the note segment. When control returns to _bfd_elf_core_fild_build_id,
the stream is no longer in the location looking at program headers, and
all subsequent reads will read from the wrong file offset.
To fix this, this patch marks the stream location and ensures
that it is restored after elf_read_notes is called.
bfd/ChangeLog
2021-03-26 Keith Seitz <keiths@redhat.com>
* elfcore.h (_bfd_elf_core_find_build_id): Seek file
offset of program headers after calling elf_read_notes.
This commit adds a couple of tests to the python pretty printer
testing.
I've added a test for the 'array' display hint. This display hint is
tested by gdb.python/py-mi.exp, however, the MI testing is done via
the varobj interface, and this code makes its own direct calls to the
Python pretty printers from gdb/varobj.c. What this means is that the
interface to the pretty printers in gdb/python/py-prettyprint.c is not
tested for the 'array' display hint path.
I also added a test for what happens when the display_hint method
raises an exception. There wasn't a bug that inspired this test, just
while adding the previous test I thought, I wonder what happens if...
The current behaviour of GDB seems reasonable, GDB displays the Python
exception, and then continues printing the value as if display_hint
had returned None. I added a test to lock in this behaviour.
gdb/testsuite/ChangeLog:
* gdb.python/py-prettyprint.c (struct container): Add 'is_array_p'
member.
(make_container): Initialise is_array_p.
* gdb.python/py-prettyprint.exp: Add new tests.
* gdb.python/py-prettyprint.py (ContainerPrinter.display_hint):
Check is_array_p and possibly return 'array'.
Rationale
---------
Let's say you have multiple threads hitting a conditional breakpoint
at the same time, and all of these are going to evaluate to false.
All these threads will need to be resumed.
Currently, GDB fetches one target event (one SIGTRAP representing the
breakpoint hit) and decides that the thread should be resumed. It
calls resume and commit_resume immediately. It then fetches the
second target event, and does the same, until it went through all
threads.
The result is therefore something like:
- consume event for thread A
- resume thread A
- commit resume (affects thread A)
- consume event for thread B
- resume thread B
- commit resume (affects thread B)
- consume event for thread C
- resume thread C
- commit resume (affects thread C)
For targets where it's beneficial to group resumptions requests (most
likely those that implement target_ops::commit_resume), it would be
much better to have:
- consume event for thread A
- resume thread A
- consume event for thread B
- resume thread B
- consume event for thread C
- resume thread C
- commit resume (affects threads A, B and C)
Implementation details
----------------------
To achieve this, this patch adds another check in
maybe_set_commit_resumed_all_targets to avoid setting the
commit-resumed flag of targets that readily have events to provide to
infrun.
To determine if a target has events readily available to report, this
patch adds an `has_pending_events` target_ops method. The method
returns a simple bool to say whether or not it has pending events to
report.
Testing
=======
To test this, I start GDBserver with a program that spawns multiple
threads:
$ ../gdbserver/gdbserver --once :1234 ~/src/many-threads-stepping-over-breakpoints/many-threads-stepping-over-breakpoints
I then connect with GDB and install a conditional breakpoint that always
evaluates to false (and force the evaluation to be done by GDB):
$ ./gdb -nx --data-directory=data-directory \
/home/simark/src/many-threads-stepping-over-breakpoints/many-threads-stepping-over-breakpoints \
-ex "set breakpoint condition-evaluation host" \
-ex "set pag off" \
-ex "set confirm off" \
-ex "maint set target-non-stop on" \
-ex "tar rem :1234" \
-ex "tb main" \
-ex "b 13 if 0" \
-ex c \
-ex "set debug infrun" \
-ex "set debug remote 1" \
-ex "set debug displaced"
I then do "continue" and look at the log.
The remote target receives a bunch of stop notifications for all
threads that have hit the breakpoint. infrun consumes and processes
one event, decides it should not cause a stop, prepares a displaced
step, after which we should see:
[infrun] maybe_set_commit_resumed_all_process_targets: not requesting commit-resumed for target remote, target has pending events
Same for a second thread (since we have 2 displaced step buffers).
For the following threads, their displaced step is deferred since
there are no more buffers available.
After consuming the last event the remote target has to offer, we get:
[infrun] maybe_set_commit_resumed_all_process_targets: enabling commit-resumed for target remote
[infrun] maybe_call_commit_resumed_all_process_targets: calling commit_resumed for target remote
[remote] Sending packet: $vCont;s:p14d16b.14d1b1;s:p14d16b.14d1b2#55
[remote] Packet received: OK
Without the patch, there would have been one vCont;s just after each
prepared displaced step.
gdb/ChangeLog:
yyyy-mm-dd Simon Marchi <simon.marchi@efficios.com>
Pedro Alves <pedro@palves.net>
* async-event.c (async_event_handler_marked): New.
* async-event.h (async_event_handler_marked): Declare.
* infrun.c (maybe_set_commit_resumed_all_targets): Switch to
inferior before calling target method. Don't commit-resumed if
target_has_pending_events is true.
* remote.c (remote_target::has_pending_events): New.
* target-delegates.c: Regenerate.
* target.c (target_has_pending_events): New.
* target.h (target_ops::has_pending_events): New target method.
(target_has_pending_events): New.
Change-Id: I18112ba19a1ff4986530c660f530d847bb4a1f1d
The rationale for this patch comes from the ROCm port [1], the goal
being to reduce the number of back and forths between GDB and the
target when doing successive operations. I'll start with explaining
the rationale and then go over the implementation. In the ROCm / GPU
world, the term "wave" is somewhat equivalent to a "thread" in GDB.
So if you read if from a GPU stand point, just s/thread/wave/.
ROCdbgapi, the library used by GDB [2] to communicate with the GPU
target, gives the illusion that it's possible for the debugger to
control (start and stop) individual threads. But in reality, this is
not how it works. Under the hood, all threads of a queue are
controlled as a group. To stop one thread in a group of running ones,
the state of all threads is retrieved from the GPU, all threads are
destroyed, and all threads but the one we want to stop are re-created
from the saved state. The net result, from the point of view of GDB,
is that the library stopped one thread. The same thing goes if we
want to resume one thread while others are running: the state of all
running threads is retrieved from the GPU, they are all destroyed, and
they are all re-created, including the thread we want to resume.
This leads to some inefficiencies when combined with how GDB works,
here are two examples:
- Stopping all threads: because the target operates in non-stop mode,
when the user interface mode is all-stop, GDB must stop all threads
individually when presenting a stop. Let's suppose we have 1000
threads and the user does ^C. GDB asks the target to stop one
thread. Behind the scenes, the library retrieves 1000 thread
states and restores the 999 others still running ones. GDB asks
the target to stop another one. The target retrieves 999 thread
states and restores the 998 remaining ones. That means that to
stop 1000 threads, we did 1000 back and forths with the GPU. It
would have been much better to just retrieve the states once and
stop there.
- Resuming with pending events: suppose the 1000 threads hit a
breakpoint at the same time. The breakpoint is conditional and
evaluates to true for the first thread, to false for all others.
GDB pulls one event (for the first thread) from the target, decides
that it should present a stop, so stops all threads using
stop_all_threads. All these other threads have a breakpoint event
to report, which is saved in `thread_info::suspend::waitstatus` for
later. When the user does "continue", GDB resumes that one thread
that did hit the breakpoint. It then processes the pending events
one by one as if they just arrived. It picks one, evaluates the
condition to false, and resumes the thread. It picks another one,
evaluates the condition to false, and resumes the thread. And so
on. In between each resumption, there is a full state retrieval
and re-creation. It would be much nicer if we could wait a little
bit before sending those threads on the GPU, until it processed all
those pending events.
To address this kind of performance issue, ROCdbgapi has a concept
called "forward progress required", which is a boolean state that
allows its user (i.e. GDB) to say "I'm doing a bunch of operations,
you can hold off putting the threads on the GPU until I'm done" (the
"forward progress not required" state). Turning forward progress back
on indicates to the library that all threads that are supposed to be
running should now be really running on the GPU.
It turns out that GDB has a similar concept, though not as general,
commit_resume. One difference is that commit_resume is not stateful:
the target can't look up "does the core need me to schedule resumed
threads for execution right now". It is also specifically linked to
the resume method, it is not used in other contexts. The target
accumulates resumption requests through target_ops::resume calls, and
then commits those resumptions when target_ops::commit_resume is
called. The target has no way to check if it's ok to leave resumed
threads stopped in other target methods.
To bridge the gap, this patch generalizes the commit_resume concept in
GDB to match the forward progress concept of ROCdbgapi. The current
name (commit_resume) can be interpreted as "commit the previous resume
calls". I renamed the concept to "commit_resumed", as in "commit the
threads that are resumed".
In the new version, we have two things:
- the commit_resumed_state field in process_stratum_target: indicates
whether GDB requires target stacks using this target to have
resumed threads committed to the execution target/device. If
false, an execution target is allowed to leave resumed threads
un-committed at the end of whatever method it is executing.
- the commit_resumed target method: called when commit_resumed_state
transitions from false to true. While commit_resumed_state was
false, the target may have left some resumed threads un-committed.
This method being called tells it that it should commit them back
to the execution device.
Let's take the "Stopping all threads" scenario from above and see how
it would work with the ROCm target with this change. Before stopping
all threads, GDB would set the target's commit_resumed_state field to
false. It would then ask the target to stop the first thread. The
target would retrieve all threads' state from the GPU and mark that
one as stopped. Since commit_resumed_state is false, it leaves all
the other threads (still resumed) stopped. GDB would then proceed to
call target_stop for all the other threads. Since resumed threads are
not committed, this doesn't do any back and forth with the GPU.
To simplify the implementation of targets, this patch makes it so that
when calling certain target methods, the contract between the core and
the targets guarantees that commit_resumed_state is false. This way,
the target doesn't need two paths, one for commit_resumed_state ==
true and one for commit_resumed_state == false. It can just assert
that commit_resumed_state is false and work with that assumption.
This also helps catch places where we forgot to disable
commit_resumed_state before calling the method, which represents a
probable optimization opportunity. The commit adds assertions in the
target method wrappers (target_resume and friends) to have some
confidence that this contract between the core and the targets is
respected.
The scoped_disable_commit_resumed type is used to disable the commit
resumed state of all process targets on construction, and selectively
re-enable it on destruction (see below for criteria). Note that it
only sets the process_stratum_target::commit_resumed_state flag. A
subsequent call to maybe_call_commit_resumed_all_targets is necessary
to call the commit_resumed method on all target stacks with process
targets that got their commit_resumed_state flag turned back on. This
separation is because we don't want to call the commit_resumed methods
in scoped_disable_commit_resumed's destructor, as they may throw.
On destruction, commit-resumed is not re-enabled for a given target
if:
1. this target has no threads resumed, or
2. this target has at least one resumed thread with a pending status
known to the core (saved in thread_info::suspend::waitstatus).
The first point is not technically necessary, because a proper
commit_resumed implementation would be a no-op if the target has no
resumed threads. But since we have a flag do to a quick check, it
shouldn't hurt.
The second point is more important: together with the
scoped_disable_commit_resumed instance added in fetch_inferior_event,
it makes it so the "Resuming with pending events" described above is
handled efficiently. Here's what happens in that case:
1. The user types "continue".
2. Upon destruction, the scoped_disable_commit_resumed in the
`proceed` function does not enable commit-resumed, as it sees some
threads have pending statuses.
3. fetch_inferior_event is called to handle another event, the
breakpoint hit evaluates to false, and that thread is resumed.
Because there are still more threads with pending statuses, the
destructor of scoped_disable_commit_resumed in
fetch_inferior_event still doesn't enable commit-resumed.
4. Rinse and repeat step 3, until the last pending status is handled
by fetch_inferior_event. In that case,
scoped_disable_commit_resumed's destructor sees there are no more
threads with pending statues, so it asks the target to commit
resumed threads.
This allows us to avoid all unnecessary back and forths, there is a
single commit_resumed call once all pending statuses are processed.
This change required remote_target::remote_stop_ns to learn how to
handle stopping threads that were resumed but pending vCont. The
simplest example where that happens is when using the remote target in
all-stop, but with "maint set target-non-stop on", to force it to
operate in non-stop mode under the hood. If two threads hit a
breakpoint at the same time, GDB will receive two stop replies. It
will present the stop for one thread and save the other one in
thread_info::suspend::waitstatus.
Before this patch, when doing "continue", GDB first resumes the thread
without a pending status:
Sending packet: $vCont;c:p172651.172676#f3
It then consumes the pending status in the next fetch_inferior_event
call:
[infrun] do_target_wait_1: Using pending wait status status->kind = stopped, signal = GDB_SIGNAL_TRAP for Thread 1517137.1517137.
[infrun] target_wait (-1.0.0, status) =
[infrun] 1517137.1517137.0 [Thread 1517137.1517137],
[infrun] status->kind = stopped, signal = GDB_SIGNAL_TRAP
It then realizes it needs to stop all threads to present the stop, so
stops the thread it just resumed:
[infrun] stop_all_threads: Thread 1517137.1517137 not executing
[infrun] stop_all_threads: Thread 1517137.1517174 executing, need stop
remote_stop called
Sending packet: $vCont;t:p172651.172676#04
This is an unnecessary resume/stop. With this patch, we don't commit
resumed threads after proceeding, because of the pending status:
[infrun] maybe_commit_resumed_all_process_targets: not requesting commit-resumed for target extended-remote, a thread has a pending waitstatus
When GDB handles the pending status and stop_all_threads runs, we stop a
resumed but pending vCont thread:
remote_stop_ns: Enqueueing phony stop reply for thread pending vCont-resume (1520940, 1520976, 0)
That thread was never actually resumed on the remote stub / gdbserver,
so we shouldn't send a packet to the remote side asking to stop the
thread.
Note that there are paths that resume the target and then do a
synchronous blocking wait, in sort of nested event loop, via
wait_sync_command_done. For example, inferior function calls, or any
run control command issued from a breakpoint command list. We handle
that making wait_sync_command_one a "sync" point -- force forward
progress, or IOW, force-enable commit-resumed state.
gdb/ChangeLog:
yyyy-mm-dd Simon Marchi <simon.marchi@efficios.com>
Pedro Alves <pedro@palves.net>
* infcmd.c (run_command_1, attach_command, detach_command)
(interrupt_target_1): Use scoped_disable_commit_resumed.
* infrun.c (do_target_resume): Remove
target_commit_resume call.
(commit_resume_all_targets): Remove.
(maybe_set_commit_resumed_all_targets): New.
(maybe_call_commit_resumed_all_targets): New.
(enable_commit_resumed): New.
(scoped_disable_commit_resumed::scoped_disable_commit_resumed)
(scoped_disable_commit_resumed::~scoped_disable_commit_resumed)
(scoped_disable_commit_resumed::reset)
(scoped_disable_commit_resumed::reset_and_commit)
(scoped_enable_commit_resumed::scoped_enable_commit_resumed)
(scoped_enable_commit_resumed::~scoped_enable_commit_resumed):
New.
(proceed): Use scoped_disable_commit_resumed and
maybe_call_commit_resumed_all_targets.
(fetch_inferior_event): Use scoped_disable_commit_resumed.
* infrun.h (struct scoped_disable_commit_resumed): New.
(maybe_call_commit_resumed_all_process_targets): New.
(struct scoped_enable_commit_resumed): New.
* mi/mi-main.c (exec_continue): Use scoped_disable_commit_resumed.
* process-stratum-target.h (class process_stratum_target):
<commit_resumed_state>: New.
* record-full.c (record_full_wait_1): Change commit_resumed_state
around calling commit_resumed.
* remote.c (class remote_target) <commit_resume>: Rename to...
<commit_resumed>: ... this.
(struct stop_reply): Move up.
(remote_target::commit_resume): Rename to...
(remote_target::commit_resumed): ... this. Check if there is any
thread pending vCont resume.
(remote_target::remote_stop_ns): Generate stop replies for resumed
but pending vCont threads.
(remote_target::wait_ns): Add gdb_assert.
* target-delegates.c: Regenerate.
* target.c (target_wait, target_resume): Assert that the current
process_stratum target isn't in commit-resumed state.
(defer_target_commit_resume): Remove.
(target_commit_resume): Remove.
(target_commit_resumed): New.
(make_scoped_defer_target_commit_resume): Remove.
(target_stop): Assert that the current process_stratum target
isn't in commit-resumed state.
* target.h (struct target_ops) <commit_resume>: Rename to ...
<commit_resumed>: ... this.
(target_commit_resume): Remove.
(target_commit_resumed): New.
(make_scoped_defer_target_commit_resume): Remove.
* top.c (wait_sync_command_done): Use
scoped_enable_commit_resumed.
[1] https://github.com/ROCm-Developer-Tools/ROCgdb/
[2] https://github.com/ROCm-Developer-Tools/ROCdbgapi
Change-Id: I836135531a29214b21695736deb0a81acf8cf566
gdb.base/maint-target-async-off.exp fails if you test against
gdbserver with "maint set target-non-stop on" forced.
(gdb) run
Starting program: build/gdb/testsuite/outputs/gdb.base/maint-target-async-off/maint-target-async-off
Breakpoint 1, main () at src/gdb/testsuite/gdb.base/maint-target-async-off.c:21
21 return 0;
(gdb) FAIL: gdb.base/maint-target-async-off.exp: continue until exit (timeout)
Above, GDB just stopped listening to stdin.
Basically, GDB assumes that a target working in non-stop mode
operation also supports async mode; it's a requirement. GDB
misbehaves badly otherwise, and even hits failed assertions.
Fix this by making target_is_non_stop_p return false if async is off.
gdb/ChangeLog:
* target.c (target_always_non_stop_p): Also check whether the
target can async.
Change-Id: I7e52e1061396a5b9b02ada462f68a14b76d68974