This patch initializes the "op" variable in skip_cfa_op() function
of bfd/elf-eh-frame.c to "0" at its declaration point to avoid the
"maybe-uninitialized" warning.
Building binutils on a system with GCC version 13.2.0 and a configure
command that sets the optimization level to "-Og" leads to a build
failure because of a warning being treated as an error:
---------------------------------------------------------------------
$ ./configure CFLAGS="-Og"
$ make
...
CC elf-eh-frame.lo
/src/gdb/bfd/elf-eh-frame.c: In function 'skip_cfa_op':
/src/gdb/bfd/elf-eh-frame.c:354:33: error: 'op' may be used
uninitialized [-Werror=maybe-uninitialized]
354 | switch (op & 0xc0 ? op & 0xc0 : op)
| ~~~~~~~~~~~~~~~~~~~~~~^~~~
/src/gdb/bfd/elf-eh-frame.c:348:12: note: 'op' was declared here
348 | bfd_byte op;
| ^~
cc1: all warnings being treated as errors
...
---------------------------------------------------------------------
The relevant code snippet related to this warning looks like:
---------------------------------------------------------------------
static inline bool
read_byte (bfd_byte **iter, bfd_byte *end, unsigned char *result)
{
if (*iter >= end)
return false;
*result = *((*iter)++);
return true;
}
static bool
skip_cfa_op (bfd_byte **iter, bfd_byte *end,...)
{
bfd_byte op;
if (!read_byte (iter, end, &op))
return false;
switch (op & 0xc0 ? op & 0xc0 : op)
...
}
---------------------------------------------------------------------
This warning probably happens because "-Og" results in GCC not
inlining the "read_byte()" function. Therefore, GCC treats its
invocation inside "skip_cfa_op()" like a black box and that ends
in the aforementioned warning.
Acknowledgement:
Lancelot Six -- for coming with the idea behind this fix.
Jan Beulich -- for reviewing.
bfd/ChangeLog:
* elf-eh-frame.c (skip_cfa_op): Initialize the "op" variable.
Along the lines of 2513312930 ("x86/APX: apply NDD-to-legacy
transformation to further CMOVcc forms") these can similarly be
converted to the shorter legacy-encoded CMOVcc.
We don't really want to align the last section's size to object
alignment (when that section may itself not be aligned as much), we want
image size to be a multiple thereof.
Since this commit:
commit a8651ef518
CommitDate: Fri Jun 14 14:47:38 2024 +0100
gdb/aarch64: prevent crash from in process agent
gdbserver isn't sending expedited registers with its stop reply packets
anymore. The problem is with how the constructor of the
expedited_registers std::vector is called:
The intent of the expedited_registers initialization in
aarch64_linux_read_description is to create a vector with capacity for 6
elements, but that's not how the std::vector constructor works.
Instead it creates a vector pre-populated with 6 elements initialized
with the default value for the type of the elements, and thus the first
6 elements are null pointers. The actual expedited registers are added
starting at the 7th element.
This causes init_target_desc to consider that the expedite_regs list is
empty, since it stops checking at the first nullptr element. The end
result is that gdbserver doesn't send any expedited registers to GDB in
its stop replies.
Fix by not specifying an element count when declaring the vector.
Tested for regressions on aarch64-linux-gnu native-extended-remote.
Approved-By: Andrew Burgess <aburgess@redhat.com>
Commit "b67a17aa7c0c478a" modified the logic of allocating dynamic
relocation space for TLS GD/IE, but only modified the logic of
generation dynamic relocations for TLS GD/IE in ABI v2.00. When
linking an object file of ABI v1.00 with bfd ld of ABI v2.00, it
will cause an assertion failure.
Modified the dynamic relocation generation logic of TLS GD/IE
in ABI v1.00 to be consistent with ABI v2.00.
I noticed that the comments for class parent_map aren't very clear.
This patch attempts to fix this, and also clarifies a point on
parent_map_map::add_map.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Fix formatting of a Python file added in commit:
commit a92e943014
Date: Wed Aug 14 15:16:46 2024 +0100
gdb: implement ::re_set method for catchpoint class
No functional change after this commit.
This syncs binutils-gdb/libiberty with gcc/libiberty up to GCC commit
64028d626a50410dbf29. This picks up the follow 3 GCC commits:
ea238096883 (gcc-delete-unused-func) libiberty/argv.c: remove only_whitespace
5e1d530da87 (gcc-buildargv) libiberty/buildargv: handle input consisting of only white space
a87954610f5 libiberty/buildargv: POSIX behaviour for backslash handling
It is possible to attach a condition to a catchpoint. This can't be
done when the catchpoint is created, but can be done with the
'condition' command, this is documented in the GDB manual:
You can also use the 'if' keyword with the 'watch' command. The
'catch' command does not recognize the 'if' keyword; 'condition' is the
only way to impose a further condition on a catchpoint.
A GDB crash was reported against Fedora GDB where a user had attached
a condition to a catchpoint and then restarted the inferior. When the
catchpoint was hit GDB would immediately segfault. I was able to
reproduce the failure on upstream GDB:
(gdb) file ./some/binary
(gdb) catch syscall write
(gdb) run
...
Catchpoint 1 (returned from syscall write), 0x00007ffff7b594a7 in write () from /lib64/libc.so.6
(gdb) condition 1 $_streq((char *) $rsi, "foobar") == 0
(gdb) run
...
Fatal signal: Segmentation fault
...
What happened here is that on the system in question we had debug
information available for both the main application and also for
libc.
When the condition was attached GDB was stopped inside libc and as the
debug information was available GDB found a reference to the 'char'
type (for the cast) inside libc's debug information.
When the inferior is restarted GDB discards all of the objfiles
associated with shared libraries, and this includes libc. As such the
'char' type, which is objfile owned, is discarded and the reference to
it from the catchpoint's condition expression becomes invalid.
Now, if it were a breakpoint instead of a catchpoint, what would
happen is that after the shared library objfiles had been discarded
we'd call the virtual breakpoint::re_set method on the breakpoint, and
this would update the breakpoint's condition expression. This is
because user breakpoints are actually instances of the code_breakpoint
class and the code_breakpoint::re_set method contains the code to
recompute the breakpoint's condition expression.
However, catchpoints are instances of the catchpoint class which
inherits from the base breakpoint class. The catchpoint class does
not override breakpoint::re_set, and breakpoint::re_set is empty!
The consequence of this is that catchpoint condition expressions are
never recomputed, and the dangling pointer to the now deleted, objfile
owned type 'char' is left around, and, when the catchpoint is hit, the
invalid pointer is used when GDB tries to evaluate the condition
expression.
In this commit I have implemented catchpoint::re_set. This is pretty
simple and just recomputes the condition expression as you'd expect.
If the condition doesn't evaluate then the catchpoint is marked as
disabled_by_cond.
I have also made breakpoint::re_set pure virtual. With the addition
of catchpoint::re_set every sub-class of breakpoint now implements the
::re_set method, and if new sub-classes are added in the future I
think that they _must_ implement ::re_set in order to avoid this
problem. As such falling back to an empty breakpoint::re_set doesn't
seem helpful.
For testing I have not relied on stopping in libc and having libc
debug information available, this doesn't seem like a good idea for
the GDB testsuite. Instead I create a (rather pointless) condition
check that uses a type defined only within a shared library. When the
inferior is restarted the catchpoint will temporarily be marked as
disabled_by_cond (due to the type not being available), but once the
shared library is loaded again the catchpoint will be re-enabled.
Without the fixes above then the same crashing behaviour can be
observed.
One point of note: the dangling pointer of course exposes undefined
behaviour, with no guarantee of a crash. Though a crash is what I
usually see I have see GDB throw random errors from the expression
evaluation code, and once, I saw no problem at all! If you recompile
GDB with the address sanitizer, or run under valgrind, then the bug
will be exposed every time.
After fixing this bug I checked bugzilla and found PR gdb/29960 which
is the same bug. I was able to reproduce the bug before this commit,
and after this commit GDB is no longer crashing.
Before:
(gdb) file /tmp/hello.x
Reading symbols from /tmp/hello.x...
(gdb) run
Starting program: /tmp/hello.x
Hello World
[Inferior 1 (process 1101855) exited normally]
(gdb) catch syscall 1
Catchpoint 1 (syscall 'write' [1])
(gdb) condition 1 write.fd == 1
(gdb) run
Starting program: /tmp/hello.x
Fatal signal: Segmentation fault
...
And after:
(gdb) file /tmp/hello.x
Reading symbols from /tmp/hello.x...
(gdb) run
Starting program: /tmp/hello.x
Hello World
Args: ( 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 )
[Inferior 1 (process 1102373) exited normally]
(gdb) catch syscall 1
Catchpoint 1 (syscall 'write' [1])
(gdb) condition 1 write.fd == 1
(gdb) r
Starting program: /tmp/hello.x
Error in testing condition for breakpoint 1:
Attempt to extract a component of a value that is not a structure.
Catchpoint 1 (call to syscall write), 0x00007ffff7eb94a7 in write ()
from /lib64/libc.so.6
(gdb) ptype write
type = <unknown return type> ()
(gdb)
Notice we get the error now when the condition fails to evaluate.
This seems reasonable given that 'write' will be a function, and
indeed the final 'ptype' shows that it's a function, not a struct.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29960
Reviewed-By: Tom de Vries <tdevries@suse.de>
On riscv64-linux, with test-case gdb.arch/riscv-tdesc-regs.exp I get:
...
(gdb) info registers fflags^M
fflags 0x0 NV:0 DZ:0 OF:0 UF:0 NX:0^M
(gdb) FAIL: gdb.arch/riscv-tdesc-regs.exp: info registers fflags
info registers frm^M
frm 0x0 FRM:0 [RNE (round to nearest; ties to even)]^M
(gdb) FAIL: gdb.arch/riscv-tdesc-regs.exp: info registers frm
...
The FAILs are produced by:
...
foreach reg {fflags frm} {
gdb_test_multiple "info registers $reg" "" {
-re "^info registers $reg\r\n" {
exp_continue
}
-wrap -re "^Invalid register `$reg`" {
fail $gdb_test_name
}
-wrap -re "^$reg\\s+\[^\r\n\]+" {
pass $gdb_test_name
}
}
}
...
The first clause is meant to consume the command.
The '^' char was updated to mean "consume command", so that clause no longer
works since it now attempts to consume the command twice.
Also, it's unnecessary because the following clauses start with ^.
Then, the second clause is unnecessary because there's a default clause
producing the FAIL.
Fix this by simplifying to:
...
foreach reg {fflags frm} {
gdb_test "info registers $reg" "^$reg\\s+\[^\r\n\]+"
}
...
Tested on riscv64-linux.
Approved-By: Andrew Burgess <aburgess@redhat.com>
We recently fixed a bug in libgcc
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115360)
where a symbol was missing a %function .type decoration.
This meant the linker would silently pick the wrong type of 'farcall
stub', involving Arm-mode instructions on Thumb-only CPUs.
This patch emits an error instead, and warns in some other cases, to
encourage users to add the missing '.type foo,%function' directive.
In practice: in arm_type_of_stub() we no longer try to infer which
stub to use if the destination is of unknown type and the CPU is
Thumb-only; so we won't lie to elf32_arm_size_stubs() which does not
check branch_type.
If branch_type is ST_BRANCH_TO_ARM but the CPU is Thumb-only, we now
convert it to ST_BRANCH_TO_THUMB only if the destination is of
absolute type. This is to support the case where the destination of
the branch is defined by the linker script (see thumb-b-lks-sym.s and
thumb-bl-lks-sym.s testcases for instance).
The motivating case is covered by the new farcall-missing-type
testcase, where we now emit an error message. We do not emit an error
when branch_type is ST_BRANCH_UNKNOWN and the CPU supports Arm-mode: a
lot of legacy code (e.g. newlib's crt0.S) lacks the corresponding
'.type foo, %function' directives and even a (too verbose) warning
could be perceived as a nuisance.
Existing testcases where such a warning would trigger:
arm-static-app.s (app_func, app_func2)
arm-rel32.s (foo)
arm-app.s (app_func)
rel32-reject.s () main)
fix-arm1176.s (func_to_branch_to)
but this list is not exhaustive.
For the sake of clarity, the patch replaces occurrences of
sym.st_target_internal = 0; with
sym.st_target_internal = ST_BRANCH_TO_ARM;
enum arm_st_branch_type is defined in include/elf/arm.h, and relies on
ST_BRANCH_TO_ARM==0, as sym.st_target_internal is also initialized to
0 in other target-independent parts of BFD code. (For instance,
swapping the ST_BRANCH_TO_ARM and ST_BRANCH_TO_THUMB entries in the
enum definition leads to 'interesting' results...)
Regarding the testsuite:
* new expected warning for thumb-b-lks-sym and thumb-bl-lks-sym
* new testcase farcall-missing-type to check the new error case
* attr-merge-arch-2b.s, branch-futures (and bfs-1.s) updated to avoid
a diagnostic
Tested on arm-eabi and arm-pe with no regression.
This script is a copy of the current script used by Sourceware's
autoregen buildbots.
It is intended as a helper to regenerate files managed by autotools
(autoconf, automake, aclocal, ....), as well as the toplevel
Makefile.in which is created by autogen.
Other files can be updated when using maintainer-mode, but this is not
covered by this script.
2024-04-19 Christophe Lyon <christophe.lyon@linaro.org>
contrib/
* autoregen.py: New script.
With test-case gdb.dwarf2/dw2-lines.exp on arm-linux, I run into:
...
(gdb) break bar_label^M
Breakpoint 2 at 0x4004f6: file dw2-lines.c, line 29.^M
(gdb) continue^M
Continuing.^M
^M
Breakpoint 2, bar () at dw2-lines.c:29^M
29 foo (2);^M
(gdb) PASS: $exp: cv=2: cdw=32: lv=2: ldw=32: continue to breakpoint: foo \(1\)
...
The pass is incorrect because the continue lands at line 29 with "foo (2)"
instead of line line 27 with "foo (1)".
A minimal version is:
...
$ gdb -q -batch dw2-lines.cv-2-cdw-32-lv-2-ldw-32 -ex "b bar_label"
Breakpoint 1 at 0x4f6: file dw2-lines.c, line 29.
...
where:
...
000004ec <bar>:
4ec: b580 push {r7, lr}
4ee: af00 add r7, sp, #0
000004f0 <bar_label>:
4f0: 2001 movs r0, #1
4f2: f7ff fff1 bl 4d8 <foo>
000004f6 <bar_label_2>:
4f6: 2002 movs r0, #2
4f8: f7ff ffee bl 4d8 <foo>
...
So, how does this happen? In short:
- skip_prologue_sal calls arm_skip_prologue with pc == 0x4ec,
- thumb_analyze_prologue returns 0x4f2
(overshooting by 1 insn, PR tdep/31981), and
- skip_prologue_sal decides that we're mid-line, and updates to 0x4f6.
However, this is a test-case about .debug_line info, so why didn't arm_skip_prologue
use the line info to skip the prologue?
The answer is that the line info starts at bar_label, not at bar.
Fixing that allows us to work around PR tdep/31981.
Likewise in gdb.dwarf2/dw2-line-number-zero.exp.
Instead, add a new test-case gdb.arch/skip-prologue.exp that is dedicated to
checking quality of architecture-specific prologue analysis, without being
written in an architecture-specific way.
If fails on arm-linux for both marm and mthumb:
...
FAIL: gdb.arch/skip-prologue.exp: f2: $bp_addr == $prologue_end_addr (skipped too much)
FAIL: gdb.arch/skip-prologue.exp: f4: $bp_addr == $prologue_end_addr (skipped too much)
...
and passes for:
- x86_64-linux for {m64,m32}x{-fno-PIE/-no-pie,-fPIE/-pie}
- aarch64-linux.
Tested on arm-linux.
MSVC 2022 is more pedantic than MSVC 2019 when it comes to loading PDB
files in readonly mode, and was rejecting PDB files generated by binutils
because of their invalid free-space bitmaps. It's unknown what would
have happened if you tried to use MS tools to modify a PDB created by
binutils, but it probably would have led to a corrupted file.
This patch fixes pdb_write_bitmap so we generate files that MSVC will accept.
Specifically there were three things we were doing wrong:
- We weren't including the superblock (block 0)
- We were setting bits in bytes backwards (MSB to LSB, rather than LSB to MSB)
- We should have been marking the contents of stream 0 as free. This is
because, as the comment says, it's intended to be used for the
directory for the previous write, to allow atomic updates.
In the patch, in order to support ymm rounding for AVX10.2, we derive
evex attribute for all cases instead of only for rc_none to encode U bit.
Also changed some bad_opcode return due to the share of U bit with APX_F.
gas/ChangeLog:
* config/tc-i386.c
(cpu_flags_match): Handle AVX10_2.
(build_evex_prefix): Handle U bit. Derive evex attribute
for all cases.
(check_VecOperands): Handle AVX10.2 and ymm roundings.
* doc/c-i386.texi: Document .avx10.2.
* testsuite/gas/i386/i386.exp: Run AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-rounding-intel.d: New test.
* testsuite/gas/i386/avx10_2-rounding-inval.l: Ditto.
* testsuite/gas/i386/avx10_2-rounding-inval.s: Ditto.
* testsuite/gas/i386/avx10_2-rounding.d: Ditto.
* testsuite/gas/i386/avx10_2-rounding.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding.s: Ditto.
opcodes/ChangeLog:
* i386-dis.c (struct instr_info): Add U bit.
(get_valid_dis386): Handle U bit.
* i386-gen.c (isa_dependencies): Add AVX10.2.
(cpu_flags): Ditto.
* i386-init.h: Regenerated.
* i386-opc.h (CpuAVX10_2): New.
(i386_cpu_flags): Add cpuavx10_2.
* i386-opc.tbl: Add rounding to old entries which do not
permit rounding previously. Also eliminate the redundant
RegXMM for vcvtps2uqq.
* i386-tbl.h: Regenerated.
R_386_TLS_IE is used only in
movl foo@indntpoff, %eax
movl foo@indntpoff, %reg
addl foo@indntpoff, %reg
R_386_TLS_DESC_CALL and R_X86_64_TLSDESC_CALL are used only in
call *x@tlscall(%[er]ax)
* elf32-i386.c (elf_i386_check_tls_transition): Use foo@indntpoff
in comments for R_386_TLS_IE check.
(elf_i386_tls_transition): Use @tlscall in comments for
R_386_TLS_DESC_CALL check.
* elf64-x86-64.c (elf_x86_64_tls_transition): Use @tlscall in
comments for R_X86_64_TLSDESC_CALL check.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Non-default weak undefined symbols in executable and shared library are
always resolved to 0 at runtime and don't need dynamic relocation.
Tested on i686, x86-64, powerpc64le and aarch64.
PR gold/32071
* symtab.cc (Symbol::final_value_is_known): Always resolve
non-default weak undefined symbol in executable and shared library
to 0 at runtime.
* symtab.h (Symbol::needs_dynamic_reloc): Return false for
non-default weak undefined symbol in executable and shared library.
* testsuite/Makefile.am: Add weak_undef_test_3 and
weak_undef_test_4 tests.
* testsuite/Makefile.in: Regenerated.
* testsuite/weak_undef_lib_4.c: New file.
* testsuite/weak_undef_test_3.c: Likewise.
* testsuite/weak_undef_test_4.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
On riscv64-linux, I run into:
...
Expecting: ^(catch syscall[^M
]+)?((&.*)*.*~"Catchpoint 5 .*\\n".*=breakpoint-created,bkpt=\{number="5",type="catchpoint".*\}.*\n\^done[^M
]+[(]gdb[)] ^M
[ ]*)
catch syscall^M
&"catch syscall\n"^M
&"The feature 'catch syscall' is not supported on this architecture yet.\n"^M
^error,msg="The feature 'catch syscall' is not supported on this architecture yet."^M
(gdb) ^M
FAIL: gdb.mi/mi-breakpoint-changed.exp: test_insert_delete_modify: catch syscall (unexpected output)
...
Fix this by:
- factoring out proc supports_catch_syscall out of gdb.base/catch-syscall.exp,
and
- using it in gdb.mi/mi-breakpoint-changed.exp.
Tested on x86_64-linux and riscv64-linux.
Approved-By: Andrew Burgess <aburgess@redhat.com>
I spotted that we have a duplicate condition check in the function
disable_breakpoints_in_freed_objfile.
Lets remove it.
There should be no user visible changes after this commit.
Approved-By: Tom Tromey <tom@tromey.com>
Cleanup includes in dwarf2/*.
1. Add the necessary includes so that clangd reports no errors when
opening header files. This ensures that header files include what
they use.
2. Remove all includes reported as unused by clangd (except
gdb-safe-ctype.h, which I think does some magic that affects what
follows).
Built-tested --enable-threading at "yes" and "no", since there are some
portions of code gated by `#ifdef CXX_STD_THREAD`.
Change-Id: I21debffcd7c2caf90f08e1e0fbba3ce30422d042
Approved-By: Tom Tromey <tom@tromey.com>
I noticed that some gdb.ada tests used regular expressions like:
"Continuing\..*$inferior_exited_re.*" \
Here, the "\." should either be "." or "\\." -- "\." is not really
meaningful.
This patch fixes all the cases of this I could find in gdb.ada. In
one test (fun_renaming.exp), using "\\." would result in failures, and
here I rewrote the tests to use -wrap.
Approved-By: Andrew Burgess <aburgess@redhat.com>
TLS descriptor call,
call *x@tlsdesc(%rax)
or
call *x@tlsdesc(%eax)
calls _dl_tlsdesc_return which expects that RAX/EAX points to the TLS
descriptor. Update x86 linker to issue an error with or without TLS
transition.
bfd/
PR ld/32123
* elf32-i386.c (elf_i386_check_tls_transition): Move
R_386_TLS_DESC_CALL to ...
(elf_i386_tls_transition): Here.
* elf64-x86-64.c (elf_x86_64_check_tls_transition): Move.
R_X86_64_TLSDESC_CALL check to ...
(elf_x86_64_tls_transition): Here.
ld/
PR ld/32123
* testsuite/ld-i386/i386.exp: Run tlsgdesc3.
* testsuite/ld-i386/tlsgdesc3.d: New file.
* testsuite/ld-x86-64/tlsdesc5.d: Likewise.
* testsuite/ld-x86-64/x86-64.exp: Run tlsdesc5.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
The special property really only applies to the "extended" byte regs
having legacy word/dword counterparts.
While touching involved code also drop redundant byte checks from a
conditional in establish_rex(): The other remaining RegRex64 uses only
exist on registers which can't be used as register operands anyway.
Hence RegRex64 as an attribute of a (valid) register operand implies
that it's a byte reg.
While OBJ_MAYBE_ELF presently implies OBJ_ELF (due to obj-multi.h
including obj-elf.h for obscure reasons), there still need to be IS_ELF
checks to cover for the OBJ_MAYBE_ELF case. Note, however, that code
checking for ->debugging being true doesn't need such extra checks, as
the field can only ever be true when IS_ELF.
On the same basis reduce #ifdef-ary in debugging_pseudo().
Also move the field (into what on 64-bit architectures is a 32-bit gap)
and put it inside an OBJ_ELF conditional, too.
While there further switch int to bool in related code.
These ending directives are swallowed by buffer_and_nest() and hence
aren't seen by read_a_source_file(). Thus they also weren't announced to
the listing subsystem. That was, when macro expansions are included,
thus misguided to associate possible output resulting from the first
line of the construct being expanded with both the .endr and that first
line (i.e. showing it twice).
When debugging ROCm code, you might have something like this:
__global__ void kernel ()
{
...
// break here
...
}
int main ()
{
// Code to call `kernel`
}
... where kernel is a function compiled to execute on the GPU. It does
not exist in the host x86-64 program that runs the main function, and
GDB doesn't know about that function until it is called, at which point
the runtime loads the corresponding code object and GDB learns about the
code of the "kernel" function. Before the GPU code object is loaded,
from the point of view of GDB, you might as well have blank lines
instead of the "kernel" function. The DWARF in the host program doesn't
describe anything at these lines.
So, a common problem that users face is:
- Start GDB with the host binary
- Place a breakpoint by line number at the "break here" line
- At this point, GDB only knows about the host code, the lines of the
`kernel` function are a big void.
- GDB finds no code mapped to the "break here" line and searches for
the first following line that has code mapped to it.
- GDB finds that the line with the opening bracket of the `main`
function (or around there) has code mapped to it, places breakpoint
there.
- User runs the program.
- The programs hits the breakpoint at the start of main.
- User is confused, because they didn't ask for a breakpoint in main.
If they continue, the code object eventually gets loaded, GDB reads the
debug info from it, re-evaluates the breakpoint locations, and at this
point the breakpoint is placed at the expected location.
The goal of this patch is to get rid of this annoyance.
A case similar to the one shown above can actually be simulated without
GPU-specific code: using a single source file to generate a library and
an executable loading that library (see the new test
gdb.linespec/line-breakpoint-outside-function.c for an example). Before
the library is loaded, trying to place a breakpoint in the library code
results in the breakpoint "drifting" down to the main function.
To address this problem, make it so that when a user requests a
breakpoint outside a function, GDB makes a pending breakpoint, rather
than placing a breakpoint at the next line with code, which happens to
be in the next function. When the GPU kernel or shared library gets
loaded, the breakpoint resolves to a location in the kernel or library.
Note that we still want breakpoints placed inside a function to
"drift" down to the next line with code. For example, here:
9
10 void foo()
11 {
12 int x;
13
14 x++;
There is probably no code associated to lines 10, 12 and 13, but the
user can still reasonably expect to be able to put a breakpoint there.
In my experience, GCC maps the function prologue to the line with the
opening curly bracket, so the user will be able to place a breakpoint
there anyway (line 11 in the example). But I don't really see a use
case to put a breakpoint above line 10 and expect to get a breakpoint in
foo. So I think that is a reasonable behavior change for GDB.
This is implemented using the following heuristic:
- If a breakpoint is requested at line L but there is no code mapped to
L, search for a following line with associated code (this already
exists today).
- However, if:
1. the found location falls in a function symbol's block
2. the found location's address is equal the entry PC of that
function
3. the found location's line is greater that the requested line
... then we don't place a breakpoint at the found location, we will
end up with a pending breakpoint.
Change the message "No line X in file..." to "No compiled code for line
X in file...". There is clearly a line 9 in the example above, so it
would be weird to say "No line 9 in file...". What we mean is that
there is no code associated to line 9.
All the regressions that I found this patch to cause were:
1. tests specifically this behavior where placing a breakpoint before
a function results in a breakpoint on that function, in which case I
removed the tests or changed them to expect a pending breakpoint
2. linespec tests expecting things like "break -line N garbage" to
error out because of the following garbage, but we now got a
different error because line N now doesn't resolve to something
anymore. For example, before:
(gdb) break -line 3 if foofoofoo == 1
No symbol "foofoofoo" in current context.
became
(gdb) break -line 3 if foofoofoo == 1
No line 3 in the current file.
These tests were modified to refer to a valid line with code, so
that we can still test what we intended to test.
Notes:
- The CUDA compiler "solves" this problem by adding dummy function
symbols between functions, that are never called. So when you try to
insert a breakpoint in the not-yet-loaded kernel, the breakpoint
still drifts, but is placed on some dummy symbol. For reasons that
would be too long to explain here, the ROCm compiler does not do
that, and it is not a desirable option.
- You can have constructs like this:
void host_function()
{
struct foo
{
static void __global__ kernel ()
{
// Place breakpoint here
}
};
// Host code that calls `kernel`
}
The heuristic won't work then, as the breakpoint will drift somewhere
inside the enclosing function, but won't be at the start of that
function. So a bogus breakpoint location will be created on the host
side. I don't think that people are going to use this kind of
construct often though, so we can probably ignore it (or at least it
shouldn't prevent making the more common case better).
ROCm doesn't support passing a lambda kernel function to
hipLaunchKernelGGL (the function used to launch kernels on the
device), but if it eventually does, there will be the same
problem.
I think that to properly support this, we will need some DWARF
improvements to be able to say "there is really nothing at these
lines" in the line table.
Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I3cc12cfa823dc7d8e24dd4d35bced8e8baf7f9b6
In commit:
commit 3055e3d2f1
Date: Tue May 21 15:58:02 2024 +0100
gdb: add GDB side target_ops::fileio_stat implementation
I managed to place a NEWS entry in the wrong place. I put the entry
in 'Changes in GDB 15' rather than 'Changes since GDB 15'. This
commit moves the entry to the correct place.
This header file uses auto_obstack, found in gdbsupport/gdb_obstack.h.
This fixes an error shown when editing addrmap.h with clangd, and makes
it so addrmap.h includes what it uses.
Change-Id: I0b0c8d26638e2150fcb65c601098ed9df5a8945a