Even when an EVEX encoding is available, use of such a prefix ought to
be respected (resulting in an error) rather than ignored. As requested
during review already, introduce a new encoding enumerator to record use
of eGPR-s, and update state transitions accordingly.
The optimize_encoding() change also addresses an internal assembler
error that was previously raised when respective memory operands used
eGPR-s for addressing.
While this results in a change of diagnostic issued for VEX-encoded
insns, the new one is at least no worse than the prior one.
Today I realized that while the .debug_names writer uses DW_FORM_udata
for the DIE offset, DW_FORM_ref_addr would be more appropriate here.
This patch makes this change.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31361
Say why we even mention shared libraries here (ET_DYN), and clarify
symbol resolution. There are of course many other ways that PIEs
resemble PDEs more closely than shared libraries.
PR 19871
* ld.texi (-pie): Clarify.
This patch copies some changes to the compile headers from GCC's
include/ directory. It is the gdb equivalent of the GCC commit
bc0e18a9 -- however, while that commit also necessarily touched
libcc1, this one of course does not.
Tested by rebuilding and also running the gdb.compile tests.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31397
When running test-case gdb.dap/pause.exp 100 times in a loop, it passes
100/100.
But if we remove the two "sleep 0.2" from the test-case, we run into
(copied from dap.log and edited for readability):
...
Traceback (most recent call last):
File "startup.py", line 251, in message
def message():
KeyboardInterrupt
Quit
...
This happens as follows.
CancellationHandler.cancel calls gdb.interrupt to cancel a request in flight.
The idea is that this interrupt triggers while in fn here in message (a nested
function of send_gdb_with_response):
...
def message():
try:
val = fn()
result_q.put(val)
except (Exception, KeyboardInterrupt) as e:
result_q.put(e)
...
but instead it triggers outside the try/except.
Fix this by:
- in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
and adding a variable in_flight_gdb_thread to be able to distinguish when
a request is in flight in the dap thread or the gdb thread.
- adding a wrapper Cancellable to to deal with cancelling the wrapped
event
- using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
event
- in CancellationHandler.cancel, only call gdb.interrupt if
req == self.in_flight_gdb_thread.
This makes the test-case pass 100/100, also when adding the extra stressor of
"taskset -c 0", which makes the fail more likely without the patch.
Tested on aarch64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
PR dap/31275
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31275
Move functions send_gdb and send_gdb_with_response, as well as class Invoker
to server module.
Separated out to make the following patch easier to read.
Tested on aarch64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
Commit 92d48a1e4e ("Add an arm-tls feature which includes the tpidruro
register from CP15.") introduced the org.gnu.gdb.arm.tls feature, which
adds the tpidruro register, and unconditionally enabled it in
aarch32_create_target_description.
In Linux, the tpidruro register isn't available via ptrace in the 32-bit
kernel but it is available for an aarch32 program running under an arm64
kernel via the ptrace compat interface. This isn't currently implemented
however, which causes GDB on arm-linux with 64-bit kernel to list the
register but show it as unavailable, as reported by Tom de Vries:
$ gdb -q -batch a.out -ex start -ex 'p $tpidruro'
Temporary breakpoint 1 at 0x512
Temporary breakpoint 1, 0xaaaaa512 in main ()
$1 = <unavailable>
Simon Marchi then clarified:
> The only time we should be seeing some "unavailable" registers or memory
> is in the context of tracepoints, for things that are not collected.
> Seeing an unavailable register here is a sign that something is not
> right.
Therefore, disable the TLS feature in aarch32 target descriptions for Linux
and NetBSD targets (the latter also doesn't seem to support accessing
tpidruro either, based on a quick look at arm-netbsd-nat.c).
This patch fixes the following tests:
Running gdb.base/inline-frame-cycle-unwind.exp ...
FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 3: backtrace when the unwind is broken at frame 3
FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 5: backtrace when the unwind is broken at frame 5
FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1
Tested with Ubuntu 22.04.3 on armv8l-linux-gnueabihf in native,
native-gdbserver and native-extended-gdbserver targets with no regressions.
PR tdep/31418
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31418
Approved-By: John Baldwin <jhb@FreeBSD.org>
Only relocation handling for now; relaxation is not implemented yet.
bfd/
* elfnn-riscv.c (riscv_elf_check_relocs): Record GOT reference and
paired relocation for TLSDESC_HI20.
(riscv_elf_adjust_dynamic_symbol): Allocate GOT and reloc slots for
TLSDESC symbols.
(riscv_elf_size_dynamic_sections): Likewise but for local symbols.
(tlsdescoff): New helper to determine static addend for R_TLSDESC.
(riscv_elf_relocate_section): Ignore TLSDESC_CALL reloc for now (it is
relaxation only).
Handle TLSDESC_{LOAD,ADD}_LO12 as paired pcrel relocs.
For TLS GOT slot generation, generalize the logic to handle any
combination of (GD, IE, TLSDESC).
Add TLSDESC Rela generation.
* ld/testsuite/ld-riscv-elf/tls*: Add TLSDESC instruction sequences
next to the existing GD and IE sequences. Update expectations.
As the size calculation is split by global and local symbols, using a
shared constant definition for its size improves clarity.
bfd/
* elfnn-riscv.c: Add macros for sizes of a normal GOT entry, TLS GD and
TLS IE entry.
(allocate_dynrelocs): Replace GOT size expressions with the new
constants.
(riscv_elf_size_dynamic_sections): Likewise.
(riscv_elf_relocate_section): Likewise.
gas/
* tc-riscv.c (percent_op_*): Add support for %tlsdesc_hi,
%tlsdesc_load_lo, %tlsdesc_add_lo and %tlsdesc_call. percent_op_rtype
renamed to percent_op_relax_only as this matcher is extended to handle
jalr as well which is not R-type.
(riscv_ip): Apply the percent_op_relax_only rename and update comment.
(md_apply_fix): Add TLSDESC_* to relaxable list. Add TLSDESC_HI20 to
TLS relocation check list.
* testsuite/gas/riscv/tlsdesc.*: New test cases for TLSDESC relocation
generation.
opcodes/
* riscv-opc.c (riscv_opcodes): Add a new syntax for jalr with
%tlsdesc_call annotations.
Change the statement "use bignum" to "use bigint". This is sufficient
for gp-display-html to work and removes the dependency on bignum.
gprofng/ChangeLog
2024-02-27 Ruud van der Pas <ruud.vanderpas@oracle.com>
PR 31390
* gprofng/gp-display-html: One line change to "use bigint".
Catching this at configure time would be nicer, but we'd need to exactly
match mips_parse_cpu in configure.ac and keep it all in sync.
PR 23877
* config/tc-mips.c (mips_after_parse_args): Don't assert that
mips_parse_cpu returns non-NULL, call as_fatal with an informative
message instead.
gdb.interrupt was introduced to implement DAP request cancellation.
However, because it can be run from another thread, and because I
didn't look deeply enough at the implementation, it turns out to be
racy.
The fix here is to lock accesses to certain globals in extension.c.
Note that this won't work in the case where configure detects that the
C++ compiler doesn't provide thread support. This version of the
patch disables DAP entirely in this situation.
Regression tested on x86-64 Fedora 38. I also ran gdb.dap/pause.exp
in a thread-sanitizer build tree to make sure the reported race is
gone.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31263
The PR testcase overflows one of the exec header fields, e_syms (the
size of the symbol table), leading to the string table offset being
wrong. Things go downhill from there. Fixed by checking for
overflow. This happens to trigger in the ld testsuite, so xfail that
test.
PR 23881
bfd/
* libaout.h (swap_exec_header_out): Return a bool.
* aoutx.h (swap_exec_header_out): Check for overflow in exec
header.
* pdp11.c (swap_exec_header_out): Likewise.
* i386lynx.c (WRITE_HEADERS): Adjust.
ld/
* testsuite/ld-scripts/map-address.exp: xfail pdp11.
While working on a different patch, I found a couple of simple addrmap
cleanups.
In one case, a forward declaration is no longer needed, as the header
now includes addrmap.h.
In the other, an include of addrmap.h is no longer needed.
Tested by rebuilding.
This changes the DAP code to explicitly request that gdb exit.
Previously this could cause crashes, but with the previous cleanups,
this should no longer happen.
This also adds a tests that ensures that gdb exits with status 0.
This changes run-on-main-thread.c to clear 'runnables' in a final
cleanup. This avoids an issue where a pending runnable could require
Python, but be run after the Python interpreter was finalized.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31172
Right now, Python is shut down via a final cleanup. However, it seems
to me that it is better for extension languages to be shut down
explicitly, after all the ordinary final cleanups are run. The main
reason for this is that a subsequent patch adds another case like
finalize_values; and rather than add a series of workarounds for
Python shutdown, it seemed better to let these be done via final
cleanups, and then have Python shutdown itself be the special case.
Tom de Vries pointed out that the gdb.dap/pause.exp test writes a
Python file but then does not use it. This patch corrects the
oversight.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31354
Reviewed-By: Tom de Vries <tdevries@suse.de>
The "python" command (and the Python implementation of the gdb
"source" command) does not handle Python exceptions in the same way as
other gdb-facing Python code. In particular, exceptions are turned
into a generic error rather than being routed through
gdbpy_handle_exception, which takes care of converting to 'quit' as
appropriate.
I think this was done this way because PyRun_SimpleFile and friends do
not propagate the Python exception -- they simply indicate that one
occurred.
This patch reimplements these functions to respect the general gdb
convention here. As a bonus, some Windows-specific code can be
removed, as can the _execute_file function.
The bulk of this change is tweaking the test suite to match the new
way that exceptions are displayed. These changes are largely
uninteresting. However, it's worth pointing out the py-error.exp
change. Here, the failure changes because the test changes the host
charset to something that isn't supported by Python. This then
results in a weird error in the new setup.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31354
Acked-By: Tom de Vries <tdevries@suse.de>
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
This patch adds a new function, read_remainder_of_file. This is like
read_text_file_to_string, but reads from an existing 'FILE *'. This
will be used in a subsequent patch.
Reviewed-By: Tom de Vries <tdevries@suse.de>
Say we do:
...
$ make check RUNTESTFLAGS="gdb.dap/ada-nested.exp gdb.dap/pause.exp"
...
and add a perror at the end of pause.exp:
...
dap_shutdown
+
+perror "foo"
...
We run into:
...
UNRESOLVED: gdb.dap/ada-nested.exp: compilation prog.adb
...
This happens because the perror increases the errcnt, which is not reset at
the end of the test-case, and consequently the first pass in the following
test-case is changed into an unresolved.
Version 1.6.3 of dejagnu contains a fix which produces an unresolved at the
end of the test-case, which does reset the errcnt, but this is with version
1.6.1.
Furthermore, we reset the errcnt in clean_restart, but the pass is produced
before, so that doesn't help either.
Fix this by resetting errcnt and warncnt in default_gdb_init.
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
PR testsuite/31351
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31351
PR ada/30908 turns out to be a duplicate of PR ada/12607, which was fixed by
commit d56fdf1b97 ("Refine Ada overload matching").
Remove the KFAILs for PR ada/30908.
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30908
With test-case gdb.python/py-finish-breakpoint.exp, we run into:
...
(gdb) python print (finishbp_default.hit_count)
Traceback (most recent call last):
File "<string>", line 1, in <module>
RuntimeError: Breakpoint 3 is invalid.
Error while executing Python code.
(gdb) PASS: gdb.python/py-finish-breakpoint.exp: normal conditions: \
check finishBP on default frame has been hit
...
The test producing the pass is:
...
gdb_test "python print (finishbp_default.hit_count)" "1.*" \
"check finishBP on default frame has been hit"
...
so the pass is produced because the 1 in "line 1" matches "1.*".
Temporary breakpoints are removed when hit, and consequently accessing the
hit_count attribute of a temporary python breakpoint (gdb.Breakpoint class) is
not possible, and as per spec we get a RuntimeError.
So the RuntimeError is correct, and not specific to finish breakpoints.
The test presumably attempts to match:
...
(gdb) python print (finishbp_default.hit_count)
1
...
but most likely this output was never produced by any gdb version.
Fix this by checking whether the finishbp_default breakpoint has hit by
checking that finishbp_default.is_valid() is False.
Tested on aarch64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
PR testsuite/31391
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31391
gdb.base/interrupt.exp reveals that inferior input is
broken on Cygwin:
(gdb) continue
Continuing.
talk to me baby
Input/output error <<< BAD
PASS: gdb.base/interrupt.exp: process is alive
a
[Thread 10688.0x2590 exited with code 1]
[Thread 10688.0x248c exited with code 1]
[Thread 10688.0x930 exited with code 1]
[Thread 10688.0x2c98 exited with code 1]
Program terminated with signal SIGHUP, Hangup.
The program no longer exists.
(gdb) FAIL: gdb.base/interrupt.exp: child process ate our char
a
Ambiguous command "a": actions, add-auto-load-safe-path, add-auto-load-scripts-directory, add-inferior...
(gdb) ERROR: "" is not a unique command name.
The problem is that inflow.c:child_terminal_inferior is failing to put
the inferior in the foreground, because we're passing down the
inferior's Windows PID instead of the Cygwin PID to Cygwin tcsetpgrp.
That is fixed by this commit. Afterwards we will get:
(gdb) continue
Continuing.
talk to me baby
PASS: gdb.base/interrupt.exp: process is alive
a
a <<< GOOD
PASS: gdb.base/interrupt.exp: child process ate our char
[New Thread 7236.0x1c58]
Thread 6 received signal SIGINT, Interrupt. <<< new thread spawned for SIGINT
[Switching to Thread 7236.0x1c58]
0x00007ffa6643ea6b in TlsGetValue () from /cygdrive/c/Windows/System32/KERNELBASE.dll
(gdb) FAIL: gdb.base/interrupt.exp: send_gdb control C
We still have the FAIL seen above because this change has another
consequence. By failing to put the inferior in the foreground
correctly, Ctrl-C was always reaching GDB first. Now that the
inferior is put in the foreground properly, Ctrl-C reaches the
inferior first, which results in a Windows Ctrl-C event, which results
in Windows injecting a new thread in the inferior to report the Ctrl-C
exception => SIGINT. That is, running the test manually:
Before patch:
(gdb) c
Continuing.
[New Thread 2352.0x1f5c]
[New Thread 2352.0x1988]
[New Thread 2352.0x18cc]
<<< Ctrl-C pressed.
Thread 7 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 2352.0x18cc]
0x00007ffa68930b11 in ntdll!DbgBreakPoint () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
Above, GDB got the SIGINT, and it manually passes it down the
inferior, which reaches windows_nat_target::interrupt(), which
interrupts the inferior with DebugBreakProcess (which injects a new
thread in the inferior that executes an int3 instruction).
After this patch, we have (with "set debugexceptions on" so
DBG_CONTROL_C is visible):
(gdb) c
Continuing.
[New Thread 9940.0x1168]
[New Thread 9940.0x5f8]
gdb: Target exception MS_VC_EXCEPTION at 0x7ffa6638cf19
gdb: Target exception MS_VC_EXCEPTION at 0x7ffa6638cf19
[New Thread 9940.0x3d8]
gdb: Target exception DBG_CONTROL_C at 0x7ffa6643ea6b <<< Ctrl-C
Thread 7 received signal SIGINT, Interrupt. <<< new injected thread
[Switching to Thread 9940.0x3d8]
0x00007ffa6643ea6b in TlsGetValue () from /cygdrive/c/Windows/System32/KERNELBASE.dll
(gdb)
This new behavior is exactly the same as what you see with a MinGW GDB
build. Also, SIGINT reaching inferior first is what you get on Linux
as well currently.
I wrote an initial fix for this before I discovered that Cygwin
downstream had a similar change, so I then combined the patches. I am
adding a Co-Authored-By for that reason.
Co-Authored-By: Takashi Yano <takashi.yano@nifty.ne.jp>
Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: I3a8c3355784c6a817dbd345ba9dac24be06c4b3f
Since we are accessing up to 2 bytes before the relocation target we
should better make sure there are actually 2 bytes before it.
ChangeLog:
* bfd/elf64-s390.c (elf_s390_relocate_section): Make sure
rel->r_offset is large enough.
arc-analyze-prologue.S test does not contain debug information thus
it must be compiled without -g option. Otherwise GDB will try to
unwind frames using debug information (which does not exist for .S
code!) instead of analyzing frames manually.
Approved-By: Shahab Vahedi <shahab@synopsys.com>
Replace relative long addressing instructions of weak symbols, which
will definitely resolve to zero, with either a load address of 0, a
NOP, or a trapping insn.
This prevents the PC32DBL relocation from overflowing in case the
binary will be loaded at 4GB or more.
bfd/ChangeLog:
* bfd/elf64-s390.c (elf_s390_relocate_section): Replace
instructions using undefined weak symbols with relative addressing
to avoid relocation overflows.
ld/ChangeLog:
* ld/testsuite/ld-s390/s390.exp:
* ld/testsuite/ld-s390/8GB.ld: New test.
* ld/testsuite/ld-s390/weakundef-1.dd: New test.
* ld/testsuite/ld-s390/weakundef-1.s: New test.
Hi,
Commits af1bd77 and 3f4ff08 introduced the Pointer Authentication feature with internal names that don't match the actual feature name pauth. The new feature PAuth_LR introduced in Armv9.5-A is an extension of the PAuth feature of Armv8.3-A. Using a different naming for it not based on the formerly "PAC" would create confusion.
Regression tested on aarch64-none-elf, and no regression found.
Ok for binutils-master? I don't have commit access so I need someone to commit on my behalf.
Regards,
Matthieu.
From 58b38358b2788939d81f2df7f5fb4c64a31ae06e Mon Sep 17 00:00:00 2001
From: Matthieu Longo <matthieu.longo@arm.com>
Date: Fri, 23 Feb 2024 11:30:40 +0000
Subject: [PATCH] aarch64: rename internals related to PAuth feature to use
pauth in their naming for coherency
Commits af1bd77 and 3f4ff08 introduced the Pointer Authentication feature
with internal names that don't match the actual feature name pauth. The new
feature PAuth_LR introduced in Armv9.5-A is an extension of the PAuth feature
of Armv8.3-A. Using a different naming for it not based on the formerly "PAC"
would create confusion.
The relsec size is still increased although sec is discarded, which
cause a lot of unused space allocated. Avoid size increased if sec
was discarded.
bfd/ChangeLog:
* bfd/elfnn-loongarch.c: (allocate_dynrelocs): Do not increase
sreloc size when discarded_section.
ld/ChangeLog:
* ld/testsuite/ld-loongarch-elf/ld-loongarch-elf.exp: Add test.
* ld/testsuite/ld-loongarch-elf/pie_discard.d: New test.
* ld/testsuite/ld-loongarch-elf/pie_discard.s: New test.
* ld/testsuite/ld-loongarch-elf/pie_discard.t: New test.
Add reloc_unsign_bits() to fix others sop_pop relocs overflow check.
Then add over/underflow tests for relocs B*, SOP_POP* and PCREL20_S2.
bfd/ChangeLog:
* bfd/elfxx-loongarch.c: Add reloc_unsign_bits().
ld/ChangeLog:
* ld/testsuite/ld-loongarch-elf/ld-loongarch-elf.exp: Add tests.
* ld/testsuite/ld-loongarch-elf/abi1_max_imm.dd: New test.
* ld/testsuite/ld-loongarch-elf/abi1_max_imm.s: New test.
* ld/testsuite/ld-loongarch-elf/abi1_sops.s: New test.
* ld/testsuite/ld-loongarch-elf/abi2_max_imm.s: New test.
* ld/testsuite/ld-loongarch-elf/abi2_overflows.s: New test.
* ld/testsuite/ld-loongarch-elf/max_imm_b16.d: New test.
* ld/testsuite/ld-loongarch-elf/max_imm_b21.d: New test.
* ld/testsuite/ld-loongarch-elf/max_imm_b26.d: New test.
* ld/testsuite/ld-loongarch-elf/max_imm_pcrel20.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_b16.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_b21.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_b26.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_pcrel20.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_0_10_10_16_s2.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_0_5_10_16_s2.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_10_12.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_10_16.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_10_16_s2.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_10_5.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_s_5_20.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_u.d: New test.
* ld/testsuite/ld-loongarch-elf/overflow_u_10_12.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_b16.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_b21.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_b26.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_pcrel20.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_0_10_10_16_s2.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_0_5_10_16_s2.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_10_12.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_10_16.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_10_16_s2.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_10_5.d: New test.
* ld/testsuite/ld-loongarch-elf/underflow_s_5_20.d: New test.
R_LARCH_IRELATIVE: For dynamic relocation that does not distinguish between
32/64 bits, size and bitsize set to 8 and 64.
R_LARCH_TLS_DESC64: Change size to 8.
R_LARCH_SOP_POP_32_S_0_5_10_16_S2: Change src_mask to 0, dst_mask to
0x03fffc1f.
I noticed a spot in ada-lang.c where the return value of
value_as_address was cast to CORE_ADDR -- a no-op cast. I searched
and found another. This patch fixes both.
PR gdb/31259 reveals one scenario where we run into a
heap-use-after-free reported by thread sanitizer, while running
gdb.base/vfork-follow-parent.exp.
The heap-use-after-free happens during the following scenario:
- linux_nat_wait_1 is about to return an event for T2. It stops all
other threads, and while doing so, stop_wait_callback -> wait_lwp
sees T1 exit, and decides to leave the exit event pending. It
should have set the lp->stopped flag too, but does not -- this is
the bug.
- The event for T2 is reported, is processed by infrun, and we're
back at linux_nat_wait_1.
- linux_nat_wait_1 selects LWP T1 with the pending exit status to
report.
- it sets variable lp to point to the corresponding lwp_info.
- it calls stop_callback and stop_wait_callback for all threads
(because !target_is_non_stop_p ()).
- it calls select_event_lwp to maybe pick another thread than T1, to
prevent starvation.
The problem is the following:
- while calling stop_wait_callback for all threads, it also does this
for T1. While doing so, the corresponding lwp_info is deleted
(callstack stop_wait_callback -> wait_lwp -> exit_lwp ->
delete_lwp), leaving variable lp as a dangling pointer.
- variable lp is passed to select_event_lwp, which derefences it,
which causes the heap-use-after-free.
Note that the comment here mentions "all other LWP's":
...
/* Now stop all other LWP's ... */
iterate_over_lwps (minus_one_ptid, stop_callback);
/* ... and wait until all of them have reported back that
they're no longer running. */
iterate_over_lwps (minus_one_ptid, stop_wait_callback);
...
The reason the comments say "all other LWP's", and doesn't bother
filtering out LP is that lp->stopped should be true at this point, and
the callbacks (both stop_callback and stop_wait_callback) check that
flag, and do nothing if set. I.e., they skip already-stopped threads,
so they should skip LP.
In this particular scenario, though, we missed setting the stopped
flag right in the first step described above, so LP was iterated over
incorrectly.
The fix is to make wait_lwp set the lp->stopped flag when it decides
to leave the exit event pending. However, going a bit further,
gdbserver has a mark_lwp_dead function to centralize setting up
various lwp flags such that the rest of the code doesn't mishandle
them, and it seems like a good idea to do a similar thing in gdb as
well. That is what this patch does.
PR gdb/31259
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Change-Id: I4a6169976f89bf714c478cbb2b7d4c32365e62a9