binutils-gdb/gdb/testsuite/gdb.base/premature-dummy-frame-removal.exp

73 lines
2.9 KiB
Plaintext
Raw Normal View History

# Copyright 2021-2022 Free Software Foundation, Inc.
gdb: avoid premature dummy frame garbage collection Consider the following chain of events: * GDB is performing an inferior call, and * the inferior calls longjmp, and * GDB detects that the longjmp has completed, stops, and enters check_longjmp_breakpoint_for_call_dummy (in breakpoint.c), and * GDB tries to unwind the stack in order to check that the dummy frame (setup for the inferior call) is still on the stack, but * The unwind fails, possibly due to missing debug information, so * GDB incorrectly concludes that the inferior has longjmp'd past the dummy frame, and so deletes the dummy frame, including the dummy frame breakpoint, but then * The inferior continues, and eventually returns to the dummy frame, which is usually (always?) on the stack, the inferior starts trying to execute the random contents of the stack, this results in undefined behaviour. This situation is already warned about in the comment on the function check_longjmp_breakpoint_for_call_dummy where we say: You should call this function only at places where it is safe to currently unwind the whole stack. Failed stack unwind would discard live dummy frames. The warning here is fine, the problem is that, even though we call the function from a location within GDB where we hope to be able to unwind, sometime the state of the inferior means that the unwind will not succeed. This commit tries to improve the situation by adding the following additional check; when GDB fails to find the dummy frame on the stack, instead of just assuming that the dummy frame can be garbage collected, first find the stop_reason for the last frame on the stack. If this stop_reason indicates that the stack unwinding may have failed then we assume that the dummy frame is still in use. However, if the last frame's stop_reason indicates that the stack unwind completed successfully then we can be confident that the dummy frame is no longer in use, and we garbage collect it. Tested on x86-64 GNU/Linux. gdb/ChangeLog: * breakpoint.c (check_longjmp_breakpoint_for_call_dummy): Add check for why the backtrace stopped. gdb/testsuite/ChangeLog: * gdb.base/premature-dummy-frame-removal.c: New file. * gdb.base/premature-dummy-frame-removal.exp: New file. * gdb.base/premature-dummy-frame-removal.py: New file. Change-Id: I8f330cfe0f3f33beb3a52a36994094c4abada07e
2019-08-29 19:37:00 +08:00
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# Make an inferior call to a function which uses longjmp. However,
# the backtrace for the function that is called is broken at the point
# where the longjmp is handled. This test is checking to see if the
# inferior call still completes successfully.
#
# This test forces a broken backtrace using Python, but in real life a
# broken backtrace can easily occur when calling through code for
# which there is no debug information if the prologue unwinder fails,
# which can often happen if the code has been optimized.
#
# The problem was that, due to the broken backtrace, GDB failed to
# find the inferior call's dummy frame. GDB then concluded that the
# inferior had longjmp'd backward past the dummy frame and so garbage
# collected the dummy frame, this causes the breakpoint within the
# dummy frame to be deleted.
#
# When the inferior continued, and eventually returned to the dummy
# frame, it would try to execute instruction from the dummy frame
# (which for most, or even all, targets, is on the stack), and then
# experience undefined behaviuor, often a SIGSEGV.
standard_testfile .c
if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
return -1
}
if ![runto_main] then {
return 0
}
# Skip this test if Python scripting is not enabled.
if { [skip_python_tests] } { continue }
set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
gdb_test_no_output "source ${pyfile}" "load python file"
gdb_test "p some_func ()" " = 0"
gdb: remove VALUE_FRAME_ID and fix another frame debug issue This commit was originally part of this patch series: (v1): https://sourceware.org/pipermail/gdb-patches/2021-May/179357.html (v2): https://sourceware.org/pipermail/gdb-patches/2021-June/180208.html (v3): https://sourceware.org/pipermail/gdb-patches/2021-July/181028.html However, that series is being held up in review, so I wanted to break out some of the non-related fixes in order to get these merged. This commit addresses two semi-related issues, both of which are problems exposed by using 'set debug frame on'. The first issue is in frame.c in get_prev_frame_always_1, and was introduced by this commit: commit a05a883fbaba69d0f80806e46a9457727fcbe74c Date: Tue Jun 29 12:03:50 2021 -0400 gdb: introduce frame_debug_printf This commit replaced fprint_frame with frame_info::to_string. However, the former could handle taking a nullptr while the later, a member function, obviously requires a non-nullptr in order to make the function call. In one place we are not-guaranteed to have a non-nullptr, and so, there is the possibility of triggering undefined behaviour. The second issue addressed in this commit has existed for a while in GDB, and would cause this assertion: gdb/frame.c:622: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->this_id.p != frame_id_status::COMPUTING' failed. We attempt to get the frame_id for a frame while we are computing the frame_id for that same frame. What happens is that when GDB stops we create a frame_info object for the sentinel frame (frame #-1) and then we attempt to unwind this frame to create a frame_info object for frame #0. In the test case used here to expose the issue we have created a Python frame unwinder. In the Python unwinder we attemt to read the program counter register. Reading this register will initially create a lazy register value. The frame-id stored in the lazy register value will be for the sentinel frame (lazy register values hold the frame-id for the frame from which the register will be unwound). However, the Python unwinder does actually want to examine the value of the program counter, and so the lazy register value is resolved into a non-lazy value. This sends GDB into value_fetch_lazy_register in value.c. Now, inside this function, if 'set debug frame on' is in effect, then we want to print something like: frame=%d, regnum=%d(%s), .... Where 'frame=%d' will be the relative frame level of the frame for which the register is being fetched, so, in this case we would expect to see 'frame=0', i.e. we are reading a register as it would be in frame #0. But, remember, the lazy register value actually holds the frame-id for frame #-1 (the sentinel frame). So, to get the frame_info for frame #0 we used to call: frame = frame_find_by_id (VALUE_FRAME_ID (val)); Where VALUE_FRAME_ID is: #define VALUE_FRAME_ID(val) (get_prev_frame_id_by_id (VALUE_NEXT_FRAME_ID (val))) That is, we start with the frame-id for the next frame as obtained by VALUE_NEXT_FRAME_ID, then call get_prev_frame_id_by_id to get the frame-id of the previous frame. The get_prev_frame_id_by_id function finds the frame_info for the given frame-id (in this case frame #-1), calls get_prev_frame to get the previous frame, and then calls get_frame_id. The problem here is that calling get_frame_id requires that we know the frame unwinder, so then have to try each frame unwinder in turn, which would include the Python unwinder.... which is where we started, and thus we have a loop! To prevent this loop GDB has an assertion in place, which is what actually triggers. Solving the assertion failure is pretty easy, if we consider the code in value_fetch_lazy_register and get_prev_frame_id_by_id then what we do is: 1. Start with a frame_id taken from a value, 2. Lookup the corresponding frame, 3. Find the previous frame, 4. Get the frame_id for that frame, and 5. Lookup the corresponding frame 6. Print the frame's level Notice that steps 3 and 5 give us the exact same result, step 4 is just wasted effort. We could shorten this process such that we drop steps 4 and 5, thus: 1. Start with a frame_id taken from a value, 2. Lookup the corresponding frame, 3. Find the previous frame, 6. Print the frame's level This will give the exact same frame as a result, and this is what I have done in this patch by removing the use of VALUE_FRAME_ID from value_fetch_lazy_register. Out of curiosity I looked to see how widely VALUE_FRAME_ID was used, and saw it was only used in one other place in valops.c:value_assign, where, once again, we take the result of VALUE_FRAME_ID and pass it to frame_find_by_id, thus introducing a redundant frame_id lookup. I don't think the value_assign case risks triggering the assertion though, as we are unlikely to call value_assign while computing the frame_id for a frame, however, we could make value_assign slightly more efficient, with no real additional complexity, by removing the use of VALUE_FRAME_ID. So, in this commit, I completely remove VALUE_FRAME_ID, and replace it with a use of VALUE_NEXT_FRAME_ID, followed by a direct call to get_prev_frame_always, this should make no difference in either case, and resolves the assertion issue from value.c. As I said, this patch was originally part of another series, the original test relied on the fixes in that original series. However, I was able to create an alternative test for this issue by enabling frame debug within an existing test script. This commit probably fixes bug PR gdb/27938, though the bug doesn't have a reproducer attached so it is not possible to know for sure. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27938
2021-05-26 22:50:05 +08:00
# When frame debugging is turned on, this test has (previously)
# revealed some crashes due to the Python frame unwinder trying to
# read registers.
#
# Enable frame debug and rerun the test. We don't bother checking the
# output of calling 'p some_func ()' as the output will be full of
# debug, to format of which isn't fixed. All we care about is that
# GDB is still running afterwards.
#
# All of the debug output makes this really slow when testing with the
# special read1 version of expect, hence the timeout factor.
with_read1_timeout_factor 10 {
gdb_test_no_output "set debug frame on"
gdb_test "p some_func ()" ".*" \
"repeat p some_func () with frame debug on"
gdb_test_no_output "set debug frame off"
}
gdb_test "p 1 + 2 + 3" " = 6"