gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
# Copyright 2022 Free Software Foundation, Inc.
|
|
|
|
|
|
|
|
# This program is free software; you can redistribute it and/or modify
|
|
|
|
# it under the terms of the GNU General Public License as published by
|
|
|
|
# the Free Software Foundation; either version 3 of the License, or
|
|
|
|
# (at your option) any later version.
|
|
|
|
#
|
|
|
|
# This program is distributed in the hope that it will be useful,
|
|
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
# GNU General Public License for more details.
|
|
|
|
#
|
|
|
|
# You should have received a copy of the GNU General Public License
|
|
|
|
# along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
# In the past we would use glibc's buffered input for the mi tty.
|
|
|
|
# This buffering would cause problems if two commands are sent to gdb
|
|
|
|
# in a single write call, and, if the first command (excluding its
|
|
|
|
# trailing newline) exactly filled glibc's internal buffer.
|
|
|
|
#
|
|
|
|
# The solution to this problem was to stop using glibc's buffering for
|
|
|
|
# the mi tty.
|
|
|
|
#
|
|
|
|
# To test for this situation we send two command to gdb in a loop, the
|
|
|
|
# first command gets progressively bigger. We check that gdb
|
|
|
|
# correctly sees both commands.
|
|
|
|
|
|
|
|
load_lib mi-support.exp
|
|
|
|
set MIFLAGS "-i=mi"
|
|
|
|
|
|
|
|
# Start gdb, passing ARGS to mi_gdb_start. Then run a series of tests
|
|
|
|
# passing two commands to gdb in a single write action. The first
|
|
|
|
# command is increasingly long, while the second command stays very
|
|
|
|
# short.
|
|
|
|
#
|
|
|
|
# Check that gdb sees, and performs, both commands.
|
|
|
|
proc run_test { args } {
|
|
|
|
global mi_gdb_prompt
|
|
|
|
global decimal
|
|
|
|
|
|
|
|
gdb_exit
|
|
|
|
if [mi_gdb_start $args] {
|
2022-05-14 00:23:57 +08:00
|
|
|
return
|
gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
set start 1
|
|
|
|
set limit 2049
|
|
|
|
|
|
|
|
mi_gdb_test "set \$a = \"FIRST COMMAND\"" ".*"
|
|
|
|
mi_gdb_test "set \$b = \"TEST COMPLETE\"" ".*"
|
|
|
|
|
|
|
|
for { set i $start } { $i < $limit } { incr i } {
|
|
|
|
|
|
|
|
set cmd ""
|
|
|
|
|
|
|
|
# Create a command that is at least `i` characters long.
|
gdb/testsuite: relax pattern in new gdb.mi/mi-multi-commands.exp test
I saw some failures in the test gdb.mi/mi-multi-commands.exp that I
added recently. This test was added in commit:
commit d08cbc5d3203118da5583296e49273cf82378042
Date: Wed Dec 22 12:57:44 2021 +0000
gdb: unbuffer all input streams when not using readline
The failures I see only occurred when my machine was very heavily
loaded.
In this test I send multiple commands from dejagnu to gdb with a
single send_gdb call. In a well behaving world what I want to happen
is that the gdb console sees both commands arrive and echos the text
of those commands. Then gdb starts processing the first command,
prints the result, and then processes the second command, and prints
the result.
However, what I saw in my loaded environment was that only after
sending the two commands, only the first command was echoed to gdb's
terminal. Then gdb started processing the first command, and started
to write the output. Now, mixed in with the first command output, the
second command was echoed to gdb's terminal. Finally, gdb would
finish printing the first command output, and would read and handle
the second command.
This mixing of command echoing with the first command output was
causing the test matching patterns to fail.
In this commit I change the command I use in the test from a CLI
command to an MI command, this reduces the number of lines of output
that come from the test, CLI commands sent through the MI interpreter
are echoed back like this:
(gdb)
set $a = "FIRST COMMAND"
&"set $a = \"FIRST COMMAND\"\n"
^done
(gdb)
While this is not the case for true MI command:
(gdb)
-data-evaluate-expression $a
^done,value="\"FIRST COMMAND\""
(gdb)
Less output makes for simpler patterns to match against.
Next, when sending two command to gdb I was previously trying to spot
the output of the first command followed by the prompt with nothing
between. This is not really needed, for the first command I can look
for just the ^done,value="\"FIRST COMMAND\"" string, then I can start
looking for the output of the second command.
So long as the second pattern matches up to the gdb prompt, then I can
be sure than nothing is left over in the expect buffer to muck up
later matches.
As to see the second command output gdb must have read in the second
command, the second command output never suffers from the corruption
that the first command output does.
Since making this change, I've not seen a failure in this test.
2022-02-08 04:35:58 +08:00
|
|
|
set first_cmd "-data-evaluate-expression \$a"
|
gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
while { [string length $first_cmd] < $i } {
|
|
|
|
set first_cmd " $first_cmd"
|
|
|
|
}
|
|
|
|
|
|
|
|
# We reset `i`, our loop counter, here. When i is large this
|
|
|
|
# should be a nop as we attempt to make the first command
|
|
|
|
# length be i above. However, the first time around the loop
|
|
|
|
# we start with an i value of 1, however, we can't make a
|
|
|
|
# command that short, so, by resetting i here we effectively
|
|
|
|
# skip the first couple of loop iterations where i is less
|
|
|
|
# than the minimum command length.
|
|
|
|
set i [string length $first_cmd]
|
|
|
|
verbose -log "length of first command is $i"
|
|
|
|
|
gdb/testsuite: relax pattern in new gdb.mi/mi-multi-commands.exp test
I saw some failures in the test gdb.mi/mi-multi-commands.exp that I
added recently. This test was added in commit:
commit d08cbc5d3203118da5583296e49273cf82378042
Date: Wed Dec 22 12:57:44 2021 +0000
gdb: unbuffer all input streams when not using readline
The failures I see only occurred when my machine was very heavily
loaded.
In this test I send multiple commands from dejagnu to gdb with a
single send_gdb call. In a well behaving world what I want to happen
is that the gdb console sees both commands arrive and echos the text
of those commands. Then gdb starts processing the first command,
prints the result, and then processes the second command, and prints
the result.
However, what I saw in my loaded environment was that only after
sending the two commands, only the first command was echoed to gdb's
terminal. Then gdb started processing the first command, and started
to write the output. Now, mixed in with the first command output, the
second command was echoed to gdb's terminal. Finally, gdb would
finish printing the first command output, and would read and handle
the second command.
This mixing of command echoing with the first command output was
causing the test matching patterns to fail.
In this commit I change the command I use in the test from a CLI
command to an MI command, this reduces the number of lines of output
that come from the test, CLI commands sent through the MI interpreter
are echoed back like this:
(gdb)
set $a = "FIRST COMMAND"
&"set $a = \"FIRST COMMAND\"\n"
^done
(gdb)
While this is not the case for true MI command:
(gdb)
-data-evaluate-expression $a
^done,value="\"FIRST COMMAND\""
(gdb)
Less output makes for simpler patterns to match against.
Next, when sending two command to gdb I was previously trying to spot
the output of the first command followed by the prompt with nothing
between. This is not really needed, for the first command I can look
for just the ^done,value="\"FIRST COMMAND\"" string, then I can start
looking for the output of the second command.
So long as the second pattern matches up to the gdb prompt, then I can
be sure than nothing is left over in the expect buffer to muck up
later matches.
As to see the second command output gdb must have read in the second
command, the second command output never suffers from the corruption
that the first command output does.
Since making this change, I've not seen a failure in this test.
2022-02-08 04:35:58 +08:00
|
|
|
set cmd "${first_cmd}\n-data-evaluate-expression \$b\n"
|
gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
|
|
|
|
# We need to call send_gdb ourselves here as gdb_test_multiple
|
|
|
|
# will try to send each line of the command separately (breaking
|
|
|
|
# the command at newline characters). This splitting will more
|
|
|
|
# than likely mean that gdb will see and process the first command
|
|
|
|
# before the second command arrives, this prevents the bug from
|
|
|
|
# triggering.
|
|
|
|
send_gdb "$cmd"
|
|
|
|
|
|
|
|
# Now check for output from the two commands. We do this
|
|
|
|
# using two calls to gdb_test_multiple, this is because the
|
|
|
|
# echoing of the second command can sometime get mixed
|
|
|
|
# unexpectedly with the command output, this is especially
|
|
|
|
# likely when running using the read1 technique.
|
|
|
|
#
|
|
|
|
# When using a single gdb_test_multiple we need to anchor
|
|
|
|
# patterns using a ^, however, this requires us to consume and
|
|
|
|
# discard all lines that are not part of the output that we're
|
|
|
|
# looking for. However, due to the unpredictable
|
|
|
|
# intermingling, it's much easier if we drop the ^ anchor.
|
|
|
|
# However, with this gone dejagnu would sometimes match the
|
|
|
|
# second comand output before the first commands output.
|
|
|
|
#
|
|
|
|
# This approach just looks for the first command output, then,
|
|
|
|
# once that has been found, we start looking for the second
|
|
|
|
# command output, this seems pretty reliable.
|
|
|
|
set seen_first_message false
|
|
|
|
set seen_second_message false
|
|
|
|
|
|
|
|
gdb_test_multiple "" "look for first command output, command length $i" -prompt "$mi_gdb_prompt" {
|
2022-05-09 23:49:03 +08:00
|
|
|
-re "\\^done.*,value=\"\\\\\"FIRST COMMAND\\\\\"\"" {
|
gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
set seen_first_message true
|
gdb/testsuite: address test failures in gdb.mi/mi-multi-commands.exp
The gdb.mi/mi-multi-commands.exp test was added in commit:
commit d08cbc5d3203118da5583296e49273cf82378042
Date: Wed Dec 22 12:57:44 2021 +0000
gdb: unbuffer all input streams when not using readline
And then tweaked in commit:
commit 144459531dd68a1287905079aaa131b777a8cc82
Date: Mon Feb 7 20:35:58 2022 +0000
gdb/testsuite: relax pattern in new gdb.mi/mi-multi-commands.exp test
The second of these commits was intended to address periodic test
failures that I was seeing, and this change did fix some problems,
but, unfortunately, introduced other issues.
The problem is that the test relies on sending two commands to GDB in
a single write. As the characters that make these two commands arrive
they are echoed to GDB's console. However, there is a race between
how quickly the characters are echoed and how quickly GDB decides to
act on the incoming commands.
Usually, both commands are echoed in full before GDB acts on the first
command, but sometimes this is not the case, and GDB can execute the
first command before both commands are fully echoed to the console.
In this case, the output of the first command will be mixed in with
the echoing of the second command.
This mixing of the command echoing and the first command output is
what was causing failures in the original version of the test.
The second commit relaxed the expected output pattern a little, but
was still susceptible to failures, so this commit further relaxes the
pattern.
Now, we look for the first command output with no regard to what is
before, or after the command. Then we look for the first mi prompt to
indicate that the first command has completed.
I believe that this change should make the test more stable than it
was before.
2022-03-22 18:23:26 +08:00
|
|
|
exp_continue
|
|
|
|
}
|
|
|
|
-re "\r\n$mi_gdb_prompt" {
|
|
|
|
gdb_assert $seen_first_message $gdb_test_name
|
gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
gdb_test_multiple "" "look for second command output, command length $i" -prompt "$mi_gdb_prompt" {
|
gdb/testsuite: relax pattern in new gdb.mi/mi-multi-commands.exp test
I saw some failures in the test gdb.mi/mi-multi-commands.exp that I
added recently. This test was added in commit:
commit d08cbc5d3203118da5583296e49273cf82378042
Date: Wed Dec 22 12:57:44 2021 +0000
gdb: unbuffer all input streams when not using readline
The failures I see only occurred when my machine was very heavily
loaded.
In this test I send multiple commands from dejagnu to gdb with a
single send_gdb call. In a well behaving world what I want to happen
is that the gdb console sees both commands arrive and echos the text
of those commands. Then gdb starts processing the first command,
prints the result, and then processes the second command, and prints
the result.
However, what I saw in my loaded environment was that only after
sending the two commands, only the first command was echoed to gdb's
terminal. Then gdb started processing the first command, and started
to write the output. Now, mixed in with the first command output, the
second command was echoed to gdb's terminal. Finally, gdb would
finish printing the first command output, and would read and handle
the second command.
This mixing of command echoing with the first command output was
causing the test matching patterns to fail.
In this commit I change the command I use in the test from a CLI
command to an MI command, this reduces the number of lines of output
that come from the test, CLI commands sent through the MI interpreter
are echoed back like this:
(gdb)
set $a = "FIRST COMMAND"
&"set $a = \"FIRST COMMAND\"\n"
^done
(gdb)
While this is not the case for true MI command:
(gdb)
-data-evaluate-expression $a
^done,value="\"FIRST COMMAND\""
(gdb)
Less output makes for simpler patterns to match against.
Next, when sending two command to gdb I was previously trying to spot
the output of the first command followed by the prompt with nothing
between. This is not really needed, for the first command I can look
for just the ^done,value="\"FIRST COMMAND\"" string, then I can start
looking for the output of the second command.
So long as the second pattern matches up to the gdb prompt, then I can
be sure than nothing is left over in the expect buffer to muck up
later matches.
As to see the second command output gdb must have read in the second
command, the second command output never suffers from the corruption
that the first command output does.
Since making this change, I've not seen a failure in this test.
2022-02-08 04:35:58 +08:00
|
|
|
-re "\\^done,value=\"\\\\\"TEST COMPLETE\\\\\"\"\r\n$mi_gdb_prompt" {
|
gdb: unbuffer all input streams when not using readline
This commit should fix PR gdb/28711. What's actually going on is
pretty involved, and there's still a bit of the story that I don't
understand completely, however, from my observed results, I think that
the change I propose making here (or something very similar) is going
to be needed.
The original bug report involves using eclipse to drive gdb using mi
commands. A separate tty is spun off in which to send gdb the mi
commands, this tty is created using the new-ui command.
The behaviour observed is that, given a particular set of mi commands
being sent to gdb, we sometimes see an ESPIPE error from a lseek
call, which ultimately results in gdb terminating.
The problems all originate from gdb_readline_no_editing_callback in
gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
allow glibc to perform buffering on the FILE object being used.
I say sometime, because, gdb_readline_no_editing_callback already
includes a call to disable the glibc buffering, but this is only done
if the input stream is not a tty. In our case the input stream is a
tty, so the buffering is left in place.
The first step to understanding why this problem occurs is to
understand that eclipse sends multiple commands to gdb very quickly
without waiting for and answer to each command, eclipse plans to
collect all of the command results after sending all the commands to
gdb. In fact, eclipse sends the commands to gdb that they appear to
arrive in the gdb process as a single block of data. When reproducing
this issue within the testsuite I find it necessary to send multiple
commands using a single write call.
The next bit of the story gets a little involved, and this is where my
understanding is not complete. I can describe the behaviour that I
observe, and (for me at least) I'm happy that what I'm seeing, if a
little strange, is consistent. In order to fully understand what's
going on I think I would likely need to dive into kernel code, which
currently seems unnecessary given that I'm happy with the solution I'm
proposing.
The following description all relates to input from a tty in which I'm
not using readline. I see the same problems either when using a
new-ui tty, or with gdb's standard, non-readline, mi tty.
Here's what I observe happening when I send multiple commands to gdb
using a single write, if I send gdb this:
command_1\ncommand_2\ncommand_3
then gdb's event loop will wake up (from its select) as it sees there
is input available. We call into gdb_readline_no_editing_callback,
where we call fgetc, glibc will do a single big read, and get back
just:
command_1\n
that is, despite there being multiple lines of input available, I
consistently get just a single line. From glibc a single character is
returned from the fgetc call, and within gdb we accumulate characters,
one at a time, into our own buffer. Eventually gdb sees the '\n'
character, and dispatches the whole 'command_1' into gdb's command
handler, which processes the command and prints the result. We then
return to gdb_readline_no_editing_callback, which in turn returns to
gdb's event loop where we re-enter the select.
Inside the select we immediately see that there is more input waiting
on the input stream, drop out of the select, and call back into
gdb_readline_no_editing_callback. In this function we again call
fgetc where glibc performs another big read. This time glibc gets:
command_2\n
that is, we once again get just a single line, despite there being a
third line available. Just like the first command we copy the whole
string, character by character into gdb's buffer, then handle the
command. After handling the command we go to the event loop, enter,
and then exit the select, and call back to the function
gdb_readline_no_editing_callback.
In gdb_readline_no_editing_callback we again call fgetc, this time
glibc gets the string:
command_3\n
like before, we copy this to gdb's buffer and handle the command, then
we return to the event loop. At this point the select blocks while we
wait for more input to arrive.
The important bit of this is that someone, somewhere is, it appears,
taking care to split the incoming write into lines.
My next experiment is to try something like:
this_is_a_very_long_command\nshort_command\n
However, I actually make 'this_is_a_very_long_command' very long, as
in many hundreds of characters long. One way to do this is:
echo xxxxxx.....xxxxx
and just adding more and more 'x' characters as needed. What I'm
aiming for is to have the first command be longer than glibc's
internal read buffer, which, on my machine, is 1024 characters.
However, for this discussion, lets imagine that glibc's buffer is just
8 characters (we can create just this situation by adding a suitable
setbuf call into gdb_readline_no_editing_callback).
Now, if I send gdb this data:
abcdefghij\nkl\n
The first read from glibc will get 'abcdefgh', that is a full 8
character buffer. Once gdb has copied these to its buffer we call
fgetc again, and now glibc will get 'ij\n', that is, just like before,
multiple lines are split at the '\n' character. The full command,
which is now in gdb's buffer can be handled 'abcdefghij', after which
we go (via the event loop) back to gdb_readline_no_editing_callback.
Now we call fgetc, and glibc will get 'kl\n', which is then handled in
the normal way.
So far, so good. However, there is, apparently, one edge case where
the above rules don't apply.
If the '\n' character is the first character read from the kernel,
then the incoming lines are not split up. So, given glibc's 8
character buffer, if I send gdb this:
abcdefgh\nkl\n
that is the first command is 8 characters plus a newline, then, on the
first read (from within glibc) we get 'abcdefgh' in a single buffer.
As there's no newline gdb calls fgetc again, and glibc does another
large read, now we get:
\nkl\n
which doesn't follow the above pattern - the lines are not split into
separate buffers!
So, gdb reads the first character from glibc using fgetc, this is the
newline. Now gdb has a complete command, and so the command is
handled. We then return to the event loop and enter the select.
The problem is that, as far as the kernel is concerned, there is no
more input pending, it's all been read into glibc's buffer, and so the
select doesn't return. The second command is basically stuck in
glibc's buffer.
If I send another command to gdb, or even just send an empty
command (a lone newline) then the select returns, we call into
gdb_readline_no_editing_callback, and now gdb sees the second
command.
OK, so the above is interesting, but it doesn't explain the ESPIPE
error.
Well, that's a slightly different, but related issue. The ESPIPE
case will _only_ show up when using new-ui to create the separate tty
for mi commands, and is a consequence of this commit:
commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
Date: Thu Jul 18 17:20:04 2019 +0100
Fix for using named pipes on Windows
Prior to this commit, the new-ui command would open the tty three
times, once each for stdin, stderr, and stdout. After this commit we
open the tty just once and reuse the FILE object for all three roles.
Consider the problem case, where glibc has (unexpectedly) read the
second command into its internal buffer. When we handle the first
command we usually end up having to write something to the mi output
stream.
After the above commit the same FILE object represents both the input
and output streams, so, when gdb tries to write to the FILE object,
glibc spots that there is input pending within the input buffer, and
so assumes that we have read ahead of where we should be in the input
file. To correct for this glibc tries to do an lseek call to
reposition the file offset of the output stream prior to writing to
it. However, as the output stream is a tty, and seeking is not
supported on a tty, this lseek call fails, this results in the ESPIPE,
which ultimately causes gdb to terminate.
So, now we understand why the ESPIPE triggers (which was what caused
the gdb crash in the original bug report), and we also understand that
sometime gdb will not handle the second command in a timely
fashion (if the first command is just the wrong length). So, what to
do about all this?
We could revert the commit mentioned above (and implement its
functionality another way). This would certainly resolve the ESPIPE
issue, the buffered input would now only be on the input stream, the
output stream would have no buffered input, and so glibc would never
try to lseek, and so we'd never get the ESPIPE error.
However, this only solves one of the two problems. We would still
suffer from the problem where, if the first command is just the wrong
length, the second command will not (immediately) get handled.
The only solution I can see to this problem is to unbuffer the input
stream. If glibc is not buffering the input, but instead, we read
incoming data character by character from the kernel, then everything
will be fine. As soon as we see the newline at the end of the first
command we will handle the first command. As glibc will have no
buffered input it will not be tempted to lseek, so no ESPIPE error.
When we go have to the event loop there will be more data pending in
the kernel, so the select will immediately return, and the second
command will be processed.
I'm tempted to suggest that we should move the unbuffering of the
input stream out of gdb_readline_no_editing_callback and do it
somewhere earlier, more like when we create the input streams.
However, I've not done that in this commit for a couple of reasons:
1. By keeping the unbuffering in gdb_readline_no_editing_callback
I'm making the smallest possible change that fixes the bug. Moving
the unbuffering somewhere better can be done as a refactor later, if
that 's felt to be important,
2. I don't think making repeated calls to unbuffer the input will
have that much performance impact. We only make the unbuffer call
once per call to gdb_readline_no_editing_callback, and, if the input
stream is already unbuffered we'll return pretty quickly, so I don't
see this as being massively costly,
3. Tom is currently doing lots of gdb stream management changes and
I want to minimise the chances we'll conflict.
So, this commit just changes gdb_readline_no_editing_callback to
always unbuffer the input stream.
The test for this issue sends two commands in a loop, with the first
command growing bigger each time around the loop. I actually make the
first command bigger by just adding whitespace to the front, as gdb
still has to read the complete command (including whitespace) via
glibc, so this is enough to trigger the bug.
The original bug was reported when using a virtual machine, and in
this situation we see this in the strace output:
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
I'm not completely sure what's going on here, but it appears that the
kernel on the virtual machine is delivering the input to glibc slower
than I see on my real hardware; glibc asks for 1024 bytes, but only
gets 64 bytes the first time. In the second read we see the problem
case, the first character is the newline, but then the entire second
command is included.
If I run this exact example on my real hardware then the first command
would not be truncated at 64 bytes, instead, I'd expect to see the
newline included in the first read, with the second command split into
a second read.
So, for testing, I check cases where the first command is just a few
characters (starting at 8 character), all the way up to 2048
characters. Hopefully, this should mean we hit the problem case for
most machine setups.
The only last question relates to commit afe09f0b6311a4d that I
mentioned earlier. That commit was intended to provide support for
Microsoft named pipes:
https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
I know next to nothing about this topic beyond a brief scan of the
above link, but I think these windows named pipe are closer in
behaviour to unix sockets than to unix named fifos.
I am a little nervous that, after the above commit, we now use the
same FILE for in, err, and out streams. In contrast, in a vanilla C
program, I would expect different FILE objects for each stream.
Still, I'm reluctant to revert the above commit (and provide the same
functionality a different way) without a specific bug to point at,
and, now that the streams are unbuffered, I expect a lot of the read
and write calls are going straight to the kernel with minimal glibc
involvement, so maybe it doesn't really matter. Anyway, I haven't
touched the above patch, but it is something to keep in mind when
working in this area.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
2021-12-22 20:57:44 +08:00
|
|
|
pass $gdb_test_name
|
|
|
|
set seen_second_message true
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
# If one of the above tests failed then lets no waste our time
|
|
|
|
# checking different command lengths. The actual bug this
|
|
|
|
# test checks for would result in a timeout, so we don't want
|
|
|
|
# to risk lots more timeouts.
|
|
|
|
if { ! [expr $seen_first_message && $seen_second_message ] } {
|
|
|
|
break
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
foreach_with_prefix args { "" "separate-mi-tty" } {
|
|
|
|
run_test $args
|
|
|
|
}
|