binutils-gdb/gdb/dwarf2/line-header.h
Simon Marchi f71ad5556c gdb: add "id" fields to identify symtabs and subfiles
Printing macros defined in the main source file doesn't work reliably
using various toolchains, especially when DWARF 5 is used.  For example,
using the binaries produced by either of these commands:

    $ gcc --version
    gcc (GCC) 11.2.0
    $ ld --version
    GNU ld (GNU Binutils) 2.38
    $ gcc test.c -g3 -gdwarf-5

    $ clang --version
    clang version 13.0.1
    $ clang test.c -gdwarf-5 -fdebug-macro

I get:

    $ ./gdb -nx -q --data-directory=data-directory a.out
    (gdb) start
    Temporary breakpoint 1 at 0x111d: file test.c, line 6.
    Starting program: /home/simark/build/binutils-gdb-one-target/gdb/a.out

    Temporary breakpoint 1, main () at test.c:6
    6         return ZERO;
    (gdb) p ZERO
    No symbol "ZERO" in current context.

When starting to investigate this (taking the gcc-compiled binary as an
example), we see that GDB fails to look up the appropriate macro scope
when evaluating the expression.  While stopped in
macro_lookup_inclusion:

    (top-gdb) p name
    $1 = 0x62100011a980 "test.c"
    (top-gdb) p source.filename
    $2 = 0x62100011a9a0 "/home/simark/build/binutils-gdb-one-target/gdb/test.c"

`source` is the macro_source_file that we would expect GDB to find.
`name` comes from the symtab::filename field of the symtab we are
stopped in.  GDB doesn't find the appropriate macro_source_file because
the name of the macro_source_file doesn't match exactly the name of the
symtab.

The name of the main symtab comes from the compilation unit's
DW_AT_name, passed to the buildsym_compunit's constructor:

  4815d6125e/gdb/dwarf2/read.c (L10627-10630)

The contents of DW_AT_name, in this case, is "test.c".  It is typically
(what I witnessed all compilers do) the same string that was passed to
the compiler on the command-line.

The name of the macro_source_file comes from the line number program
header's file table, from the call to the line_header::file_file_name
method:

  4815d6125e/gdb/dwarf2/macro.c (L54-65)

line_header::file_file_name prepends the directory path that the file
entry refers to, in the file table (if the file name is not already
absolute).  In this case, the file name is "test.c", appended to the
directory "/home/simark/build/binutils-gdb-one-target/gdb".

Because the symtab's name is not created the same way as the
macro_source_file's name is created, we get this mismatch.  GDB fails to
find the appropriate macro scope for the symtab, and we can't print
macros when stopped in that symtab.

To make this work, we must ensure that paths produced in these two ways
end up identical.  This can be tricky because of the different ways a
path can be passed to the compiler by the user.

Another thing to consider is that while the main symtab's name (or
subfile, before it becomes a symtab) is created using DW_AT_name, the
main symtab is also referred to using its entry in the line table
header's file table, when processing the line table.  We must therefore
ensure that the same name is produced in both cases, so that a call to
"start_subfile" for the main subfile will correctly find the
already-created subfile, created by buildsym_compunit's constructor.  If
we fail to do that, things still often work, because of a fallback: the
watch_main_source_file_lossage method.  This method determines that if
the main subfile has no symbols but there exists another subfile with
the same basename (e.g. "test.c") that does have symbols, it's probably
because there was some filename mismatch.  So it replaces the main
subfile with that other subfile.  I think that heuristic is useful as a
last effort to work around any bug or bad debug info, but I don't think
we should design things such as to rely on it.  It's a heuristic, it can
get things wrong.  So in my search for a fix, it is important that given
some good debug info, we don't end up relying on that for things to
work.

A first attempt at fixing this was to try to prepend the compilation
directory here or not prepend it there.  In practice, because of all the
possible combinations of debug info the compilers produce, it was not
possible to get something that would produce reliable, consistent paths.

Another attempt at fixing this was to make both macro_source_file
objects and symtab objects use the most complete form of path possible.
That means to prepend directories at least until we get an absolute
path.  In theory, we should end up with the same path in all cases.
This generally worked, but because it changed the symtab names, it
resulted in user-visible changes (for example, paths to source files in
Breakpoint hit messages becoming always absolute).  I didn't find this
very good, first because there is a "set filename-display" setting that
lets the user control how they want the paths to be displayed, and that
would suddenly make this setting completely ineffective (although even
today, it is a bit dependent on the debug info).  Second, it would
require a good amount of testsuite tweaks to make tests accept these
suddenly absolute paths.

This new patch is a slight variation of that: it adds a new field called
"filename_for_id" in struct symtab and struct subfile, next to the
existing filename field. The goal is to separate the internal ids used
for finding objects from the names used for presentation.  This field is
used for identifying subfiles, symtabs and macro_source_files
internally.  For DWARF symtabs, this new field is meant to contain the
"most complete possible" path, as discussed above.  So for a given file,
it must always be in the same form, everywhere.  The existing
symtab::filename field remains the one used for printing to the user, so
there shouldn't be any change in how paths are printed.

Changes in the core symtab files are:

 - Add "name_for_id" and "filename_for_id" fields to "struct subfile"
   and "struct symtab", next to existing "name" and "filename" fields.
 - Make buildsym_compunit::buildsym_compunit and
   buildsym_compunit::start_subfile accept a "name_for_id" parameter
   next to the existing "name" ones.
 - Make buildsym_compunit::start_subfile use "name_for_id" for looking
   up existing subfiles.  This is the key thing for making calls
   to start_subfile for the main source file look up the existing
   subfile successfully, and avoid relying on
   watch_main_source_file_lossage.
 - Make sal_macro_scope pass "filename_for_id", rather than "filename",
   to macro_lookup_inclusion.  This is the key thing to making the
   lookup work and macro printing work.

Changes in the DWARF files are:

 - Make line_header::file_file_name return the "most complete possible"
   name.  The only pre-existing user of this method is the macro code,
   to give the macro_source_file objects their name.  And we now want
   them to have this "most complete possible" name, which will match the
   corresponding symtab's "filename_for_id".
 - Make dwarf2_cu::start_compunit_symtab pass the "most complete
   possible" name for the main symtab's "filename_for_id".  In this
   context, where the info comes from the compilation unit's DW_AT_name
   / DW_AT_comp_dir, it means prepending DW_AT_comp_dir to DW_AT_name if
   DW_AT_name is not already absolute.
 - Change dwarf2_start_subfile to build a name_for_id for the subfile
   being started.  The simplest way is to re-use
   line_header::file_file_name, since the callers always have a
   file_entry handy.  This ensures that it will get the exact same path
   representation as the macro code does, for the same file (since it
   also uses line_header::file_file_name).
 - Update calls to allocate_symtab to pass the "name_for_id" from the
   subfile.

Tests exercising all this are added by the following patch.

Of all the cases I tried, the only one I found that ends up relying on
watch_main_source_file_lossage is the following one:

    $ clang --version
    clang version 13.0.1
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    $ clang  ./test.c -g3 -O0 -gdwarf-4
    $ ./gdb -nx --data-directory=data-directory -q -readnow -iex "set debug symtab-create 1"  a.out
    ...
    [symtab-create] start_subfile: name = test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/test.c
    [symtab-create] start_subfile: name = ./test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/./test.c
    [symtab-create] start_subfile: name = ./test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/./test.c
    [symtab-create] start_subfile: found existing symtab with name_for_id /home/simark/build/binutils-gdb-one-target/gdb/./test.c (/home/simark/build/binutils-gdb-one-target/gdb/./test.c)
    [symtab-create] watch_main_source_file_lossage: using subfile ./test.c as the main subfile

As we can see, there are two forms used for "test.c", one with a "." and
one without.  This comes from the fact that the compilation unit DIE
contains:

    DW_AT_name ("test.c")
    DW_AT_comp_dir ("/home/simark/build/binutils-gdb-one-target/gdb")

without a ".", and the line table for that file contains:

    include_directories[  1] = "."
    file_names[  1]:
               name: "test.c"
          dir_index: 1

When assembling the filename from that entry, we get a ".".

It is a bit unexpected that the main filename resulting from the line
table header does not match exactly the name in the compilation unit.
For instance, gcc uses "./test.c" for the DW_AT_name, which gives
identical paths in the compilation unit and in the line table header.

Similarly, with DWARF 5:

    $ clang  ./test.c -g3 -O0 -gdwarf-5

clang create two entries that refer to the same file but are of in a different
form.

    include_directories[  0] = "/home/simark/build/binutils-gdb-one-target/gdb"
    include_directories[  1] = "."
    file_names[  0]:
               name: "test.c"
          dir_index: 0
    file_names[  1]:
               name: "test.c"
          dir_index: 1

The first file name produces a path without a "." while the second does.
This is not caught by watch_main_source_file_lossage, because of
dwarf_decode_lines that creates a symtab for each file entry in the line
table.  It therefore appears as "non-empty" to
watch_main_source_file_lossage.  This results in two symtabs:

    (gdb) maintenance info symtabs
    { objfile /home/simark/build/binutils-gdb-one-target/gdb/a.out ((struct objfile *) 0x613000005d00)
      { ((struct compunit_symtab *) 0x62100011aca0)
        debugformat DWARF 5
        producer clang version 13.0.1
        name test.c
        dirname /home/simark/build/binutils-gdb-one-target/gdb
        blockvector ((struct blockvector *) 0x621000129ec0)
        user ((struct compunit_symtab *) (null))
            { symtab test.c ((struct symtab *) 0x62100011ad20)
              fullname (null)
              linetable ((struct linetable *) 0x0)
            }
            { symtab ./test.c ((struct symtab *) 0x62100011ad60)
              fullname (null)
              linetable ((struct linetable *) 0x621000129ef0)
            }
      }
    }

I am not sure what is the consequence of this, but this is also what
happens before my patch, so I think its acceptable to leave it as-is.

To handle these two cases nicely, I think we will need a function that
removes the unnecessary "." from path names, something that can be done
later.

Finally, I made a change in find_file_and_directory is necessary to
avoid breaking test

    gdb.dwarf2/dw2-compdir-oldgcc.exp: info source gcc42

Without that change, we would get:

    (gdb) info source
    Current source file is /dir/d/dw2-compdir-oldgcc42.S
    Compilation directory is /dir/d

whereas the expected result is:

    (gdb) info source
    Current source file is dw2-compdir-oldgcc42.S
    Compilation directory is /dir/d

This test was added here:

  https://sourceware.org/pipermail/gdb-patches/2012-November/098144.html

Long story short, GCC <= 4.2 apparently had a bug where it would
generate a DW_AT_name with a full path ("/dir/d/dw2-compdir-oldgcc42.S")
and no DW_AT_comp_dir.  The line table has one entry with filename
"dw2-compdir-oldgcc42.S", which refers to directory 0.  Directory 0
normally refers to the compilation unit's comp dir, but it is
non-existent in this case.

This caused some symtab lookup problems, and to work around them, some
workaround was added, which today reads as:

    if (res.get_comp_dir () == nullptr
        && producer_is_gcc_lt_4_3 (cu)
        && res.get_name () != nullptr
        && IS_ABSOLUTE_PATH (res.get_name ()))
      res.set_comp_dir (ldirname (res.get_name ()));

Source: 6577f365eb/gdb/dwarf2/read.c (L9428-9432)

It extracts an artificial DW_AT_comp_dir from DW_AT_name, if there is no
DW_AT_comp_dir and DW_AT_name is absolute.

Prior to my patch, a subfile would get created with filename
"/dir/d/dw2-compdir-oldgcc42.S", from DW_AT_name, and another would get
created with filename "dw2-compdir-oldgcc42.S" from the line table's
file table.  Then watch_main_source_file_lossage would kick in and merge
them, keeping only the "dw2-compdir-oldgcc42.S" one:

    [symtab-create] start_subfile: name = /dir/d/dw2-compdir-oldgcc42.S
    [symtab-create] start_subfile: name = dw2-compdir-oldgcc42.S
    [symtab-create] start_subfile: name = dw2-compdir-oldgcc42.S
    [symtab-create] start_subfile: found existing symtab with name dw2-compdir-oldgcc42.S (dw2-compdir-oldgcc42.S)
    [symtab-create] watch_main_source_file_lossage: using subfile dw2-compdir-oldgcc42.S as the main subfile

And so "info source" would show "dw2-compdir-oldgcc42.S" as the
filename.

With my patch applied, but without the change in
find_file_and_directory, both DW_AT_name and the line table would try to
start a subfile with the same filename_for_id, and there was no need for
watch_main_source_file_lossage - which is what we want:

[symtab-create] start_subfile: name = /dir/d/dw2-compdir-oldgcc42.S, name_for_id = /dir/d/dw2-compdir-oldgcc42.S
[symtab-create] start_subfile: name = dw2-compdir-oldgcc42.S, name_for_id = /dir/d/dw2-compdir-oldgcc42.S
[symtab-create] start_subfile: found existing symtab with name_for_id /dir/d/dw2-compdir-oldgcc42.S (/dir/d/dw2-compdir-oldgcc42.S)
[symtab-create] start_subfile: name = dw2-compdir-oldgcc42.S, name_for_id = /dir/d/dw2-compdir-oldgcc42.S
[symtab-create] start_subfile: found existing symtab with name_for_id /dir/d/dw2-compdir-oldgcc42.S (/dir/d/dw2-compdir-oldgcc42.S)

But since the one with name == "/dir/d/dw2-compdir-oldgcc42.S", coming
from DW_AT_name, gets created first, it wins, and the symtab ends up
with "/dir/d/dw2-compdir-oldgcc42.S" as the name, "info source" shows
"/dir/d/dw2-compdir-oldgcc42.S" and the test breaks.

This is not wrong per-se, after all DW_AT_name is
"/dir/d/dw2-compdir-oldgcc42.S", so it wouldn't be wrong to report the
current source file as "/dir/d/dw2-compdir-oldgcc42.S".  If you compile
a file passing "/an/absolute/path.c", DW_AT_name typically contains (at
least with GCC) "/an/absolute/path.c" and GDB tells you that the source
file is "/an/absolute/path.c".  But we can also keep the existing
behavior fairly easily with a little change in find_file_and_directory.
When extracting an artificial DW_AT_comp_dir from DW_AT_name, we now
modify the name to just keep the file part.  The result is coherent with
what compilers do when you compile a file by just passing its filename
("gcc path.c -g"):

      DW_AT_name        ("path.c")
      DW_AT_comp_dir    ("/home/simark/build/binutils-gdb-one-target/gdb")

With this change, filename_for_id is still the full name,
"/dir/d/dw2-compdir-oldgcc42.S", but the filename of the subfile /
symtab (what ends up shown by "info source") is just
"dw2-compdir-oldgcc42.S", and that makes the test happy.

Change-Id: I8b5cc4bb3052afdb172ee815c051187290566307
2022-07-29 20:54:49 -04:00

223 lines
7.4 KiB
C++

/* DWARF 2 debugging format support for GDB.
Copyright (C) 1994-2022 Free Software Foundation, Inc.
This file is part of GDB.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */
#ifndef DWARF2_LINE_HEADER_H
#define DWARF2_LINE_HEADER_H
#include "gdbtypes.h"
/* dir_index is 1-based in DWARF 4 and before, and is 0-based in DWARF 5 and
later. */
typedef int dir_index;
/* file_name_index is 1-based in DWARF 4 and before, and is 0-based in DWARF 5
and later. */
typedef int file_name_index;
struct line_header;
struct file_entry
{
file_entry () = default;
file_entry (const char *name_, file_name_index index_, dir_index d_index_,
unsigned int mod_time_, unsigned int length_)
: name (name_),
index (index_),
d_index (d_index_),
mod_time (mod_time_),
length (length_)
{}
/* Return the include directory at D_INDEX stored in LH. Returns
NULL if D_INDEX is out of bounds. */
const char *include_dir (const line_header *lh) const;
/* The file name. Note this is an observing pointer. The memory is
owned by debug_line_buffer. */
const char *name {};
/* The index of this file in the file table. */
file_name_index index {};
/* The directory index (1-based). */
dir_index d_index {};
unsigned int mod_time {};
unsigned int length {};
/* The associated symbol table, if any. */
struct symtab *symtab {};
};
/* The line number information for a compilation unit (found in the
.debug_line section) begins with a "statement program header",
which contains the following information. */
struct line_header
{
/* COMP_DIR is the value of the DW_AT_comp_dir attribute of the compilation
unit in the context of which we are reading this line header, or nullptr
if unknown or not applicable. */
explicit line_header (const char *comp_dir)
: offset_in_dwz {}, m_comp_dir (comp_dir)
{}
/* This constructor should only be used to create line_header intances to do
hash table lookups. */
line_header (sect_offset sect_off, bool offset_in_dwz)
: sect_off (sect_off),
offset_in_dwz (offset_in_dwz)
{}
/* Add an entry to the include directory table. */
void add_include_dir (const char *include_dir);
/* Add an entry to the file name table. */
void add_file_name (const char *name, dir_index d_index,
unsigned int mod_time, unsigned int length);
/* Return the include dir at INDEX (0-based in DWARF 5 and 1-based before).
Returns NULL if INDEX is out of bounds. */
const char *include_dir_at (dir_index index) const
{
int vec_index;
if (version >= 5)
vec_index = index;
else
vec_index = index - 1;
if (vec_index < 0 || vec_index >= m_include_dirs.size ())
return NULL;
return m_include_dirs[vec_index];
}
bool is_valid_file_index (int file_index) const
{
if (version >= 5)
return 0 <= file_index && file_index < file_names_size ();
return 1 <= file_index && file_index <= file_names_size ();
}
/* Return the file name at INDEX (0-based in DWARF 5 and 1-based before).
Returns NULL if INDEX is out of bounds. */
file_entry *file_name_at (file_name_index index)
{
int vec_index;
if (version >= 5)
vec_index = index;
else
vec_index = index - 1;
if (vec_index < 0 || vec_index >= m_file_names.size ())
return NULL;
return &m_file_names[vec_index];
}
/* A const overload of the same. */
const file_entry *file_name_at (file_name_index index) const
{
line_header *lh = const_cast<line_header *> (this);
return lh->file_name_at (index);
}
/* The indexes are 0-based in DWARF 5 and 1-based in DWARF 4. Therefore,
this method should only be used to iterate through all file entries in an
index-agnostic manner. */
std::vector<file_entry> &file_names ()
{ return m_file_names; }
/* A const overload of the same. */
const std::vector<file_entry> &file_names () const
{ return m_file_names; }
/* Offset of line number information in .debug_line section. */
sect_offset sect_off {};
/* OFFSET is for struct dwz_file associated with dwarf2_per_objfile. */
unsigned offset_in_dwz : 1; /* Can't initialize bitfields in-class. */
unsigned short version {};
unsigned char minimum_instruction_length {};
unsigned char maximum_ops_per_instruction {};
unsigned char default_is_stmt {};
int line_base {};
unsigned char line_range {};
unsigned char opcode_base {};
/* standard_opcode_lengths[i] is the number of operands for the
standard opcode whose value is i. This means that
standard_opcode_lengths[0] is unused, and the last meaningful
element is standard_opcode_lengths[opcode_base - 1]. */
std::unique_ptr<unsigned char[]> standard_opcode_lengths;
int file_names_size () const
{ return m_file_names.size(); }
/* The start and end of the statement program following this
header. These point into dwarf2_per_objfile->line_buffer. */
const gdb_byte *statement_program_start {}, *statement_program_end {};
/* Return the most "complete" file name for FILE possible.
This means prepending the directory and compilation directory, as needed,
until we get an absolute path. */
std::string file_file_name (const file_entry &fe) const;
/* Return the compilation directory of the compilation unit in the context of
which this line header is read. Return nullptr if non applicable. */
const char *comp_dir () const
{ return m_comp_dir; }
private:
/* The include_directories table. Note these are observing
pointers. The memory is owned by debug_line_buffer. */
std::vector<const char *> m_include_dirs;
/* The file_names table. This is private because the meaning of indexes
differs among DWARF versions (The first valid index is 1 in DWARF 4 and
before, and is 0 in DWARF 5 and later). So the client should use
file_name_at method for access. */
std::vector<file_entry> m_file_names;
/* Compilation directory of the compilation unit in the context of which this
line header is read. nullptr if unknown or not applicable. */
const char *m_comp_dir = nullptr;
};
typedef std::unique_ptr<line_header> line_header_up;
inline const char *
file_entry::include_dir (const line_header *lh) const
{
return lh->include_dir_at (d_index);
}
/* Read the statement program header starting at SECT_OFF in SECTION.
Return line_header. Returns nullptr if there is a problem reading
the header, e.g., if it has a version we don't understand.
NOTE: the strings in the include directory and file name tables of
the returned object point into the dwarf line section buffer,
and must not be freed. */
extern line_header_up dwarf_decode_line_header
(sect_offset sect_off, bool is_dwz, dwarf2_per_objfile *per_objfile,
struct dwarf2_section_info *section, const struct comp_unit_head *cu_header,
const char *comp_dir);
#endif /* DWARF2_LINE_HEADER_H */