The DWARF standard is clear that DW_AT_linkage_name is optional.
Compilers may not provide the attribute on functions and variables,
even though the language mangles names. g++ does not for local
variables and functions. Without DW_AT_linkage_name, mangled object
file symbols can't be directly matched against the source-level
DW_AT_name in DWARF info. One possibility is demangling the object
file symbols, but that comes with its own set of problems:
1) A demangler might not be available for the compiler/language.
2) Demangling doesn't give the source function name as stored in
DW_AT_name. Class and template parameters must be stripped at
least.
So this patch takes a simpler approach. A symbol matches DWARF info
if the DWARF address matches the symbol address, and if the symbol
name contains the DWARF name as a sub-string. Very likely the name
matching is entirely superfluous.
PR 29573
* dwarf.c (lookup_symbol_in_function_table): Match a symbol
containing the DWARF source name as a substring.
(lookup_symbol_in_variable_table): Likewise.
(_bfd_dwarf2_find_nearest_line_with_alt): If stash_find_line_fast
returns false, fall back to comp_unit_find_line.
non_mangled incorrectly returned "true" for Ada. Correct that, and
add a few more non-mangled entries. Return a value suitable for
passing to cplus_demangle to control demangling.
* dwarf2.c: Include demangle.h.
(mangle_style): Rename from non_mangled. Return DMGL_* value
to suit lang. Adjust all callers.
The "sec" field in these structures is only set and used in lookup
functions. It always starts off as NULL. So the only possible effect
of the field is to modify the return of the lookup, which was its
purpose back in 2005 when HJ fixed PR990. Since then we solved the
problem of relocatable object files with the fix for PR2338, so this
field is now redundant.
* dwarf.c (struct funcinfo, struct varinfo): Remove "sec" field.
(lookup_symbol_in_function_table): Don't set or test "sec".
(lookup_symbol_in_variable_table): Likewise.
(info_hash_lookup_funcinfo, info_hash_lookup_varinfo): Likewise.
bfd_find_nearest_line_with_alt functions like bfd_find_nearest_line with
the addition of a parameter for specifying the filename of a supplementary
debug file such as one referenced by .gnu_debugaltlink or .debug_sup.
This patch focuses on implementing bfd_find_nearest_line_with_alt
support for ELF/DWARF2 .gnu_debugaltlink. For other targets this
function simply sets the invalid_operation bfd_error.
PR 29529
* dwarf2.c (struct line_info_table): Add new field:
use_dir_and_file_0.
(concat_filename): Use new field to help select the correct table
slot.
(read_formatted_entries): Do not skip entry 0.
(decode_line_info): Set new field depending upon the version of
DWARF being parsed. Initialise filename based upon the setting of
the new field.
While using perf top for MozillaThunderbird I noticed quite some slow
dissably call with source code involved. E.g.
time ./objdump --start-address=0x0000000004e0dcd0 --stop-address=0x0000000004e0df8b -l -d --no-show-raw-insn -S -C /usr/lib64/thunderbird/libxul.so
took 2.071s and I noticed quite some time is spent in
find_abstract_instance:
33.46% objdump objdump [.] find_abstract_instance
18.22% objdump objdump [.] arange_add
13.77% objdump objdump [.] read_attribute_value
4.82% objdump objdump [.] comp_unit_maybe_decode_line_info
3.10% objdump libc.so.6 [.] __memset_avx2_unaligned_erms
where linked list of CU is iterated when searing for where info_ptr
belongs to:
: 3452 for (u = unit->prev_unit; u != NULL; u = u->prev_unit)
0.00 : 4c61f7: mov 0x10(%rbx),%rax
0.00 : 4c61fb: test %rax,%rax
0.00 : 4c61fe: je 4c6215 <find_abstract_instance+0x365>
: 3453 if (info_ptr >= u->info_ptr_unit && info_ptr < u->end_ptr)
0.00 : 4c6200: cmp 0x60(%rax),%rdx
83.20 : 4c6204: jb 4c620c <find_abstract_instance+0x35c>
0.00 : 4c6206: cmp 0x78(%rax),%rdx
6.89 : 4c620a: jb 4c6270 <find_abstract_instance+0x3c0>
: 3452 for (u = unit->prev_unit; u != NULL; u = u->prev_unit)
0.00 : 4c620c: mov 0x10(%rax),%rax
7.90 : 4c6210: test %rax,%rax
0.00 : 4c6213: jne 4c6200 <find_abstract_instance+0x350>
The following scan can be replaced with search in a splay tree and with
that I can get to 1.5s and there are other symbols where the difference
is even bigger.
bfd/ChangeLog:
PR 29081
* dwarf2.c (struct addr_range): New.
(addr_range_intersects): Likewise.
(splay_tree_compare_addr_range): Likewise.
(splay_tree_free_addr_range): Likewise.
(struct dwarf2_debug_file): Add comp_unit_tree.
(find_abstract_instance): Use the splay tree when searching
for a info_ptr.
(stash_comp_unit): Insert to the splay tree.
(_bfd_dwarf2_cleanup_debug_info): Clean up the splay tree.
The PR23230 testcase uses indexed strings without specifying
SW_AT_str_offsets_base. In this case we left u.str with garbage (from
u.val) which then led to a segfault when attempting to access the
string. Fix that by clearing u.str. The patch also adds missing
sanity checks in the recently committed read_indexed_address and
read_indexed_string functions.
PR 29230
* dwarf2.c (read_indexed_address): Return uint64_t. Sanity check idx.
(read_indexed_string): Use uint64_t for str_offset. Sanity check idx.
(read_attribute_value): Clear u.str for indexed string forms when
DW_AT_str_offsets_base is not yet read or missing.
Since commit b43771b045 it has been possible to look up addresses
that match a unit with errors, since ranges are added to a trie while
the unit is being parsed. On error, parse_comp_unit leaves
first_child_die_ptr NULL which results in a NULL info_ptr being passed
to scan_unit_for_symbols. Fix this by setting unit->error.
Also wrap some overlong lines, and fix some formatting errors.
* dwarf2.c: Formatting.
(parse_comp_unit): Set unit->error on err_exit path.
Requiring C99 means that uses of bfd_uint64_t can be replaced with
uint64_t, and similarly for bfd_int64_t, BFD_HOST_U_64_BIT, and
BFD_HOST_64_BIT. This patch does that, removes #ifdef BFD_HOST_*
and tidies a few places that print 64-bit values.
When using perf to profile large binaries, _bfd_dwarf2_find_nearest_line()
becomes a hotspot, as perf wants to get line number information
(for inline-detection purposes) for each and every sample. In Chromium
in particular (the content_shell binary), this entails going through
475k address ranges, which takes a long time when done repeatedly.
Add a radix-256 trie over the address space to quickly map address to
compilation unit spaces; for content_shell, which is 1.6 GB when some
(but not full) debug information turned is on, we go from 6 ms to
0.006 ms (6 µs) for each lookup from address to compilation unit, a 1000x
speedup.
There is a modest RAM increase of 180 MB in this binary (the existing
linked list over ranges uses about 10 MB, and the entire perf job uses
between 2–3 GB for a medium-size profile); for smaller binaries with few
ranges, there should be hardly any extra RAM usage at all.
PR 28592
PR 15994
PR 15935
* dwarf2.c (lookup_address_in_line_info_table): Return bool rather
than a range.
(comp_unit_find_nearest_line): Likewise. Return true if function
info found without line info.
(_bfd_dwarf2_find_nearest_line): Revert range handling code.
Include the language identifier emitted by gas in the set of ones where
no mangled names are expected. Even if there could be "hand-mangled"
names, gas doesn't emit DW_AT_linkage_name in the first place.
Prior to entering the enclosing "else if()" the earlier associated if()
checks function->is_linkage and, if set, uses function->name. The
comment in patch context precedes (and explains) the setting
function->is_linkage. Yet with the flag set, we should then also return
the function name, just like said earlier if() would do when we came
here a 2nd time for the same "addr". And indeed passing the same address
twice on addr2line's command line would resolve the function for the 2nd
instance, but not for the 1st (if this code path is taken). (This,
obviously, is particularly relevant when there's no ELF symbol table in
the first place, like would be the case - naturally - in PE/COFF
binaries, for example.)
PR 28978
* dwarf2.c (scan_unit_for_symbols): When performing second pass,
check to see if the function or variable being processed is the
same as the previous one.
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.
The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
PR28691 is a fuzzing PR that triggers a non-problem of "output changes
per run" with PIEs and/or different compilers. I've closed similar
PRs before as wontfix, but I guess there will be no end of this type
of PR. The trigger is an attribute that usually takes one of the
offset/constant reference DW_FORMs being given an indexed string
DW_FORM. The bfd reader doesn't support indexed strings and returns
an error string instead. The address of the string varies with PIE
runs and/or compiler, and we allow that address to appear in output.
Fix this by validating integer attribute forms, as we do for string
form attributes.
PR 28691
* dwarf2.c (is_str_attr): Rename to..
(is_str_form): ..this. Change param type. Update calls.
(is_int_form): New function.
(read_attribute_value): Handle DW_FORM_addrx2.
(find_abstract_instance): Validate form when using attr.u.val.
(scan_unit_for_symbols, parse_comp_unit): Likewise.
Not returning an error indication here leaves the attribute
uninitialised, which then leads to intemperate behaviour.
PR 28674
* dwarf2.c (read_attribute_value): Return NULL on trying to read
past end of attributes.
Pointer range checking is UB if the values compared are outside the
underlying array elements (plus one).
* dwarf2.c (read_address): Remove accidental commit.
(read_ranges): Compare offset rather than pointers.
Older gcc reports:
.../bfd/dwarf2.c: In function 'read_ranges':
.../bfd/dwarf2.c:3107: error: comparison between signed and unsigned
.../bfd/dwarf2.c: In function 'read_rnglists':
.../bfd/dwarf2.c:3189: error: comparison between signed and unsigned
Similarly for binutils/dwarf.c. Arrange for the left sides of the > to
also be unsigned quantities.
This patch is aimed at the many places in dwarf2.c that blindly
increment a data pointer after calling functions that are meant to
read a fixed number of bytes. The problem with that is with damaged
dwarf we might increment a data pointer past the end of data, which is
UB and complicates (ie. bugs likely) any further use of that data
pointer. To fix those problems, I've moved incrementing of the data
pointer into the functions that do the reads. _bfd_safe_read_leb128
gets the same treatment for consistency.
* libbfd.c (_bfd_safe_read_leb128): Remove length_return parameter.
Replace data pointer with pointer to pointer. Increment pointer
over bytes read.
* libbfd-in.h (_bfd_safe_read_leb128): Update prototype.
* elf-attrs.c (_bfd_elf_parse_attributes): Adjust to suit. Be
careful not to increment data pointer past end. Remove now
redundant pr17512 check.
* wasm-module.c (READ_LEB128): Adjust to suit changes to
_bfd_safe_read_leb128.
* dwarf2.c (read_n_bytes): New inline function, old one renamed to..
(read_blk): ..this. Allocate and return block. Increment bfd_byte**
arg.
(read_3_bytes): New function.
(read_1_byte, read_1_signed_byte, read_2_bytes, read_4_bytes),
(read_8_bytes, read_string, read_indirect_string),
(read_indirect_line_string, read_alt_indirect_string): Take a
byte_byte** arg which is incremented over bytes read. Remove any
bytes_read return. Rewrite limit checks to compare lengths
rather than pointers.
(read_abbrevs, read_attribute_value, read_formatted_entries),
(decode_line_info, find_abstract_instance, read_ranges),
(read_rnglists, scan_unit_for_symbols, parse_comp_unit),
(stash_comp_unit): Adjust to suit. Rewrite limit checks to
compare lengths rather than pointers.
* libbfd.h: Regenerate.
I built GDB with `--enable-targets=all`, then started GDB passing it
an x86-64 executable, finally I ran 'maint selftest', and observed GDB
crash like this:
BFD: BFD (GNU Binutils) 2.36.50.20210519 assertion fail ../../src/bfd/hash.c:438
Aborted (core dumped)
The problem originates from two locations, for example in csky-dis.c
(csky_get_disassembler) where we do this:
const char *sec_name = NULL;
...
sec_name = get_elf_backend_data (abfd)->obj_attrs_section;
if (bfd_get_section_by_name (abfd, sec_name) != NULL)
...
We end up in here because during the selftests GDB forces the
architecture to be csky, but the BFD being accessed is still of type
x86-64. As a result obj_attrs_section returns NULL, which means we
end up passing NULL to bfd_get_section_by_name. If we follow the
function calls from bfd_get_section_by_name we eventually end up in
bfd_hash_hash, which asserts that the string (i.e. the name) is not
NULL.
The same crash can be reproduced in GDB without using the selftests,
for example:
(gdb) file x86_64.elf
(gdb) start
(gdb) set architecture csky
(gdb) disassemble main
Dump of assembler code for function main:
BFD: BFD (GNU Binutils) 2.36.50.20210519 assertion fail ../../src/bfd/hash.c:438
Aborted (core dumped)
The fix I propose here is to have bfd_get_section_by_name return NULL
if name is ever NULL. For consistency I updated
bfd_get_section_by_name_if in the same way, even though I'm not
hitting any problems along that code path right now.
I looked through the source tree and removed two NULL checks in
bfd/dwarf2.c which are no longer needed, its possible that there are
additional NULL checks that could be removed, I just didn't find them.
bfd/ChangeLog:
* section.c (bfd_get_section_by_name): Return NULL if name is
NULL.
(bfd_get_section_by_name_if): Likewise.
* dwarf2.c (read_section): Remove unneeded NULL check.
(find_debug_info): Likewise.
The DWARF5 spec does indeed explicitly say: "A bounded range entry whose
beginning and ending address offsets are equal (including zero) indicates
an empty range and may be ignored."
Since arange_add already ignores empty ranges, remove the whole check
which is equivalent to the check plus explicit continue.
PR binutils/27231
* dwarf2.c (read_rnglists): Ignore empty range when parsing line
number tables.
Advance rngs_ptr when parsing DW_RLE_offset_pair, which was missing in
commit c3757b583d
Author: Mark Wielaard <mark@klomp.org>
Date: Tue Aug 25 15:33:00 2020 +0100
Fix the linker's handling of DWARF-5 line number tables.
PR binutils/27231
* dwarf2.c (read_rnglists): Advance rngs_ptr after
_bfd_safe_read_leb128 when parsing DW_RLE_offset_pair.
When building with gcc with -gdwarf-5 ld tests (including ld-elf/dwarf.exp)
fail because they try to read the .debug_ranges section. But DWARF5
introduces a new .debug_rnglists section that encodes the address ranges
more efficiently. Implement reading the debug_rnglists in bfd/dwarf2.c.
Which makes all tests pass again and fixes several gcc testsuite tests
when defaulting to DWARF5.
* dwarf2.c (struct dwarf2_debug_file): Add dwarf_rnglists_buffer
and dwarf_rnglists_size fields.
(dwarf_debug_sections): Add debug_rnglists.
(dwarf_debug_section_enum): Likewise.
(read_debug_rnglists): New function.
(read_rangelist): New function to call either read_ranges or
read_rnglists. Rename original function to...
(read_ranges): ...this.
(read_rnglists): New function.
PR 25676
bfd * dwarf2.c (struct varinfo): Add unit_offset field to record the
location of the varinfo in the unit's debug info data. Change the
type of the stack field to a boolean.
(lookup_var_by_offset): New function. Returns the varinfo
structure for the variable described at the given offset in the
unit's debug info.
(scan_unit_for_symbols): Add support for variables which have the
DW_AT_specification attribute.
binutils* testsuite/binutils-all/dw4.s: New test source file.
* testsuite/binutils-all/nm.exp: Run the new test.
This patch remedies the following DW_FORM_GNU_ref_alt related problem:
/* FIXME: Do we need to locate the correct CU, in a similar
fashion to the code in the DW_FORM_ref_addr case above ? */
Without the correct CU the wrong abbrevs are used, resulting in
errors and/or wrong file names.
There is scope for further work here. Parsing of CUs should be a two
step process, with the first stage just finding the bounds of the CU.
This would allow find_abstract_instance to quickly find the CU
referenced by DW_FORM_ref_addr or DW_FORM_GNU_ref_alt, then take the
second stage of CU parsing where abbrevs, ranges and suchlike consume
time and memory. As it is, we just process CUs from the start of
.debug_info until we find the one of interest. The testcase in the PR
takes 98G of virtual memory.
PR 25230
* dwarf2.c (struct dwarf2_debug_file): Add line_table and
abbrev_offsets.
(struct abbrev_offset_entry): New.
(hash_abbrev, eq_abbrev, del_abbrev): New functions.
(read_abbrevs): Check whether we have already read abbrevs at
given offset, and add new offset/abbrev to hash table.
(decode_line_info): Keep line table at offset zero in file struct.
Return this for a cu reusing the same dir/file list.
(find_abstract_instance): Find cu for DW_FORM_GNU_ref_alt.
(_bfd_dwarf2_slurp_debug_info): Create offset/abbrev hash tables.
(_bfd_dwarf2_cleanup_debug_info): Adjust deletion of lines and
abbrevs.