This fixes a memory leak in the vanishingly rare cases (found by
fuzzers of course) when something goes wrong in the save_section_vma,
htab_create_alloc or alloc_trie_leaf calls before *pinfo is written.
If *pinfo is not written, _bfd_dwarf2_cleanup_debug_info won't be able
to free that memory.
* dwarf2.c (_bfd_dwarf2_slurp_debug_info): Save stash pointer
on setting up stash.
Another case of fuzzers finding the section size sanity checks are
avoided with SHT_NOBITS sections.
* dwarf2.c (read_section): Check that the DWARF section being
read has contents.
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
I think the test for table->files[file].dir being non-zero is wrong
for DWARF5 where index zero is allowed and is the current directory of
the compilation. Most times this will be covered by the use of
table->comp_dir (from DW_AT_comp_dir) in concat_filename but the point
of putting the current dir in .debug_line was so the section could
stand alone without .debug_info.
Also, there is no need to check for table->dirs non-NULL, the
table->num_dirs test is sufficient.
* dwarf2.c (concat_filename): Correct and simplify tests of
directory index.
The testcase in the PR had a variable with both DW_AT_decl_file and
DW_AT_specification, where the DW_AT_specification also specified
DW_AT_decl_file. This leads to a memory leak as the file name is
malloced and duplicates are not expected.
I've also changed find_abstract_instance to not use a temp for "name",
because that can result in a change in behaviour from the usual last
of duplicate attributes wins.
PR 29925
* dwarf2.c (find_abstract_instance): Delete "name" variable.
Free *filename_ptr before assigning new file name.
(scan_unit_for_symbols): Similarly free func->file and
var->file before assigning.
This patch provides a new function to sanity check section sizes.
It's mostly extracted from what we had in bfd_get_full_section_contents
but also handles compressed debug sections.
Improvements are:
- section file offset is taken into account,
- added checks that a compressed section can be read from file.
The function is then used when handling multiple .debug_* sections
that need to be read into a single buffer, to sanity check sizes
before allocating the buffer.
PR 26946, PR 28834
* Makefile.am (LIBBFD_H_FILES): Add section.c.
* compress.c (bfd_get_full_section_contents): Move section size
sanity checks..
* section.c (_bfd_section_size_insane): ..to here. New function.
* dwarf2.c (read_section): Use _bfd_section_size_insane.
(_bfd_dwarf2_slurp_debug_info): Likewise.
* Makefile.in: Regenerate.
* libbfd.h: Regenerate.
The DWARF standard is clear that DW_AT_linkage_name is optional.
Compilers may not provide the attribute on functions and variables,
even though the language mangles names. g++ does not for local
variables and functions. Without DW_AT_linkage_name, mangled object
file symbols can't be directly matched against the source-level
DW_AT_name in DWARF info. One possibility is demangling the object
file symbols, but that comes with its own set of problems:
1) A demangler might not be available for the compiler/language.
2) Demangling doesn't give the source function name as stored in
DW_AT_name. Class and template parameters must be stripped at
least.
So this patch takes a simpler approach. A symbol matches DWARF info
if the DWARF address matches the symbol address, and if the symbol
name contains the DWARF name as a sub-string. Very likely the name
matching is entirely superfluous.
PR 29573
* dwarf.c (lookup_symbol_in_function_table): Match a symbol
containing the DWARF source name as a substring.
(lookup_symbol_in_variable_table): Likewise.
(_bfd_dwarf2_find_nearest_line_with_alt): If stash_find_line_fast
returns false, fall back to comp_unit_find_line.
non_mangled incorrectly returned "true" for Ada. Correct that, and
add a few more non-mangled entries. Return a value suitable for
passing to cplus_demangle to control demangling.
* dwarf2.c: Include demangle.h.
(mangle_style): Rename from non_mangled. Return DMGL_* value
to suit lang. Adjust all callers.
The "sec" field in these structures is only set and used in lookup
functions. It always starts off as NULL. So the only possible effect
of the field is to modify the return of the lookup, which was its
purpose back in 2005 when HJ fixed PR990. Since then we solved the
problem of relocatable object files with the fix for PR2338, so this
field is now redundant.
* dwarf.c (struct funcinfo, struct varinfo): Remove "sec" field.
(lookup_symbol_in_function_table): Don't set or test "sec".
(lookup_symbol_in_variable_table): Likewise.
(info_hash_lookup_funcinfo, info_hash_lookup_varinfo): Likewise.
bfd_find_nearest_line_with_alt functions like bfd_find_nearest_line with
the addition of a parameter for specifying the filename of a supplementary
debug file such as one referenced by .gnu_debugaltlink or .debug_sup.
This patch focuses on implementing bfd_find_nearest_line_with_alt
support for ELF/DWARF2 .gnu_debugaltlink. For other targets this
function simply sets the invalid_operation bfd_error.
PR 29529
* dwarf2.c (struct line_info_table): Add new field:
use_dir_and_file_0.
(concat_filename): Use new field to help select the correct table
slot.
(read_formatted_entries): Do not skip entry 0.
(decode_line_info): Set new field depending upon the version of
DWARF being parsed. Initialise filename based upon the setting of
the new field.
While using perf top for MozillaThunderbird I noticed quite some slow
dissably call with source code involved. E.g.
time ./objdump --start-address=0x0000000004e0dcd0 --stop-address=0x0000000004e0df8b -l -d --no-show-raw-insn -S -C /usr/lib64/thunderbird/libxul.so
took 2.071s and I noticed quite some time is spent in
find_abstract_instance:
33.46% objdump objdump [.] find_abstract_instance
18.22% objdump objdump [.] arange_add
13.77% objdump objdump [.] read_attribute_value
4.82% objdump objdump [.] comp_unit_maybe_decode_line_info
3.10% objdump libc.so.6 [.] __memset_avx2_unaligned_erms
where linked list of CU is iterated when searing for where info_ptr
belongs to:
: 3452 for (u = unit->prev_unit; u != NULL; u = u->prev_unit)
0.00 : 4c61f7: mov 0x10(%rbx),%rax
0.00 : 4c61fb: test %rax,%rax
0.00 : 4c61fe: je 4c6215 <find_abstract_instance+0x365>
: 3453 if (info_ptr >= u->info_ptr_unit && info_ptr < u->end_ptr)
0.00 : 4c6200: cmp 0x60(%rax),%rdx
83.20 : 4c6204: jb 4c620c <find_abstract_instance+0x35c>
0.00 : 4c6206: cmp 0x78(%rax),%rdx
6.89 : 4c620a: jb 4c6270 <find_abstract_instance+0x3c0>
: 3452 for (u = unit->prev_unit; u != NULL; u = u->prev_unit)
0.00 : 4c620c: mov 0x10(%rax),%rax
7.90 : 4c6210: test %rax,%rax
0.00 : 4c6213: jne 4c6200 <find_abstract_instance+0x350>
The following scan can be replaced with search in a splay tree and with
that I can get to 1.5s and there are other symbols where the difference
is even bigger.
bfd/ChangeLog:
PR 29081
* dwarf2.c (struct addr_range): New.
(addr_range_intersects): Likewise.
(splay_tree_compare_addr_range): Likewise.
(splay_tree_free_addr_range): Likewise.
(struct dwarf2_debug_file): Add comp_unit_tree.
(find_abstract_instance): Use the splay tree when searching
for a info_ptr.
(stash_comp_unit): Insert to the splay tree.
(_bfd_dwarf2_cleanup_debug_info): Clean up the splay tree.
The PR23230 testcase uses indexed strings without specifying
SW_AT_str_offsets_base. In this case we left u.str with garbage (from
u.val) which then led to a segfault when attempting to access the
string. Fix that by clearing u.str. The patch also adds missing
sanity checks in the recently committed read_indexed_address and
read_indexed_string functions.
PR 29230
* dwarf2.c (read_indexed_address): Return uint64_t. Sanity check idx.
(read_indexed_string): Use uint64_t for str_offset. Sanity check idx.
(read_attribute_value): Clear u.str for indexed string forms when
DW_AT_str_offsets_base is not yet read or missing.
Since commit b43771b045 it has been possible to look up addresses
that match a unit with errors, since ranges are added to a trie while
the unit is being parsed. On error, parse_comp_unit leaves
first_child_die_ptr NULL which results in a NULL info_ptr being passed
to scan_unit_for_symbols. Fix this by setting unit->error.
Also wrap some overlong lines, and fix some formatting errors.
* dwarf2.c: Formatting.
(parse_comp_unit): Set unit->error on err_exit path.
Requiring C99 means that uses of bfd_uint64_t can be replaced with
uint64_t, and similarly for bfd_int64_t, BFD_HOST_U_64_BIT, and
BFD_HOST_64_BIT. This patch does that, removes #ifdef BFD_HOST_*
and tidies a few places that print 64-bit values.
When using perf to profile large binaries, _bfd_dwarf2_find_nearest_line()
becomes a hotspot, as perf wants to get line number information
(for inline-detection purposes) for each and every sample. In Chromium
in particular (the content_shell binary), this entails going through
475k address ranges, which takes a long time when done repeatedly.
Add a radix-256 trie over the address space to quickly map address to
compilation unit spaces; for content_shell, which is 1.6 GB when some
(but not full) debug information turned is on, we go from 6 ms to
0.006 ms (6 µs) for each lookup from address to compilation unit, a 1000x
speedup.
There is a modest RAM increase of 180 MB in this binary (the existing
linked list over ranges uses about 10 MB, and the entire perf job uses
between 2–3 GB for a medium-size profile); for smaller binaries with few
ranges, there should be hardly any extra RAM usage at all.
PR 28592
PR 15994
PR 15935
* dwarf2.c (lookup_address_in_line_info_table): Return bool rather
than a range.
(comp_unit_find_nearest_line): Likewise. Return true if function
info found without line info.
(_bfd_dwarf2_find_nearest_line): Revert range handling code.
Include the language identifier emitted by gas in the set of ones where
no mangled names are expected. Even if there could be "hand-mangled"
names, gas doesn't emit DW_AT_linkage_name in the first place.
Prior to entering the enclosing "else if()" the earlier associated if()
checks function->is_linkage and, if set, uses function->name. The
comment in patch context precedes (and explains) the setting
function->is_linkage. Yet with the flag set, we should then also return
the function name, just like said earlier if() would do when we came
here a 2nd time for the same "addr". And indeed passing the same address
twice on addr2line's command line would resolve the function for the 2nd
instance, but not for the 1st (if this code path is taken). (This,
obviously, is particularly relevant when there's no ELF symbol table in
the first place, like would be the case - naturally - in PE/COFF
binaries, for example.)
PR 28978
* dwarf2.c (scan_unit_for_symbols): When performing second pass,
check to see if the function or variable being processed is the
same as the previous one.
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.
The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
PR28691 is a fuzzing PR that triggers a non-problem of "output changes
per run" with PIEs and/or different compilers. I've closed similar
PRs before as wontfix, but I guess there will be no end of this type
of PR. The trigger is an attribute that usually takes one of the
offset/constant reference DW_FORMs being given an indexed string
DW_FORM. The bfd reader doesn't support indexed strings and returns
an error string instead. The address of the string varies with PIE
runs and/or compiler, and we allow that address to appear in output.
Fix this by validating integer attribute forms, as we do for string
form attributes.
PR 28691
* dwarf2.c (is_str_attr): Rename to..
(is_str_form): ..this. Change param type. Update calls.
(is_int_form): New function.
(read_attribute_value): Handle DW_FORM_addrx2.
(find_abstract_instance): Validate form when using attr.u.val.
(scan_unit_for_symbols, parse_comp_unit): Likewise.
Not returning an error indication here leaves the attribute
uninitialised, which then leads to intemperate behaviour.
PR 28674
* dwarf2.c (read_attribute_value): Return NULL on trying to read
past end of attributes.
Pointer range checking is UB if the values compared are outside the
underlying array elements (plus one).
* dwarf2.c (read_address): Remove accidental commit.
(read_ranges): Compare offset rather than pointers.
Older gcc reports:
.../bfd/dwarf2.c: In function 'read_ranges':
.../bfd/dwarf2.c:3107: error: comparison between signed and unsigned
.../bfd/dwarf2.c: In function 'read_rnglists':
.../bfd/dwarf2.c:3189: error: comparison between signed and unsigned
Similarly for binutils/dwarf.c. Arrange for the left sides of the > to
also be unsigned quantities.
This patch is aimed at the many places in dwarf2.c that blindly
increment a data pointer after calling functions that are meant to
read a fixed number of bytes. The problem with that is with damaged
dwarf we might increment a data pointer past the end of data, which is
UB and complicates (ie. bugs likely) any further use of that data
pointer. To fix those problems, I've moved incrementing of the data
pointer into the functions that do the reads. _bfd_safe_read_leb128
gets the same treatment for consistency.
* libbfd.c (_bfd_safe_read_leb128): Remove length_return parameter.
Replace data pointer with pointer to pointer. Increment pointer
over bytes read.
* libbfd-in.h (_bfd_safe_read_leb128): Update prototype.
* elf-attrs.c (_bfd_elf_parse_attributes): Adjust to suit. Be
careful not to increment data pointer past end. Remove now
redundant pr17512 check.
* wasm-module.c (READ_LEB128): Adjust to suit changes to
_bfd_safe_read_leb128.
* dwarf2.c (read_n_bytes): New inline function, old one renamed to..
(read_blk): ..this. Allocate and return block. Increment bfd_byte**
arg.
(read_3_bytes): New function.
(read_1_byte, read_1_signed_byte, read_2_bytes, read_4_bytes),
(read_8_bytes, read_string, read_indirect_string),
(read_indirect_line_string, read_alt_indirect_string): Take a
byte_byte** arg which is incremented over bytes read. Remove any
bytes_read return. Rewrite limit checks to compare lengths
rather than pointers.
(read_abbrevs, read_attribute_value, read_formatted_entries),
(decode_line_info, find_abstract_instance, read_ranges),
(read_rnglists, scan_unit_for_symbols, parse_comp_unit),
(stash_comp_unit): Adjust to suit. Rewrite limit checks to
compare lengths rather than pointers.
* libbfd.h: Regenerate.
I built GDB with `--enable-targets=all`, then started GDB passing it
an x86-64 executable, finally I ran 'maint selftest', and observed GDB
crash like this:
BFD: BFD (GNU Binutils) 2.36.50.20210519 assertion fail ../../src/bfd/hash.c:438
Aborted (core dumped)
The problem originates from two locations, for example in csky-dis.c
(csky_get_disassembler) where we do this:
const char *sec_name = NULL;
...
sec_name = get_elf_backend_data (abfd)->obj_attrs_section;
if (bfd_get_section_by_name (abfd, sec_name) != NULL)
...
We end up in here because during the selftests GDB forces the
architecture to be csky, but the BFD being accessed is still of type
x86-64. As a result obj_attrs_section returns NULL, which means we
end up passing NULL to bfd_get_section_by_name. If we follow the
function calls from bfd_get_section_by_name we eventually end up in
bfd_hash_hash, which asserts that the string (i.e. the name) is not
NULL.
The same crash can be reproduced in GDB without using the selftests,
for example:
(gdb) file x86_64.elf
(gdb) start
(gdb) set architecture csky
(gdb) disassemble main
Dump of assembler code for function main:
BFD: BFD (GNU Binutils) 2.36.50.20210519 assertion fail ../../src/bfd/hash.c:438
Aborted (core dumped)
The fix I propose here is to have bfd_get_section_by_name return NULL
if name is ever NULL. For consistency I updated
bfd_get_section_by_name_if in the same way, even though I'm not
hitting any problems along that code path right now.
I looked through the source tree and removed two NULL checks in
bfd/dwarf2.c which are no longer needed, its possible that there are
additional NULL checks that could be removed, I just didn't find them.
bfd/ChangeLog:
* section.c (bfd_get_section_by_name): Return NULL if name is
NULL.
(bfd_get_section_by_name_if): Likewise.
* dwarf2.c (read_section): Remove unneeded NULL check.
(find_debug_info): Likewise.
The DWARF5 spec does indeed explicitly say: "A bounded range entry whose
beginning and ending address offsets are equal (including zero) indicates
an empty range and may be ignored."
Since arange_add already ignores empty ranges, remove the whole check
which is equivalent to the check plus explicit continue.
PR binutils/27231
* dwarf2.c (read_rnglists): Ignore empty range when parsing line
number tables.
Advance rngs_ptr when parsing DW_RLE_offset_pair, which was missing in
commit c3757b583d
Author: Mark Wielaard <mark@klomp.org>
Date: Tue Aug 25 15:33:00 2020 +0100
Fix the linker's handling of DWARF-5 line number tables.
PR binutils/27231
* dwarf2.c (read_rnglists): Advance rngs_ptr after
_bfd_safe_read_leb128 when parsing DW_RLE_offset_pair.
When building with gcc with -gdwarf-5 ld tests (including ld-elf/dwarf.exp)
fail because they try to read the .debug_ranges section. But DWARF5
introduces a new .debug_rnglists section that encodes the address ranges
more efficiently. Implement reading the debug_rnglists in bfd/dwarf2.c.
Which makes all tests pass again and fixes several gcc testsuite tests
when defaulting to DWARF5.
* dwarf2.c (struct dwarf2_debug_file): Add dwarf_rnglists_buffer
and dwarf_rnglists_size fields.
(dwarf_debug_sections): Add debug_rnglists.
(dwarf_debug_section_enum): Likewise.
(read_debug_rnglists): New function.
(read_rangelist): New function to call either read_ranges or
read_rnglists. Rename original function to...
(read_ranges): ...this.
(read_rnglists): New function.