Commit Graph

24 Commits

Author SHA1 Message Date
Nick Alcock
483546ce4f libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.

It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.

Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.

... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t.  This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly.  This simplifies most of its callers
significantly.

(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)

This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.

libctf/

	* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
	(ctf_type_sect_size): Add static type sizes too.
	(ctf_serialize): Return the new dict rather than updating the
	existing dict.  No longer fail for dicts with static types;
	copy them onto the start of the new types table.
	(ctf_gzwrite): Actually serialize before gzwriting.
	(ctf_write_mem): Improve forced (test-mode) endian-flipping:
	flip dicts even if they are too small to be compressed.
	Improve confusing variable naming.
	* ctf-archive.c (arc_write_one_ctf): Don't bother to call
	ctf_serialize: both the functions we call do so.
	* ctf-string.c (ctf_str_create_atoms): Drop serializing case
	(atoms arg).
	* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
	longer bother with syn_ext_strtab or forced atoms table,
	serialization no longer needs them.
	* ctf-create.c (ctf_create): Call ctf_bufopen directly.
	* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Likewise.
	(ctf_serialize): Adjust.
	* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
	this is supposed to work.
2024-04-19 16:14:47 +01:00
Nick Alcock
cf9da3b0b6 libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.

There are three intertwined changes here:

 - pull the contents of strtabs from newly ctf_bufopened dicts into the
   atoms table, so that future additions will reuse the existing offset etc
   rather than adding new identical strings
 - allow the internal ctf_bufopen done by serialization to contribute its
   existing atoms table, so that existing atoms can be used for the
   remainder of the open process (like name table construction): this atoms
   table currente gets thrown away in the mass reassignment done later in
   ctf_serialize in any case, but it needs to be there during the open.
 - rewrite ctf_str_write_strtab so that a) it uses iterators rather than
   ctf_*_iter, reducing pointless structures which serve no other purpose
   than to implement ordinary variable scope, but more clunkily, and b)
   retains the existing strtab on the front of the new one, with its sort
   retained, rather than resorting, so all existing already-written strtab
   offsets remain valid across the call.

This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.

(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize().  We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them.  This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)

libctf/

	* ctf-create.c (ctf_create): Add (temporary) atoms arg.
	* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
	(ctf_str_create_atoms): Adjust.
	(ctf_str_write_strtab): Likewise.
	(ctf_simple_open_internal): Likewise.
	* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Initialize just enough of an
	atoms table: pre-init from the atoms arg if supplied.
	(ctf_simple_open): Adjust.
	* ctf-serialize.c (ctf_serialize): Constify the strtab.
	Move ref list purging into ctf_str_write_strtab.
	Initialize the new dict with the old dict's atoms table.
	Accept the new strtab from ctf_str_write_strtab.
	Adjust for addition of ctf_dynstrtab.
	* ctf-string.c (ctf_strraw_explicit): Improve comments.
	(ctf_str_create_atoms): Prepopulate from an existing atoms table,
	or alternatively pull in all strings from the strtab and turn
	them into atoms.
	(ctf_str_free_atoms): Free the dynstrtab and its strtab.
	(struct ctf_strtab_write_state): Remove.
	(ctf_str_count_strtab): Fold this...
	(ctf_str_populate_sorttab): ... and this...
	(ctf_str_write_strtab): ... into this.  Prepend existing strings
	to the strtab rather than resorting them (and wrecking their
	offsets).  Keep the dynstrtab updated.  Update refs for all
	atoms with refs, whether or not they are strings newly added
	to the strtab.
2024-04-19 16:14:47 +01:00
Nick Alcock
149ce5c263 libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.

This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again.  We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.

Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).

libctf/

	* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
        a movable ref.
	(ctf_add_member_offset): Likewise.
	* ctf-util.c (ctf_realloc): Delete.
	* ctf-serialize.c (ctf_serialize): No longer use it.  Adjust to
	new fields.
	* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
	(ctf_str_free_atom): Free freeable atoms' strings.
	(ctf_str_create_atoms): Create the movable refs dynhash if needed.
	(ctf_str_free_atoms): Destroy it.
	(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
	reversion).  Add new flag.
	(aref_create):  New, populate movable refs if need be.
	(ctf_str_add_ref_internal): Switch back to flags, update refs
	directly for nonprovisional strings (with already-known fixed offsets);
	create refs via aref_create.  Allocate strings only if not within an
	mmapped strtab.
	(ctf_str_add_movable_ref): New.
	(ctf_str_add): Adjust to CTF_STR_* reintroduction.
	(ctf_str_add_external): LIkewise.
	(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
	backpointer.
	(ctf_str_purge_refs): Drop ctf_str_num_refs.
	(ctf_str_update_refs): Fix indentation.
	* ctf-impl.h (struct ctf_str_atom_movable): New.
	(struct ctf_dict.ctf_str_num_refs): Drop.
	(struct ctf_dict.ctf_str_movable_refs): New.
	(ctf_str_add_movable_ref): Declare.
	(ctf_str_move_refs): Likewise.
	(ctf_realloc): Drop.
2024-04-19 16:14:46 +01:00
Nick Alcock
3301ddba1b Revert "libctf: do not corrupt strings across ctf_serialize"
This reverts commit 986e9e3aa0.

(We do not revert the testcase -- it remains valid -- but we are
taking a different, less complex and more robust approach.)

This also deletes the pending refs abstraction without (yet)
replacing it, so some tests will fail for a commit or two.
2024-04-19 16:14:46 +01:00
Nick Alcock
4fa4e3d92a libctf: delete LCTF_DIRTY
This flag was meant as an optimization to avoid reserializing dicts
unnecessarily.  It was critically necessary back when serialization was
done by ctf_update() and you had to call that every time you wanted any
new modifications to the type table to be usable by other types, but
that has been unnecessary for years now, and serialization is only done
once when writing out, which one would naturally assume would always
serialize the dict.  Worse, it never really worked: it only tracked
newly-added types, not things like added symbols which might equally
well require reserialization, and it gets in the way of an upcoming
change.  Delete entirely.

libctf/

	* ctf-create.c (ctf_create): Drop LCTF_DIRTY.
	(ctf_discard): Likewise.
	(ctf_rollback): Likewise.
	(ctf_add_generic): Likewise.
	(ctf_set_array): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable_forced): Likewise.
	* ctf-link.c (ctf_link_intern_extern_string): Likewise.
	(ctf_link_add_strtab): Likewise.
	* ctf-serialize.c (ctf_serialize): Likewise.
	* ctf-impl.h (LCTF_DIRTY): Likewise.
	(LCTF_LINKING): Renumber.
2024-04-19 16:14:46 +01:00
Nick Alcock
8a60c93096 libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.

But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.

So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them.  (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)

This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account.  Some of these irregularities were hard to define as
anything but bugs.

Notably:

 - The symbol handling was assuming that symbols only needed to be
   looked for in dynamic hashtabs or static linker-laid-out indexed/
   nonindexed layouts, but now we want to check both in case people
   added more symbols to a dict they opened.

 - The code that handles type additions wasn't checking to see if types
   with the same name existed *at all* (so you could do
   ctf_add_typedef (fp, "foo", bar) repeatedly without error).  This
   seems reasonable for types you just added, but we probably *do* want
   to ban addition of types with names that override names we already
   used in the ctf_open()ed portion, since that would probably corrupt
   existing type relationships.  (Doing things this way also avoids
   causing new errors for any existing code that was doing this sort of
   thing.)

 - ctf_lookup_variable entirely failed to work for variables just added
   by ctf_add_variable: you had to write the dict out and read it back
   in again before they appeared.

 - The symbol handling remembered what symbols you looked up but didn't
   remember their types, so you could look up an object symbol and then
   find it popping up when you asked for function symbols, which seems
   less than ideal.  Since we had to rejig things enough to be able to
   distinguish function and object symbols internally anyway (in order
   to give suitable errors if you try to add a symbol with a name that
   already existed in the ctf_open()ed dict), this bug suddenly became
   more visible and was easily fixed.

We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time).  This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).

There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.

libctf/

	* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
	(ctf_dict.ctf_symhash_func): ... this and...
	(ctf_dict.ctf_symhash_objt): ... this.
	(ctf_dict.ctf_stypes): New, counts static types.
	(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
	(LCTF_RDWR): Deleted.
	(LCTF_DIRTY): Renumbered.
	(LCTF_LINKING): Likewise.
	(ctf_lookup_variable_here): New.
	(ctf_lookup_by_sym_or_name): Likewise.
	(ctf_symbol_next_static): Likewise.
	(ctf_add_variable_forced): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_simple_open_internal): Adjust.
	(ctf_bufopen_internal): Likewise.
	* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
	(ctf_create): Migrate a bunch of initializations into bufopen.
	Force recreation of name tables.  Do not forcibly override the
	model, let ctf_bufopen do it.
	(ctf_static_type): New.
	(ctf_update): Drop LCTF_RDWR check.
	(ctf_dynamic_type): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_type_internal): Likewise.
	(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
	(ctf_set_array): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_enumerator): Likewise (only on the target dict).
	(ctf_add_member_offset): Likewise.
	(ctf_add_generic): Drop LCTF_RDWR check.  Ban addition of types
	with colliding names.
	(ctf_add_forward): Note safety under the new rules.
	(ctf_add_variable): Split all but the existence check into...
	(ctf_add_variable_forced): ... this new function.
	(ctf_add_funcobjt_sym): Likewise...
	(ctf_add_funcobjt_sym_forced): ... for this new function.
	* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
	with any stypes.
	(ctf_link_add_strtab): Likewise.
	(ctf_link_shuffle_syms): Likewise.
	(ctf_link_intern_extern_string): Note pre-existing prohibition.
	* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
	(ctf_lookup_variable): Split out looking in a dict but not
	its parent into...
	(ctf_lookup_variable_here): ... this new function.
	(ctf_lookup_symbol_idx): Track whether looking up a function or
	object: cache them separately.
	(ctf_symbol_next): Split out looking in non-dynamic symtypetab
	entries to...
	(ctf_symbol_next_static): ... this new function.  Don't get confused
	by the simultaneous presence of static and dynamic symtypetab entries.
	(ctf_try_lookup_indexed):  Don't waste time looking up symbols by
	index before there can be any idea how symbols are numbered.
	(ctf_lookup_by_sym_or_name): Distinguish between function and
	data object lookups.  Drop LCTF_RDWR.
	(ctf_lookup_by_symbol): Adjust.
	(ctf_lookup_by_symbol_name): Likewise.
	* ctf-open.c (init_types): Rename to...
	(init_static_types): ... this.  Drop LCTF_RDWR.  Populate ctf_stypes.
	(ctf_simple_open): Drop writable arg.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Populate fields only used for writable dicts.
	Drop LCTF_RDWR.
	(ctf_dict_close): Cater for symhash cache split.
	* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
	* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
	* testsuite/libctf-lookup/add-to-opened*: New test.
2024-04-19 16:14:46 +01:00
Nick Alcock
54a0219150 libctf: remove static/dynamic name lookup distinction
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).

Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.

This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.

That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.

(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)

libctf/

	* ctf-create.c (ctf_create): Eliminate ctn_writable.
	(ctf_dtd_insert): Likewise.
	(ctf_dtd_delete): Likewise.
	(ctf_rollback): Likewise.
	(ctf_name_table): Eliminate ctf_names_t.
	* ctf-hash.c (ctf_dynhash_create): Comment update.
        Reimplement in terms of...
	(ctf_dynhash_create_sized): ... this new function.
	(ctf_hash_create): Remove.
	(ctf_hash_size): Remove.
	(ctf_hash_define_type): Remove.
	(ctf_hash_destroy): Remove.
	(ctf_hash_lookup_type): Rename to...
	(ctf_dynhash_lookup_type): ... this.
	(ctf_hash_insert_type): Rename to...
	(ctf_dynhash_insert_type): ... this, moving validation to...
	* ctf-string.c (ctf_strptr_validate): ... this new function.
	* ctf-impl.h (struct ctf_names): Extirpate.
	(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
	(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
	(ctf_name_table): Now returns a ctf_dynhash_t.
	(ctf_lookup_by_rawhash): Remove.
	(ctf_hash_create): Likewise.
	(ctf_hash_insert_type): Likewise.
	(ctf_hash_define_type): Likewise.
	(ctf_hash_lookup_type): Likewise.
	(ctf_hash_size): Likewise.
	(ctf_hash_destroy): Likewise.
	(ctf_dynhash_create_sized): New.
	(ctf_dynhash_insert_type): New.
	(ctf_dynhash_lookup_type): New.
	(ctf_strptr_validate): New.
	* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
	* ctf-open.c (init_types): Adapt.
	(ctf_set_ctl_hashes): Adapt.
	(ctf_dict_close): Adapt.
	* ctf-serialize.c (ctf_serialize): Adapt.
	* ctf-types.c (ctf_lookup_by_rawhash): Remove.
2024-04-19 16:14:46 +01:00
Alan Modra
59497587af libctf warnings
Seen with every compiler I have if using -fno-inline:
home/alan/src/binutils-gdb/libctf/ctf-create.c: In function ‘ctf_add_encoded’:
/home/alan/src/binutils-gdb/libctf/ctf-create.c:555:3: warning: ‘encoding’ may be used uninitialized [-Wmaybe-uninitialized]
  555 |   memcpy (dtd->dtd_vlen, &encoding, sizeof (encoding));

Seen with gcc-4.9 and probably others at lower optimisation levels:
home/alan/src/binutils-gdb/libctf/ctf-serialize.c: In function 'symtypetab_density':
/home/alan/src/binutils-gdb/libctf/ctf-serialize.c:211:18: warning: 'sym' may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (*max < sym->st_symidx)

Seen with gcc-4.5 and probably others at lower optimisation levels:
/home/alan/src/binutils-gdb/libctf/ctf-types.c:1649:21: warning: 'tp' may be used uninitialized in this function
/home/alan/src/binutils-gdb/libctf/ctf-link.c:765:16: warning: 'parent_i' may be used uninitialized in this function

Also with gcc-4.5:
In file included from /home/alan/src/binutils-gdb/libctf/ctf-endian.h:25:0,
                 from /home/alan/src/binutils-gdb/libctf/ctf-archive.c:24:
/home/alan/src/binutils-gdb/libctf/swap.h:70:0: warning: "_Static_assert" redefined
/usr/include/sys/cdefs.h:568:0: note: this is the location of the previous definition

	* swap.h (_Static_assert): Don't define if already defined.
	* ctf-serialize.c (symtypetab_density): Merge two
	CTF_SYMTYPETAB_FORCE_INDEXED blocks.
	* ctf-create.c (ctf_add_encoded): Avoid "encoding" may be used
	uninitialized warning.
	* ctf-link.c (ctf_link_deduplicating_open_inputs): Avoid
	"parent_i" may be used uninitialized warning.
	* ctf-types.c (ctf_type_rvisit): Avoid "tp" may be used
	uninitialized warning.
2024-04-17 09:24:36 +09:30
Alan Modra
fd67aa1129 Update year range in copyright notice of binutils files
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:

1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
   author I haven't committed, 'Kalray SA.', to cover gas testsuite
   files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
2024-01-04 22:58:12 +10:30
Alan Modra
d87bef3a7b Update year range in copyright notice of binutils files
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2023-01-01 21:50:11 +10:30
Nick Alcock
3ec2b3c058 libctf: avoid mingw warning
A missing paren led to an intended cast to avoid dependence on the size
of size_t in one argument of ctf_err_warn applying to the wrong type by
mistake.

libctf/ChangeLog:

	* ctf-serialize.c (ctf_write_mem): Fix cast.
2022-06-21 19:27:15 +01:00
Nick Alcock
faf5e6ace8 libctf: add LIBCTF_WRITE_FOREIGN_ENDIAN debugging option
libctf has always handled endianness differences by detecting
foreign-endian CTF dicts on the input and endian-flipping them: dicts
are always written in native endianness.  This makes endian-awareness
very low overhead, but it means that the foreign-endian code paths
almost never get routinely tested, since "make check" usually reads in
dicts ld has just written out: only a few corrupted-CTF tests are
actually in fixed endianness, and even they only test the foreign-
endian code paths when you run make check on a big-endian machine.
(And the fix is surely not to add more .s-based tests like that, because
they are a nightmare to maintain compared to the C-code-based ones.)

To improve on this, add a new environment variable,
LIBCTF_WRITE_FOREIGN_ENDIAN, which causes libctf to unconditionally
endian-flip at ctf_write time, so the output is always in the wrong
endianness.  This then tests the foreign-endian read paths properly
at open time.

Make this easier by restructuring the writeout code in ctf-serialize.c,
which duplicates the maybe-gzip-and-write-out code three times (once
for ctf_write_mem, with thresholding, and once each for
ctf_compress_write and ctf_write just so those can avoid thresholding
and/or compression).  Instead, have the latter two call the former
with thresholds of 0 or (size_t) -1, respectively.

The endian-flipping code itself gains a bit of complexity, because
one single endian-flipper (flip_types) was assuming the input to be
in foreign-endian form and assuming it could pull things out of the
input once they had been flipped and make sense of them. At the
cost of a few lines of duplicated initializations, teach it to
read before flipping if we're flipping to foreign-endianness instead
of away from it.

libctf/
	* ctf-impl.h (ctf_flip_header): No longer static.
	(ctf_flip): Likewise.
	* ctf-open.c (flip_header): Rename to...
	(ctf_flip_header): ... this, now it is not private to one file.
	(flip_ctf): Rename...
	(ctf_flip): ... this too.  Add FOREIGN_ENDIAN arg.
	(flip_types): Likewise.  Use it.
	(ctf_bufopen_internal): Adjust calls.
	* ctf-serialize.c (ctf_write_mem): Add flip_endian path via
	a newly-allocated bounce buffer.
	(ctf_compress_write): Move below ctf_write_mem and reimplement
	in terms of it.
	(ctf_write): Likewise.
	(ctf_gzwrite): Note that this obscure writeout function does not
	support endian-flipping.
2022-03-23 13:48:32 +00:00
Nick Alcock
203bfa2f6b include, libctf, ld: extend variable section to contain functions too
The CTF variable section is an optional (usually-not-present) section in
the CTF dict which contains name -> type mappings corresponding to data
symbols that are present in the linker input but not in the output
symbol table: the idea is that programs that use their own symbol-
resolution mechanisms can use this section to look up the types of
symbols they have found using their own mechanism.

Because these removed symbols (mostly static variables, functions, etc)
all have names that are unlikely to appear in the ELF symtab and because
very few programs have their own symbol-resolution mechanisms, a special
linker flag (--ctf-variables) is needed to emit this section.

Historically, we emitted only removed data symbols into the variable
section.  This seemed to make sense at the time, but in hindsight it
really doesn't: functions are symbols too, and a C program can look them
up just like any other type.  So extend the variable section so that it
contains all static function symbols too (if it is emitted at all), with
types of kind CTF_K_FUNCTION.

This is a little fiddly.  We relied on compiler assistance for data
symbols: the compiler simply emits all data symbols twice, once into the
symtypetab as an indexed symbol and once into the variable section.

Rather than wait for a suitably adjusted compiler that does the same for
function symbols, we can pluck unreported function symbols out of the
symtab and add them to the variable section ourselves.  While we're at
it, we do the same with data symbols: this is redundant right now
because the compiler does it, but it costs very little time and lets the
compiler drop this kludge and save a little space in .o files.

include/
	* ctf.h: Mention the new things we can see in the variable
	section.

ld/
	* testsuite/ld-ctf/data-func-conflicted-vars.d: New test.

libctf/
	* ctf-link.c (ctf_link_deduplicating_variables): Duplicate
	symbols into the variable section too.
	* ctf-serialize.c (symtypetab_delete_nonstatic_vars): Rename
	to...
	(symtypetab_delete_nonstatics): ... this.  Check the funchash
	when pruning redundant variables.
	(ctf_symtypetab_sect_sizes): Adjust accordingly.
	* NEWS: Describe this change.
2022-03-23 13:48:32 +00:00
Alan Modra
a2c5833233 Update year range in copyright notice of binutils files
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.

The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
2022-01-02 12:04:28 +10:30
Nick Alcock
86f64bf43f libctf, serialize: functions with no args have a NULL dtd_vlen
Every place that accesses a function's dtd_vlen accesses it only if the
number of args is nonzero, except the serializer, which always tries to
memcpy it.  The number of bytes it memcpys in this case is zero, but it
is still undefined behaviour to copy zero bytes from a null pointer.
So check for this case explicitly.

libctf/ChangeLog
2021-03-25  Nick Alcock  <nick.alcock@oracle.com>

	PR libctf/27628
	* ctf-serialize.c (ctf_emit_type_sect): Allow for a NULL vlen in
	CTF_K_FUNCTION types.
2021-03-25 16:32:48 +00:00
Nick Alcock
08c428aff4 libctf: eliminate dtd_u, part 5: structs / unions
Eliminate the dynamic member storage for structs and unions as we have
for other dynamic types.  This is much like the previous enum
elimination, except that structs and unions are the only types for which
a full-sized ctf_type_t might be needed.  Up to now, this decision has
been made in the individual ctf_add_{struct,union}_sized functions and
duplicated in ctf_add_member_offset.  The vlen machinery lets us
simplify this, always allocating a ctf_lmember_t and setting the
dtd_data's ctt_size to CTF_LSIZE_SENT: we figure out whether this is
really justified and (almost always) repack things down into a
ctf_stype_t at ctf_serialize time.

This allows us to eliminate the dynamic member paths from the iterators and
query functions in ctf-types.c in favour of always using the large-structure
vlen stuff for dynamic types (the diff is ugly but that's just because of the
volume of reindentation this calls for).  This also means the large-structure
vlen stuff gets more heavily tested, which is nice because it was an almost
totally unused code path before now (it only kicked in for structures of size
>4GiB, and how often do you see those?)

The only extra complexity here is ctf_add_type.  Back in the days of the
nondeduplicating linker this was called a ridiculous number of times for
countless identical copies of structures: eschewing the repeated lookups of the
dtd in ctf_add_member_offset and adding the members directly saved an amazing
amount of time.  Now the nondeduplicating linker is gone, this is extreme
overoptimization: we can rip out the direct addition and use ctf_member_next and
ctf_add_member_offset, just like ctf_dedup_emit does.

We augment a ctf_add_type test to try adding a self-referential struct, the only
thing the ctf_add_type part of this change really perturbs.

This completes the elimination of dtd_u.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtu_members>: Remove.
	<dtd_u>: Likewise.
	(ctf_dmdef_t): Remove.
	(struct ctf_next) <u.ctn_dmd>: Remove.
	* ctf-create.c (INITIAL_VLEN): New, more-or-less arbitrary initial
	vlen size.
	(ctf_add_enum): Use it.
	(ctf_dtd_delete): Do not free the (removed) dmd; remove string
	refs from the vlen on struct deletion.
	(ctf_add_struct_sized): Populate the vlen: do it by hand if
	promoting forwards.  Always populate the full-size
	lsizehi/lsizelo members.
	(ctf_add_union_sized): Likewise.
	(ctf_add_member_offset): Set up the vlen rather than the dmd.
	Expand it as needed, repointing string refs via
	ctf_str_move_pending. Add the member names as pending strings.
	Always populate the full-size lsizehi/lsizelo members.
	(membadd): Remove, folding back into...
	(ctf_add_type_internal): ... here, adding via an ordinary
	ctf_add_struct_sized and _next iteration rather than doing
	everything by hand.
	* ctf-serialize.c (ctf_copy_smembers): Remove this...
	(ctf_copy_lmembers): ... and this...
	(ctf_emit_type_sect): ... folding into here. Figure out if a
	ctf_stype_t is needed here, not in ctf_add_*_sized.
	(ctf_type_sect_size): Figure out the ctf_stype_t stuff the same
	way here.
	* ctf-types.c (ctf_member_next): Remove the dmd path and always
	use the vlen.  Force large-structure usage for dynamic types.
	(ctf_type_align): Likewise.
	(ctf_member_info): Likewise.
	(ctf_type_rvisit): Likewise.
	* testsuite/libctf-regression/type-add-unnamed-struct-ctf.c: Add a
	self-referential type to this test.
	* testsuite/libctf-regression/type-add-unnamed-struct.c: Adjusted
	accordingly.
	* testsuite/libctf-regression/type-add-unnamed-struct.lk: Likewise.
2021-03-18 12:40:40 +00:00
Nick Alcock
77d724a7ec libctf: eliminate dtd_u, part 4: enums
This is the first tricky one, the first complex multi-entry vlen
containing strings.  To handle this in vlen form, we have to handle
pending refs moving around on realloc.

We grow vlen regions using a new ctf_grow_vlen function, and iterate
through the existing enums every time a grow happens, telling the string
machinery the distance between the old and new vlen region and letting
it adjust the pending refs accordingly.  (This avoids traversing all
outstanding refs to find the refs that need adjusting, at the cost of
having to traverse one enum: an obvious major performance win.)

Addition of enums themselves (and also structs/unions later) is a bit
trickier than earlier forms, because the type might be being promoted
from a forward, and forwards have no vlen: so we have to spot that and
create it if needed.

Serialization of enums simplifies down to just telling the string
machinery about the string refs; all the enum type-lookup code loses all
its dynamic member lookup complexity entirely.

A new test is added that iterates over (and gets values of) an enum with
enough members to force a round of vlen growth.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_vlen_alloc>: New.
	(ctf_str_move_pending): Declare.
	* ctf-string.c (ctf_str_add_ref_internal): Fix error return.
	(ctf_str_move_pending): New.
	* ctf-create.c (ctf_grow_vlen): New.
	(ctf_dtd_delete): Zero out the vlen_alloc after free.  Free the
	vlen later: iterate over it and free enum name refs first.
	(ctf_add_generic): Populate dtd_vlen_alloc from vlen.
	(ctf_add_enum): populate the vlen; do it by hand if promoting
	forwards.
	(ctf_add_enumerator): Set up the vlen rather than the dmd.  Expand
	it as needed, repointing string refs via ctf_str_move_pending. Add
	the enumerand names as pending strings.
	* ctf-serialize.c (ctf_copy_emembers): Remove.
	(ctf_emit_type_sect): Copy the vlen into place and ref the
	strings.
	* ctf-types.c (ctf_enum_next): The dynamic portion now uses
	the same code as the non-dynamic.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	* testsuite/libctf-lookup/enum-many-ctf.c: New test.
	* testsuite/libctf-lookup/enum-many.lk: New test.
2021-03-18 12:40:40 +00:00
Nick Alcock
986e9e3aa0 libctf: do not corrupt strings across ctf_serialize
The preceding change revealed a new bug: the string table is sorted for
better compression, so repeated serialization with type (or member)
additions in the middle can move strings around.  But every
serialization flushes the set of refs (the memory locations that are
automatically updated with a final string offset when the strtab is
updated), so if we are not to have string offsets go stale, we must do
all ref additions within the serialization code (which walks the
complete set of types and symbols anyway). Unfortunately, we were adding
one ref in another place: the type name in the dynamic type definitions,
which has a ref added to it by ctf_add_generic.

So adding a type, serializing (via, say, one of the ctf_write
functions), adding another type with a name that sorts earlier, and
serializing again will corrupt the name of the first type because it no
longer had a ref pointing to its dtd entry's name when its string offset
was shifted later in the strtab to mae way for the other type.

To ensure that we don't miss strings, we also maintain a set of *pending
refs* that will be added later (during serialization), and remove
entries from that set when the ref is finally added.  We always use
ctf_str_add_pending outside ctf-serialize.c, ensure that ctf_serialize
adds all strtab offsets as refs (even those in the dtds) on every
serialization, and mandate that no refs are live on entry to
ctf_serialize and that all pending refs are gone before strtab
finalization.  (Of necessity ctf_serialize has to traverse all strtab
offsets in the dtds in order to serialize them, so adding them as refs
at the same time is easy.)

(Note that we still can't erase unused atoms when we roll back, though
we can erase unused refs: members and enums are still not removed by
rollbacks and might reference strings added after the snapshot.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-hash.c (ctf_dynset_elements): New.
	* ctf-impl.h (ctf_dynset_elements): Declare it.
	(ctf_str_add_pending): Likewise.
	(ctf_dict_t) <ctf_str_pending_ref>: New, set of refs that must be
	added during serialization.
	* ctf-string.c (ctf_str_create_atoms): Initialize it.
	(CTF_STR_ADD_REF): New flag.
	(CTF_STR_MAKE_PROVISIONAL): Likewise.
	(CTF_STR_PENDING_REF): Likewise.
	(ctf_str_add_ref_internal): Take a flags word rather than int
	params.  Populate, and clear out, ctf_str_pending_ref.
	(ctf_str_add): Adjust accordingly.
	(ctf_str_add_external): Likewise.
	(ctf_str_add_pending): New.
	(ctf_str_remove_ref): Also remove the potential ref if it is a
	pending ref.
	* ctf-serialize.c (ctf_serialize): Prohibit addition of strings
	with ctf_str_add_ref before serialization.  Ensure that the
	ctf_str_pending_ref set is empty before strtab finalization.
	(ctf_emit_type_sect): Add a ref to the ctt_name.
	* ctf-create.c (ctf_add_generic): Add the ctt_name as a pending
	ref.
	* testsuite/libctf-writable/reserialize-strtab-corruption.*: New test.
2021-03-18 12:40:40 +00:00
Nick Alcock
2a05d50e90 libctf: don't lose track of all valid types upon serialization
One pattern which is rarely done in libctf but which is meant to work is
this:

ctf_create();
ctf_add_*(); // add stuff
ctf_type_*() // look stuff up
ctf_write_*();
ctf_add_*(); // should still work
ctf_type_*() // so should this
ctf_write_*(); // and this

i.e., writing out a dict should not break it and you should be able to
do everything you could do with it before, including writing it out
again.

Unfortunately this has been broken for a while because the field which
indicates the maximum valid type ID was not preserved across
serialization: so type additions after serialization would overwrite
types (obviously disastrous) and type lookups would just fail.

Fix trivial.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-serialize.c (ctf_serialize): Preserve ctf_typemax across
	serialization.
2021-03-18 12:40:40 +00:00
Nick Alcock
81982d20fa libctf: eliminate dtd_u, part 3: functions
One more member vanishes from the dtd_u, leaving only the member for
struct/union/enum members.

There's not much to do here, since as of commit afd78bd6f0 we use
the same representation (type sizes, etc) in the dtu_argv as we will
use in the final vlen, with one exception: the vlen has alignment
padding, and the dtu_argv did not.  Simplify things by adding suitable
padding in both cases.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_argv>: Remove.
	* ctf-create.c (ctf_dtd_delete): No longer free it.
	(ctf_add_function): Use the dtd_vlen, not dtu_argv.  Properly align.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-types.c (ctf_func_type_info): Just use the vlen.
	(ctf_func_type_args): Likewise.
2021-03-18 12:40:40 +00:00
Nick Alcock
534444b1ee libctf: eliminate dtd_u, part 2: arrays
This is even simpler than ints, floats and slices, with the only extra
complication being the need to manually transfer the array parameter in
the rarely-used function ctf_set_array.  (Arrays are unique in libctf in
that they can be modified post facto, not just created and appended to.
I'm not sure why they got this exemption, but it's easy to maintain.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_arr>: Remove.
	* ctf-create.c (ctf_add_array): Use the dtd_vlen, not dtu_arr.
	(ctf_set_array): Likewise.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-types.c (ctf_array_info): Just use the vlen.
2021-03-18 12:40:40 +00:00
Nick Alcock
7879dd88ef libctf: eliminate dtd_u, part 1: int/float/slice
This series eliminates a lot of special-case code to handle dynamic
types (types added to writable dicts and not yet serialized).

Historically, when such types have variable-length data in their final
CTF representations, libctf has always worked by adding such types to a
special union (ctf_dtdef_t.dtd_u) in the dynamic type definition
structure, then picking the members out of this structure at
serialization time and packing them into their final form.

This has the advantage that the ctf_add_* code doesn't need to know
anything about the final CTF representation, but the significant
disadvantage that all code that looks up types in any way needs two code
paths, one for dynamic types, one for all others.  Historically libctf
"handled" this by not supporting most type lookups on dynamic types at
all until ctf_update was called to do a complete reserialization of the
entire dict (it didn't emit an error, it just emitted wrong results).
Since commit 676c3ecbad, which eliminated ctf_update in favour of
the internal-only ctf_serialize function, all the type-lookup paths
grew an extra branch to handle dynamic types.

We can eliminate this branch again by dropping the dtd_u stuff and
simply writing out the vlen in (close to) its final form at ctf_add_*
time: type lookup for types using this approach is then identical for
types in writable dicts and types that are in read-only ones, and
serialization is also simplified (we just need to write out the vlen
we already created).

The only complexity lies in type kinds for which multiple
vlen representations are valid depending on properties of the type,
e.g. structures.  But we can start simple, adjusting ints, floats,
and slices to work this way, and leaving everything else as is.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove.
	<dtd_u.dtu_slice>: Likewise.
	<dtd_vlen>: New.
	* ctf-create.c (ctf_add_generic): Perhaps allocate it.  All
	callers adjusted.
	(ctf_dtd_delete): Free it.
	(ctf_add_slice): Use the dtd_vlen, not dtu_enc.
	(ctf_add_encoded): Likewise.  Assert that this must be an int or
	float.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not
	dtu_slice.
	* ctf-types.c (ctf_type_reference): Likewise.
	(ctf_type_encoding): Remove most dynamic-type-specific code: just
	get the vlen from the right place.  Report failure to look up the
	underlying type's encoding.
2021-03-18 12:40:36 +00:00
Nick Alcock
b9a964318a libctf: split up ctf_serialize
ctf_serialize and its various pieces may be split out into a separate
file now, but ctf_serialize is still far too long and disordered, mixing
header initialization, sizing of multiple CTF sections, sorting and
emission of multiple CTF sections, strtab construction and ctf_dict_t
copying into a single ugly organically-grown mess.

Fix the worst of this by migrating all section sizing and emission into
separate functions, two per section (or class of section in the case of
the symtypetabs).  Only the variable section is now sized and emitted
directly in ctf_serialize (because it only takes about three lines to do
so).

The section sizes themselves are still maintained by ctf_serialize so
that it can work out the header offsets, but ctf_symtypetab_sect_sizes
and ctf_emit_symtypetab_sects share a lot of extra state: migrate that
into a shared structure, emit_symtypetab_state_t.

(Test results unchanged.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-serialize.c: General reshuffling, and...
	(emit_symtypetab_state_t): New, migrated from
	local variables in ctf_serialize.
	(ctf_serialize): Split out most section sizing and
	emission.
	(ctf_symtypetab_sect_sizes): New (split out).
	(ctf_emit_symtypetab_sects): Likewise.
	(ctf_type_sect_size): Likewise.
	(ctf_emit_type_sect): Likewise.
2021-03-18 12:37:55 +00:00
Nick Alcock
bf4c3185a5 libctf: split serialization and file writeout into its own file
The code to serialize CTF dicts just gets bigger and bigger as the
dictionary's complexity grows: adding symtypetabs almost doubled it on
its own.  It's long past time to split this out into its own source
file, accompanied by the functions that do the actual writeout.

This leaves ctf-create.c populated exclusively by functions related to
actual writable dict creation (ctf_add_*, ctf_create etc), and leaves
both files a much more reasonable size.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-create.c (symtypetab_delete_nonstatic_vars): Move
	into ctf-serialize.c.
	(ctf_symtab_skippable): Likewise.
	(CTF_SYMTYPETAB_EMIT_FUNCTION): Likewise.
	(CTF_SYMTYPETAB_EMIT_PAD): Likewise.
	(CTF_SYMTYPETAB_FORCE_INDEXED): Likewise.
	(symtypetab_density): Likewise.
	(emit_symtypetab): Likewise.
	(emit_symtypetab_index): Likewise.
	(ctf_copy_smembers): Likewise.
	(ctf_copy_lmembers): Likewise.
	(ctf_copy_emembers): Likewise.
	(ctf_sort_var): Likewise.
	(ctf_serialize): Likewise.
	(ctf_gzwrite): Likewise.
	(ctf_compress_write): Likewise.
	(ctf_write_mem): Likewise.
	(ctf_write): Likewise.
	* ctf-serialize.c: New file.
	* Makefile.am (libctf_nobfd_la_SOURCES): Add it.
	* Makefile.in: Regenerate.
2021-03-18 12:37:53 +00:00