libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
/* CTF string table management.
|
2024-01-04 19:52:08 +08:00
|
|
|
Copyright (C) 2019-2024 Free Software Foundation, Inc.
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
|
|
|
This file is part of libctf.
|
|
|
|
|
|
|
|
libctf is free software; you can redistribute it and/or modify it under
|
|
|
|
the terms of the GNU General Public License as published by the Free
|
|
|
|
Software Foundation; either version 3, or (at your option) any later
|
|
|
|
version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful, but
|
|
|
|
WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
|
|
|
See the GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; see the file COPYING. If not see
|
|
|
|
<http://www.gnu.org/licenses/>. */
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
#include <assert.h>
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
#include <ctf-impl.h>
|
|
|
|
#include <string.h>
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
static ctf_str_atom_t *
|
|
|
|
ctf_str_add_ref_internal (ctf_dict_t *fp, const char *str,
|
|
|
|
int flags, uint32_t *ref);
|
|
|
|
|
|
|
|
/* Convert an encoded CTF string name into a pointer to a C string, possibly
|
|
|
|
using an explicit internal provisional strtab rather than the fp-based
|
|
|
|
one. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
const char *
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_strraw_explicit (ctf_dict_t *fp, uint32_t name, ctf_strs_t *strtab)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
|
|
|
ctf_strs_t *ctsp = &fp->ctf_str[CTF_NAME_STID (name)];
|
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if ((CTF_NAME_STID (name) == CTF_STRTAB_0) && (strtab != NULL))
|
|
|
|
ctsp = strtab;
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
/* If this name is in the external strtab, and there is a synthetic
|
|
|
|
strtab, use it in preference. (This is used to add the set of strings
|
|
|
|
-- symbol names, etc -- the linker knows about before the strtab is
|
|
|
|
written out.) */
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
|
|
|
if (CTF_NAME_STID (name) == CTF_STRTAB_1
|
|
|
|
&& fp->ctf_syn_ext_strtab != NULL)
|
|
|
|
return ctf_dynhash_lookup (fp->ctf_syn_ext_strtab,
|
|
|
|
(void *) (uintptr_t) name);
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
/* If the name is in the internal strtab, and the name offset is beyond
|
|
|
|
the end of the ctsp->cts_len but below the ctf_str_prov_offset, this is
|
|
|
|
a provisional string added by ctf_str_add*() but not yet built into a
|
|
|
|
real strtab: get the value out of the ctf_prov_strtab. */
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
|
|
|
|
if (CTF_NAME_STID (name) == CTF_STRTAB_0
|
|
|
|
&& name >= ctsp->cts_len && name < fp->ctf_str_prov_offset)
|
|
|
|
return ctf_dynhash_lookup (fp->ctf_prov_strtab,
|
|
|
|
(void *) (uintptr_t) name);
|
|
|
|
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
if (ctsp->cts_strs != NULL && CTF_NAME_OFFSET (name) < ctsp->cts_len)
|
|
|
|
return (ctsp->cts_strs + CTF_NAME_OFFSET (name));
|
|
|
|
|
|
|
|
/* String table not loaded or corrupt offset. */
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
/* Convert an encoded CTF string name into a pointer to a C string by looking
|
|
|
|
up the appropriate string table buffer and then adding the offset. */
|
|
|
|
const char *
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_strraw (ctf_dict_t *fp, uint32_t name)
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
{
|
|
|
|
return ctf_strraw_explicit (fp, name, NULL);
|
|
|
|
}
|
|
|
|
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
/* Return a guaranteed-non-NULL pointer to the string with the given CTF
|
|
|
|
name. */
|
|
|
|
const char *
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_strptr (ctf_dict_t *fp, uint32_t name)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
|
|
|
const char *s = ctf_strraw (fp, name);
|
|
|
|
return (s != NULL ? s : "(?)");
|
|
|
|
}
|
|
|
|
|
libctf: remove static/dynamic name lookup distinction
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).
Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.
This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.
That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.
(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)
libctf/
* ctf-create.c (ctf_create): Eliminate ctn_writable.
(ctf_dtd_insert): Likewise.
(ctf_dtd_delete): Likewise.
(ctf_rollback): Likewise.
(ctf_name_table): Eliminate ctf_names_t.
* ctf-hash.c (ctf_dynhash_create): Comment update.
Reimplement in terms of...
(ctf_dynhash_create_sized): ... this new function.
(ctf_hash_create): Remove.
(ctf_hash_size): Remove.
(ctf_hash_define_type): Remove.
(ctf_hash_destroy): Remove.
(ctf_hash_lookup_type): Rename to...
(ctf_dynhash_lookup_type): ... this.
(ctf_hash_insert_type): Rename to...
(ctf_dynhash_insert_type): ... this, moving validation to...
* ctf-string.c (ctf_strptr_validate): ... this new function.
* ctf-impl.h (struct ctf_names): Extirpate.
(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
(ctf_name_table): Now returns a ctf_dynhash_t.
(ctf_lookup_by_rawhash): Remove.
(ctf_hash_create): Likewise.
(ctf_hash_insert_type): Likewise.
(ctf_hash_define_type): Likewise.
(ctf_hash_lookup_type): Likewise.
(ctf_hash_size): Likewise.
(ctf_hash_destroy): Likewise.
(ctf_dynhash_create_sized): New.
(ctf_dynhash_insert_type): New.
(ctf_dynhash_lookup_type): New.
(ctf_strptr_validate): New.
* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
* ctf-open.c (init_types): Adapt.
(ctf_set_ctl_hashes): Adapt.
(ctf_dict_close): Adapt.
* ctf-serialize.c (ctf_serialize): Adapt.
* ctf-types.c (ctf_lookup_by_rawhash): Remove.
2023-12-19 01:47:48 +08:00
|
|
|
/* As above, but return info on what is wrong in more detail.
|
|
|
|
(Used for type lookups.) */
|
|
|
|
|
|
|
|
const char *
|
|
|
|
ctf_strptr_validate (ctf_dict_t *fp, uint32_t name)
|
|
|
|
{
|
|
|
|
const char *str = ctf_strraw (fp, name);
|
|
|
|
|
|
|
|
if (str == NULL)
|
|
|
|
{
|
|
|
|
if (CTF_NAME_STID (name) == CTF_STRTAB_1
|
|
|
|
&& fp->ctf_syn_ext_strtab == NULL
|
|
|
|
&& fp->ctf_str[CTF_NAME_STID (name)].cts_strs == NULL)
|
|
|
|
{
|
|
|
|
ctf_set_errno (fp, ECTF_STRTAB);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
ctf_set_errno (fp, ECTF_BADNAME);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
return str;
|
|
|
|
}
|
|
|
|
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
/* Remove all refs to a given atom. */
|
|
|
|
static void
|
|
|
|
ctf_str_purge_atom_refs (ctf_str_atom_t *atom)
|
|
|
|
{
|
|
|
|
ctf_str_atom_ref_t *ref, *next;
|
|
|
|
|
|
|
|
for (ref = ctf_list_next (&atom->csa_refs); ref != NULL; ref = next)
|
|
|
|
{
|
|
|
|
next = ctf_list_next (ref);
|
|
|
|
ctf_list_delete (&atom->csa_refs, ref);
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
if (atom->csa_flags & CTF_STR_ATOM_MOVABLE)
|
|
|
|
{
|
|
|
|
ctf_str_atom_ref_movable_t *movref;
|
|
|
|
movref = (ctf_str_atom_ref_movable_t *) ref;
|
|
|
|
ctf_dynhash_remove (movref->caf_movable_refs, ref);
|
|
|
|
}
|
|
|
|
|
libctf: remove ctf_malloc, ctf_free and ctf_strdup
These just get in the way of auditing for erroneous usage of strdup and
add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free?
ctf_strdup or strdup?"
ctf_malloc and ctf_free usage has not reliably matched up for many
years, if ever, making the whole game pointless.
Go back to malloc, free, and strdup like everyone else: while we're at
it, fix a bunch of places where we weren't properly checking for OOM.
This changes the interface of ctf_cuname_set and ctf_parent_name_set,
which could strdup but could not return errors (like ENOMEM).
New in v4.
include/
* ctf-api.h (ctf_cuname_set): Can now fail, returning int.
(ctf_parent_name_set): Likewise.
libctf/
* ctf-impl.h (ctf_alloc): Remove.
(ctf_free): Likewise.
(ctf_strdup): Likewise.
* ctf-subr.c (ctf_alloc): Remove.
(ctf_free): Likewise.
* ctf-util.c (ctf_strdup): Remove.
* ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_function): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
(ctf_compress_write): Likewise.
(ctf_write_mem): Likewise.
* ctf-decl.c (ctf_decl_push): Likewise.
(ctf_decl_fini): Likewise.
(ctf_decl_sprintf): Likewise. Check for OOM.
* ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dump_free): Likewise.
(ctf_dump): Likewise.
* ctf-open.c (upgrade_types_v1): Likewise.
(init_types): Likewise.
(ctf_file_close): Likewise.
(ctf_bufopen_internal): Likewise. Check for OOM.
(ctf_parent_name_set): Likewise: report the OOM to the caller.
(ctf_cuname_set): Likewise.
(ctf_import): Likewise.
* ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc;
free, not ctf_free; strdup, not ctf_strdup.
(ctf_str_free_atom): Likewise.
(ctf_str_create_atoms): Likewise.
(ctf_str_add_ref_internal): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_write_strtab): Likewise.
2019-09-17 13:54:23 +08:00
|
|
|
free (ref);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* Free an atom. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
static void
|
|
|
|
ctf_str_free_atom (void *a)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom = a;
|
|
|
|
|
|
|
|
ctf_str_purge_atom_refs (atom);
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
|
|
|
|
if (atom->csa_flags & CTF_STR_ATOM_FREEABLE)
|
|
|
|
free (atom->csa_str);
|
|
|
|
|
libctf: remove ctf_malloc, ctf_free and ctf_strdup
These just get in the way of auditing for erroneous usage of strdup and
add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free?
ctf_strdup or strdup?"
ctf_malloc and ctf_free usage has not reliably matched up for many
years, if ever, making the whole game pointless.
Go back to malloc, free, and strdup like everyone else: while we're at
it, fix a bunch of places where we weren't properly checking for OOM.
This changes the interface of ctf_cuname_set and ctf_parent_name_set,
which could strdup but could not return errors (like ENOMEM).
New in v4.
include/
* ctf-api.h (ctf_cuname_set): Can now fail, returning int.
(ctf_parent_name_set): Likewise.
libctf/
* ctf-impl.h (ctf_alloc): Remove.
(ctf_free): Likewise.
(ctf_strdup): Likewise.
* ctf-subr.c (ctf_alloc): Remove.
(ctf_free): Likewise.
* ctf-util.c (ctf_strdup): Remove.
* ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_function): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
(ctf_compress_write): Likewise.
(ctf_write_mem): Likewise.
* ctf-decl.c (ctf_decl_push): Likewise.
(ctf_decl_fini): Likewise.
(ctf_decl_sprintf): Likewise. Check for OOM.
* ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dump_free): Likewise.
(ctf_dump): Likewise.
* ctf-open.c (upgrade_types_v1): Likewise.
(init_types): Likewise.
(ctf_file_close): Likewise.
(ctf_bufopen_internal): Likewise. Check for OOM.
(ctf_parent_name_set): Likewise: report the OOM to the caller.
(ctf_cuname_set): Likewise.
(ctf_import): Likewise.
* ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc;
free, not ctf_free; strdup, not ctf_strdup.
(ctf_str_free_atom): Likewise.
(ctf_str_create_atoms): Likewise.
(ctf_str_add_ref_internal): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_write_strtab): Likewise.
2019-09-17 13:54:23 +08:00
|
|
|
free (atom);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Create the atoms table. There is always at least one atom in it, the null
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
string: but also pull in atoms from the internal strtab. (We rely on
|
|
|
|
calls to ctf_str_add_external to populate external strtab entries, since
|
|
|
|
these are often not quite the same as what appears in any external
|
|
|
|
strtab, and the external strtab is often huge and best not aggressively
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
pulled in.) */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
int
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
ctf_str_create_atoms (ctf_dict_t *fp)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
size_t i;
|
|
|
|
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
fp->ctf_str_atoms = ctf_dynhash_create (ctf_hash_string, ctf_hash_eq_string,
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
NULL, ctf_str_free_atom);
|
|
|
|
if (!fp->ctf_str_atoms)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
if (!fp->ctf_prov_strtab)
|
|
|
|
fp->ctf_prov_strtab = ctf_dynhash_create (ctf_hash_integer,
|
|
|
|
ctf_hash_eq_integer,
|
|
|
|
NULL, NULL);
|
|
|
|
if (!fp->ctf_prov_strtab)
|
|
|
|
goto oom_prov_strtab;
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
fp->ctf_str_movable_refs = ctf_dynhash_create (ctf_hash_integer,
|
|
|
|
ctf_hash_eq_integer,
|
|
|
|
NULL, NULL);
|
|
|
|
if (!fp->ctf_str_movable_refs)
|
|
|
|
goto oom_movable_refs;
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
errno = 0;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
ctf_str_add (fp, "");
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
if (errno == ENOMEM)
|
|
|
|
goto oom_str_add;
|
|
|
|
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
/* Pull in all the strings in the strtab as new atoms. The provisional
|
|
|
|
strtab must be empty at this point, so there is no need to populate
|
|
|
|
atoms from it as well. Types in this subset are frozen and readonly,
|
|
|
|
so the refs list and movable refs list need not be populated. */
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
for (i = 0; i < fp->ctf_str[CTF_STRTAB_0].cts_len;
|
|
|
|
i += strlen (&fp->ctf_str[CTF_STRTAB_0].cts_strs[i]) + 1)
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
{
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
ctf_str_atom_t *atom;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
if (fp->ctf_str[CTF_STRTAB_0].cts_strs[i] == 0)
|
|
|
|
continue;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
atom = ctf_str_add_ref_internal (fp, &fp->ctf_str[CTF_STRTAB_0].cts_strs[i],
|
|
|
|
0, 0);
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
if (!atom)
|
|
|
|
goto oom_str_add;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
|
libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
2024-03-26 21:04:20 +08:00
|
|
|
atom->csa_offset = i;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
}
|
|
|
|
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
return 0;
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
|
|
|
|
oom_str_add:
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
ctf_dynhash_destroy (fp->ctf_str_movable_refs);
|
|
|
|
fp->ctf_str_movable_refs = NULL;
|
|
|
|
oom_movable_refs:
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
ctf_dynhash_destroy (fp->ctf_prov_strtab);
|
|
|
|
fp->ctf_prov_strtab = NULL;
|
|
|
|
oom_prov_strtab:
|
|
|
|
ctf_dynhash_destroy (fp->ctf_str_atoms);
|
|
|
|
fp->ctf_str_atoms = NULL;
|
|
|
|
return -ENOMEM;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* Destroy the atoms table and associated refs. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
void
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_free_atoms (ctf_dict_t *fp)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
ctf_dynhash_destroy (fp->ctf_prov_strtab);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
ctf_dynhash_destroy (fp->ctf_str_atoms);
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
ctf_dynhash_destroy (fp->ctf_str_movable_refs);
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
if (fp->ctf_dynstrtab)
|
|
|
|
{
|
|
|
|
free (fp->ctf_dynstrtab->cts_strs);
|
|
|
|
free (fp->ctf_dynstrtab);
|
|
|
|
}
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
#define CTF_STR_ADD_REF 0x1
|
|
|
|
#define CTF_STR_PROVISIONAL 0x2
|
|
|
|
#define CTF_STR_MOVABLE 0x4
|
|
|
|
|
|
|
|
/* Allocate a ref and bind it into a ref list. */
|
|
|
|
|
|
|
|
static ctf_str_atom_ref_t *
|
|
|
|
aref_create (ctf_dict_t *fp, ctf_str_atom_t *atom, uint32_t *ref, int flags)
|
|
|
|
{
|
|
|
|
ctf_str_atom_ref_t *aref;
|
|
|
|
size_t s = sizeof (struct ctf_str_atom_ref);
|
|
|
|
|
|
|
|
if (flags & CTF_STR_MOVABLE)
|
|
|
|
s = sizeof (struct ctf_str_atom_ref_movable);
|
|
|
|
|
|
|
|
aref = malloc (s);
|
|
|
|
|
|
|
|
if (!aref)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
aref->caf_ref = ref;
|
|
|
|
|
|
|
|
/* Movable refs get a backpointer to them in ctf_str_movable_refs, and a
|
|
|
|
pointer to ctf_str_movable_refs itself in the ref, for use when freeing
|
|
|
|
refs: they can be moved later in batches via a call to
|
|
|
|
ctf_str_move_refs. */
|
|
|
|
|
|
|
|
if (flags & CTF_STR_MOVABLE)
|
|
|
|
{
|
|
|
|
ctf_str_atom_ref_movable_t *movref = (ctf_str_atom_ref_movable_t *) aref;
|
|
|
|
|
|
|
|
movref->caf_movable_refs = fp->ctf_str_movable_refs;
|
|
|
|
|
|
|
|
if (ctf_dynhash_insert (fp->ctf_str_movable_refs, ref, aref) < 0)
|
|
|
|
{
|
|
|
|
free (aref);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ctf_list_append (&atom->csa_refs, aref);
|
|
|
|
|
|
|
|
return aref;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Add a string to the atoms table, copying the passed-in string if
|
|
|
|
necessary. Return the atom added. Return NULL only when out of memory
|
|
|
|
(and do not touch the passed-in string in that case).
|
|
|
|
|
|
|
|
Possibly add a provisional entry for this string to the provisional
|
|
|
|
strtab. If the string is in the provisional strtab, update its ref list
|
|
|
|
with the passed-in ref, causing the ref to be updated when the strtab is
|
|
|
|
written out. */
|
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
static ctf_str_atom_t *
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_add_ref_internal (ctf_dict_t *fp, const char *str,
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
int flags, uint32_t *ref)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
|
|
|
char *newstr = NULL;
|
|
|
|
ctf_str_atom_t *atom = NULL;
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
int added = 0;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
|
|
|
atom = ctf_dynhash_lookup (fp->ctf_str_atoms, str);
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* Existing atoms get refs added only if they are provisional:
|
|
|
|
non-provisional strings already have a fixed strtab offset, and just
|
|
|
|
get their ref updated immediately, since its value cannot change. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
|
|
|
if (atom)
|
|
|
|
{
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
if (!ctf_dynhash_lookup (fp->ctf_prov_strtab, (void *) (uintptr_t)
|
|
|
|
atom->csa_offset))
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
if (flags & CTF_STR_ADD_REF)
|
|
|
|
{
|
|
|
|
if (atom->csa_external_offset)
|
|
|
|
*ref = atom->csa_external_offset;
|
|
|
|
else
|
|
|
|
*ref = atom->csa_offset;
|
|
|
|
}
|
|
|
|
return atom;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
|
|
|
|
if (flags & CTF_STR_ADD_REF)
|
|
|
|
{
|
|
|
|
if (!aref_create (fp, atom, ref, flags))
|
|
|
|
{
|
|
|
|
ctf_set_errno (fp, ENOMEM);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
return atom;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* New atom. */
|
|
|
|
|
libctf: remove ctf_malloc, ctf_free and ctf_strdup
These just get in the way of auditing for erroneous usage of strdup and
add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free?
ctf_strdup or strdup?"
ctf_malloc and ctf_free usage has not reliably matched up for many
years, if ever, making the whole game pointless.
Go back to malloc, free, and strdup like everyone else: while we're at
it, fix a bunch of places where we weren't properly checking for OOM.
This changes the interface of ctf_cuname_set and ctf_parent_name_set,
which could strdup but could not return errors (like ENOMEM).
New in v4.
include/
* ctf-api.h (ctf_cuname_set): Can now fail, returning int.
(ctf_parent_name_set): Likewise.
libctf/
* ctf-impl.h (ctf_alloc): Remove.
(ctf_free): Likewise.
(ctf_strdup): Likewise.
* ctf-subr.c (ctf_alloc): Remove.
(ctf_free): Likewise.
* ctf-util.c (ctf_strdup): Remove.
* ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_function): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
(ctf_compress_write): Likewise.
(ctf_write_mem): Likewise.
* ctf-decl.c (ctf_decl_push): Likewise.
(ctf_decl_fini): Likewise.
(ctf_decl_sprintf): Likewise. Check for OOM.
* ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dump_free): Likewise.
(ctf_dump): Likewise.
* ctf-open.c (upgrade_types_v1): Likewise.
(init_types): Likewise.
(ctf_file_close): Likewise.
(ctf_bufopen_internal): Likewise. Check for OOM.
(ctf_parent_name_set): Likewise: report the OOM to the caller.
(ctf_cuname_set): Likewise.
(ctf_import): Likewise.
* ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc;
free, not ctf_free; strdup, not ctf_strdup.
(ctf_str_free_atom): Likewise.
(ctf_str_create_atoms): Likewise.
(ctf_str_add_ref_internal): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_write_strtab): Likewise.
2019-09-17 13:54:23 +08:00
|
|
|
if ((atom = malloc (sizeof (struct ctf_str_atom))) == NULL)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
goto oom;
|
|
|
|
memset (atom, 0, sizeof (struct ctf_str_atom));
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* Don't allocate new strings if this string is within an mmapped
|
|
|
|
strtab. */
|
|
|
|
|
|
|
|
if ((unsigned char *) str < (unsigned char *) fp->ctf_data_mmapped
|
|
|
|
|| (unsigned char *) str > (unsigned char *) fp->ctf_data_mmapped + fp->ctf_data_mmapped_len)
|
|
|
|
{
|
|
|
|
if ((newstr = strdup (str)) == NULL)
|
|
|
|
goto oom;
|
|
|
|
atom->csa_flags |= CTF_STR_ATOM_FREEABLE;
|
|
|
|
atom->csa_str = newstr;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
atom->csa_str = (char *) str;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
if (ctf_dynhash_insert (fp->ctf_str_atoms, atom->csa_str, atom) < 0)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
goto oom;
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
added = 1;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
|
|
|
atom->csa_snapshot_id = fp->ctf_snapshots;
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* New atoms marked provisional go into the provisional strtab, and get a
|
|
|
|
ref added. */
|
|
|
|
|
|
|
|
if (flags & CTF_STR_PROVISIONAL)
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
{
|
|
|
|
atom->csa_offset = fp->ctf_str_prov_offset;
|
|
|
|
|
|
|
|
if (ctf_dynhash_insert (fp->ctf_prov_strtab, (void *) (uintptr_t)
|
|
|
|
atom->csa_offset, (void *) atom->csa_str) < 0)
|
|
|
|
goto oom;
|
|
|
|
|
|
|
|
fp->ctf_str_prov_offset += strlen (atom->csa_str) + 1;
|
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
if (flags & CTF_STR_ADD_REF)
|
|
|
|
{
|
|
|
|
if (!aref_create (fp, atom, ref, flags))
|
|
|
|
goto oom;
|
|
|
|
}
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
return atom;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
|
|
|
oom:
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
if (added)
|
|
|
|
ctf_dynhash_remove (fp->ctf_str_atoms, atom->csa_str);
|
libctf: remove ctf_malloc, ctf_free and ctf_strdup
These just get in the way of auditing for erroneous usage of strdup and
add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free?
ctf_strdup or strdup?"
ctf_malloc and ctf_free usage has not reliably matched up for many
years, if ever, making the whole game pointless.
Go back to malloc, free, and strdup like everyone else: while we're at
it, fix a bunch of places where we weren't properly checking for OOM.
This changes the interface of ctf_cuname_set and ctf_parent_name_set,
which could strdup but could not return errors (like ENOMEM).
New in v4.
include/
* ctf-api.h (ctf_cuname_set): Can now fail, returning int.
(ctf_parent_name_set): Likewise.
libctf/
* ctf-impl.h (ctf_alloc): Remove.
(ctf_free): Likewise.
(ctf_strdup): Likewise.
* ctf-subr.c (ctf_alloc): Remove.
(ctf_free): Likewise.
* ctf-util.c (ctf_strdup): Remove.
* ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_function): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
(ctf_compress_write): Likewise.
(ctf_write_mem): Likewise.
* ctf-decl.c (ctf_decl_push): Likewise.
(ctf_decl_fini): Likewise.
(ctf_decl_sprintf): Likewise. Check for OOM.
* ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dump_free): Likewise.
(ctf_dump): Likewise.
* ctf-open.c (upgrade_types_v1): Likewise.
(init_types): Likewise.
(ctf_file_close): Likewise.
(ctf_bufopen_internal): Likewise. Check for OOM.
(ctf_parent_name_set): Likewise: report the OOM to the caller.
(ctf_cuname_set): Likewise.
(ctf_import): Likewise.
* ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc;
free, not ctf_free; strdup, not ctf_strdup.
(ctf_str_free_atom): Likewise.
(ctf_str_create_atoms): Likewise.
(ctf_str_add_ref_internal): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_write_strtab): Likewise.
2019-09-17 13:54:23 +08:00
|
|
|
free (atom);
|
|
|
|
free (newstr);
|
2021-03-18 20:37:52 +08:00
|
|
|
ctf_set_errno (fp, ENOMEM);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
/* Add a string to the atoms table, without augmenting the ref list for this
|
|
|
|
string: return a 'provisional offset' which can be used to return this string
|
|
|
|
until ctf_str_write_strtab is called, or 0 on failure. (Everywhere the
|
|
|
|
provisional offset is assigned to should be added as a ref using
|
|
|
|
ctf_str_add_ref() as well.) */
|
|
|
|
uint32_t
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_add (ctf_dict_t *fp, const char *str)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
ctf_str_atom_t *atom;
|
libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-29 21:33:11 +08:00
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if (!str)
|
libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-29 21:33:11 +08:00
|
|
|
str = "";
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
atom = ctf_str_add_ref_internal (fp, str, CTF_STR_PROVISIONAL, 0);
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if (!atom)
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
return 0;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
return atom->csa_offset;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Like ctf_str_add(), but additionally augment the atom's refs list with the
|
|
|
|
passed-in ref, whether or not the string is already present. There is no
|
|
|
|
attempt to deduplicate the refs list (but duplicates are harmless). */
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
uint32_t
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_add_ref (ctf_dict_t *fp, const char *str, uint32_t *ref)
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom;
|
libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-29 21:33:11 +08:00
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if (!str)
|
libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-29 21:33:11 +08:00
|
|
|
str = "";
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
atom = ctf_str_add_ref_internal (fp, str, CTF_STR_ADD_REF
|
|
|
|
| CTF_STR_PROVISIONAL, ref);
|
|
|
|
if (!atom)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return atom->csa_offset;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Like ctf_str_add_ref(), but note that the ref may be moved later on. */
|
|
|
|
uint32_t
|
|
|
|
ctf_str_add_movable_ref (ctf_dict_t *fp, const char *str, uint32_t *ref)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom;
|
|
|
|
|
|
|
|
if (!str)
|
|
|
|
str = "";
|
|
|
|
|
|
|
|
atom = ctf_str_add_ref_internal (fp, str, CTF_STR_ADD_REF
|
|
|
|
| CTF_STR_PROVISIONAL
|
|
|
|
| CTF_STR_MOVABLE, ref);
|
libctf: do not corrupt strings across ctf_serialize
The preceding change revealed a new bug: the string table is sorted for
better compression, so repeated serialization with type (or member)
additions in the middle can move strings around. But every
serialization flushes the set of refs (the memory locations that are
automatically updated with a final string offset when the strtab is
updated), so if we are not to have string offsets go stale, we must do
all ref additions within the serialization code (which walks the
complete set of types and symbols anyway). Unfortunately, we were adding
one ref in another place: the type name in the dynamic type definitions,
which has a ref added to it by ctf_add_generic.
So adding a type, serializing (via, say, one of the ctf_write
functions), adding another type with a name that sorts earlier, and
serializing again will corrupt the name of the first type because it no
longer had a ref pointing to its dtd entry's name when its string offset
was shifted later in the strtab to mae way for the other type.
To ensure that we don't miss strings, we also maintain a set of *pending
refs* that will be added later (during serialization), and remove
entries from that set when the ref is finally added. We always use
ctf_str_add_pending outside ctf-serialize.c, ensure that ctf_serialize
adds all strtab offsets as refs (even those in the dtds) on every
serialization, and mandate that no refs are live on entry to
ctf_serialize and that all pending refs are gone before strtab
finalization. (Of necessity ctf_serialize has to traverse all strtab
offsets in the dtds in order to serialize them, so adding them as refs
at the same time is easy.)
(Note that we still can't erase unused atoms when we roll back, though
we can erase unused refs: members and enums are still not removed by
rollbacks and might reference strings added after the snapshot.)
libctf/ChangeLog
2021-03-18 Nick Alcock <nick.alcock@oracle.com>
* ctf-hash.c (ctf_dynset_elements): New.
* ctf-impl.h (ctf_dynset_elements): Declare it.
(ctf_str_add_pending): Likewise.
(ctf_dict_t) <ctf_str_pending_ref>: New, set of refs that must be
added during serialization.
* ctf-string.c (ctf_str_create_atoms): Initialize it.
(CTF_STR_ADD_REF): New flag.
(CTF_STR_MAKE_PROVISIONAL): Likewise.
(CTF_STR_PENDING_REF): Likewise.
(ctf_str_add_ref_internal): Take a flags word rather than int
params. Populate, and clear out, ctf_str_pending_ref.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_external): Likewise.
(ctf_str_add_pending): New.
(ctf_str_remove_ref): Also remove the potential ref if it is a
pending ref.
* ctf-serialize.c (ctf_serialize): Prohibit addition of strings
with ctf_str_add_ref before serialization. Ensure that the
ctf_str_pending_ref set is empty before strtab finalization.
(ctf_emit_type_sect): Add a ref to the ctt_name.
* ctf-create.c (ctf_add_generic): Add the ctt_name as a pending
ref.
* testsuite/libctf-writable/reserialize-strtab-corruption.*: New test.
2021-03-18 20:37:52 +08:00
|
|
|
if (!atom)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return atom->csa_offset;
|
|
|
|
}
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
/* Add an external strtab reference at OFFSET. Returns zero if the addition
|
|
|
|
failed, nonzero otherwise. */
|
|
|
|
int
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_add_external (ctf_dict_t *fp, const char *str, uint32_t offset)
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom;
|
libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-29 21:33:11 +08:00
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if (!str)
|
libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-29 21:33:11 +08:00
|
|
|
str = "";
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
atom = ctf_str_add_ref_internal (fp, str, 0, 0);
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
if (!atom)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
atom->csa_external_offset = CTF_SET_STID (offset, CTF_STRTAB_1);
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
|
|
|
if (!fp->ctf_syn_ext_strtab)
|
|
|
|
fp->ctf_syn_ext_strtab = ctf_dynhash_create (ctf_hash_integer,
|
|
|
|
ctf_hash_eq_integer,
|
|
|
|
NULL, NULL);
|
|
|
|
if (!fp->ctf_syn_ext_strtab)
|
|
|
|
{
|
|
|
|
ctf_set_errno (fp, ENOMEM);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ctf_dynhash_insert (fp->ctf_syn_ext_strtab,
|
|
|
|
(void *) (uintptr_t)
|
|
|
|
atom->csa_external_offset,
|
|
|
|
(void *) atom->csa_str) < 0)
|
|
|
|
{
|
|
|
|
/* No need to bother freeing the syn_ext_strtab: it will get freed at
|
|
|
|
ctf_str_write_strtab time if unreferenced. */
|
|
|
|
ctf_set_errno (fp, ENOMEM);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
return 1;
|
|
|
|
}
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
/* Note that refs have moved from (SRC, LEN) to DEST. We use the movable
|
|
|
|
refs backpointer for this, because it is done an amortized-constant
|
|
|
|
number of times during structure member and enumerand addition, and if we
|
|
|
|
did a linear search this would turn such addition into an O(n^2)
|
|
|
|
operation. Even this is not linear, but it's better than that. */
|
|
|
|
int
|
|
|
|
ctf_str_move_refs (ctf_dict_t *fp, void *src, size_t len, void *dest)
|
|
|
|
{
|
|
|
|
uintptr_t p;
|
|
|
|
|
|
|
|
if (src == dest)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (p = (uintptr_t) src; p - (uintptr_t) src < len; p++)
|
|
|
|
{
|
|
|
|
ctf_str_atom_ref_t *ref;
|
|
|
|
|
|
|
|
if ((ref = ctf_dynhash_lookup (fp->ctf_str_movable_refs,
|
|
|
|
(ctf_str_atom_ref_t *) p)) != NULL)
|
|
|
|
{
|
|
|
|
int out_of_memory;
|
|
|
|
|
|
|
|
ref->caf_ref = (uint32_t *) (((uintptr_t) ref->caf_ref +
|
|
|
|
(uintptr_t) dest - (uintptr_t) src));
|
|
|
|
ctf_dynhash_remove (fp->ctf_str_movable_refs,
|
|
|
|
(ctf_str_atom_ref_t *) p);
|
|
|
|
out_of_memory = ctf_dynhash_insert (fp->ctf_str_movable_refs,
|
|
|
|
ref->caf_ref, ref);
|
|
|
|
assert (out_of_memory == 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
/* Remove a single ref. */
|
|
|
|
void
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_remove_ref (ctf_dict_t *fp, const char *str, uint32_t *ref)
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
{
|
|
|
|
ctf_str_atom_ref_t *aref, *anext;
|
|
|
|
ctf_str_atom_t *atom = NULL;
|
|
|
|
|
|
|
|
atom = ctf_dynhash_lookup (fp->ctf_str_atoms, str);
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if (!atom)
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
return;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
for (aref = ctf_list_next (&atom->csa_refs); aref != NULL; aref = anext)
|
|
|
|
{
|
|
|
|
anext = ctf_list_next (aref);
|
|
|
|
if (aref->caf_ref == ref)
|
|
|
|
{
|
|
|
|
ctf_list_delete (&atom->csa_refs, aref);
|
libctf: remove ctf_malloc, ctf_free and ctf_strdup
These just get in the way of auditing for erroneous usage of strdup and
add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free?
ctf_strdup or strdup?"
ctf_malloc and ctf_free usage has not reliably matched up for many
years, if ever, making the whole game pointless.
Go back to malloc, free, and strdup like everyone else: while we're at
it, fix a bunch of places where we weren't properly checking for OOM.
This changes the interface of ctf_cuname_set and ctf_parent_name_set,
which could strdup but could not return errors (like ENOMEM).
New in v4.
include/
* ctf-api.h (ctf_cuname_set): Can now fail, returning int.
(ctf_parent_name_set): Likewise.
libctf/
* ctf-impl.h (ctf_alloc): Remove.
(ctf_free): Likewise.
(ctf_strdup): Likewise.
* ctf-subr.c (ctf_alloc): Remove.
(ctf_free): Likewise.
* ctf-util.c (ctf_strdup): Remove.
* ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_function): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
(ctf_compress_write): Likewise.
(ctf_write_mem): Likewise.
* ctf-decl.c (ctf_decl_push): Likewise.
(ctf_decl_fini): Likewise.
(ctf_decl_sprintf): Likewise. Check for OOM.
* ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not
ctf_free; strdup, not ctf_strdup.
(ctf_dump_free): Likewise.
(ctf_dump): Likewise.
* ctf-open.c (upgrade_types_v1): Likewise.
(init_types): Likewise.
(ctf_file_close): Likewise.
(ctf_bufopen_internal): Likewise. Check for OOM.
(ctf_parent_name_set): Likewise: report the OOM to the caller.
(ctf_cuname_set): Likewise.
(ctf_import): Likewise.
* ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc;
free, not ctf_free; strdup, not ctf_strdup.
(ctf_str_free_atom): Likewise.
(ctf_str_create_atoms): Likewise.
(ctf_str_add_ref_internal): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_write_strtab): Likewise.
2019-09-17 13:54:23 +08:00
|
|
|
free (aref);
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
}
|
|
|
|
}
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* A ctf_dynhash_iter_remove() callback that removes atoms later than a given
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
snapshot ID. External atoms are never removed, because they came from the
|
|
|
|
linker string table and are still present even if you roll back type
|
|
|
|
additions. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
static int
|
|
|
|
ctf_str_rollback_atom (void *key _libctf_unused_, void *value, void *arg)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom = (ctf_str_atom_t *) value;
|
|
|
|
ctf_snapshot_id_t *id = (ctf_snapshot_id_t *) arg;
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return (atom->csa_snapshot_id > id->snapshot_id)
|
|
|
|
&& (atom->csa_external_offset == 0);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
/* Roll back, deleting all (internal) atoms created after a particular ID. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
void
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_rollback (ctf_dict_t *fp, ctf_snapshot_id_t id)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
|
|
|
ctf_dynhash_iter_remove (fp->ctf_str_atoms, ctf_str_rollback_atom, &id);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* An adaptor around ctf_purge_atom_refs. */
|
|
|
|
static void
|
|
|
|
ctf_str_purge_one_atom_refs (void *key _libctf_unused_, void *value,
|
|
|
|
void *arg _libctf_unused_)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom = (ctf_str_atom_t *) value;
|
|
|
|
ctf_str_purge_atom_refs (atom);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Remove all the recorded refs from the atoms table. */
|
|
|
|
void
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_purge_refs (ctf_dict_t *fp)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
ctf_dynhash_iter (fp->ctf_str_atoms, ctf_str_purge_one_atom_refs, NULL);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Update a list of refs to the specified value. */
|
|
|
|
static void
|
|
|
|
ctf_str_update_refs (ctf_str_atom_t *refs, uint32_t value)
|
|
|
|
{
|
|
|
|
ctf_str_atom_ref_t *ref;
|
|
|
|
|
|
|
|
for (ref = ctf_list_next (&refs->csa_refs); ref != NULL;
|
|
|
|
ref = ctf_list_next (ref))
|
libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
2024-03-26 00:39:02 +08:00
|
|
|
*(ref->caf_ref) = value;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Sort the strtab. */
|
|
|
|
static int
|
|
|
|
ctf_str_sort_strtab (const void *a, const void *b)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t **one = (ctf_str_atom_t **) a;
|
|
|
|
ctf_str_atom_t **two = (ctf_str_atom_t **) b;
|
|
|
|
|
|
|
|
return (strcmp ((*one)->csa_str, (*two)->csa_str));
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Write out and return a strtab containing all strings with recorded refs,
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
adjusting the refs to refer to the corresponding string. The returned
|
|
|
|
strtab is already assigned to strtab 0 in this dict, is owned by this
|
|
|
|
dict, and may be NULL on error. Also populate the synthetic strtab with
|
|
|
|
mappings from external strtab offsets to names, so we can look them up
|
|
|
|
with ctf_strptr(). Only external strtab offsets with references are
|
|
|
|
added.
|
|
|
|
|
|
|
|
As a side effect, replaces the strtab of the current dict with the newly-
|
|
|
|
generated strtab. This is an exception to the general rule that
|
|
|
|
serialization does not change the dict passed in, because the alternative
|
|
|
|
is to copy the entire atoms table on every reserialization just to avoid
|
|
|
|
modifying the original, which is excessively costly for minimal gain.
|
|
|
|
|
|
|
|
We use the lazy man's approach and double memory costs by always storing
|
|
|
|
atoms as individually allocated entities whenever they come from anywhere
|
|
|
|
but a freshly-opened, mmapped dict, even though after serialization there
|
|
|
|
is another copy in the strtab; this ensures that ctf_strptr()-returned
|
|
|
|
pointers to them remain valid for the lifetime of the dict.
|
|
|
|
|
|
|
|
This is all rendered more complex because if a dict is ctf_open()ed it
|
|
|
|
will have a bunch of strings in its strtab already, and their strtab
|
|
|
|
offsets can never change (without piles of complexity to rescan the
|
|
|
|
entire dict just to get all the offsets to all of them into the atoms
|
|
|
|
table). Entries below the existing strtab limit are just copied into the
|
|
|
|
new dict: entries above it are new, and are are sorted first, then
|
|
|
|
appended to it. The sorting is purely a compression-efficiency
|
|
|
|
improvement, and we get nearly as good an improvement from sorting big
|
|
|
|
chunks like this as we would from sorting the whole thing. */
|
|
|
|
|
|
|
|
const ctf_strs_writable_t *
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_str_write_strtab (ctf_dict_t *fp)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
ctf_strs_writable_t *strtab;
|
|
|
|
size_t strtab_count = 0;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
uint32_t cur_stroff = 0;
|
|
|
|
ctf_str_atom_t **sorttab;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
ctf_next_t *it = NULL;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
size_t i;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
void *v;
|
|
|
|
int err;
|
|
|
|
int new_strtab = 0;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
int any_external = 0;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
strtab = calloc (1, sizeof (ctf_strs_writable_t));
|
|
|
|
if (!strtab)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* The strtab contains the existing string table at its start: figure out
|
|
|
|
how many new strings we need to add. We only need to add new strings
|
|
|
|
that have no external offset, that have refs, and that are found in the
|
|
|
|
provisional strtab. If the existing strtab is empty we also need to
|
|
|
|
add the null string at its start. */
|
|
|
|
|
|
|
|
strtab->cts_len = fp->ctf_str[CTF_STRTAB_0].cts_len;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
if (strtab->cts_len == 0)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
new_strtab = 1;
|
|
|
|
strtab->cts_len++; /* For the \0. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
/* Count new entries in the strtab: i.e. entries in the provisional
|
|
|
|
strtab. Ignore any entry for \0, entries which ended up in the
|
|
|
|
external strtab, and unreferenced entries. */
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
while ((err = ctf_dynhash_next (fp->ctf_prov_strtab, &it, NULL, &v)) == 0)
|
|
|
|
{
|
|
|
|
const char *str = (const char *) v;
|
|
|
|
ctf_str_atom_t *atom;
|
|
|
|
|
|
|
|
atom = ctf_dynhash_lookup (fp->ctf_str_atoms, str);
|
|
|
|
if (!ctf_assert (fp, atom))
|
|
|
|
goto err_strtab;
|
|
|
|
|
|
|
|
if (atom->csa_str[0] == 0 || ctf_list_empty_p (&atom->csa_refs) ||
|
|
|
|
atom->csa_external_offset)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
strtab->cts_len += strlen (atom->csa_str) + 1;
|
|
|
|
strtab_count++;
|
|
|
|
}
|
|
|
|
if (err != ECTF_NEXT_END)
|
|
|
|
{
|
|
|
|
ctf_dprintf ("ctf_str_write_strtab: error counting strtab entries: %s\n",
|
|
|
|
ctf_errmsg (err));
|
|
|
|
goto err_strtab;
|
|
|
|
}
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
ctf_dprintf ("%lu bytes of strings in strtab: %lu pre-existing.\n",
|
|
|
|
(unsigned long) strtab->cts_len,
|
|
|
|
(unsigned long) fp->ctf_str[CTF_STRTAB_0].cts_len);
|
|
|
|
|
|
|
|
/* Sort the new part of the strtab. */
|
|
|
|
|
|
|
|
sorttab = calloc (strtab_count, sizeof (ctf_str_atom_t *));
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
if (!sorttab)
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
{
|
|
|
|
ctf_set_errno (fp, ENOMEM);
|
|
|
|
goto err_strtab;
|
|
|
|
}
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
i = 0;
|
|
|
|
while ((err = ctf_dynhash_next (fp->ctf_prov_strtab, &it, NULL, &v)) == 0)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
atom = ctf_dynhash_lookup (fp->ctf_str_atoms, v);
|
|
|
|
if (!ctf_assert (fp, atom))
|
|
|
|
goto err_sorttab;
|
|
|
|
|
|
|
|
if (atom->csa_str[0] == 0 || ctf_list_empty_p (&atom->csa_refs) ||
|
|
|
|
atom->csa_external_offset)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
sorttab[i++] = atom;
|
|
|
|
}
|
|
|
|
|
|
|
|
qsort (sorttab, strtab_count, sizeof (ctf_str_atom_t *),
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
ctf_str_sort_strtab);
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
if ((strtab->cts_strs = malloc (strtab->cts_len)) == NULL)
|
|
|
|
goto err_sorttab;
|
|
|
|
|
|
|
|
cur_stroff = fp->ctf_str[CTF_STRTAB_0].cts_len;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
if (new_strtab)
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
{
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
strtab->cts_strs[0] = 0;
|
|
|
|
cur_stroff++;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
memcpy (strtab->cts_strs, fp->ctf_str[CTF_STRTAB_0].cts_strs,
|
|
|
|
fp->ctf_str[CTF_STRTAB_0].cts_len);
|
|
|
|
|
|
|
|
/* Work over the sorttab, add its strings to the strtab, and remember
|
|
|
|
where they are in the csa_offset for the appropriate atom. No ref
|
|
|
|
updating is done at this point, because refs might well relate to
|
|
|
|
already-existing strings, or external strings, which do not need adding
|
|
|
|
to the strtab and may not be in the sorttab. */
|
|
|
|
|
|
|
|
for (i = 0; i < strtab_count; i++)
|
|
|
|
{
|
|
|
|
sorttab[i]->csa_offset = cur_stroff;
|
|
|
|
strcpy (&strtab->cts_strs[cur_stroff], sorttab[i]->csa_str);
|
|
|
|
cur_stroff += strlen (sorttab[i]->csa_str) + 1;
|
|
|
|
}
|
|
|
|
free (sorttab);
|
|
|
|
sorttab = NULL;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
/* Update all refs, then purge them as no longer necessary: also update
|
|
|
|
the strtab appropriately. */
|
|
|
|
|
|
|
|
while ((err = ctf_dynhash_next (fp->ctf_str_atoms, &it, NULL, &v)) == 0)
|
|
|
|
{
|
|
|
|
ctf_str_atom_t *atom = (ctf_str_atom_t *) v;
|
|
|
|
uint32_t offset;
|
|
|
|
|
|
|
|
if (ctf_list_empty_p (&atom->csa_refs))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (atom->csa_external_offset)
|
|
|
|
{
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
any_external = 1;
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
offset = atom->csa_external_offset;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
}
|
|
|
|
else
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
offset = atom->csa_offset;
|
|
|
|
ctf_str_update_refs (atom, offset);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
if (err != ECTF_NEXT_END)
|
|
|
|
{
|
|
|
|
ctf_dprintf ("ctf_str_write_strtab: error iterating over atoms while updating refs: %s\n",
|
|
|
|
ctf_errmsg (err));
|
|
|
|
goto err_strtab;
|
|
|
|
}
|
|
|
|
ctf_str_purge_refs (fp);
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
if (!any_external)
|
|
|
|
{
|
|
|
|
ctf_dynhash_destroy (fp->ctf_syn_ext_strtab);
|
|
|
|
fp->ctf_syn_ext_strtab = NULL;
|
|
|
|
}
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
/* Replace the old strtab with the new one in this dict. */
|
|
|
|
|
|
|
|
if (fp->ctf_dynstrtab)
|
|
|
|
{
|
|
|
|
free (fp->ctf_dynstrtab->cts_strs);
|
|
|
|
free (fp->ctf_dynstrtab);
|
|
|
|
}
|
|
|
|
|
|
|
|
fp->ctf_dynstrtab = strtab;
|
|
|
|
fp->ctf_str[CTF_STRTAB_0].cts_strs = strtab->cts_strs;
|
|
|
|
fp->ctf_str[CTF_STRTAB_0].cts_len = strtab->cts_len;
|
|
|
|
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
/* All the provisional strtab entries are now real strtab entries, and
|
|
|
|
ctf_strptr() will find them there. The provisional offset now starts right
|
|
|
|
beyond the new end of the strtab. */
|
|
|
|
|
|
|
|
ctf_dynhash_empty (fp->ctf_prov_strtab);
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
fp->ctf_str_prov_offset = strtab->cts_len + 1;
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
return strtab;
|
|
|
|
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
err_sorttab:
|
libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.
This commit adds support for these external strings in the ctf-string.c
strtab tracking layer. It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB). Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.
(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)
We also have to go through a bit of extra effort at variable-sorting
time. This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results. Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.
But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk. Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set. This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine. (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)
v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.
include/
* ctf.h (CTF_SET_STID): New.
libctf/
* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
(ctf_str_add_ref): Name the last arg.
(ctf_str_add_external) New.
(ctf_str_add_strraw_explicit): Likewise.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
* ctf-string.c (ctf_strraw_explicit): Split from...
(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
(ctf_str_add_ref_internal): Return the atom, not the
string.
(ctf_str_add): Adjust accordingly.
(ctf_str_add_ref): Likewise. Move up in the file.
(ctf_str_add_external): New: update the csa_offset.
(ctf_str_count_strtab): Only account for strings with no csa_offset
in the internal strtab length.
(ctf_str_write_strtab): If the csa_offset is set, update the
string's refs without writing the string out, and update the
ctf_syn_ext_strtab. Make OOM handling less ugly.
* ctf-create.c (struct ctf_sort_var_arg_cb): New.
(ctf_update): Handle failure to populate the strtab. Pass in the
new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition.
Call ctf_simple_open_internal, not ctf_simple_open.
(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
strings by hand.
* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless.
* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
(ctf_file_close): Destroy the ctf_syn_ext_strtab.
(ctf_simple_open): Rename to, and reimplement as a wrapper around...
(ctf_simple_open_internal): ... this new function, which calls
ctf_bufopen_internal.
(ctf_bufopen): Rename to, and reimplement as a wrapper around...
(ctf_bufopen_internal): ... this new function, which sets
ctf_syn_ext_strtab.
2019-07-14 03:33:01 +08:00
|
|
|
free (sorttab);
|
libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
2024-03-26 03:07:43 +08:00
|
|
|
err_strtab:
|
|
|
|
free (strtab);
|
|
|
|
return NULL;
|
libctf: deduplicate and sort the string table
ctf.h states:
> [...] the CTF string table does not contain any duplicated strings.
Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer. There is no global
view of the strings and no deduplication.
Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.
Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).
Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go. The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'. Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).
The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.
No change in externally-visible API or file format (other than the
reduction in pointless redundancy).
libctf/
* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
struct ctf_strs.
(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
(struct ctf_str_atom): New, disambiguated single string.
(struct ctf_str_atom_ref): New, points to some other location that
references this string's offset.
(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
Remove member ctf_dtvstrlen: we no longer track the total strlen
as we add strings.
(ctf_str_create_atoms): Declare new function in ctf-string.c.
(ctf_str_free_atoms): Likewise.
(ctf_str_add): Likewise.
(ctf_str_add_ref): Likewise.
(ctf_str_purge_refs): Likewise.
(ctf_str_write_strtab): Likewise.
(ctf_realloc): Declare new function in ctf-util.c.
* ctf-open.c (ctf_bufopen): Create the atoms table.
(ctf_file_close): Destroy it.
* ctf-create.c (ctf_update): Copy-and-free it on update. No longer
special-case the position of the parname string. Construct the
strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
rest of each buffer element is constructed, not via open-coding:
realloc the CTF buffer and append the strtab to it. No longer
maintain ctf_dtvstrlen. Sort the variable entry table later, after
strtab construction.
(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
after buffer element construction instead.
(ctf_copy_lmembers): Likewise.
(ctf_copy_emembers): Likewise.
(ctf_create): No longer maintain the ctf_dtvstrlen.
(ctf_dtd_delete): Likewise.
(ctf_dvd_delete): Likewise.
(ctf_add_generic): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(membadd): Likewise.
* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
if there are active ctf_str_num_refs.
(ctf_strraw): Move to ctf-string.c.
(ctf_strptr): Likewise.
* ctf-string.c: New file, strtab manipulation.
* Makefile.am (libctf_a_SOURCES): Add it.
* Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
|
|
|
}
|