binutils-gdb/libctf/ctf-util.c

315 lines
7.5 KiB
C
Raw Normal View History

/* Miscellaneous utilities.
Copyright (C) 2019-2023 Free Software Foundation, Inc.
This file is part of libctf.
libctf is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; see the file COPYING. If not see
<http://www.gnu.org/licenses/>. */
#include <ctf-impl.h>
#include <string.h>
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
#include "ctf-endian.h"
/* Simple doubly-linked list append routine. This implementation assumes that
each list element contains an embedded ctf_list_t as the first member.
An additional ctf_list_t is used to store the head (l_next) and tail
(l_prev) pointers. The current head and tail list elements have their
previous and next pointers set to NULL, respectively. */
void
ctf_list_append (ctf_list_t *lp, void *newp)
{
ctf_list_t *p = lp->l_prev; /* p = tail list element. */
ctf_list_t *q = newp; /* q = new list element. */
lp->l_prev = q;
q->l_prev = p;
q->l_next = NULL;
if (p != NULL)
p->l_next = q;
else
lp->l_next = q;
}
/* Prepend the specified existing element to the given ctf_list_t. The
existing pointer should be pointing at a struct with embedded ctf_list_t. */
void
ctf_list_prepend (ctf_list_t * lp, void *newp)
{
ctf_list_t *p = newp; /* p = new list element. */
ctf_list_t *q = lp->l_next; /* q = head list element. */
lp->l_next = p;
p->l_prev = NULL;
p->l_next = q;
if (q != NULL)
q->l_prev = p;
else
lp->l_prev = p;
}
/* Delete the specified existing element from the given ctf_list_t. The
existing pointer should be pointing at a struct with embedded ctf_list_t. */
void
ctf_list_delete (ctf_list_t *lp, void *existing)
{
ctf_list_t *p = existing;
if (p->l_prev != NULL)
p->l_prev->l_next = p->l_next;
else
lp->l_next = p->l_next;
if (p->l_next != NULL)
p->l_next->l_prev = p->l_prev;
else
lp->l_prev = p->l_prev;
}
libctf: avoid the need to ever use ctf_update The method of operation of libctf when the dictionary is writable has before now been that types that are added land in the dynamic type section, which is a linked list and hash of IDs -> dynamic type definitions (and, recently a hash of names): the DTDs are a bit of CTF representing the ctf_type_t and ad hoc C structures representing the vlen. Historically, libctf was unable to do anything with these types, not even look them up by ID, let alone by name: if you wanted to do that say if you were adding a type that depended on one you just added) you called ctf_update, which serializes all the DTDs into a CTF file and reopens it, copying its guts over the fp it's called with. The ctf_updated types are then frozen in amber and unchangeable: all lookups will return the types in the static portion in preference to the dynamic portion, and we will refuse to re-add things that already exist in the static portion (and, of late, in the dynamic portion too). The libctf machinery remembers the boundary between static and dynamic types and looks in the right portion for each type. Lots of things still don't quite work with dynamic types (e.g. getting their size), but enough works to do a bunch of additions and then a ctf_update, most of the time. Except it doesn't, because ctf_add_type finds it necessary to walk the full dynamic type definition list looking for types with matching names, so it gets slower and slower with every type you add: fixing this requires calling ctf_update periodically for no other reason than to avoid massively slowing things down. This is all clunky and very slow but kind of works, until you consider that it is in fact possible and indeed necessary to modify one sort of type after it has been added: forwards. These are necessarily promoted to structs, unions or enums, and when they do so *their type ID does not change*. So all of a sudden we are changing types that already exist in the static portion. ctf_update gets massively confused by this and allocates space enough for the forward (with no members), but then emits the new dynamic type (with all the members) into it. You get an assertion failure after that, if you're lucky, or a coredump. So this commit rejigs things a bit and arranges to exclusively use the dynamic type definitions in writable dictionaries, and the static type definitions in readable dictionaries: we don't at any time have a mixture of static and dynamic types, and you don't need to call ctf_update to make things "appear". The ctf_dtbyname hash I introduced a few months ago, which maps things like "struct foo" to DTDs, is removed, replaced instead by a change of type of the four dictionaries which track names. Rather than just being (unresizable) ctf_hash_t's populated only at ctf_bufopen time, they are now a ctf_names_t structure, which is a pair of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used in readonly dictionaries, and the ctf_dynhash_t being used in writable ones. The decision as to which to use is centralized in the new functions ctf_lookup_by_rawname (which takes a type kind) and ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.) This change lets us switch from using static to dynamic name hashes on the fly across the entirety of libctf without complexifying anything: in fact, because we now centralize the knowledge about how to map from type kind to name hash, it actually simplifies things and lets us throw out quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced by the dynamic half of the name tables), through to ctf_dtnextid (now that a dictionary's static portion is never referenced if the dictionary is writable, we can just use ctf_typemax to indicate the maximum type: dynamic or non-dynamic does not matter, and we no longer need to track the boundary between the types). You can now ctf_rollback() as far as you like, even past a ctf_update or for that matter a full writeout; all the iteration functions work just as well on writable as on read-only dictionaries; ctf_add_type no longer needs expensive duplicated code to run over the dynamic types hunting for ones it might be interested in; and the linker no longer needs a hack to call ctf_update so that calling ctf_add_type is not impossibly expensive. There is still a bit more complexity: some new code paths in ctf-types.c need to know how to extract information from dynamic types. This complexity will go away again in a few months when libctf acquires a proper intermediate representation. You can still call ctf_update if you like (it's public API, after all), but its only effect now is to set the point to which ctf_discard rolls back. Obviously *something* still needs to serialize the CTF file before writeout, and this job is done by ctf_serialize, which does everything ctf_update used to except set the counter used by ctf_discard. It is automatically called by the various functions that do CTF writeout: nobody else ever needs to call it. With this in place, forwards that are promoted to non-forwards no longer crash the link, even if it happens tens of thousands of types later. v5: fix tabdamage. libctf/ * ctf-impl.h (ctf_names_t): New. (ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t. (ctf_file_t) <ctf_structs>: Likewise. <ctf_unions>: Likewise. <ctf_enums>: Likewise. <ctf_names>: Likewise. <ctf_lookups>: Improve comment. <ctf_ptrtab_len>: New. <ctf_prov_strtab>: New. <ctf_str_prov_offset>: New. <ctf_dtbyname>: Remove, redundant to the names hashes. <ctf_dtnextid>: Remove, redundant to ctf_typemax. (ctf_dtdef_t) <dtd_name>: Remove. <dtd_data>: Note that the ctt_name is now populated. (ctf_str_atom_t) <csa_offset>: This is now the strtab offset for internal strings too. <csa_external_offset>: New, the external strtab offset. (CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case. (ctf_name_table): New declaration. (ctf_lookup_by_rawname): Likewise. (ctf_lookup_by_rawhash): Likewise. (ctf_set_ctl_hashes): Likewise. (ctf_serialize): Likewise. (ctf_dtd_insert): Adjust. (ctf_simple_open_internal): Likewise. (ctf_bufopen_internal): Likewise. (ctf_list_empty_p): Likewise. (ctf_str_remove_ref): Likewise. (ctf_str_add): Returns uint32_t now. (ctf_str_add_ref): Likewise. (ctf_str_add_external): Now returns a boolean (int). * ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab for strings in the appropriate range. (ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM when adding the null string to the new strtab. (ctf_str_free_atoms): Destroy the ctf_prov_strtab. (ctf_str_add_ref_internal): Add make_provisional argument. If make_provisional, populate the offset and fill in the ctf_prov_strtab accordingly. (ctf_str_add): Return the offset, not the string. (ctf_str_add_ref): Likewise. (ctf_str_add_external): Return a success integer. (ctf_str_remove_ref): New, remove a single ref. (ctf_str_count_strtab): Do not count the initial null string's length or the existence or length of any unreferenced internal atoms. (ctf_str_populate_sorttab): Skip atoms with no refs. (ctf_str_write_strtab): Populate the nullstr earlier. Add one to the cts_len for the null string, since it is no longer done in ctf_str_count_strtab. Adjust for csa_external_offset rename. Populate the csa_offset for both internal and external cases. Flush the ctf_prov_strtab afterwards, and reset the ctf_str_prov_offset. * ctf-create.c (ctf_grow_ptrtab): New. (ctf_create): Call it. Initialize new fields rather than old ones. Tell ctf_bufopen_internal that this is a writable dictionary. Set the ctl hashes and data model. (ctf_update): Rename to... (ctf_serialize): ... this. Leave a compatibility function behind. Tell ctf_simple_open_internal that this is a writable dictionary. Pass the new fields along from the old dictionary. Drop ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name. Do not zero out the DTD's ctt_name. (ctf_prefixed_name): Rename to... (ctf_name_table): ... this. No longer return a prefixed name: return the applicable name table instead. (ctf_dtd_insert): Use it, and use the right name table. Pass in the kind we're adding. Migrate away from dtd_name. (ctf_dtd_delete): Adjust similarly. Remove the ref to the deleted ctt_name. (ctf_dtd_lookup_type_by_name): Remove. (ctf_dynamic_type): Always return NULL on read-only dictionaries. No longer check ctf_dtnextid: check ctf_typemax instead. (ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead. (ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use ctf_name_table and the right name table, and migrate away from dtd_name as in ctf_dtd_delete. (ctf_add_generic): Pass in the kind explicitly and pass it to ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away from dtd_name to using ctf_str_add_ref to populate the ctt_name. Grow the ptrtab if needed. (ctf_add_encoded): Pass in the kind. (ctf_add_slice): Likewise. (ctf_add_array): Likewise. (ctf_add_function): Likewise. (ctf_add_typedef): Likewise. (ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking ctt_name rather than dtd_name. (ctf_add_struct_sized): Pass in the kind. Use ctf_lookup_by_rawname, not ctf_hash_lookup_type / ctf_dtd_lookup_type_by_name. (ctf_add_union_sized): Likewise. (ctf_add_enum): Likewise. (ctf_add_enum_encoded): Likewise. (ctf_add_forward): Likewise. (ctf_add_type): Likewise. (ctf_compress_write): Call ctf_serialize: adjust for ctf_size not being initialized until after the call. (ctf_write_mem): Likewise. (ctf_write): Likewise. * ctf-archive.c (arc_write_one_ctf): Likewise. * ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not ctf_hash_lookup_type. (ctf_lookup_by_id): No longer check the readonly types if the dictionary is writable. * ctf-open.c (init_types): Assert that this dictionary is not writable. Adjust to use the new name hashes, ctf_name_table, and ctf_ptrtab_len. GNU style fix for the final ptrtab scan. (ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR if set. Drop out early when dictionary is writable. Split the ctf_lookups initialization into... (ctf_set_cth_hashes): ... this new function. (ctf_simple_open_internal): Adjust. New 'writable' parameter. (ctf_simple_open): Adjust accordingly. (ctf_bufopen): Likewise. (ctf_file_close): Destroy the appropriate name hashes. No longer destroy ctf_dtbyname, which is gone. (ctf_getdatasect): Remove spurious "extern". * ctf-types.c (ctf_lookup_by_rawname): New, look up types in the specified name table, given a kind. (ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *. (ctf_member_iter): Add support for iterating over the dynamic type list. (ctf_enum_iter): Likewise. (ctf_variable_iter): Likewise. (ctf_type_rvisit): Likewise. (ctf_member_info): Add support for types in the dynamic type list. (ctf_enum_name): Likewise. (ctf_enum_value): Likewise. (ctf_func_type_info): Likewise. (ctf_func_type_args): Likewise. * ctf-link.c (ctf_accumulate_archive_names): No longer call ctf_update. (ctf_link_write): Likewise. (ctf_link_intern_extern_string): Adjust for new ctf_str_add_external return value. (ctf_link_add_strtab): Likewise. * ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
/* Return 1 if the list is empty. */
int
ctf_list_empty_p (ctf_list_t *lp)
{
return (lp->l_next == NULL && lp->l_prev == NULL);
}
libctf, link: tie in the deduplicating linker This fairly intricate commit connects up the CTF linker machinery (which operates in terms of ctf_archive_t's on ctf_link_inputs -> ctf_link_outputs) to the deduplicator (which operates in terms of arrays of ctf_file_t's, all the archives exploded). The nondeduplicating linker is retained, but is not called unless the CTF_LINK_NONDEDUP flag is passed in (which ld never does), or the environment variable LD_NO_CTF_DEDUP is set. Eventually, once we have confidence in the much-more-complex deduplicating linker, I hope the nondeduplicating linker can be removed. In brief, what this does is traverses each input archive in ctf_link_inputs, opening every member (if not already open) and tying child dicts to their parents, shoving them into an array and constructing a corresponding parents array that tells the deduplicator which dict is the parent of which child. We then call ctf_dedup and ctf_dedup_emit with that array of inputs, taking the outputs that result and putting them into ctf_link_outputs where the rest of the CTF linker expects to find them, then linking in the variables just as is done by the nondeduplicating linker. It also implements much of the CU-mapping side of things. The problem CU-mapping introduces is that if you map many input CUs into one output, this is saying that you want many translation units to produce at most one child dict if conflicting types are found in any of them. This means you can suddenly have multiple distinct types with the same name in the same dict, which libctf cannot really represent because it's not something you can do with C translation units. The deduplicator machinery already committed does as best it can with these, hiding types with conflicting names rather than making child dicts out of them: but we still need to call it. This is done similarly to the main link, taking the inputs (one CU output at a time), deduplicating them, taking the output and making it an input to the final link. Two (significant) optimizations are done: we share atoms tables between all these links and the final link (so e.g. all type hash values are shared, all decorated type names, etc); and any CU-mapped links with only one input (and no child dicts) doesn't need to do anything other than renaming the CU: the CU-mapped link phase can be skipped for it. Put together, large CU-mapped links can save 50% of their memory usage and about as much time (and the memory usage for CU-mapped links is significant, because all those output CUs have to have all their types stored in memory all at once). include/ * ctf-api.h (CTF_LINK_NONDEDUP): New, turn off the deduplicator. libctf/ * ctf-impl.h (ctf_list_splice): New. * ctf-util.h (ctf_list_splice): Likewise. * ctf-link.c (link_sort_inputs_cb_arg_t): Likewise. (ctf_link_sort_inputs): Likewise. (ctf_link_deduplicating_count_inputs): Likewise. (ctf_link_deduplicating_open_inputs): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating_variables): Likewise. (ctf_link_deduplicating_per_cu): Likewise. (ctf_link_deduplicating): Likewise. (ctf_link): Call it.
2020-06-06 05:57:06 +08:00
/* Splice one entire list onto the end of another one. The existing list is
emptied. */
void
ctf_list_splice (ctf_list_t *lp, ctf_list_t *append)
{
if (ctf_list_empty_p (append))
return;
if (lp->l_prev != NULL)
lp->l_prev->l_next = append->l_next;
else
lp->l_next = append->l_next;
append->l_next->l_prev = lp->l_prev;
lp->l_prev = append->l_prev;
append->l_next = NULL;
append->l_prev = NULL;
}
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
/* Convert a 32-bit ELF symbol to a ctf_link_sym_t. */
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
ctf_link_sym_t *
ctf_elf32_to_link_sym (ctf_dict_t *fp, ctf_link_sym_t *dst, const Elf32_Sym *src,
uint32_t symidx)
{
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
Elf32_Sym tmp;
int needs_flipping = 0;
#ifdef WORDS_BIGENDIAN
if (fp->ctf_symsect_little_endian)
needs_flipping = 1;
#else
if (!fp->ctf_symsect_little_endian)
needs_flipping = 1;
#endif
memcpy (&tmp, src, sizeof (Elf32_Sym));
if (needs_flipping)
{
swap_thing (tmp.st_name);
swap_thing (tmp.st_size);
swap_thing (tmp.st_shndx);
swap_thing (tmp.st_value);
}
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
/* The name must be in the external string table. */
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
if (tmp.st_name < fp->ctf_str[CTF_STRTAB_1].cts_len)
dst->st_name = (const char *) fp->ctf_str[CTF_STRTAB_1].cts_strs + tmp.st_name;
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
else
dst->st_name = _CTF_NULLSTR;
dst->st_nameidx_set = 0;
dst->st_symidx = symidx;
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
dst->st_shndx = tmp.st_shndx;
dst->st_type = ELF32_ST_TYPE (tmp.st_info);
dst->st_value = tmp.st_value;
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
return dst;
}
/* Convert a 64-bit ELF symbol to a ctf_link_sym_t. */
ctf_link_sym_t *
ctf_elf64_to_link_sym (ctf_dict_t *fp, ctf_link_sym_t *dst, const Elf64_Sym *src,
uint32_t symidx)
{
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
Elf64_Sym tmp;
int needs_flipping = 0;
#ifdef WORDS_BIGENDIAN
if (fp->ctf_symsect_little_endian)
needs_flipping = 1;
#else
if (!fp->ctf_symsect_little_endian)
needs_flipping = 1;
#endif
memcpy (&tmp, src, sizeof (Elf64_Sym));
if (needs_flipping)
{
swap_thing (tmp.st_name);
swap_thing (tmp.st_size);
swap_thing (tmp.st_shndx);
swap_thing (tmp.st_value);
}
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
/* The name must be in the external string table. */
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
if (tmp.st_name < fp->ctf_str[CTF_STRTAB_1].cts_len)
dst->st_name = (const char *) fp->ctf_str[CTF_STRTAB_1].cts_strs + tmp.st_name;
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
else
dst->st_name = _CTF_NULLSTR;
dst->st_nameidx_set = 0;
dst->st_symidx = symidx;
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
dst->st_shndx = tmp.st_shndx;
dst->st_type = ELF32_ST_TYPE (tmp.st_info);
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
/* We only care if the value is zero, so avoid nonzeroes turning into
zeroes. */
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
if (_libctf_unlikely_ (tmp.st_value != 0 && ((uint32_t) tmp.st_value == 0)))
libctf: symbol type linking support This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
dst->st_value = 1;
else
libctf, include: support foreign-endianness symtabs with CTF The CTF symbol lookup machinery added recently has one deficit: it assumes the symtab is in the machine's native endianness. This is always true when the linker is writing out symtabs (because cross linkers byteswap symbols only after libctf has been called on them), but may be untrue in the cross case when the linker or another tool (objdump, etc) is reading them. Unfortunately the easy way to model this to the caller, as an endianness field in the ctf_sect_t, is precluded because doing so would change the size of the ctf_sect_t, which would be an ABI break. So, instead, allow the endianness of the symtab to be set after open time, by calling one of the two new API functions ctf_symsect_endianness (for ctf_dict_t's) or ctf_arc_symsect_endianness (for entire ctf_archive_t's). libctf calls these functions automatically for objects opened via any of the BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen, ctf_open, or ctf_arc_open), but the various mechanisms that just take raw ctf_sect_t's will assume the symtab is in native endianness and need a later call to ctf_*symsect_endianness to adjust it if needed. (This call is basically free if the endianness is actually native: it only costs anything if the symtab endianness was previously guessed wrong, and there is a symtab, and we are using it directly rather than using symtab indexing.) Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the symtab endianness is correctly set will probably give wrong answers -- but you can set it at any time as long as it is before then. include/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h: Style nit: remove () on function names in comments. (ctf_sect_t): Mention endianness concerns. (ctf_symsect_endianness): New declaration. (ctf_arc_symsect_endianness): Likewise. libctf/ChangeLog 2020-11-23 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New. (struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise. * ctf-create.c (ctf_serialize): Adjust for new field. * ctf-open.c (init_symtab): Note the semantics of repeated calls. (ctf_symsect_endianness): New. (ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for the native endianness. (_Static_assert): Moved... (swap_thing): ... with this... * swap.h: ... to here. * ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the Elf32_Sym if the ctf_symtab_little_endian demands it. (ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed. * ctf-archive.c (ctf_arc_symsect_endianness): New, set the endianness of the symtab used by the dicts in an archive. (ctf_archive_iter_internal): Initialize to unknown (assumed native, do not call ctf_symsect_endianness). (ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be. (ctf_dict_open_internal): Propagate the endianness down. (ctf_dict_open_sections): Likewise. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the struct bfd and pass it down to the archive. * libctf.ver: Add ctf_symsect_endianness and ctf_arc_symsect_endianness.
2020-11-24 05:17:44 +08:00
dst->st_value = (uint32_t) tmp.st_value;
return dst;
}
/* A string appender working on dynamic strings. Returns NULL on OOM. */
char *
ctf_str_append (char *s, const char *append)
{
size_t s_len = 0;
if (append == NULL)
return s;
if (s != NULL)
s_len = strlen (s);
size_t append_len = strlen (append);
if ((s = realloc (s, s_len + append_len + 1)) == NULL)
return NULL;
memcpy (s + s_len, append, append_len);
s[s_len + append_len] = '\0';
return s;
}
/* A version of ctf_str_append that returns the old string on OOM. */
char *
ctf_str_append_noerr (char *s, const char *append)
{
char *new_s;
new_s = ctf_str_append (s, append);
if (!new_s)
return s;
return new_s;
}
libctf: deduplicate and sort the string table ctf.h states: > [...] the CTF string table does not contain any duplicated strings. Unfortunately this is entirely untrue: libctf has before now made no attempt whatsoever to deduplicate the string table. It computes the string table's length on the fly as it adds new strings to the dynamic CTF file, and ctf_update() just writes each string to the table and notes the current write position as it traverses the dynamic CTF file's data structures and builds the final CTF buffer. There is no global view of the strings and no deduplication. Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding a new dynhash table ctf_str_atoms that maps unique strings to a list of references to those strings: a reference is a simple uint32_t * to some value somewhere in the under-construction CTF buffer that needs updating to note the string offset when the strtab is laid out. Adding a string is now a simple matter of calling ctf_str_add_ref(), which adds a new atom to the atoms table, if one doesn't already exist, and adding the location of the reference to this atom to the refs list attached to the atom: this works reliably as long as one takes care to only call ctf_str_add_ref() once the final location of the offset is known (so you can't call it on a temporary structure and then memcpy() that structure into place in the CTF buffer, because the ref will still point to the old location: ctf_update() changes accordingly). Generating the CTF string table is a matter of calling ctf_str_write_strtab(), which counts the length and number of elements in the atoms table using the ctf_dynhash_iter() function we just added, populating an array of pointers into the atoms table and sorting it into order (to help compressors), then traversing this table and emitting it, updating the refs to each atom as we go. The only complexity here is arranging to keep the null string at offset zero, since a lot of code in libctf depends on being able to leave strtab references at 0 to indicate 'no name'. Once the table is constructed and the refs updated, we know how long it is, so we can realloc() the partial CTF buffer we allocated earlier and can copy the table on to the end of it (and purge the refs because they're not needed any more and have been invalidated by the realloc() call in any case). The net effect of all this is a reduction in uncompressed strtab sizes of about 30% (perhaps a quarter to a half of all strings across the Linux kernel are eliminated as duplicates). Of course, duplicated strings are highly redundant, so the space saving after compression is only about 20%: when the other non-strtab sections are factored in, CTF sizes shrink by about 10%. No change in externally-visible API or file format (other than the reduction in pointless redundancy). libctf/ * ctf-impl.h: (struct ctf_strs_writable): New, non-const version of struct ctf_strs. (struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated. (struct ctf_str_atom): New, disambiguated single string. (struct ctf_str_atom_ref): New, points to some other location that references this string's offset. (struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs. Remove member ctf_dtvstrlen: we no longer track the total strlen as we add strings. (ctf_str_create_atoms): Declare new function in ctf-string.c. (ctf_str_free_atoms): Likewise. (ctf_str_add): Likewise. (ctf_str_add_ref): Likewise. (ctf_str_purge_refs): Likewise. (ctf_str_write_strtab): Likewise. (ctf_realloc): Declare new function in ctf-util.c. * ctf-open.c (ctf_bufopen): Create the atoms table. (ctf_file_close): Destroy it. * ctf-create.c (ctf_update): Copy-and-free it on update. No longer special-case the position of the parname string. Construct the strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the rest of each buffer element is constructed, not via open-coding: realloc the CTF buffer and append the strtab to it. No longer maintain ctf_dtvstrlen. Sort the variable entry table later, after strtab construction. (ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members. (ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref after buffer element construction instead. (ctf_copy_lmembers): Likewise. (ctf_copy_emembers): Likewise. (ctf_create): No longer maintain the ctf_dtvstrlen. (ctf_dtd_delete): Likewise. (ctf_dvd_delete): Likewise. (ctf_add_generic): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable): Likewise. (membadd): Likewise. * ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts if there are active ctf_str_num_refs. (ctf_strraw): Move to ctf-string.c. (ctf_strptr): Likewise. * ctf-string.c: New file, strtab manipulation. * Makefile.am (libctf_a_SOURCES): Add it. * Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
/* A realloc() that fails noisily if called with any ctf_str_num_users. */
void *
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t The naming of the ctf_file_t type in libctf is a historical curiosity. Back in the Solaris days, CTF dictionaries were originally generated as a separate file and then (sometimes) merged into objects: hence the datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw CTF is essentially never written to a file on its own, and the datatype changed name to a "CTF dictionary" years ago. So the term "CTF file" refers to something that is never a file! This is at best confusing. The type has also historically been known as a 'CTF container", which is even more confusing now that we have CTF archives which are *also* a sort of container (they contain CTF dictionaries), but which are never referred to as containers in the source code. So fix this by completing the renaming, renaming ctf_file_t to ctf_dict_t throughout, and renaming those few functions that refer to CTF files by name (keeping compatibility aliases) to refer to dicts instead. Old users who still refer to ctf_file_t will see (harmless) pointer-compatibility warnings at compile time, but the ABI is unchanged (since C doesn't mangle names, and ctf_file_t was always an opaque type) and things will still compile fine as long as -Werror is not specified. All references to CTF containers and CTF files in the source code are fixed to refer to CTF dicts instead. Further (smaller) renamings of annoyingly-named functions to come, as part of the process of souping up queries across whole archives at once (needed for the function info and data object sections). binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. * readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_section_as_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. gdb/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctfread.c: Change uses of ctf_file_t to ctf_dict_t. (ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_file_t): Rename to... (ctf_dict_t): ... this. Keep ctf_file_t around for compatibility. (struct ctf_file): Likewise rename to... (struct ctf_dict): ... this. (ctf_file_close): Rename to... (ctf_dict_close): ... this, keeping compatibility function. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this, keeping compatibility function. All callers adjusted. * ctf.h: Rename references to ctf_file_t to ctf_dict_t. (struct ctf_archive) <ctfa_nfiles>: Rename to... <ctfa_ndicts>: ... this. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (ctf_output): This is a ctf_dict_t now. (lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t. (ldlang_open_ctf): Adjust comment. (lang_merge_ctf): Use ctf_dict_close, not ctf_file_close. * ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to ctf_dict_t. Change opaque declaration accordingly. * ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust. * ldemul.h (examine_strtab_for_ctf): Likewise. (ldemul_examine_strtab_for_ctf): Likewise. * ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations adjusted. (ctf_fileops): Rename to... (ctf_dictops): ... this. (ctf_dedup_t) <cd_id_to_file_t>: Rename to... <cd_id_to_dict_t>: ... this. (ctf_file_t): Fix outdated comment. <ctf_fileops>: Rename to... <ctf_dictops>: ... this. (struct ctf_archive_internal) <ctfi_file>: Rename to... <ctfi_dict>: ... this. * ctf-archive.c: Rename ctf_file_t to ctf_dict_t. Rename ctf_archive.ctfa_nfiles to ctfa_ndicts. Rename ctf_file_close to ctf_dict_close. All users adjusted. * ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers. (ctf_bundle_t) <ctb_file>: Rename to... <ctb_dict): ... this. * ctf-decl.c: Rename ctf_file_t to ctf_dict_t. * ctf-dedup.c: Likewise. Rename ctf_file_close to ctf_dict_close. Refer to CTF dicts, not CTF containers. * ctf-dump.c: Likewise. * ctf-error.c: Likewise. * ctf-hash.c: Likewise. * ctf-inlines.h: Likewise. * ctf-labels.c: Likewise. * ctf-link.c: Likewise. * ctf-lookup.c: Likewise. * ctf-open-bfd.c: Likewise. * ctf-string.c: Likewise. * ctf-subr.c: Likewise. * ctf-types.c: Likewise. * ctf-util.c: Likewise. * ctf-open.c: Likewise. (ctf_file_close): Rename to... (ctf_dict_close): ...this. (ctf_file_close): New trivial wrapper around ctf_dict_close, for compatibility. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this. (ctf_parent_file): New trivial wrapper around ctf_parent_dict, for compatibility. * libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
ctf_realloc (ctf_dict_t *fp, void *ptr, size_t size)
libctf: deduplicate and sort the string table ctf.h states: > [...] the CTF string table does not contain any duplicated strings. Unfortunately this is entirely untrue: libctf has before now made no attempt whatsoever to deduplicate the string table. It computes the string table's length on the fly as it adds new strings to the dynamic CTF file, and ctf_update() just writes each string to the table and notes the current write position as it traverses the dynamic CTF file's data structures and builds the final CTF buffer. There is no global view of the strings and no deduplication. Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding a new dynhash table ctf_str_atoms that maps unique strings to a list of references to those strings: a reference is a simple uint32_t * to some value somewhere in the under-construction CTF buffer that needs updating to note the string offset when the strtab is laid out. Adding a string is now a simple matter of calling ctf_str_add_ref(), which adds a new atom to the atoms table, if one doesn't already exist, and adding the location of the reference to this atom to the refs list attached to the atom: this works reliably as long as one takes care to only call ctf_str_add_ref() once the final location of the offset is known (so you can't call it on a temporary structure and then memcpy() that structure into place in the CTF buffer, because the ref will still point to the old location: ctf_update() changes accordingly). Generating the CTF string table is a matter of calling ctf_str_write_strtab(), which counts the length and number of elements in the atoms table using the ctf_dynhash_iter() function we just added, populating an array of pointers into the atoms table and sorting it into order (to help compressors), then traversing this table and emitting it, updating the refs to each atom as we go. The only complexity here is arranging to keep the null string at offset zero, since a lot of code in libctf depends on being able to leave strtab references at 0 to indicate 'no name'. Once the table is constructed and the refs updated, we know how long it is, so we can realloc() the partial CTF buffer we allocated earlier and can copy the table on to the end of it (and purge the refs because they're not needed any more and have been invalidated by the realloc() call in any case). The net effect of all this is a reduction in uncompressed strtab sizes of about 30% (perhaps a quarter to a half of all strings across the Linux kernel are eliminated as duplicates). Of course, duplicated strings are highly redundant, so the space saving after compression is only about 20%: when the other non-strtab sections are factored in, CTF sizes shrink by about 10%. No change in externally-visible API or file format (other than the reduction in pointless redundancy). libctf/ * ctf-impl.h: (struct ctf_strs_writable): New, non-const version of struct ctf_strs. (struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated. (struct ctf_str_atom): New, disambiguated single string. (struct ctf_str_atom_ref): New, points to some other location that references this string's offset. (struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs. Remove member ctf_dtvstrlen: we no longer track the total strlen as we add strings. (ctf_str_create_atoms): Declare new function in ctf-string.c. (ctf_str_free_atoms): Likewise. (ctf_str_add): Likewise. (ctf_str_add_ref): Likewise. (ctf_str_purge_refs): Likewise. (ctf_str_write_strtab): Likewise. (ctf_realloc): Declare new function in ctf-util.c. * ctf-open.c (ctf_bufopen): Create the atoms table. (ctf_file_close): Destroy it. * ctf-create.c (ctf_update): Copy-and-free it on update. No longer special-case the position of the parname string. Construct the strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the rest of each buffer element is constructed, not via open-coding: realloc the CTF buffer and append the strtab to it. No longer maintain ctf_dtvstrlen. Sort the variable entry table later, after strtab construction. (ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members. (ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref after buffer element construction instead. (ctf_copy_lmembers): Likewise. (ctf_copy_emembers): Likewise. (ctf_create): No longer maintain the ctf_dtvstrlen. (ctf_dtd_delete): Likewise. (ctf_dvd_delete): Likewise. (ctf_add_generic): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable): Likewise. (membadd): Likewise. * ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts if there are active ctf_str_num_refs. (ctf_strraw): Move to ctf-string.c. (ctf_strptr): Likewise. * ctf-string.c: New file, strtab manipulation. * Makefile.am (libctf_a_SOURCES): Add it. * Makefile.in: Regenerate.
2019-06-27 20:51:10 +08:00
{
if (fp->ctf_str_num_refs > 0)
{
ctf_dprintf ("%p: attempt to realloc() string table with %lu active refs\n",
(void *) fp, (unsigned long) fp->ctf_str_num_refs);
return NULL;
}
return realloc (ptr, size);
}
/* Store the specified error code into errp if it is non-NULL, and then
return NULL for the benefit of the caller. */
void *
ctf_set_open_errno (int *errp, int error)
{
if (errp != NULL)
*errp = error;
return NULL;
}
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t The naming of the ctf_file_t type in libctf is a historical curiosity. Back in the Solaris days, CTF dictionaries were originally generated as a separate file and then (sometimes) merged into objects: hence the datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw CTF is essentially never written to a file on its own, and the datatype changed name to a "CTF dictionary" years ago. So the term "CTF file" refers to something that is never a file! This is at best confusing. The type has also historically been known as a 'CTF container", which is even more confusing now that we have CTF archives which are *also* a sort of container (they contain CTF dictionaries), but which are never referred to as containers in the source code. So fix this by completing the renaming, renaming ctf_file_t to ctf_dict_t throughout, and renaming those few functions that refer to CTF files by name (keeping compatibility aliases) to refer to dicts instead. Old users who still refer to ctf_file_t will see (harmless) pointer-compatibility warnings at compile time, but the ABI is unchanged (since C doesn't mangle names, and ctf_file_t was always an opaque type) and things will still compile fine as long as -Werror is not specified. All references to CTF containers and CTF files in the source code are fixed to refer to CTF dicts instead. Further (smaller) renamings of annoyingly-named functions to come, as part of the process of souping up queries across whole archives at once (needed for the function info and data object sections). binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. * readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_section_as_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. gdb/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctfread.c: Change uses of ctf_file_t to ctf_dict_t. (ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_file_t): Rename to... (ctf_dict_t): ... this. Keep ctf_file_t around for compatibility. (struct ctf_file): Likewise rename to... (struct ctf_dict): ... this. (ctf_file_close): Rename to... (ctf_dict_close): ... this, keeping compatibility function. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this, keeping compatibility function. All callers adjusted. * ctf.h: Rename references to ctf_file_t to ctf_dict_t. (struct ctf_archive) <ctfa_nfiles>: Rename to... <ctfa_ndicts>: ... this. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (ctf_output): This is a ctf_dict_t now. (lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t. (ldlang_open_ctf): Adjust comment. (lang_merge_ctf): Use ctf_dict_close, not ctf_file_close. * ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to ctf_dict_t. Change opaque declaration accordingly. * ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust. * ldemul.h (examine_strtab_for_ctf): Likewise. (ldemul_examine_strtab_for_ctf): Likewise. * ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations adjusted. (ctf_fileops): Rename to... (ctf_dictops): ... this. (ctf_dedup_t) <cd_id_to_file_t>: Rename to... <cd_id_to_dict_t>: ... this. (ctf_file_t): Fix outdated comment. <ctf_fileops>: Rename to... <ctf_dictops>: ... this. (struct ctf_archive_internal) <ctfi_file>: Rename to... <ctfi_dict>: ... this. * ctf-archive.c: Rename ctf_file_t to ctf_dict_t. Rename ctf_archive.ctfa_nfiles to ctfa_ndicts. Rename ctf_file_close to ctf_dict_close. All users adjusted. * ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers. (ctf_bundle_t) <ctb_file>: Rename to... <ctb_dict): ... this. * ctf-decl.c: Rename ctf_file_t to ctf_dict_t. * ctf-dedup.c: Likewise. Rename ctf_file_close to ctf_dict_close. Refer to CTF dicts, not CTF containers. * ctf-dump.c: Likewise. * ctf-error.c: Likewise. * ctf-hash.c: Likewise. * ctf-inlines.h: Likewise. * ctf-labels.c: Likewise. * ctf-link.c: Likewise. * ctf-lookup.c: Likewise. * ctf-open-bfd.c: Likewise. * ctf-string.c: Likewise. * ctf-subr.c: Likewise. * ctf-types.c: Likewise. * ctf-util.c: Likewise. * ctf-open.c: Likewise. (ctf_file_close): Rename to... (ctf_dict_close): ...this. (ctf_file_close): New trivial wrapper around ctf_dict_close, for compatibility. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this. (ctf_parent_file): New trivial wrapper around ctf_parent_dict, for compatibility. * libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
/* Store the specified error code into the CTF dict, and then return CTF_ERR /
-1 for the benefit of the caller. */
libctf: fix a number of build problems found on Solaris and NetBSD - Use of nonportable <endian.h> - Use of qsort_r - Use of zlib without appropriate magic to pull in the binutils zlib - Use of off64_t without checking (fixed by dropping the unused fields that need off64_t entirely) - signedness problems due to long being too short a type on 32-bit platforms: ctf_id_t is now 'unsigned long', and CTF_ERR must be used only for functions that return ctf_id_t - One lingering use of bzero() and of <sys/errno.h> All fixed, using code from gnulib where possible. Relatedly, set cts_size in a couple of places it was missed (string table and symbol table loading upon ctf_bfdopen()). binutils/ * objdump.c (make_ctfsect): Drop cts_type, cts_flags, and cts_offset. * readelf.c (shdr_to_ctf_sect): Likewise. include/ * ctf-api.h (ctf_sect_t): Drop cts_type, cts_flags, and cts_offset. (ctf_id_t): This is now an unsigned type. (CTF_ERR): Cast it to ctf_id_t. Note that it should only be used for ctf_id_t-returning functions. libctf/ * Makefile.am (ZLIB): New. (ZLIBINC): Likewise. (AM_CFLAGS): Use them. (libctf_a_LIBADD): New, for LIBOBJS. * configure.ac: Check for zlib, endian.h, and qsort_r. * ctf-endian.h: New, providing htole64 and le64toh. * swap.h: Code style fixes. (bswap_identity_64): New. * qsort_r.c: New, from gnulib (with one added #include). * ctf-decls.h: New, providing a conditional qsort_r declaration, and unconditional definitions of MIN and MAX. * ctf-impl.h: Use it. Do not use <sys/errno.h>. (ctf_set_errno): Now returns unsigned long. * ctf-util.c (ctf_set_errno): Adjust here too. * ctf-archive.c: Use ctf-endian.h. (ctf_arc_open_by_offset): Use memset, not bzero. Drop cts_type, cts_flags and cts_offset. (ctf_arc_write): Drop debugging dependent on the size of off_t. * ctf-create.c: Provide a definition of roundup if not defined. (ctf_create): Drop cts_type, cts_flags and cts_offset. (ctf_add_reftype): Do not check if type IDs are below zero. (ctf_add_slice): Likewise. (ctf_add_typedef): Likewise. (ctf_add_member_offset): Cast error-returning ssize_t's to size_t when known error-free. Drop CTF_ERR usage for functions returning int. (ctf_add_member_encoded): Drop CTF_ERR usage for functions returning int. (ctf_add_variable): Likewise. (enumcmp): Likewise. (enumadd): Likewise. (membcmp): Likewise. (ctf_add_type): Likewise. Cast error-returning ssize_t's to size_t when known error-free. * ctf-dump.c (ctf_is_slice): Drop CTF_ERR usage for functions returning int: use CTF_ERR for functions returning ctf_type_id. (ctf_dump_label): Likewise. (ctf_dump_objts): Likewise. * ctf-labels.c (ctf_label_topmost): Likewise. (ctf_label_iter): Likewise. (ctf_label_info): Likewise. * ctf-lookup.c (ctf_func_args): Likewise. * ctf-open.c (upgrade_types): Cast to size_t where appropriate. (ctf_bufopen): Likewise. Use zlib types as needed. * ctf-types.c (ctf_member_iter): Drop CTF_ERR usage for functions returning int. (ctf_enum_iter): Likewise. (ctf_type_size): Likewise. (ctf_type_align): Likewise. Cast to size_t where appropriate. (ctf_type_kind_unsliced): Likewise. (ctf_type_kind): Likewise. (ctf_type_encoding): Likewise. (ctf_member_info): Likewise. (ctf_array_info): Likewise. (ctf_enum_value): Likewise. (ctf_type_rvisit): Likewise. * ctf-open-bfd.c (ctf_bfdopen): Drop cts_type, cts_flags and cts_offset. (ctf_simple_open): Likewise. (ctf_bfdopen_ctfsect): Likewise. Set cts_size properly. * Makefile.in: Regenerate. * aclocal.m4: Likewise. * config.h: Likewise. * configure: Likewise.
2019-05-31 17:10:51 +08:00
unsigned long
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t The naming of the ctf_file_t type in libctf is a historical curiosity. Back in the Solaris days, CTF dictionaries were originally generated as a separate file and then (sometimes) merged into objects: hence the datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw CTF is essentially never written to a file on its own, and the datatype changed name to a "CTF dictionary" years ago. So the term "CTF file" refers to something that is never a file! This is at best confusing. The type has also historically been known as a 'CTF container", which is even more confusing now that we have CTF archives which are *also* a sort of container (they contain CTF dictionaries), but which are never referred to as containers in the source code. So fix this by completing the renaming, renaming ctf_file_t to ctf_dict_t throughout, and renaming those few functions that refer to CTF files by name (keeping compatibility aliases) to refer to dicts instead. Old users who still refer to ctf_file_t will see (harmless) pointer-compatibility warnings at compile time, but the ABI is unchanged (since C doesn't mangle names, and ctf_file_t was always an opaque type) and things will still compile fine as long as -Werror is not specified. All references to CTF containers and CTF files in the source code are fixed to refer to CTF dicts instead. Further (smaller) renamings of annoyingly-named functions to come, as part of the process of souping up queries across whole archives at once (needed for the function info and data object sections). binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. * readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_section_as_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. gdb/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctfread.c: Change uses of ctf_file_t to ctf_dict_t. (ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_file_t): Rename to... (ctf_dict_t): ... this. Keep ctf_file_t around for compatibility. (struct ctf_file): Likewise rename to... (struct ctf_dict): ... this. (ctf_file_close): Rename to... (ctf_dict_close): ... this, keeping compatibility function. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this, keeping compatibility function. All callers adjusted. * ctf.h: Rename references to ctf_file_t to ctf_dict_t. (struct ctf_archive) <ctfa_nfiles>: Rename to... <ctfa_ndicts>: ... this. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (ctf_output): This is a ctf_dict_t now. (lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t. (ldlang_open_ctf): Adjust comment. (lang_merge_ctf): Use ctf_dict_close, not ctf_file_close. * ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to ctf_dict_t. Change opaque declaration accordingly. * ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust. * ldemul.h (examine_strtab_for_ctf): Likewise. (ldemul_examine_strtab_for_ctf): Likewise. * ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations adjusted. (ctf_fileops): Rename to... (ctf_dictops): ... this. (ctf_dedup_t) <cd_id_to_file_t>: Rename to... <cd_id_to_dict_t>: ... this. (ctf_file_t): Fix outdated comment. <ctf_fileops>: Rename to... <ctf_dictops>: ... this. (struct ctf_archive_internal) <ctfi_file>: Rename to... <ctfi_dict>: ... this. * ctf-archive.c: Rename ctf_file_t to ctf_dict_t. Rename ctf_archive.ctfa_nfiles to ctfa_ndicts. Rename ctf_file_close to ctf_dict_close. All users adjusted. * ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers. (ctf_bundle_t) <ctb_file>: Rename to... <ctb_dict): ... this. * ctf-decl.c: Rename ctf_file_t to ctf_dict_t. * ctf-dedup.c: Likewise. Rename ctf_file_close to ctf_dict_close. Refer to CTF dicts, not CTF containers. * ctf-dump.c: Likewise. * ctf-error.c: Likewise. * ctf-hash.c: Likewise. * ctf-inlines.h: Likewise. * ctf-labels.c: Likewise. * ctf-link.c: Likewise. * ctf-lookup.c: Likewise. * ctf-open-bfd.c: Likewise. * ctf-string.c: Likewise. * ctf-subr.c: Likewise. * ctf-types.c: Likewise. * ctf-util.c: Likewise. * ctf-open.c: Likewise. (ctf_file_close): Rename to... (ctf_dict_close): ...this. (ctf_file_close): New trivial wrapper around ctf_dict_close, for compatibility. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this. (ctf_parent_file): New trivial wrapper around ctf_parent_dict, for compatibility. * libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
ctf_set_errno (ctf_dict_t *fp, int err)
{
fp->ctf_errno = err;
return CTF_ERR;
}
libctf, next: introduce new class of easier-to-use iterators The libctf machinery currently only provides one way to iterate over its data structures: ctf_*_iter functions that take a callback and an arg and repeatedly call it. This *works*, but if you are doing a lot of iteration it is really quite inconvenient: you have to package up your local variables into structures over and over again and spawn lots of little functions even if it would be clearer in a single run of code. Look at ctf-string.c for an extreme example of how unreadable this can get, with three-line-long functions proliferating wildly. The deduplicator takes this to the Nth level. It iterates over a whole bunch of things: if we'd had to use _iter-class iterators for all of them there would be twenty additional functions in the deduplicator alone, for no other reason than that the iterator API requires it. Let's do something better. strtok_r gives us half the design: generators in a number of other languages give us the other half. The *_next API allows you to iterate over CTF-like entities in a single function using a normal while loop. e.g. here we are iterating over all the types in a dict: ctf_next_t *i = NULL; int *hidden; ctf_id_t id; while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR) { /* do something with 'hidden' and 'id' */ } if (ctf_errno (fp) != ECTF_NEXT_END) /* iteration error */ Here we are walking through the members of a struct with CTF ID 'struct_type': ctf_next_t *i = NULL; ssize_t offset; const char *name; ctf_id_t membtype; while ((offset = ctf_member_next (fp, struct_type, &i, &name, &membtype)) >= 0 { /* do something with offset, name, and membtype */ } if (ctf_errno (fp) != ECTF_NEXT_END) /* iteration error */ Like every other while loop, this means you have access to all the local variables outside the loop while inside it, with no need to tiresomely package things up in structures, move the body of the loop into a separate function, etc, as you would with an iterator taking a callback. ctf_*_next allocates 'i' for you on first entry (when it must be NULL), and frees and NULLs it and returns a _next-dependent flag value when the iteration is over: the fp errno is set to ECTF_NEXT_END when the iteartion ends normally. If you want to exit early, call ctf_next_destroy on the iterator. You can copy iterators using ctf_next_copy, which copies their current iteration position so you can remember loop positions and go back to them later (or ctf_next_destroy them if you don't need them after all). Each _next function returns an always-likely-to-be-useful property of the thing being iterated over, and takes pointers to parameters for the others: with very few exceptions all those parameters can be NULLs if you're not interested in them, so e.g. you can iterate over only the offsets of members of a structure this way: while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0) If you pass an iterator in use by one iteration function to another one, you get the new error ECTF_NEXT_WRONGFUN back; if you try to change ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back. Internally the ctf_next_t remembers the iteration function in use, various sizes and increments useful for almost all iterations, then uses unions to overlap the actual entities being iterated over to keep ctf_next_t size down. Iterators available in the public API so far (all tested in actual use in the deduplicator): /* Iterate over the members of a STRUCT or UNION, returning each member's offset and optionally name and member type in turn. On end-of-iteration, returns -1. */ ssize_t ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it, const char **name, ctf_id_t *membtype); /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or NULL at end of iteration or error, and optionally passing back the enumerand's integer VALue. */ const char * ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it, int *val); /* Iterate over every type in the given CTF container (not including parents), optionally including non-user-visible types, returning each type ID and optionally the hidden flag in turn. Returns CTF_ERR on end of iteration or error. */ ctf_id_t ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag, int want_hidden); /* Iterate over every variable in the given CTF container, in arbitrary order, returning the name and type of each variable in turn. The NAME argument is not optional. Returns CTF_ERR on end of iteration or error. */ ctf_id_t ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name); /* Iterate over all CTF files in an archive, returning each dict in turn as a ctf_file_t, and NULL on error or end of iteration. It is the caller's responsibility to close it. Parent dicts may be skipped. Regardless of whether they are skipped or not, the caller must ctf_import the parent if need be. */ ctf_file_t * ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it, const char **name, int skip_parent, int *errp); ctf_label_next is prototyped but not implemented yet. include/ * ctf-api.h (ECTF_NEXT_END): New error. (ECTF_NEXT_WRONGFUN): Likewise. (ECTF_NEXT_WRONGFP): Likewise. (ECTF_NERR): Adjust. (ctf_next_t): New. (ctf_next_create): New prototype. (ctf_next_destroy): Likewise. (ctf_next_copy): Likewise. (ctf_member_next): Likewise. (ctf_enum_next): Likewise. (ctf_type_next): Likewise. (ctf_label_next): Likewise. (ctf_variable_next): Likewise. libctf/ * ctf-impl.h (ctf_next): New. (ctf_get_dict): New prototype. * ctf-lookup.c (ctf_get_dict): New, split out of... (ctf_lookup_by_id): ... here. * ctf-util.c (ctf_next_create): New. (ctf_next_destroy): New. (ctf_next_copy): New. * ctf-types.c (includes): Add <assert.h>. (ctf_member_next): New. (ctf_enum_next): New. (ctf_type_iter): Document the lack of iteration over parent types. (ctf_type_next): New. (ctf_variable_next): New. * ctf-archive.c (ctf_archive_next): New. * libctf.ver: Add new public functions.
2020-06-03 22:13:24 +08:00
/* Create a ctf_next_t. */
ctf_next_t *
ctf_next_create (void)
{
return calloc (1, sizeof (struct ctf_next));
}
/* Destroy a ctf_next_t, for early exit from iterators. */
void
ctf_next_destroy (ctf_next_t *i)
{
if (i == NULL)
return;
if (i->ctn_iter_fun == (void (*) (void)) ctf_dynhash_next_sorted)
free (i->u.ctn_sorted_hkv);
libctf, include: support unnamed structure members better libctf has no intrinsic support for the GCC unnamed structure member extension. This principally means that you can't look up named members inside unnamed struct or union members via ctf_member_info: you have to tiresomely find out the type ID of the unnamed members via iteration, then look in each of these. This is ridiculous. Fix it by extending ctf_member_info so that it recurses into unnamed members for you: this is still unambiguous because GCC won't let you create ambiguously-named members even in the presence of this extension. For consistency, and because the release hasn't happened and we can still do this, break the ctf_member_next API and add flags: we specify one flag, CTF_MN_RECURSE, which if set causes ctf_member_next to automatically recurse into unnamed members for you, returning not only the members themselves but all their contained members, so that you can use ctf_member_next to identify every member that it would be valid to call ctf_member_info with. New lookup tests are added for all of this. include/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (CTF_MN_RECURSE): New. (ctf_member_next): Add flags argument. libctf/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (struct ctf_next) <u.ctn_next>: Move to... <ctn_next>: ... here. * ctf-util.c (ctf_next_destroy): Unconditionally destroy it. * ctf-lookup.c (ctf_symbol_next): Adjust accordingly. * ctf-types.c (ctf_member_iter): Reimplement in terms of... (ctf_member_next): ... this. Support recursive unnamed member iteration (off by default). (ctf_member_info): Look up members in unnamed sub-structs. * ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_member_next call. (ctf_dedup_emit_struct_members): Likewise. * testsuite/libctf-lookup/struct-iteration-ctf.c: Test empty unnamed members, and a normal member after the end. * testsuite/libctf-lookup/struct-iteration.c: Verify that ctf_member_count is consistent with the number of successful returns from a non-recursive ctf_member_next. * testsuite/libctf-lookup/struct-iteration-*: New, test iteration over struct members. * testsuite/libctf-lookup/struct-lookup.c: New test. * testsuite/libctf-lookup/struct-lookup.lk: New test.
2021-01-05 21:25:56 +08:00
if (i->ctn_next)
ctf_next_destroy (i->ctn_next);
libctf, next: introduce new class of easier-to-use iterators The libctf machinery currently only provides one way to iterate over its data structures: ctf_*_iter functions that take a callback and an arg and repeatedly call it. This *works*, but if you are doing a lot of iteration it is really quite inconvenient: you have to package up your local variables into structures over and over again and spawn lots of little functions even if it would be clearer in a single run of code. Look at ctf-string.c for an extreme example of how unreadable this can get, with three-line-long functions proliferating wildly. The deduplicator takes this to the Nth level. It iterates over a whole bunch of things: if we'd had to use _iter-class iterators for all of them there would be twenty additional functions in the deduplicator alone, for no other reason than that the iterator API requires it. Let's do something better. strtok_r gives us half the design: generators in a number of other languages give us the other half. The *_next API allows you to iterate over CTF-like entities in a single function using a normal while loop. e.g. here we are iterating over all the types in a dict: ctf_next_t *i = NULL; int *hidden; ctf_id_t id; while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR) { /* do something with 'hidden' and 'id' */ } if (ctf_errno (fp) != ECTF_NEXT_END) /* iteration error */ Here we are walking through the members of a struct with CTF ID 'struct_type': ctf_next_t *i = NULL; ssize_t offset; const char *name; ctf_id_t membtype; while ((offset = ctf_member_next (fp, struct_type, &i, &name, &membtype)) >= 0 { /* do something with offset, name, and membtype */ } if (ctf_errno (fp) != ECTF_NEXT_END) /* iteration error */ Like every other while loop, this means you have access to all the local variables outside the loop while inside it, with no need to tiresomely package things up in structures, move the body of the loop into a separate function, etc, as you would with an iterator taking a callback. ctf_*_next allocates 'i' for you on first entry (when it must be NULL), and frees and NULLs it and returns a _next-dependent flag value when the iteration is over: the fp errno is set to ECTF_NEXT_END when the iteartion ends normally. If you want to exit early, call ctf_next_destroy on the iterator. You can copy iterators using ctf_next_copy, which copies their current iteration position so you can remember loop positions and go back to them later (or ctf_next_destroy them if you don't need them after all). Each _next function returns an always-likely-to-be-useful property of the thing being iterated over, and takes pointers to parameters for the others: with very few exceptions all those parameters can be NULLs if you're not interested in them, so e.g. you can iterate over only the offsets of members of a structure this way: while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0) If you pass an iterator in use by one iteration function to another one, you get the new error ECTF_NEXT_WRONGFUN back; if you try to change ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back. Internally the ctf_next_t remembers the iteration function in use, various sizes and increments useful for almost all iterations, then uses unions to overlap the actual entities being iterated over to keep ctf_next_t size down. Iterators available in the public API so far (all tested in actual use in the deduplicator): /* Iterate over the members of a STRUCT or UNION, returning each member's offset and optionally name and member type in turn. On end-of-iteration, returns -1. */ ssize_t ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it, const char **name, ctf_id_t *membtype); /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or NULL at end of iteration or error, and optionally passing back the enumerand's integer VALue. */ const char * ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it, int *val); /* Iterate over every type in the given CTF container (not including parents), optionally including non-user-visible types, returning each type ID and optionally the hidden flag in turn. Returns CTF_ERR on end of iteration or error. */ ctf_id_t ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag, int want_hidden); /* Iterate over every variable in the given CTF container, in arbitrary order, returning the name and type of each variable in turn. The NAME argument is not optional. Returns CTF_ERR on end of iteration or error. */ ctf_id_t ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name); /* Iterate over all CTF files in an archive, returning each dict in turn as a ctf_file_t, and NULL on error or end of iteration. It is the caller's responsibility to close it. Parent dicts may be skipped. Regardless of whether they are skipped or not, the caller must ctf_import the parent if need be. */ ctf_file_t * ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it, const char **name, int skip_parent, int *errp); ctf_label_next is prototyped but not implemented yet. include/ * ctf-api.h (ECTF_NEXT_END): New error. (ECTF_NEXT_WRONGFUN): Likewise. (ECTF_NEXT_WRONGFP): Likewise. (ECTF_NERR): Adjust. (ctf_next_t): New. (ctf_next_create): New prototype. (ctf_next_destroy): Likewise. (ctf_next_copy): Likewise. (ctf_member_next): Likewise. (ctf_enum_next): Likewise. (ctf_type_next): Likewise. (ctf_label_next): Likewise. (ctf_variable_next): Likewise. libctf/ * ctf-impl.h (ctf_next): New. (ctf_get_dict): New prototype. * ctf-lookup.c (ctf_get_dict): New, split out of... (ctf_lookup_by_id): ... here. * ctf-util.c (ctf_next_create): New. (ctf_next_destroy): New. (ctf_next_copy): New. * ctf-types.c (includes): Add <assert.h>. (ctf_member_next): New. (ctf_enum_next): New. (ctf_type_iter): Document the lack of iteration over parent types. (ctf_type_next): New. (ctf_variable_next): New. * ctf-archive.c (ctf_archive_next): New. * libctf.ver: Add new public functions.
2020-06-03 22:13:24 +08:00
free (i);
}
/* Copy a ctf_next_t. */
ctf_next_t *
ctf_next_copy (ctf_next_t *i)
{
ctf_next_t *i2;
if ((i2 = ctf_next_create()) == NULL)
return NULL;
memcpy (i2, i, sizeof (struct ctf_next));
if (i2->ctn_iter_fun == (void (*) (void)) ctf_dynhash_next_sorted)
{
size_t els = ctf_dynhash_elements ((ctf_dynhash_t *) i->cu.ctn_h);
if ((i2->u.ctn_sorted_hkv = calloc (els, sizeof (ctf_next_hkv_t))) == NULL)
{
free (i2);
return NULL;
}
memcpy (i2->u.ctn_sorted_hkv, i->u.ctn_sorted_hkv,
els * sizeof (ctf_next_hkv_t));
}
libctf, next: introduce new class of easier-to-use iterators The libctf machinery currently only provides one way to iterate over its data structures: ctf_*_iter functions that take a callback and an arg and repeatedly call it. This *works*, but if you are doing a lot of iteration it is really quite inconvenient: you have to package up your local variables into structures over and over again and spawn lots of little functions even if it would be clearer in a single run of code. Look at ctf-string.c for an extreme example of how unreadable this can get, with three-line-long functions proliferating wildly. The deduplicator takes this to the Nth level. It iterates over a whole bunch of things: if we'd had to use _iter-class iterators for all of them there would be twenty additional functions in the deduplicator alone, for no other reason than that the iterator API requires it. Let's do something better. strtok_r gives us half the design: generators in a number of other languages give us the other half. The *_next API allows you to iterate over CTF-like entities in a single function using a normal while loop. e.g. here we are iterating over all the types in a dict: ctf_next_t *i = NULL; int *hidden; ctf_id_t id; while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR) { /* do something with 'hidden' and 'id' */ } if (ctf_errno (fp) != ECTF_NEXT_END) /* iteration error */ Here we are walking through the members of a struct with CTF ID 'struct_type': ctf_next_t *i = NULL; ssize_t offset; const char *name; ctf_id_t membtype; while ((offset = ctf_member_next (fp, struct_type, &i, &name, &membtype)) >= 0 { /* do something with offset, name, and membtype */ } if (ctf_errno (fp) != ECTF_NEXT_END) /* iteration error */ Like every other while loop, this means you have access to all the local variables outside the loop while inside it, with no need to tiresomely package things up in structures, move the body of the loop into a separate function, etc, as you would with an iterator taking a callback. ctf_*_next allocates 'i' for you on first entry (when it must be NULL), and frees and NULLs it and returns a _next-dependent flag value when the iteration is over: the fp errno is set to ECTF_NEXT_END when the iteartion ends normally. If you want to exit early, call ctf_next_destroy on the iterator. You can copy iterators using ctf_next_copy, which copies their current iteration position so you can remember loop positions and go back to them later (or ctf_next_destroy them if you don't need them after all). Each _next function returns an always-likely-to-be-useful property of the thing being iterated over, and takes pointers to parameters for the others: with very few exceptions all those parameters can be NULLs if you're not interested in them, so e.g. you can iterate over only the offsets of members of a structure this way: while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0) If you pass an iterator in use by one iteration function to another one, you get the new error ECTF_NEXT_WRONGFUN back; if you try to change ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back. Internally the ctf_next_t remembers the iteration function in use, various sizes and increments useful for almost all iterations, then uses unions to overlap the actual entities being iterated over to keep ctf_next_t size down. Iterators available in the public API so far (all tested in actual use in the deduplicator): /* Iterate over the members of a STRUCT or UNION, returning each member's offset and optionally name and member type in turn. On end-of-iteration, returns -1. */ ssize_t ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it, const char **name, ctf_id_t *membtype); /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or NULL at end of iteration or error, and optionally passing back the enumerand's integer VALue. */ const char * ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it, int *val); /* Iterate over every type in the given CTF container (not including parents), optionally including non-user-visible types, returning each type ID and optionally the hidden flag in turn. Returns CTF_ERR on end of iteration or error. */ ctf_id_t ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag, int want_hidden); /* Iterate over every variable in the given CTF container, in arbitrary order, returning the name and type of each variable in turn. The NAME argument is not optional. Returns CTF_ERR on end of iteration or error. */ ctf_id_t ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name); /* Iterate over all CTF files in an archive, returning each dict in turn as a ctf_file_t, and NULL on error or end of iteration. It is the caller's responsibility to close it. Parent dicts may be skipped. Regardless of whether they are skipped or not, the caller must ctf_import the parent if need be. */ ctf_file_t * ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it, const char **name, int skip_parent, int *errp); ctf_label_next is prototyped but not implemented yet. include/ * ctf-api.h (ECTF_NEXT_END): New error. (ECTF_NEXT_WRONGFUN): Likewise. (ECTF_NEXT_WRONGFP): Likewise. (ECTF_NERR): Adjust. (ctf_next_t): New. (ctf_next_create): New prototype. (ctf_next_destroy): Likewise. (ctf_next_copy): Likewise. (ctf_member_next): Likewise. (ctf_enum_next): Likewise. (ctf_type_next): Likewise. (ctf_label_next): Likewise. (ctf_variable_next): Likewise. libctf/ * ctf-impl.h (ctf_next): New. (ctf_get_dict): New prototype. * ctf-lookup.c (ctf_get_dict): New, split out of... (ctf_lookup_by_id): ... here. * ctf-util.c (ctf_next_create): New. (ctf_next_destroy): New. (ctf_next_copy): New. * ctf-types.c (includes): Add <assert.h>. (ctf_member_next): New. (ctf_enum_next): New. (ctf_type_iter): Document the lack of iteration over parent types. (ctf_type_next): New. (ctf_variable_next): New. * ctf-archive.c (ctf_archive_next): New. * libctf.ver: Add new public functions.
2020-06-03 22:13:24 +08:00
return i2;
}