Add tests to cover the full range of behaviors observed around
optional register operands for the `tlbip' and `sysp' instructions,
namely:
* Not all `tlbip' operations take GPR operands. When this is the
case, we should check that neither optional operand was supplied.
* When a `tlbip' operation is labeled with the `F_HASXT' flag, xzr
is not a valid optional operand. In such case, at least the fist
optional register needs to be specified with a non-xzr value.
* The first operand for both insns should be either xzr or an
even-numbered register (n % 2 == 0). In the former scenario, the
second operand should default to xzr too, while in the latter, it
should default to n + 1.
With the addition of 128-bit system registers to the Arm architecture
starting with Armv9.4-a, a mechanism for manipulating their contents
is introduced with the `msrr' and `mrrs' instruction pair.
These move values from one such 128-bit system register into a pair of
contiguous general-purpose registers and vice-versa, as for example:
msrr ttlb0_el1, x0, x1
mrrs x0, x1, ttlb0_el1
This patch adds the necessary support for these instructions, adding
checks for system-register width by defining a new operand type in the
form of `AARCH64_OPND_SYSREG128' and the `aarch64_sys_reg_128bit_p'
predicate, responsible for checking whether the requested system
register table entry is marked as implemented in the 128-bit mode via
the F_REG_128 flag.
The 2020 Architecture Extensions to the Arm A-profile architecture
added FEAT_XS, the XS attribute feature, giving cores the ability to
identify devices which can be subject to long response delays. TLB
invalidate (TLBI) operations and barriers can also be annotated with
this attribute[1].
With the introduction of the 128-bit translation tables with the
Armv8.9-a/Armv9.4-a Translation Hardening Extension, a series of new
TLB invalidate operations are introduced which make use of this
extension. These are added to aarch64_sys_regs_tlbi[] for use
with the `tlbip' insn.
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2020
The addition of 128-bit page table descriptors and, with it, the
addition of 128-bit system registers for these means that special
"invalidate translation table entry" instructions are needed to cope
with the new 128-bit model. This is introduced with the `tlbpi'
instruction, implemented here.
Some 128-bit system operations (mrrs, msrr, tlbip, and sysp) take two
qualified operands and one of unqualified type (e.g. system register
name, tlbip operation). This creates the need for adequate qualifiers
to handle this.
This patch therefore introduces the `QL_SRC_X2' and `QL_DST_X2' qualifier
specifiers, which expand to `QLF3(NIL,X,X)' and `QLF3(X,X,NIL)',
respectively.
While CRn and CRm fields in the SYSP instruction are 4-bit wide and
are thus able to accommodate values in the range 0-15, the
specifications for the SYSP instructions limit their ranges to 8-9 for
CRm and 0-7 in the case of CRn.
This led to the need to signal in some way to the operand parser that
a given operand is under special restrictions regarding its use. This
is done via the new `F_OPD_NARROW' flag, indicating a narrowing in the
range of operand values for fields in the instruction tagged with the
flag.
The flag is then used in `parse_operands' when the instruction is
assembled, but needs not be taken into consideration during
disassembly.
Mirroring the use of the `sys' - System Instruction assembly
instruction, this implements its 128-bit counterpart, `sysp'.
This optionally takes two contiguous general-purpose registers
starting at an even number or, when these are omitted, by default
sets both of these to xzr.
Syntax:
sysp #<op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}
Two of the instructions added by the `+d128' architectural extension
add the flexibility to have two optional operands. Prior to the
addition of the `tlbip' and `sysp' instructions, no mnemonic allowed
more than one such optional operand.
With `tlbip' as an example, some TLBIP instruction names do not allow
for any optional operands, while others allow for both to be optional.
In the latter case, it is possible that either the second operand
alone is omitted or both operands are omitted.
Therefore, a considerable degree of flexibility needed to be added to
the way operands were parsed. It was, however, possible to achieve
this with relatively few changes to existing code.
it is noteworthy that opcode flags specifying the optional operand
number are non-orthogonal. For example, we have:
#define F_OPD1_OPT (2 << 12) : 0b10 << 12
#define F_OPD2_OPT (3 << 12) : 0b11 << 12
such that by virtue of the observation that
(F_OPD1_OPT | F_OPD2_OPT) == F_OPD2_OPT
it is impossible to mark both operands 1 and 2 as optional for an
instruction and it is assumed that a maximum of 1 operand can ever be
optional. This is not overly-problematic given that, for optional
pairs, the second optional operand is always found immediately after
the first. Thus, it suffices for us to flag that there is a second
optional operand. With this fact, we can infer its position in the
mnemonic from the position of the first (e.g. if the second operand in
the mnemonic is optional, we know the third is too). We therefore
define the `F_OPD_PAIR_OPT' flag and calculate its position in the
mnemonic from the value encoded by the `F_OPD<n>_OPT' flag.
Another observation is that there is a tight coupling between default
values assigned to the two registers when one (or both) are omitted
from the mnemonic. Namely, if Xt1 has a value of 0x1f (the zero
register is specified), Xt2 defaults to the same value, otherwise Xt2
will be assigned Xt + 1. This meant that where you have default value
validation, in checking the second optional operand's value, it is
also necessary to look at the value assigned to the
previously-processed operand value before deciding its validity. Thus
`process_omitted_operand' needs not only access to its `operand'
argument, but also to the global `inst' struct.
Analysis of the allowed operand values for `sysp' and `tlbip' reveals
a significant departure from the allowed behavior for operand register
pairs (hitherto labeled AARCH64_OPND_PAIRREG) observed for other
insns in this category.
For instructions `casp', `mrrs' and `msrr' the register pair must
always start at an even index and the second register in the pair is
the index + 1. This precludes the use of xzr as the first register,
given it corresponds to register number 31.
This is different in the case of `sysp' and `tlbip', however. These
allow the use of xzr and, where the first operand in the pair is
omitted, this is the default value assigned to it. When this
operand is assigned xzr, it is expected that the second operand will
likewise take on a value of xzr.
These two instructions therefore "break" two rules of register pairs:
* The first of the two registers is odd-numbered.
* The index of the second register is equal to that of the first,
and not n+1.
To allow for this departure from hitherto standard behavior, we
extend the functionality of the assembler by defining an extension of
the AARCH64_OPND_PAIRREG, called AARCH64_OPND_PAIRREG_OR_XZR.
It is used in defining `sysp' and `tlbip' and allows
`operand_general_constraint_met_p' to allow the pair to both take on
the value of xzr.
Given the introduction of the new Armv9.4-a `sysp' insn using the
following syntax:
sysp #<op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}
and by extension the need to encode 6 assembly operands, extend
Binutils to handle instructions taking 6 operands, up from a previous
maximum of 5.
Indicating the presence of the Armv9.4-a features concerning 128-bit
Page Table Descriptors, 128-bit System Registers and Instructions,
the "+d128" architectural extension flag is added to the list of
possible -march options in Binutils, together with the necessary macro
for encoding d128 instructions.
Add support for compiling build tools with various -Werror settings.
Since the tools don't compile cleanly with the same set of flags as
the rest of the sim code, we need to maintain & test a separate list.
Only bother when not cross-compiling so we don't have to test all the
flags against the build compiler. This should be good enough for our
actual development flows.
Leave the igen code in place as it's meant to be used with newer
(to-be-written) code ported from the ppc version.
The sh code isn't really necessary as the opcodes enums have been
maintained independently from here, and the lists are out-of-sync
already.
Now that the DWARF reader does not use parallel_for_each, we can
remove some of the features that were added just for it: return values
and task sizing.
The thread_pool typed tasks feature could also be removed, but I
haven't done so here. This one seemed less intrusive and perhaps more
likely to be needed at some point.
The previous patches are nearly enough to enable background DWARF
reading. However, this hack in language_defn::get_symbol_name_matcher
causes an early computation of current_language:
/* If currently in Ada mode, and the lookup name is wrapped in
'<...>', hijack all symbol name comparisons using the Ada
matcher, which handles the verbatim matching. */
if (current_language->la_language == language_ada
&& lookup_name.ada ().verbatim_p ())
return current_language->get_symbol_name_matcher_inner (lookup_name);
I considered various options here -- reversing the order of the
checks, or promoting the verbatim mode to not be a purely Ada feature
-- but in the end found that the few calls to this during startup
could be handled more directly.
In the JIT code, and in create_exception_master_breakpoint_hook, gdb
is really looking for a certain kind of symbol (text or data) using a
linkage name. Changing the lookup here is clearer and probably more
efficient as well.
In create_std_terminate_master_breakpoint, the lookup can't really be
done by linkage name (it would require relying on a certain mangling
scheme, and also may trip over versioned symbols) -- but we know that
this spot is C++-specific, and so the language ought to be temporarily
set to C++ here.
After this patch, the "file" case is much faster:
(gdb) file /tmp/gdb
2023-10-23 13:16:54.456 - command started
Reading symbols from /tmp/gdb...
2023-10-23 13:16:54.520 - command finished
Command execution time: 0.225906 (cpu), 0.064313 (wall)
lookup_minimal_symbol_text always loops over all objfiles, even when
an objfile is passed in as an argument. This patch changes the
function to loop over the minimal number of objfiles in the latter
situation.
When gdb starts up with a symbol file, it uses the program's "main" to
decide the "static context" and the initial language. With background
DWARF reading, this means that gdb has to wait for a significant
amount of DWARF to be read synchronously.
This patch introduces lazy language setting. The idea here is that in
many cases, the prompt can show up early, making gdb feel more
responsive.
This changes the 'current_language' global to be a macro that wraps a
function call. This change will let a subsequent patch introduce lazy
language setting.
quick_symbol_functions::read_partial_symbols is no longer implemented,
so both it and quick_symbol_functions::can_lazily_read_symbols can be
removed. This allows for other functions to be removed as well.
Note that SYMFILE_NO_READ is now pretty much dead. I haven't removed
it here -- but could if that's desirable. I tend to think that this
functionality would be better implemented in the core; but whenever I
dive into the non-DWARF readers it is pretty depressing.
dwarf2_has_info and dwarf2_initialize_objfile are only separate
because the DWARF reader implemented lazy psymtab reading. However,
now that this is gone, we can simplify the public DWARF API again.
This patch rearranges the DWARF reader so that more work is done in
the background. This is PR symtab/29942.
The idea here is that there is only a small amount of work that must
be done on the main thread when scanning DWARF -- before the main
scan, the only part is mapping the section data.
Currently, the DWARF reader uses the quick_symbol_functions "lazy"
functionality to defer even starting to read. This patch instead
changes the reader to start reading immediately, but doing more in
worker tasks.
Before this patch, "file" on my machine:
(gdb) file /tmp/gdb
2023-10-23 12:29:56.885 - command started
Reading symbols from /tmp/gdb...
2023-10-23 12:29:58.047 - command finished
Command execution time: 5.867228 (cpu), 1.162444 (wall)
After the patch, more work is done in the background and so this takes
a bit less time:
(gdb) file /tmp/gdb
2023-10-23 13:25:51.391 - command started
Reading symbols from /tmp/gdb...
2023-10-23 13:25:51.712 - command finished
Command execution time: 1.894500 (cpu), 0.320306 (wall)
I think this could be further sped up by using the shared library load
map to avoid objfile loops like the one in expand_symtab_containing_pc
-- it seems like the correct objfile could be chosen more directly.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29942
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30174
This changes the cooked index code to wait for threads in its
public-facing API. That is, the waits are done in cooked_index now,
and never in the cooked_index_shard. Centralizing this decision makes
it easier to wait for other events here as well.
For testing, it's sometimes convenient to be able to request that
DWARF reading be done synchronously. This patch adds a new "maint"
setting for this purpose.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
This adds a new compute_main_name method to quick_symbol_functions.
Currently there are no implementations of this, but a subsequent patch
will add one.
When DWARF reading is done in the background,
read_addrmap_from_aranges will be called from a worker thread.
Because warnings can't be emitted from these threads, this patch adds
a new deferred_warnings parameter to the function, letting the caller
control exactly how the warnings are emitted.
This patch changes the way complaint works in a background thread.
The new approach requires installing a complaint interceptor in each
worker, and then the resulting complaints are treated as one of the
results of the computation. This change is needed for a subsequent
patch, where installing a complaint interceptor around a parallel-for
is no longer a viable approach.
This changes gdb to ensure that gdb's BFD cache is guarded by a lock.
This avoids any races when multiple threads might open a BFD (and thus
use the BFD cache) at the same time.
Currently, this change is not needed because the the main thread waits
for some DWARF scanning to be completed before returning. The only
locking that's required is when opening DWO files, and there's a local
lock to this end in dwarf2/read.c.
However, in the coming patches, the DWARF reader will begin its work
earlier, in the background. This means there is the potential for the
DWARF reader and other code on the main thread to both attempt to open
BFDs at the same time.
This adds a couple of calls to bfd_cache_close at points where a BFD
isn't actively needed by gdb. Normally at these points, all the
needed section data is already mapped, so we can simply close the file
descriptor. This is harmless at worst, because if this is needed
after all, the BFD file descriptor cache will reopen it.
This changes the DWZ code to pre-read the section data and somewhat
simplify the DWZ API. This makes it easier to add the bfd_cache_close
call to the new dwarf2_read_dwz_file function -- after this is done,
there shouldn't be a reason to keep the BFD's file descriptor open.
The DWO code in the DWARF reader currently uses objfile::intern. This
accesses a shared data structure and so would be racy when used from
multiple threads. I don't believe this can happen right now, but
background reading could provoke this, and in any case it's better to
avoid this, just to be sure.
This patch changes this code to just use a std::string. A new type is
introduced to do hash table lookups, to avoid unnecessary copies.
This logic dates back to the original import, and seems to be for
handling systems where `rm -f` (i.e. no files) would error out.
None of that is relevant for us with current automake, so drop it.
Some compilers don't understand the semctl API and think it's an input
argument even when it's used as an output, and then complains that it
is being used uninitialized. Zero it out explicitly to workaround it.
This adds some runtime overhead, but should be fairly minor as it's a
small stack buffer, and shouldn't be that relevant relative to all the
other logic in these functions.
The cgen code uses DI as int64_t and UDI as uint64_t. The DI macros
are used to construct 64-bit values from 32-bit values (for the low
and high parts). The MAKEDI macro casts the high 32-bit value to a
signed 32-bit value before shifting. If this created a negative
value, this would be undefined behavior according to the C standard.
All we care about is shifting the 32-bits as they are to the high
32-bits, not caring about sign extension (since there's nothing left
to shift into), and the low 32-bits being empty. This is what we
get from shifting an unsigned value, so cast it to unsigned 32-bit
to avoid undefined behavior.
While we're here, change the SETLODI macro to truncate the lower
value to 32-bits before we set it. If it was passing in a 64-bit
value, those high bits would get included too, and that's not what
we want.
Similarly, tweak the SETHIDI macro to cast the value to an unsigned
64-bit instead of a signed 64-bit. If the value was only 32-bits,
the behavior would be the same. If it happened to be signed 64-bit,
it would trigger the undefined behavior too.
This patch adds linker support to patch R_BPF_64_NODYLD32 relocations.
The implementation was based on comments and code in LLVM, as the GNU
toolchain does not uses this relocation type.
Currently, only mipsisa32-linux and mipsisa32el-linux is marked
as addr32, which make mipsisa32rN(el) not marked.
This change can fix 2 test failures on mipsisa32rN(el)-linux:
FAIL: MIPS MIPS64 MIPS-3D ASE instructions (-mips3d flag)
FAIL: MIPS MIPS64 MDMX ASE instructions (-mdmx flag)
These failures don't happen for mipsisa32rN-mti-elf etc,
due to that, the output is set as NO_ABI instead of O32, then
gas won't warn:
`fp=64' used with a 32-bit ABI
Maybe, we should change this behaivour in future.
This patch adds AArch32 support for -march=armv8.9-a and
-march=armv9.4-a. The behaviour of the new options can be
expressed using a combination of existing feature flags
and tables.
The cpu_arch_ver entries for ARM_ARCH_V9_4A and ARM_ARCH_V8_9A
are technically redundant but it including them for macro code
consistency across architectures.