This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point (both binary and decimal).
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies. For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that. In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway. Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_bid_list): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
(softfp_bid_file_list): New.
(LIB2ADD_ST): Add $(softfp_bid_file_list).
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.
* soft-fp/bitintpow10.c: New file.
* soft-fp/fixsdbitint.c: New file.
* soft-fp/fixddbitint.c: New file.
* soft-fp/fixtdbitint.c: New file.
* soft-fp/floatbitintsd.c: New file.
* soft-fp/floatbitintdd.c: New file.
* soft-fp/floatbitinttd.c: New file.
The following patch adds a header with generated helper tables to support
computation of powers of 10 from 10^0 to 10^6111 inclusive into a
sufficiently large array of _BitInt limbs. This is split from the rest
of the libgcc _BitInt support because it is quite large and together it
would run into gcc-patches mail length limits.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
libgcc/
* soft-fp/bitintpow10.h: New file.
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110956
Rainer Orth successfully tested the patch on Solaris with a full bootstrap.
Some uncommon unwinding table encodings need to access the base pointer
for address computations. We do not have that information in calls to
__deregister_frame_info_bases, and previously simply used nullptr as
base pointer. That is usually fine, but for some Solaris i386 shared
libraries that results in wrong address computations.
To fix this problem we now associate the unwinding object with
the table pointer itself, which is always known, in addition to
the PC range. When deregistering a frame, we first locate the object
using the table pointer, and then use the base pointer stored within
the object to compute the PC range.
libgcc/ChangeLog:
PR libgcc/110956
* unwind-dw2-fde.c: Associate object with address of unwinding
table.
As discussed previously, a.out support is now quite deprecated, and in
some cases removed, in both Binutils itself and NetBSD, so this legacy
default makes little sense. `netbsdelf*` and `netbsdaout*` still work
allowing the user to be explicit about there choice. Additionally, the
configure script warns about the change as Nick Clifton requested.
One possible concern was the status of NetBSD on NS32K, where only a.out
was supported. But per [1] NetBSD has removed support, and if it were to
come back, it would be with ELF. The binutils implementation is
therefore marked obsolete, per the instructions in the last message.
With that patch and this one applied, I have confirmed the following:
--target=i686-unknown-netbsd
--target=i686-unknown-netbsdelf
builds completely
--target=i686-unknown-netbsdaout
properly fails because target is deprecated.
--target=vax-unknown-netbsdaout builds completely except for gas, where
the target is deprecated.
[1]: https://mail-index.netbsd.org/tech-toolchain/2021/07/19/msg004025.html
config/ChangeLog:
* picflag.m4: Simplify SHmedia NetBSD match by presuming ELF.
gcc/ChangeLog:
* configure: Regenerate.
libada/ChangeLog:
* configure: Regenerate.
libgcc/ChangeLog:
* configure: Regenerate.
libiberty/ChangeLog:
* configure: Regenerate.
Apparently some distros have a nagging egrep that helpfully tells you
egrep is deprecated and to use "grep -E". The nag message causes a ld
testsuite failure. What's more the advice isn't that good. The "-E"
flag may not be available with older versions of grep.
This patch fixes bare invocation of egrep within binutils, replacing
it with the autoconf $EGREP or with grep.
config/ChangeLog:
* lib-ld.m4 (AC_LIB_PROG_LD_GNU): Require AC_PROG_EGREP and
invoke $EGREP.
(AC_LIB_PROG_LD): Likewise.
gcc/ChangeLog:
* configure: Regenerate.
intl/ChangeLog:
* configure: Regenerate.
libcpp/ChangeLog:
* configure: Regenerate.
libgcc/ChangeLog:
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
The problem -fasynchronous-unwind-tables is on by default for riscv linux
We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
to .eh_frame data from crtbeginT.o instead of the user-defined object
during static linking.
This turns it off.
OK?
libgcc/ChangeLog:
* config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
(riscv*-*-freebsd*): Likewise.
* config/riscv/t-crtstuff: New file.
Enable _Float16 and __bf16 all the time but issue errors when the
types are used in conversion, unary operation, binary operation,
parameter passing or value return when TARGET_SSE2 is not available.
Also undef macros which are used by libgcc/libstdc++ to check the
backend support of the _Float16/__bf16 types when TARGET_SSE2 is not
available.
gcc/ChangeLog:
PR target/109504
* config/i386/i386-builtins.cc
(ix86_register_float16_builtin_type): Remove TARGET_SSE2.
(ix86_register_bf16_builtin_type): Ditto.
* config/i386/i386-c.cc (ix86_target_macros): When TARGET_SSE2
isn't available, undef the macros which are used to check the
backend support of the _Float16/__bf16 types when building
libstdc++ and libgcc.
* config/i386/i386.cc (construct_container): Issue errors for
HFmode/BFmode when TARGET_SSE2 is not available.
(function_value_32): Ditto.
(ix86_scalar_mode_supported_p): Remove TARGET_SSE2 for HFmode/BFmode.
(ix86_libgcc_floating_mode_supported_p): Ditto.
(ix86_emit_support_tinfos): Adjust codes.
(ix86_invalid_conversion): Return diagnostic message string
when there's conversion from/to BF/HFmode w/o TARGET_SSE2.
(ix86_invalid_unary_op): New function.
(ix86_invalid_binary_op): Ditto.
(TARGET_INVALID_UNARY_OP): Define.
(TARGET_INVALID_BINARY_OP): Define.
* config/i386/immintrin.h [__SSE2__]: Remove for fp16/bf16
related instrinsics header files.
* config/i386/i386.h (VALID_SSE2_TYPE_MODE): New macro.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr109504.c: New test.
* gcc.target/i386/sse2-bfloat16-1.c: Adjust error info.
* gcc.target/i386/sse2-float16-1.c: Ditto.
* gcc.target/i386/sse2-float16-4.c: New test.
* gcc.target/i386/sse2-float16-5.c: New test.
* g++.target/i386/float16-1.C: Adjust error info.
libgcc/ChangeLog:
* config/i386/t-softfp: Add -msse2 to libbid HFtype related
files.
Fixes commit r14-1614-g49310a99330849 ("libgcc: Fix eh_frame fast path
in find_fde_tail").
libgcc/
PR libgcc/110179
* unwind-dw2-fde-dip.c (find_fde_tail): Add cast to avoid
implicit conversion of pointer value to integer.
Zfinx has provide fcsr like F, so rouding mode should use fcsr instead
of `soft` fenv.
libgcc/ChangeLog:
* config/riscv/sfp-machine.h (FP_INIT_ROUNDMODE): Check zfinx.
(FP_HANDLE_EXCEPTIONS): Ditto.
Also divmod, but only for scalar modes, for now (because there are no complex
int vectors yet).
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_divmod_libfunc): New function.
(gcn_init_libfuncs): Add div and mod functions for all modes.
Add placeholders for divmod functions.
(TARGET_EXPAND_DIVMOD_LIBFUNC): Define.
libgcc/ChangeLog:
* config/gcn/lib2-divmod-di.c: Reimplement like lib2-divmod.c.
* config/gcn/lib2-divmod.c: Likewise.
* config/gcn/lib2-gcn.h: Add new types and prototypes for all the
new vector libfuncs.
* config/gcn/t-amdgcn: Add new files.
* config/gcn/amdgcn_veclib.h: New file.
* config/gcn/lib2-vec_divmod-di.c: New file.
* config/gcn/lib2-vec_divmod-hi.c: New file.
* config/gcn/lib2-vec_divmod-qi.c: New file.
* config/gcn/lib2-vec_divmod.c: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/predcom-2.c: Avoid vectors on amdgcn.
* gcc.dg/unroll-8.c: Likewise.
* gcc.dg/vect/slp-26.c: Change expected results on amdgdn.
* lib/target-supports.exp
(check_effective_target_vect_int_mod): Add amdgcn.
(check_effective_target_divmod): Likewise.
* gcc.target/gcn/simd-math-3-16.c: New test.
* gcc.target/gcn/simd-math-3-2.c: New test.
* gcc.target/gcn/simd-math-3-32.c: New test.
* gcc.target/gcn/simd-math-3-4.c: New test.
* gcc.target/gcn/simd-math-3-8.c: New test.
* gcc.target/gcn/simd-math-3-char-16.c: New test.
* gcc.target/gcn/simd-math-3-char-2.c: New test.
* gcc.target/gcn/simd-math-3-char-32.c: New test.
* gcc.target/gcn/simd-math-3-char-4.c: New test.
* gcc.target/gcn/simd-math-3-char-8.c: New test.
* gcc.target/gcn/simd-math-3-char-run-16.c: New test.
* gcc.target/gcn/simd-math-3-char-run-2.c: New test.
* gcc.target/gcn/simd-math-3-char-run-32.c: New test.
* gcc.target/gcn/simd-math-3-char-run-4.c: New test.
* gcc.target/gcn/simd-math-3-char-run-8.c: New test.
* gcc.target/gcn/simd-math-3-char-run.c: New test.
* gcc.target/gcn/simd-math-3-char.c: New test.
* gcc.target/gcn/simd-math-3-long-16.c: New test.
* gcc.target/gcn/simd-math-3-long-2.c: New test.
* gcc.target/gcn/simd-math-3-long-32.c: New test.
* gcc.target/gcn/simd-math-3-long-4.c: New test.
* gcc.target/gcn/simd-math-3-long-8.c: New test.
* gcc.target/gcn/simd-math-3-long-run-16.c: New test.
* gcc.target/gcn/simd-math-3-long-run-2.c: New test.
* gcc.target/gcn/simd-math-3-long-run-32.c: New test.
* gcc.target/gcn/simd-math-3-long-run-4.c: New test.
* gcc.target/gcn/simd-math-3-long-run-8.c: New test.
* gcc.target/gcn/simd-math-3-long-run.c: New test.
* gcc.target/gcn/simd-math-3-long.c: New test.
* gcc.target/gcn/simd-math-3-run-16.c: New test.
* gcc.target/gcn/simd-math-3-run-2.c: New test.
* gcc.target/gcn/simd-math-3-run-32.c: New test.
* gcc.target/gcn/simd-math-3-run-4.c: New test.
* gcc.target/gcn/simd-math-3-run-8.c: New test.
* gcc.target/gcn/simd-math-3-run.c: New test.
* gcc.target/gcn/simd-math-3-short-16.c: New test.
* gcc.target/gcn/simd-math-3-short-2.c: New test.
* gcc.target/gcn/simd-math-3-short-32.c: New test.
* gcc.target/gcn/simd-math-3-short-4.c: New test.
* gcc.target/gcn/simd-math-3-short-8.c: New test.
* gcc.target/gcn/simd-math-3-short-run-16.c: New test.
* gcc.target/gcn/simd-math-3-short-run-2.c: New test.
* gcc.target/gcn/simd-math-3-short-run-32.c: New test.
* gcc.target/gcn/simd-math-3-short-run-4.c: New test.
* gcc.target/gcn/simd-math-3-short-run-8.c: New test.
* gcc.target/gcn/simd-math-3-short-run.c: New test.
* gcc.target/gcn/simd-math-3-short.c: New test.
* gcc.target/gcn/simd-math-3.c: New test.
* gcc.target/gcn/simd-math-4-char-run.c: New test.
* gcc.target/gcn/simd-math-4-char.c: New test.
* gcc.target/gcn/simd-math-4-long-run.c: New test.
* gcc.target/gcn/simd-math-4-long.c: New test.
* gcc.target/gcn/simd-math-4-run.c: New test.
* gcc.target/gcn/simd-math-4-short-run.c: New test.
* gcc.target/gcn/simd-math-4-short.c: New test.
* gcc.target/gcn/simd-math-4.c: New test.
* gcc.target/gcn/simd-math-5-16.c: New test.
* gcc.target/gcn/simd-math-5-32.c: New test.
* gcc.target/gcn/simd-math-5-4.c: New test.
* gcc.target/gcn/simd-math-5-8.c: New test.
* gcc.target/gcn/simd-math-5-char-16.c: New test.
* gcc.target/gcn/simd-math-5-char-32.c: New test.
* gcc.target/gcn/simd-math-5-char-4.c: New test.
* gcc.target/gcn/simd-math-5-char-8.c: New test.
* gcc.target/gcn/simd-math-5-char-run-16.c: New test.
* gcc.target/gcn/simd-math-5-char-run-32.c: New test.
* gcc.target/gcn/simd-math-5-char-run-4.c: New test.
* gcc.target/gcn/simd-math-5-char-run-8.c: New test.
* gcc.target/gcn/simd-math-5-char-run.c: New test.
* gcc.target/gcn/simd-math-5-char.c: New test.
* gcc.target/gcn/simd-math-5-long-16.c: New test.
* gcc.target/gcn/simd-math-5-long-32.c: New test.
* gcc.target/gcn/simd-math-5-long-4.c: New test.
* gcc.target/gcn/simd-math-5-long-8.c: New test.
* gcc.target/gcn/simd-math-5-long-run-16.c: New test.
* gcc.target/gcn/simd-math-5-long-run-32.c: New test.
* gcc.target/gcn/simd-math-5-long-run-4.c: New test.
* gcc.target/gcn/simd-math-5-long-run-8.c: New test.
* gcc.target/gcn/simd-math-5-long-run.c: New test.
* gcc.target/gcn/simd-math-5-long.c: New test.
* gcc.target/gcn/simd-math-5-run-16.c: New test.
* gcc.target/gcn/simd-math-5-run-32.c: New test.
* gcc.target/gcn/simd-math-5-run-4.c: New test.
* gcc.target/gcn/simd-math-5-run-8.c: New test.
* gcc.target/gcn/simd-math-5-run.c: New test.
* gcc.target/gcn/simd-math-5-short-16.c: New test.
* gcc.target/gcn/simd-math-5-short-32.c: New test.
* gcc.target/gcn/simd-math-5-short-4.c: New test.
* gcc.target/gcn/simd-math-5-short-8.c: New test.
* gcc.target/gcn/simd-math-5-short-run-16.c: New test.
* gcc.target/gcn/simd-math-5-short-run-32.c: New test.
* gcc.target/gcn/simd-math-5-short-run-4.c: New test.
* gcc.target/gcn/simd-math-5-short-run-8.c: New test.
* gcc.target/gcn/simd-math-5-short-run.c: New test.
* gcc.target/gcn/simd-math-5-short.c: New test.
* gcc.target/gcn/simd-math-5.c: New test.
The HImode libfuncs weren't called and trying to enable them fails because
TARGET_PROMOTE_FUNCTION_MODE wants to widen the arguments but the signedness
isn't known.
libgcc/ChangeLog:
* config/gcn/lib2-gcn.h (QItype, UQItype, HItype, UHItype): Delete.
(__divhi3, __modhi3, __udivhi3, __umodhi3): Delete.
* config/gcn/t-amdgcn: Don't build lib2-divmod-hi.c.
* config/gcn/lib2-divmod-hi.c: Removed.
The eh_frame value is only used by linear_search_fdes, not the binary
search directly in find_fde_tail, so the bug is not immediately
apparent with most programs.
Fixes commit e724b0480b ("libgcc:
Special-case BFD ld unwind table encodings in find_fde_tail").
libgcc/
PR libgcc/109712
* unwind-dw2-fde-dip.c (find_fde_tail): Correct fast path for
parsing eh_frame.
One of my workmates found there is a warning like:
libgcc/config/rs6000/morestack.S:402: Warning: ignoring
incorrect section type for .init_array.00000
when compiling libgcc/config/rs6000/morestack.S.
Since commit r13-6545 touched that file recently, which was
suspected to be responsible for this warning, I did some
investigation and found this is a warning staying for a long
time. For section .init_stack*, it's preferred to use
section type SHT_INIT_ARRAY. So this patch is use
"@init_array" to replace "@progbits".
Although the warning is trivial, Segher suggested me to
post this to fix it, in order to avoid any possible
misunderstanding/confusion on the warning.
As Alan confirmed, this doesn't require a premise check
on if the existing binutils supports "@init_array" or not,
"because if you want split-stack to work, you must link
with gold, any version of binutils that has gold has an
assembler that understands @init_array". (Thanks Alan!)
libgcc/ChangeLog:
* config/i386/morestack.S: Use @init_array rather than
@progbits for section type of section .init_array.
* config/rs6000/morestack.S: Likewise.
* config/s390/morestack.S: Likewise.
speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.
gcc/ChangeLog:
* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
prototype.
* config/mips/mips.cc (speculation_barrier_libfunc): New static
variable.
(mips_init_libfuncs): Initialize it.
(mips_emit_speculation_barrier): New function.
* config/mips/mips.md (speculation_barrier): Call
mips_emit_speculation_barrier.
libgcc/ChangeLog:
* config/mips/lib1funcs.S: New file.
define __speculation_barrier and include mips16.S.
* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
define LIB1ASMFUNCS as _speculation_barrier.
set version info for __speculation_barrier.
* config/mips/libgcc-mips.ver: New file.
* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S
included in lib1funcs.S now.
The radix sort uses two buffers, a1 for input and a2 for output.
After every digit the role of the two buffers is swapped.
When terminating the sort early the code made sure the output
was in a2. However, when we run out of bits, as can happen on
32bit platforms, the sorted result was in a1, as we had just
swapped a1 and a2.
This patch fixes the problem by unconditionally having a1 as
output after every loop iteration.
This bug manifested itself only on 32bit platforms and even then
only in some circumstances, as it needs frames where a swap
is required due to differences in the top-most byte, which is
affected by ASLR. The new logic was validated by exhaustive
search over 32bit input values.
libgcc/ChangeLog:
PR libgcc/109670
* unwind-dw2-fde.c: Fix radix sort buffer management.
The atomic fastpath bypasses the code that releases the sort
array which was lazily allocated during unwinding. We now
check after deregistering if there is an array to free.
libgcc/ChangeLog:
PR libgcc/109685
* unwind-dw2-fde.c: Free sort array in atomic fast path.
Tools from later versions of the OS deprecate or fail to support
earlier OS revisions.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:
* config.host: Arrange to set min Darwin OS versions from
the configured host version.
* config/darwin10-unwind-find-enc-func.c: Do not use current
headers, but declare the nexessary structures locally to the
versions in use for Mac OSX 10.6.
* config/t-darwin: Amend to handle configured min OS
versions.
* config/t-darwin-min-1: New.
* config/t-darwin-min-5: New.
* config/t-darwin-min-8: New.
The non-atomic path does not have range information,
we have to adjust the assert handle that case, too.
libgcc/ChangeLog:
* unwind-dw2-fde.c: Fix assert in non-atomic path.
The assertion in __deregister_frame_info_bases assumes that for every
frame something was inserted into the lookup data structure by
__register_frame_info_bases. Unfortunately, this does not necessarily
hold true as the btree_insert call in __register_frame_info_bases will
not insert anything for empty ranges. Therefore, we need to explicitly
account for such empty ranges in the assertion as `ob` will be a null
pointer for such ranges, hence causing the assertion to fail.
Signed-off-by: Sören Tempel <soeren@soeren-tempel.net>
libgcc/ChangeLog:
* unwind-dw2-fde.c: Accept empty ranges when deregistering frames.
Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
libgcc/ChangeLog:
* config/riscv/atomic.c: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
This patch aligns the configuration to the actual PRU capabilities. It
also reduces the size of the affected libgcc functions.
For a real-world project using integer arithmetics the savings
are significant:
Before:
text data bss dec hex filename
3688 865 544 5097 13e9 hc-sr04-range-sensor.elf
With TARGET_HAS_NO_HW_DIVIDE defined:
text data bss dec hex filename
2824 865 544 4233 1089 hc-sr04-range-sensor.elf
Execution speed also appears to have improved. The moddi3 function is
now executed in half the CPU cycles.
libgcc/ChangeLog:
* config/pru/t-pru (HOST_LIBGCC2_CFLAGS): Add
-DTARGET_HAS_NO_HW_DIVIDE.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
With this, execution time for e.g. __moddi3 go from 59 to 40 cycles in
the "fast" case or from 290 to 200 cycles in the "slow" case (when the
!TARGET_HAS_NO_HW_DIVIDE variant calls division and modulus functions
for 32-bit SImode), as exposed by gcc.c-torture/execute/arith-rand-ll.c
compiled for -march=v10.
Unfortunately, it just puts a performance improvement "dent" of 0.07%
in a arith-rand-ll.c-based performance test - where all loops are also
reduced to 1/10.
The size of every affected libgcc function is reduced to less than
half and they are all now leaf functions.
* config/cris/t-cris (HOST_LIBGCC2_CFLAGS): Add
-DTARGET_HAS_NO_HW_DIVIDE.
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.
This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.
gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.
2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flag.
* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding command-line flag.
libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.
gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
muldi3 will deallocate stack space after the call to __save_r26_r31,
then re-allocate the space a short while later. If an interrupt
occurs in that window, it can clobber items on the stack.
PR target/109402
libgcc/
* config/v850/lib1funcs.S (___muldi3): Remove unnecessary
stack manipulations.
The millicode division and remainder routines trap division by zero.
The unwinder needs these directives to unwind divide by zero traps.
2023-04-05 John David Anglin <danglin@gcc.gnu.org>
libgcc/ChangeLog:
PR target/109374
* config/pa/milli64.S (RETURN_COLUMN): Define.
($$divI): Add CFI directives.
($$divU): Likewise.
($$remI): Likewise.
($$remU): Likewise.
We should always carry the exceptions forward. This bug was found when
working on testing glibc math tests, many tests were failing with
Overflow and Underflow flags not set. This was traced to here.
libgcc/ChangeLog:
* config/or1k/sfp-machine.h (FP_HANDLE_EXCEPTIONS): Remove
statement clearing existing exceptions.
x86_64/i686 has for a few months working std::bfloat16_t support, __bf16
there is no longer a storage only type, but can be used for arithmetics
and is supported in libgcc and libstdc++.
The following patch adds similar support for AArch64.
Unlike the x86 changes, this one keeps the old __bf16 mangling of
u6__bf16 rather than DF16b (so an exception from Itanium ABI), but
otherwise __bf16 and decltype (0.0bf16) are the same type and both
in C++ act as extended floating-point type.
2023-03-13 Jakub Jelinek <jakub@redhat.com>
gcc/
* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
(aarch64_bf16_ptr_type_node): Adjust comment.
* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
bfloat16_type_node rather than aarch64_bf16_type_node.
(aarch64_libgcc_floating_mode_supported_p,
aarch64_scalar_mode_supported_p): Also support BFmode.
(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
(aarch64_invalid_binary_op): Remove BFmode related rejections.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
aarch64_bf16_type_node.
(aarch64_init_simd_builtin_types): Likewise.
(aarch64_init_bf16_types): Likewise. Don't create bfloat16_type_node,
which is created in tree.cc already.
* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
Don't expect one __bf16 related error.
* gcc.target/aarch64/bfloat16_vector_typecheck_1.c: Adjust or remove
dg-error directives for __bf16 being an extended arithmetic type.
* gcc.target/aarch64/bfloat16_vector_typecheck_2.c: Likewise.
* gcc.target/aarch64/bfloat16_scalar_typecheck.c: Likewise.
* g++.target/aarch64/bfloat_cpp_typecheck.C: Don't expect two __bf16
related errors.
libgcc/
* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
(softfp_truncations): Add tfbf dfbf sfbf hfbf.
(softfp_extras): Add floatdibf floatundibf floattibf floatuntibf.
* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
__extendbfsf2 and __trunc{s,d,t,h}fbf2.
* config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define.
* soft-fp/floatundibf.c: New file.
* soft-fp/floatdibf.c: New file.
libstdc++-v3/
* config/abi/pre/gnu.ver (CXXABI_1.3.14): Also export __bf16 tinfos
if it isn't mangled as DF16b but u6__bf16.