My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs. The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on. This patch brings the
compile time down to about 0m0.030s.
The ff_fold_immediate flag is unused after this patch but since I'm
using it in the P2564 patch, I'm not removing it now. Maybe at_eof
can be used instead and then we can remove ff_fold_immediate.
PR c++/111660
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't
handle it here.
(cp_fold_r): Handle COND_EXPR here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/hog1.C: New test.
* g++.dg/cpp2a/consteval36.C: New test.
Most of this is introducing the abi_check function to reduce the verbosity
of most places that check -fabi-version.
The start_mangling change is to avoid needing to zero-initialize additional
members of the mangling globals, though I'm not actually adding any.
The comment documents existing semantics.
gcc/cp/ChangeLog:
* mangle.cc (abi_check): New.
(write_prefix, write_unqualified_name, write_discriminator)
(write_type, write_member_name, write_expression)
(write_template_arg, write_template_param): Use it.
(start_mangling): Assign from {}.
* cp-tree.h: Update comment.
Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.
But after commit 8eb9cdd142
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Declaring a function with both external and internal linkage
in the same TU is translation-time UB. Add an error for this
case as already done for objects.
PR c/111708
gcc/c/ChangeLog:
* c-decl.cc (grokdeclarator): Add error.
gcc/testsuite/ChangeLog:
* gcc.dg/pr111708-1.c: New test.
* gcc.dg/pr111708-2.c: New test.
gcc/fortran/ChangeLog:
* intrinsic.texi (signal): Mention that the argument
passed to the signal handler procedure is passed by reference.
Extend example.
This adds the simplification `a & (x | CST)` to a when we know that
`(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle.
I looked into handling `a | (x & CST)` but that I don't see any decent
simplifications happening.
OK? Bootstrapped and tested on x86_linux-gnu with no regressions.
PR tree-optimization/111432
gcc/ChangeLog:
* match.pd (`a & (x | CST)`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bitops-7.c: New test.
This patch makes sure the profile_count information is initialized for the new
bb created in move_sese_region_to_fn.
gcc/ChangeLog:
* tree-cfg.cc (move_sese_region_to_fn): Initialize profile_count for
new basic block.
When having modula-2 enabled in a development tree and there are any
changes that trigger rebuilds in m2/ doing a 'make all-gcc' in the
build directory might fail due to lack of dependency tracking. This
patch introduces build dependencies into gcc/m2/Make-lang.in using -M*
options. The patch also introduces all -M* options to cc1gm2 and gm2.
gcc/m2/ChangeLog:
PR modula2/111756
* Make-lang.in (CM2DEP): New define conditionally set if
($(CXXDEPMODE),depmode=gcc3).
(GM2_1): Use $(CM2DEP).
(m2/gm2-gcc/%.o): Ensure $(@D)/$(DEPDIR) is created.
Add $(CM2DEP) to the $(COMPILER) command and use $(POSTCOMPILE).
(m2/gm2-gcc/m2configure.o): Ditto.
(m2/gm2-lang.o): Ditto.
(m2/m2pp.o): Ditto.
(m2/gm2-gcc/rtegraph.o): Ditto.
(m2/mc-boot/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot/main.o): Ditto.
(mcflex.o): Ditto.
(m2/gm2-libs-boot/M2RTS.o): Ditto.
(m2/gm2-libs-boot/%.o): Ditto.
(m2/gm2-libs-boot/%.o): Ditto.
(m2/gm2-libs-boot/RTcodummy.o): Ditto.
(m2/gm2-libs-boot/RTintdummy.o): Ditto.
(m2/gm2-libs-boot/wrapc.o): Ditto.
(m2/gm2-libs-boot/UnixArgs.o): Ditto.
(m2/gm2-libs-boot/choosetemp.o): Ditto.
(m2/gm2-libs-boot/errno.o): Ditto.
(m2/gm2-libs-boot/dtoa.o): Ditto.
(m2/gm2-libs-boot/ldtoa.o): Ditto.
(m2/gm2-libs-boot/termios.o): Ditto.
(m2/gm2-libs-boot/SysExceptions.o): Ditto.
(m2/gm2-libs-boot/SysStorage.o): Ditto.
(m2/gm2-compiler-boot/M2GCCDeclare.o): Ditto.
(m2/gm2-compiler-boot/M2Error.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
(m2/gm2-compiler-boot/m2flex.o): Ditto.
(m2/gm2-compiler/%.o): Ditto.
(m2/gm2-compiler/m2flex.o): Ditto.
(m2/gm2-libs-iso/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
(m2/gm2-libs/choosetemp.o): Ditto.
(m2/boot-bin/mklink$(exeext)): Ditto.
(m2/pge-boot/%.o): Ditto.
(m2/pge-boot/%.o): Ditto.
(m2/gm2-compiler/%.o): Ensure $(@D)/$(DEPDIR) is created and use
$(POSTCOMPILE).
(m2/gm2-compiler/%.o): Ditto.
(m2/gm2-libs-iso/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
* README: Purge out of date info.
* gm2-compiler/M2Comp.mod (MakeSaveTempsFileNameExt): Import.
(OnExitDelete): Import.
(GetModuleDefImportStatementList): Import.
(GetModuleModImportStatementList): Import.
(GetImportModule): Import.
(IsImportStatement): Import.
(IsImport): Import.
(GetImportStatementList): Import.
(File): Import.
(Close): Import.
(EOF): Import.
(IsNoError): Import.
(WriteLine): Import.
(WriteChar): Import.
(FlushOutErr): Import.
(WriteS): Import.
(OpenToRead): Import.
(OpenToWrite): Import.
(ReadS): Import.
(WriteS): Import.
(GetM): Import.
(GetMM): Import.
(GetDepTarget): Import.
(GetMF): Import.
(GetMP): Import.
(GetObj): Import.
(GetMD): Import.
(GetMMD): Import.
(GenerateDefDependency): New procedure.
(GenerateDependenciesFromImport): New procedure.
(GenerateDependenciesFromList): New procedure.
(GenerateDependencies): New procedure.
(Compile): Re-write.
(compile): Re-format.
(CreateFileStem): New procedure function.
(DoPass0): Re-write.
(IsLibrary): New procedure function.
(IsUnique): New procedure function.
(Append): New procedure.
(MergeDep): New procedure.
(GetRuleTarget): New procedure function.
(ReadDepContents): New procedure function.
(WriteDep): New procedure.
(WritePhonyDep): New procedure.
(WriteDepContents): New procedure.
(CreateDepFilename): New procedure function.
(Pass0CheckDef): New procedure function.
(Pass0CheckMod): New procedure function.
(DoPass0): Re-write.
(DepContent): New variable.
(DepOutput): New variable.
(BaseName): New procedure function.
* gm2-compiler/M2GCCDeclare.mod (PrintTerse): Handle IsImport.
Replace IsGnuAsmVolatile with IsGnuAsm.
* gm2-compiler/M2Options.def (EXPORT QUALIFIED): Remove list.
(SetM): New procedure.
(GetM): New procedure function.
(SetMM): New procedure.
(GetMM): New procedure function.
(SetMF): New procedure.
(GetMF): New procedure function.
(SetPPOnly): New procedure.
(GetB): New procedure function.
(SetMD): New procedure.
(GetMD): New procedure function.
(SetMMD): New procedure.
(GetMMD): New procedure function.
(SetMQ): New procedure.
(SetMT): New procedure.
(GetMT): New procedure function.
(GetDepTarget): New procedure function.
(SetMP): New procedure.
(GetMP): New procedure function.
(SetObj): New procedure.
(SetSaveTempsDir): New procedure.
* gm2-compiler/M2Options.mod (SetM): New procedure.
(GetM): New procedure function.
(SetMM): New procedure.
(GetMM): New procedure function.
(SetMF): New procedure.
(GetMF): New procedure function.
(SetPPOnly): New procedure.
(GetB): New procedure function.
(SetMD): New procedure.
(GetMD): New procedure function.
(SetMMD): New procedure.
(GetMMD): New procedure function.
(SetMQ): New procedure.
(SetMT): New procedure.
(GetMT): New procedure function.
(GetDepTarget): New procedure function.
(SetMP): New procedure.
(GetMP): New procedure function.
(SetObj): New procedure.
(SetSaveTempsDir): New procedure.
* gm2-compiler/M2Preprocess.def (PreprocessModule): New parameters
topSource and outputDep. Re-write.
(MakeSaveTempsFileNameExt): New procedure function.
(OnExitDelete): New procedure function.
* gm2-compiler/M2Preprocess.mod (GetM): Import.
(GetMM): Import.
(OnExitDelete): Add debugging message.
(RemoveFile): Add debugging message.
(BaseName): Remove.
(BuildCommandLineExecute): New procedure function.
* gm2-compiler/M2Search.def (SetDefExtension): Remove unnecessary
spacing.
* gm2-compiler/SymbolTable.mod (GetSymName): Handle ImportSym and
ImportStatementSym.
* gm2-gcc/m2options.h (M2Options_SetMD): New function.
(M2Options_GetMD): New function.
(M2Options_SetMMD): New function.
(M2Options_GetMMD): New function.
(M2Options_SetM): New function.
(M2Options_GetM): New function.
(M2Options_SetMM): New function.
(M2Options_GetMM): New function.
(M2Options_GetMQ): New function.
(M2Options_SetMF): New function.
(M2Options_GetMF): New function.
(M2Options_SetMT): New function.
(M2Options_SetMP): New function.
(M2Options_GetMP): New function.
(M2Options_GetDepTarget): New function.
* gm2-lang.cc (gm2_langhook_init): Correct comment case.
(gm2_langhook_init_options): Add case OPT_M and
OPT_MM.
(gm2_langhook_post_options): Add case OPT_MF, OPT_MT,
OPT_MD and OPT_MMD.
* lang-specs.h (M2CPP): Pass though MF option.
(MDMMD): New define. Add MDMMD to "@modula-2".
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
The following avoids bogously re-using the simd-clone-info we
currently hang off stmt_info from two different SLP contexts where
a different number of lanes should have chosen a different best
simdclone.
PR tree-optimization/111846
* tree-vectorizer.h (_slp_tree::simd_clone_info): Add.
(SLP_TREE_SIMD_CLONE_INFO): New.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
SLP_TREE_SIMD_CLONE_INFO.
(_slp_tree::~_slp_tree): Release it.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
SLP_TREE_SIMD_CLONE_INFO or STMT_VINFO_SIMD_CLONE_INFO
dependent on if we're doing SLP.
* gcc.dg/vect/pr111846.c: New testcase.
The following patch implements printing of wide_int/widest_int numbers
decimally when asked for that using print_dec{,s,u}, even if they have
precision larger than 64 and get_len () above 1 (right now we printed
them hexadecimally and even negative numbers as huge positive hexadecimal).
In order to avoid the expensive division/modulo by 10^19 twice, once to
estimate how many will be needed and another to actually print it, the
patch prints the 19 digit chunks in reverse order (from least significant
to most significant) and then reorders those with linear complexity to form
the right printed number.
Tested with printing both 256 and 320 bit numbers (first as an example
of even number of 19 digit chunks plus one shorter above it, the second
as an example of odd number of 19 digit chunks plus one shorter above it).
The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now
is one byte too much (one byte for -, one for '\0') and too conservative,
so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g.
l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative
estimation (though more expensive to compute in inline code).
But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much
as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come
from? Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple
of 64...
2023-10-17 Jakub Jelinek <jakub@redhat.com>
* wide-int-print.h (print_dec_buf_size): For length, divide number
of bits by 3 and add 3 instead of division by 4 and adding 4.
* wide-int-print.cc (print_decs): Remove superfluous ()s. Don't call
print_hex, instead call print_decu on either negated value after
printing - or on wi itself.
(print_decu): Don't call print_hex, instead print even large numbers
decimally.
(pp_wide_int_large): Assume len from print_dec_buf_size is big enough
even if it returns false.
* pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if
pp_wide_int_large should be used.
* tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size
to compute needed buffer size.
libgcc/config/avr/libf7/
* libf7.h (F7_SIZEOF): New macro.
* libf7-asm.sx: Use F7_SIZEOF instead of magic number "10".
(F7MOD_D_fma_, __fma): New module and function.
(fma) [-mdouble=64]: Define as alias for __fma.
(fmal) [-mlong-double=64]: Define as alias for __fma.
* libf7-common.mk (F7_ASM_PARTS): Add D_fma.
The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.
The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles. The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.
PR middle-end/111818
* tree-ssa.cc (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.
* gcc.dg/torture/pr111818.c: New testcase.
The following addresses build_reconstructed_reference failing to
build references with a different offset than the models and thus
the caller conditional being off. This manifests when attempting
to build a ref with offset 160 from the model BIT_FIELD_REF <l_4827[9], 8, 0>
onto the same base l_4827 but the models offset being 288. This
cannot work for any kind of ref I can think of, not just with
BIT_FIELD_REFs.
PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Only call
build_reconstructed_reference when the offsets are the same.
* gcc.dg/torture/pr111807.c: New testcase.
RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)
Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.
RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.
Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.
The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.
| | # of unexpected case / # of unique unexpected case
| | gcc | g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/| 264 / 87 | 5 / 2 | 72 / 12 |
| lp64d/medlow
Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this 🙂
I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming 😉
Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.
```
foo2:
sext.w a6,a1 <-- this goes away
beq a1,zero,.L4
li a5,0
li a0,0
.L3:
addw a4,a2,a5
addw a5,a3,a5
addw a0,a4,a0
bltu a5,a6,.L3
ret
.L4:
li a0,0
ret
```
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
PR target/111466
gcc/
* expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P.
gcc/testsuite
* gcc.target/riscv/pr111466.c: New test.
There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
The third expression it is the maximum number of bytes that should be
skipped by this alignment directive.
Therefore, it will affect the display of the specified alignment rules
and affect the operating efficiency.
This modification relies on binutils commit 1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing relax,
it will cause the conditional branch to go out of bounds during the assembly process.
This submission of binutils solves this problem.)
gcc/ChangeLog:
* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.
Co-authored-by: Chenghua Xu <xuchenghua@loongson.cn>
This code fails to link:
import core.math;
real function(real) fn = &sin;
However, when called directly, the D intrinsic `sin()' is expanded by
the front-end into the GCC built-in `__builtin_sin()'. This has been
fixed to now also expand the function when a reference is taken.
As there are D intrinsics and GCC built-ins that don't have a fallback
implementation, raise an error if taking the address is not possible.
gcc/d/ChangeLog:
* d-tree.h (intrinsic_code): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New prototype.
* expr.cc (ExprVisitor::visit (SymOffExp *)): Call
maybe_reject_intrinsic.
* intrinsics.cc (intrinsic_decl): Add fallback field.
(intrinsic_decls): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New function.
* intrinsics.def (DEF_D_LIB_BUILTIN): Update.
(DEF_CTFE_BUILTIN): Update.
(INTRINSIC_BSF): Declare as library builtin.
(INTRINSIC_BSR): Likewise.
(INTRINSIC_BT): Likewise.
(INTRINSIC_BSF64): Likewise.
(INTRINSIC_BSR64): Likewise.
(INTRINSIC_BT64): Likewise.
(INTRINSIC_POPCNT32): Likewise.
(INTRINSIC_POPCNT64): Likewise.
(INTRINSIC_ROL): Likewise.
(INTRINSIC_ROL_TIARG): Likewise.
(INTRINSIC_ROR): Likewise.
(INTRINSIC_ROR_TIARG): Likewise.
(INTRINSIC_ADDS): Likewise.
(INTRINSIC_ADDSL): Likewise.
(INTRINSIC_ADDU): Likewise.
(INTRINSIC_ADDUL): Likewise.
(INTRINSIC_SUBS): Likewise.
(INTRINSIC_SUBSL): Likewise.
(INTRINSIC_SUBU): Likewise.
(INTRINSIC_SUBUL): Likewise.
(INTRINSIC_MULS): Likewise.
(INTRINSIC_MULSL): Likewise.
(INTRINSIC_MULU): Likewise.
(INTRINSIC_MULUI): Likewise.
(INTRINSIC_MULUL): Likewise.
(INTRINSIC_NEGS): Likewise.
(INTRINSIC_NEGSL): Likewise.
(INTRINSIC_TOPRECF): Likewise.
(INTRINSIC_TOPREC): Likewise.
(INTRINSIC_TOPRECL): Likewise.
gcc/testsuite/ChangeLog:
* gdc.dg/builtins_reject.d: New test.
* gdc.dg/intrinsics_reject.d: New test.
probe_stack_range has an assert to capture the possibility that that
expand_binop might not construct its result in the provided target.
We triggered that internally a little while ago. I'm pretty sure it was in the
testsuite, so no new testcase. The fix is easy, copy the result into the
proper target when needed.
Bootstrapped and regression tested on x86.
gcc/
* explow.cc (probe_stack_range): Handle case when expand_binop
does not construct its result in the expected location.
In the LWN discussion of the "ASCII" art in GCC 14
https://lwn.net/Articles/946733/#Comments
there was some concern about the use of non-ASCII characters in the
output.
Currently -fdiagnostics-text-art-charset defaults to "emoji".
To better handle older terminals by default, this patch special-cases
LANG=C to use -fdiagnostics-text-art-charset=ascii.
gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): When LANG=C, update
default for -fdiagnostics-text-art-charset from emoji to ascii.
* doc/invoke.texi (fdiagnostics-text-art-charset): Document the above.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
From: Fangrui Song <maskray@google.com>
When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place
.l* sections into separate output sections. If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.
However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply. This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.
This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.
Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")
Signed-off-by: Fangrui Song <maskray@google.com>
gcc/ChangeLog:
* config/i386/i386.cc (ix86_can_inline_p):
Handle CM_LARGE and CM_LARGE_PIC.
(x86_elf_aligned_decl_common): Ditto.
(x86_output_aligned_bss): Ditto.
* config/i386/i386.opt: Update doc for -mlarge-data-threshold=.
* doc/invoke.texi: Update doc for -mlarge-data-threshold=.
gcc/testsuite/ChangeLog:
* gcc.target/i386/large-data.c: New test.
This just moves a few functions out of riscv.cc into riscv-string.cc in an
attempt to keep riscv.cc manageable. This was originally Christoph's code and
I'm just pushing it on his behalf.
Full disclosure: I built rv64gc after changing to verify everything still
builds. Given it was just lifting code from one place to another, I didn't run
the testsuite.
gcc/
* config/riscv/riscv-protos.h (emit_block_move): Remove redundant
prototype. Improve comment.
* config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc
into riscv-string.cc.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved
function.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.
Fix that by adding the -march explicitly which doesn't have zfa.
BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.
gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv32gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:
addi t4,sp,16
add t2,a6,t4
shl t3,t2,1
ld a2,0(t3)
addi a2,1
sd a2,8(t2)
into the following (one instruction less):
add t2,a6,sp
shl t3,t2,1
ld a2,32(t3)
addi a2,1
sd a2,24(t2)
Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.
gcc/ChangeLog:
* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.
* gcc.target/i386/pr52146.c: Adjust expected output.
Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
This improves the `A CMP 0 ? A : -A` set of match patterns to use
bitwise_equal_p which allows an nop cast between signed and unsigned.
This allows catching a few extra cases which were not being caught before.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/101541
* match.pd (A CMP 0 ? A : -A): Improve
using bitwise_equal_p.
gcc/testsuite/ChangeLog:
PR tree-optimization/101541
* gcc.dg/tree-ssa/phi-opt-36.c: New test.
* gcc.dg/tree-ssa/phi-opt-37.c: New test.
Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a nop
conversion in between the `~` and the `a` which can show up. A similarly thing should
be done for `~a CMP CST`.
I had originally submitted the `~a CMP CST` case as
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585088.html;
I noticed we should do the same thing for the `~a CMP ~b` case and combined
it with that one here.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/31531
gcc/ChangeLog:
* match.pd (~X op ~Y): Allow for an optional nop convert.
(~X op C): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr31531-1.c: New test.
* gcc.dg/tree-ssa/pr31531-2.c: New test.
I want to distinguish between constraint && and fold-expressions there of
written by the user and those implied by template parameter
type-constraints; to that end, let's improve our EXPR_LOCATION for an
explicit fold-expression.
The fold3.C change is needed because this moves the caret from the end of
the expression to the operator, which means the location of the error refers
to the macro invocation rather than the macro definition; both locations are
still printed, but which one is an error and which a note changes.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_fold_expression): Track location range.
* semantics.cc (finish_unary_fold_expr)
(finish_left_unary_fold_expr, finish_right_unary_fold_expr)
(finish_binary_fold_expr): Add location parm.
* constraint.cc (finish_shorthand_constraint): Pass it.
* pt.cc (convert_generic_types_to_packs): Likewise.
* cp-tree.h: Adjust.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/diagnostic3.C: Add expected column.
* g++.dg/cpp1z/fold3.C: Adjust diagnostic lines.
In C++23, since P2448, a constexpr function F that calls a non-constexpr
function N is OK as long as we don't actually call F in a constexpr
context. So instead of giving an error in maybe_save_constexpr_fundef,
we only give an error when evaluating the call. Unfortunately, as shown
in this PR, the diagnostic can be truncated:
z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because:
10 | constexpr Jam() { ft(); }
| ^~~
...because what? With this patch, we say:
z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because:
10 | constexpr Jam() { ft(); }
| ^~~
z.C:10:23: error: call to non-'constexpr' function 'int Jam::ft()'
10 | constexpr Jam() { ft(); }
| ~~^~
z.C:8:7: note: 'int Jam::ft()' declared here
8 | int ft() { return 42; }
| ^~
Like maybe_save_constexpr_fundef, explain_invalid_constexpr_fn should
also check the body of a constructor, not just the mem-initializer.
PR c++/111272
gcc/cp/ChangeLog:
* constexpr.cc (explain_invalid_constexpr_fn): Also check the body of
a constructor in C++14 and up.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-diag1.C: New test.
This patch adds a pre-reload splitter to arc.md, to use the bset (set
specific bit instruction) to implement 1<<x (i.e. left shifts of one)
on ARC processors that don't have a barrel shifter.
Currently,
int foo(int x) {
return 1 << x;
}
when compiled with -O2 -mcpu=em is compiled as a loop:
foo: mov_s r2,1 ;3
and.f lp_count,r0, 0x1f
lpnz 2f
add r2,r2,r2
nop
2: # end single insn loop
j_s.d [blink]
mov_s r0,r2 ;4
with this patch we instead generate a single instruction:
foo: bset r0,0,r0
j_s [blink]
2023-10-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to
use bset dst,0,src to implement 1<<x on !TARGET_BARREL_SHIFTER.
The normal form of a CONST_INT which represents an integer of a mode
with fewer bits than in HOST_WIDE_INT is sign extended. This even holds
for unsigned integers.
This fixes an ICE during cse1 where we bail out at rtl.h:2297 since
INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) does not hold.
gcc/ChangeLog:
* config/s390/vector.md (popcountv8hi2_vx): Sign extend each
unsigned vector element.
void
foo8 (int64_t *restrict a)
{
for (int i = 0; i < 16; ++i)
a[i] = a[i]-16;
}
We use VLS modes instead of VLA modes even it is specified by dynamic LMUL.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Use VLS modes.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: New test.
For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried.
For "get_range_query", it could get more context-aware range info.
And look at the implementation of "get_range_query", it returns
global range if no local fun info.
So, if not quering for SSA_NAME and not chaning the IL, it would
be ok to use get_range_query to replace get_global_range_query.
gcc/ChangeLog:
* fold-const.cc (expr_not_equal_to): Replace get_global_range_query
by get_range_query.
* gimple-fold.cc (size_must_be_zero_p): Likewise.
* gimple-range-fold.cc (fur_source::fur_source): Likewise.
* gimple-ssa-warn-access.cc (check_nul_terminated_array): Likewise.
* tree-dfa.cc (get_ref_base_and_extent): Likewise.
The OpenACC specification does not mention the '!$ ' sentinel for conditional
compilation and the feature was removed in r11-5572-g1d6f6ac693a860
for PR fortran/98011; update libgomp.texi for this and update a leftover
comment. - Additionally, some other updates are done as well.
libgomp/
* libgomp.texi (Enabling OpenMP): Update for C/C++ attributes;
improve wording especially for Fortran; mention -fopenmp-simd.
(Enabling OpenACC): Minor cleanup; remove conditional compilation
sentinel.
gcc/
* doc/invoke.texi (-fopenacc, -fopenmp, -fopenmp-simd): Use @samp not
@code; document more completely the supported Fortran sentinels.
gcc/fortran
* scanner.cc (skip_free_comments, skip_fixed_comments): Remove
leftover 'OpenACC' from comments about OpenMP's conditional
compilation sentinel.
None of the ACC_* env vars was documented; in particular, the valid valids
for ACC_DEVICE_TYPE found to be lacking as those are not document in the
OpenACC spec.
GCC_ACC_NOTIFY was removed as I failed to find any traces of it but the
addition to the documentation in commit r6-6185-gcdf6119dad04dd
("libgomp.texi: Updates for OpenACC."). It seems to be planned as GCC
version of the ACC_NOTIFY env var used by another compiler for offloading
debugging.
libgomp/
* libgomp.texi (ACC_DEVICE_TYPE, ACC_DEVICE_NUM, ACC_PROFLIB):
Actually document what the function does.
(GCC_ACC_NOTIFY): Remove unused env var.
This patch improves the initial RTL expanded for double word shifts
on architectures with conditional moves, so that later passes don't
need to clean-up unnecessary and/or unused instructions.
Consider the general case, x << y, which is expanded well as:
t1 = y & 32;
t2 = 0;
t3 = x_lo >> 1;
t4 = y ^ ~0;
t5 = t3 >> t4;
tmp_hi = x_hi << y;
tmp_hi |= t5;
tmp_lo = x_lo << y;
out_hi = t1 ? tmp_lo : tmp_hi;
out_lo = t1 ? t2 : tmp_lo;
which is nearly optimal, the only thing that can be improved is
that using a unary NOT operation "t4 = ~y" is better than XOR
with -1, on targets that support it. [Note the one_cmpl_optab
expander didn't fall back to XOR when this code was originally
written, but has been improved since].
Now consider the relatively common idiom of 1LL << y, which
currently produces the RTL equivalent of:
t1 = y & 32;
t2 = 0;
t3 = 1 >> 1;
t4 = y ^ ~0;
t5 = t3 >> t4;
tmp_hi = 0 << y;
tmp_hi |= t5;
tmp_lo = 1 << y;
out_hi = t1 ? tmp_lo : tmp_hi;
out_lo = t1 ? t2 : tmp_lo;
Notice here that t3 is always zero, so the assignment of t5
is a variable shift of zero, which expands to a loop on many
smaller targets, a similar shift by zero in the first tmp_hi
assignment (another loop), that the value of t4 is no longer
required (as t3 is zero), and that the ultimate value of tmp_hi
is always zero.
Fortunately, for many (but perhaps not all) targets this mess
gets cleaned up by later optimization passes. However, this
patch avoids generating unnecessary RTL at expand time, by
calling simplify_expand_binop instead of expand_binop, and
avoiding generating dead or unnecessary code when intermediate
values are known to be zero. For the 1LL << y test case above,
we now generate:
t1 = y & 32;
t2 = 0;
tmp_hi = 0;
tmp_lo = 1 << y;
out_hi = t1 ? tmp_lo : tmp_hi;
out_lo = t1 ? t2 : tmp_lo;
On arc-elf, for example, there are 18 RTL INSN_P instructions
generated by expand before this patch, but only 12 with this patch
(improving both compile-time and memory usage).
2023-10-15 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* optabs.cc (expand_subword_shift): Call simplify_expand_binop
instead of expand_binop. Optimize cases (i.e. avoid generating
RTL) when CARRIES or INTO_INPUT is zero. Use one_cmpl_optab
(i.e. NOT) instead of xor_optab with ~0 to calculate ~OP1.
This patch adds the m2.etags rule to gcc/m2/Make-lang.in which
generates etags for the .cc .c .h files within gcc/m2.
gcc/m2/ChangeLog:
* Make-lang.in (m2.tags): New rule.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
As mentioned in the PR, my estimations on needed buffer size for wide_int
and especially widest_int printing were incorrect, I've used get_len ()
in the estimations, but that is true only for !wi::neg_p (x) values.
Under the hood, we have 3 ways to print numbers.
print_decs which if
if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
|| (wi.get_len () == 1))
uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive or
negative) and otherwise uses print_hex,
print_decu which if
if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
|| (wi.get_len () == 1 && !wi::neg_p (wi)))
uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive
only) and print_hex, which doesn't print most significant limbs which are
zero and the first limb which is non-zero prints such that redundant 0
hex digits aren't printed, while all limbs below that are printed with
"%016" PRIx64. For wi::neg_p (x) values, the first limb of the precision
is always non-zero, so we print all the limbs for the precision.
So, the current estimations are accurate if !wi::neg_p (x), or when
print_decs will be used and x.get_len () == 1, otherwise we need to use
estimation based on get_precision () rather than get_len ().
I've introduced new inlines print_{dec{,s,u},hex}_buf_size which compute the
needed buffer length in bytes and return true if WIDE_INT_PRINT_BUFFER_SIZE
isn't sufficient and caller should XALLOCAVEC the buffer.
2023-10-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111800
gcc/
* wide-int-print.h (print_dec_buf_size, print_decs_buf_size,
print_decu_buf_size, print_hex_buf_size): New inline functions.
* wide-int.cc (assert_deceq): Use print_dec_buf_size.
(assert_hexeq): Use print_hex_buf_size.
* wide-int-print.cc (print_decs): Use print_decs_buf_size.
(print_decu): Use print_decu_buf_size.
(print_hex): Use print_hex_buf_size.
(pp_wide_int_large): Use print_dec_buf_size.
* value-range.cc (irange_bitmask::dump): Use print_hex_buf_size.
* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
print_dec_buf_size. Use TYPE_SIGN macro in print_dec call argument.
gcc/c-family/
* c-warn.cc (match_case_to_enum_1): Assert w.get_precision ()
is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than
w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS.
D front-end changes:
- Import dmd v2.105.2.
- A function with enum storage class is now deprecated.
- Global variables can now be initialized with Associative
Arrays.
- Improvements for the C++ header generation of static variables
used in a default argument context.
D runtime changes:
- Import druntime v2.105.2.
- The `core.memory.GC' functions `GC.enable', `GC.disable',
`GC.collect', and `GC.minimize' `have been marked `@safe'.
Phobos changes:
- Import phobos v2.105.2.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd f9efc98fd7.
* dmd/VERSION: Bump version to v2.105.2.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
* d-diagnostic.cc (verrorReport): Don't emit tips when error gagging
is turned on.
* d-lang.cc (d_handle_option): Remove obsolete parameter.
(d_post_options): Likewise.
(d_read_ddoc_files): New function.
(d_generate_ddoc_file): New function.
(d_parse_file): Update for new front-end interface.
* expr.cc (ExprVisitor::visit (AssocArrayLiteralExp *)): Check for new
front-end lowering of static associative arrays.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime f9efc98fd7.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/newaa.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos a3f22129d.
* testsuite/libphobos.hash/test_hash.d: Update test.
* testsuite/libphobos.phobos/phobos.exp: Add compiler flags
-Wno-deprecated.
* testsuite/libphobos.phobos_shared/phobos_shared.exp: Likewise.
gcc/testsuite/ChangeLog:
* lib/gdc-utils.exp (gdc-convert-args): Handle new compiler options.