mirror/gcc - gcc - Collaboration & Inovation

mirror/gcc

mirror of git://gcc.gnu.org/git/gcc.git synced 2025-03-04 06:05:36 +08:00

Author	SHA1	Message	Date
Juzhe-Zhong	c51040cb43	RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx This patch optimize this following permutation with consecutive patterns index: typedef char vnx16i __attribute__ ((vector_size (16))); #define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15 vnx16i __attribute__ ((noinline, noclone)) test_1 (vnx16i x, vnx16i y) { return __builtin_shufflevector (x, y, MASK_16); } Before this patch: lui a5,%hi(.LC0) addi a5,a5,%lo(.LC0) vsetivli zero,16,e8,m1,ta,ma vle8.v v3,0(a5) vle8.v v2,0(a1) vrgather.vv v1,v2,v3 vse8.v v1,0(a0) ret After this patch: vsetivli zero,16,e8,mf8,ta,ma vle8.v v2,0(a1) vsetivli zero,4,e32,mf2,ta,ma vrgather.vi v1,v2,3 vsetivli zero,16,e8,mf8,ta,ma vse8.v v1,0(a0) ret Overal reduce 1 instruction which is vector load instruction which is much more expansive than VL toggling. Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function. (expand_vec_perm_const_1): Add consecutive pattern recognition. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add new test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.	2023-10-18 15:58:53 +08:00
Tobias Burnus	372c5da215	fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL example gcc/fortran/ChangeLog: * intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to the example to make it safer.	2023-10-18 09:29:56 +02:00
Haochen Jiang	f019251ac9	Initial Panther Lake Support gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther Lake. * common/config/i386/i386-common.cc (processor_name): Ditto. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_PANTHERLAKE. * config.gcc: Add -march=pantherlake. * config/i386/driver-i386.cc (host_detect_local_cpu): Refactor the if clause. Handle pantherlake. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle pantherlake. * config/i386/i386-options.cc (processor_cost_table): Ditto. (m_PANTHERLAKE): New. (m_CORE_HYBRID): Add pantherlake. * config/i386/i386.h (enum processor_type): Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Ditto. * gcc.target/i386/funcspec-56.inc: Handle new march.	2023-10-18 14:40:59 +08:00
Haochen Jiang	2aa97c0da4	x86: Add m_CORE_HYBRID for hybrid clients tuning gcc/Changelog: * config/i386/i386-options.cc (m_CORE_HYBRID): New. * config/i386/x86-tune.def: Replace hybrid client tune to m_CORE_HYBRID.	2023-10-18 14:40:22 +08:00
Haochen Jiang	7370c479dd	Initial Clearwater Forest Support gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Clearwater Forest. * common/config/i386/i386-common.cc (processor_name): Add Clearwater Forest. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_CLEARWATERFOREST. * config.gcc: Add -march=clearwaterforest. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle clearwaterforest. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-options.cc (processor_cost_table): Ditto. (m_CLEARWATERFOREST): New. (m_CORE_ATOM): Add clearwaterforest. * config/i386/i386.h (enum processor_type): Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Ditto. * gcc.target/i386/funcspec-56.inc: Handle new march.	2023-10-18 14:39:53 +08:00
liuhongt	cead92b7fc	Support 32/64-bit vectorization for _Float16 fma related operations. gcc/ChangeLog: * config/i386/mmx.md (fma<mode>4): New expander. (fms<mode>4): Ditto. (fnma<mode>4): Ditto. (fnms<mode>4): Ditto. (vec_fmaddsubv4hf4): Ditto. (vec_fmsubaddv4hf4): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-fmaddsubhf-1.c: New test. * gcc.target/i386/part-vect-fmahf-1.c: New test.	2023-10-18 09:14:57 +08:00
Juzhe-Zhong	cf7739d4a6	RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832] Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html which is caused by assertion FAIL. When we enable more currents in rvv.exp with dynamic LMUL, such issue can be reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832 Now, we enable more tests in rvv.exp in this patch and fix the bug. PR target/111832 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.	2023-10-18 09:03:09 +08:00
GCC Administrator	fb69acffa9	Daily bump.	2023-10-18 00:17:58 +00:00
Richard Sandiford	773306e9ef	aarch64: Put LR save slot first in more cases Now that the prologue and epilogue code iterates over saved registers in offset order, we can put the LR save slot first without compromising LDP/STP formation. This isn't worthwhile when shadow call stacks are enabled, since the first two registers are also push/pop candidates, and LR cannot be popped when shadow call stacks are enabled. (LR is instead loaded first and compared against the shadow stack's value.) But otherwise, it seems better to put the LR save slot first, to reduce unnecessary variation with the layout for stack clash protection. gcc/ * config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make the position of the LR save slot dependent on stack clash protection unless shadow call stacks are enabled. gcc/testsuite/ * gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19. * gcc.target/aarch64/test_frame_4.c: Likewise. * gcc.target/aarch64/test_frame_7.c: Likewise. * gcc.target/aarch64/test_frame_10.c: Likewise.	2023-10-17 23:46:33 +01:00
Richard Sandiford	5758585080	aarch64: Use vecs to store register save order aarch64_save/restore_callee_saves looped over registers in register number order. This in turn meant that we could only use LDP and STP for registers that were consecutive both number-wise and offset-wise (after unsaved registers are excluded). This patch instead builds lists of the registers that we've decided to save, in offset order. We can then form LDP/STP pairs regardless of register number order, which in turn means that we can put the LR save slot first without losing LDP/STP opportunities. gcc/ * config/aarch64/aarch64.h (aarch64_frame): Add vectors that store the list saved GPRs, FPRs and predicate registers. * config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize the lists of saved registers. Use them to choose push candidates. Invalidate pop candidates if we're not going to do a pop. (aarch64_next_callee_save): Delete. (aarch64_save_callee_saves): Take a list of registers, rather than a range. Make !skip_wb select only write-back candidates. (aarch64_expand_prologue): Update calls accordingly. (aarch64_restore_callee_saves): Take a list of registers, rather than a range. Always skip pop candidates. Also skip LR if shadow call stacks are enabled. (aarch64_expand_epilogue): Update calls accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores to happen in offset order. * gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.	2023-10-17 23:46:33 +01:00
Richard Sandiford	aeb3f0436f	Handle epilogues that contain jumps The prologue/epilogue pass allows the prologue sequence to contain jumps. The sequence is then partitioned into basic blocks using find_many_sub_basic_blocks. This patch treats epilogues in a similar way. Since only one block might need to be split, the patch (re)introduces a find_sub_basic_blocks routine to handle a single block. The new routine hard-codes the assumption that split_block will chain the new block immediately after the original block. The routine doesn't try to replicate the fix for PR81030, since that was specific to gimple->rtl expansion. The patch is needed for follow-on aarch64 patches that add conditional code to the epilogue. The tests are part of those patches. gcc/ * cfgbuild.h (find_sub_basic_blocks): Declare. * cfgbuild.cc (update_profile_for_new_sub_basic_block): New function, split out from... (find_many_sub_basic_blocks): ...here. (find_sub_basic_blocks): New function. * function.cc (thread_prologue_and_epilogue_insns): Handle epilogues that contain jumps.	2023-10-17 23:45:43 +01:00
Andrew Pinski	5e4abf4233	ssa_name_has_boolean_range vs signed-boolean:31 types This turns out to be a latent bug in ssa_name_has_boolean_range where it would return true for all boolean types but all of the uses of ssa_name_has_boolean_range was expecting 0/1 as the range rather than [-1,0]. So when I fixed vector lower to do all comparisons in boolean_type rather than still in the signed-boolean:31 type (to fix a different issue), the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which was signed-boolean:31) had a range of [0,1] which broke down and sometimes gave us -1/-2 as values rather than what we were expecting of -1/0. This was the simpliest patch I found while testing. We have another way of matching [0,1] range which we could use instead of ssa_name_has_boolean_range except that uses only the global ranges rather than the local range (during VRP). I tried to clean this up slightly by using gimple_match_zero_one_valuedp inside ssa_name_has_boolean_range but that failed because due to using only the global ranges. I then tried to change get_nonzero_bits to use the local ranges at the optimization time but that failed also because we would remove branches to __builtin_unreachable during evrp and lose information as we don't set the global ranges during evrp. OK? Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/110817 gcc/ChangeLog: * tree-ssanames.cc (ssa_name_has_boolean_range): Remove the check for boolean type as they don't have "[0,1]" range. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110817-1.c: New test. * gcc.c-torture/execute/pr110817-2.c: New test. * gcc.c-torture/execute/pr110817-3.c: New test.	2023-10-17 22:44:19 +00:00
Marek Polacek	1fbb7d75ab	c++: accepts-invalid with =delete("") [PR111840] r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration so that we don't emit multiple errors in g++.dg/parse/error57.C. But that means we don't diagnose int f1() = delete("george_crumb"); anymore, because fn decls often have error_mark_node in their DECL_INITIAL. (The code may be allowed one day via https://wg21.link/P2573R0.) I was hoping I could use cp_parser_error_occurred but that would regress error57.C. PR c++/111840 gcc/cp/ChangeLog: * parser.cc (cp_parser_simple_declaration): Do cp_parser_error for FUNCTION_DECLs. gcc/testsuite/ChangeLog: * g++.dg/parse/error65.C: New test.	2023-10-17 17:44:59 -04:00
Marek Polacek	765c3b8f82	c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660] My recent patch introducing cp_fold_immediate_r caused exponential compile time with nested COND_EXPRs. The problem is that the COND_EXPR case recursively walks the arms of a COND_EXPR, but after processing both arms it doesn't end the walk; it proceeds to walk the sub-expressions of the outermost COND_EXPR, triggering again walking the arms of the nested COND_EXPR, and so on. This patch brings the compile time down to about 0m0.030s. The ff_fold_immediate flag is unused after this patch but since I'm using it in the P2564 patch, I'm not removing it now. Maybe at_eof can be used instead and then we can remove ff_fold_immediate. PR c++/111660 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't handle it here. (cp_fold_r): Handle COND_EXPR here. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/hog1.C: New test. * g++.dg/cpp2a/consteval36.C: New test.	2023-10-17 17:41:44 -04:00
Jason Merrill	bac21b7ea6	c++: mangling tweaks Most of this is introducing the abi_check function to reduce the verbosity of most places that check -fabi-version. The start_mangling change is to avoid needing to zero-initialize additional members of the mangling globals, though I'm not actually adding any. The comment documents existing semantics. gcc/cp/ChangeLog: * mangle.cc (abi_check): New. (write_prefix, write_unqualified_name, write_discriminator) (write_type, write_member_name, write_expression) (write_template_arg, write_template_param): Use it. (start_mangling): Assign from {}. * cp-tree.h: Update comment.	2023-10-17 17:20:02 -04:00
Nathaniel Shead	4f8700078c	c++: Add missing auto_diagnostic_groups to constexpr.cc gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing auto_diagnostic_group. (cxx_eval_call_expression): Likewise. (diag_array_subscript): Likewise. (outside_lifetime_error): Likewise. (potential_constant_expression_1): Likewise. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Marek Polacek <polacek@redhat.com>	2023-10-17 16:19:02 -04:00
Vineet Gupta	9cad42786c	RISC-V/testsuite/pr111466.c: update test and expected output Update the test to potentially generate two SEXT.W instructions: one for incoming function arg, other for function return. But after commit `8eb9cdd142` ("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg") the test is not supposed to generate either of them so fix the expected assembler output which was errorneously introduced by commit above. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr111466.c (foo2): Change return to unsigned int as that will potentially generate two SEXT.W instructions. dg-final: Change to scan-assembler-not SEXT.W. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>	2023-10-17 13:12:58 -07:00
Martin Uecker	1f186f64b8	c: error for function with external and internal linkage [PR111708] Declaring a function with both external and internal linkage in the same TU is translation-time UB. Add an error for this case as already done for objects. PR c/111708 gcc/c/ChangeLog: * c-decl.cc (grokdeclarator): Add error. gcc/testsuite/ChangeLog: * gcc.dg/pr111708-1.c: New test. * gcc.dg/pr111708-2.c: New test.	2023-10-17 20:19:12 +02:00
Harald Anlauf	5ac63ec5da	Fortran: out of bounds access with nested implied-do IO [PR111837] gcc/fortran/ChangeLog: PR fortran/111837 * frontend-passes.cc (traverse_io_block): Dependency check of loop nest shall be triangular, not banded. gcc/testsuite/ChangeLog: PR fortran/111837 * gfortran.dg/implied_do_io_8.f90: New test.	2023-10-17 18:55:22 +02:00
Tobias Burnus	43c2f85f52	fortran/intrinsic.texi: Improve SIGNAL intrinsic entry gcc/fortran/ChangeLog: * intrinsic.texi (signal): Mention that the argument passed to the signal handler procedure is passed by reference. Extend example.	2023-10-17 18:24:06 +02:00
Andrew Pinski	b18d1cabe2	MATCH: [PR111432] Simplify `a & (x \| CST)` to a when we know that (a & ~CST) == 0 This adds the simplification `a & (x \| CST)` to a when we know that `(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle. I looked into handling `a \| (x & CST)` but that I don't see any decent simplifications happening. OK? Bootstrapped and tested on x86_linux-gnu with no regressions. PR tree-optimization/111432 gcc/ChangeLog: * match.pd (`a & (x \| CST)`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-7.c: New test.	2023-10-17 08:51:51 -07:00
Georg-Johann Lay	da65efe433	LibF7: Re-generate f7-renames.h to pick up white-space from f7renames.sh. libgcc/config/avr/libf7/ * f7-renames.h: Re-renerate.	2023-10-17 17:19:36 +02:00
Andre Vieira	305034e3ae	tree-cfg: Add count information when creating new bb in move_sese_region_to_fn This patch makes sure the profile_count information is initialized for the new bb created in move_sese_region_to_fn. gcc/ChangeLog: * tree-cfg.cc (move_sese_region_to_fn): Initialize profile_count for new basic block.	2023-10-17 15:02:29 +01:00
Gaius Mulley	ef6696af08	PR modula2/111756: Re-building all-gcc after source changes fails to link When having modula-2 enabled in a development tree and there are any changes that trigger rebuilds in m2/ doing a 'make all-gcc' in the build directory might fail due to lack of dependency tracking. This patch introduces build dependencies into gcc/m2/Make-lang.in using -M* options. The patch also introduces all -M* options to cc1gm2 and gm2. gcc/m2/ChangeLog: PR modula2/111756 * Make-lang.in (CM2DEP): New define conditionally set if ($(CXXDEPMODE),depmode=gcc3). (GM2_1): Use $(CM2DEP). (m2/gm2-gcc/%.o): Ensure $(@D)/$(DEPDIR) is created. Add $(CM2DEP) to the $(COMPILER) command and use $(POSTCOMPILE). (m2/gm2-gcc/m2configure.o): Ditto. (m2/gm2-lang.o): Ditto. (m2/m2pp.o): Ditto. (m2/gm2-gcc/rtegraph.o): Ditto. (m2/mc-boot/$(SRC_PREFIX)%.o): Ditto. (m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto. (m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto. (m2/mc-boot/main.o): Ditto. (mcflex.o): Ditto. (m2/gm2-libs-boot/M2RTS.o): Ditto. (m2/gm2-libs-boot/%.o): Ditto. (m2/gm2-libs-boot/%.o): Ditto. (m2/gm2-libs-boot/RTcodummy.o): Ditto. (m2/gm2-libs-boot/RTintdummy.o): Ditto. (m2/gm2-libs-boot/wrapc.o): Ditto. (m2/gm2-libs-boot/UnixArgs.o): Ditto. (m2/gm2-libs-boot/choosetemp.o): Ditto. (m2/gm2-libs-boot/errno.o): Ditto. (m2/gm2-libs-boot/dtoa.o): Ditto. (m2/gm2-libs-boot/ldtoa.o): Ditto. (m2/gm2-libs-boot/termios.o): Ditto. (m2/gm2-libs-boot/SysExceptions.o): Ditto. (m2/gm2-libs-boot/SysStorage.o): Ditto. (m2/gm2-compiler-boot/M2GCCDeclare.o): Ditto. (m2/gm2-compiler-boot/M2Error.o): Ditto. (m2/gm2-compiler-boot/%.o): Ditto. (m2/gm2-compiler-boot/%.o): Ditto. (m2/gm2-compiler-boot/m2flex.o): Ditto. (m2/gm2-compiler/%.o): Ditto. (m2/gm2-compiler/m2flex.o): Ditto. (m2/gm2-libs-iso/%.o): Ditto. (m2/gm2-libs/%.o): Ditto. (m2/gm2-libs/%.o): Ditto. (m2/gm2-libs/choosetemp.o): Ditto. (m2/boot-bin/mklink$(exeext)): Ditto. (m2/pge-boot/%.o): Ditto. (m2/pge-boot/%.o): Ditto. (m2/gm2-compiler/%.o): Ensure $(@D)/$(DEPDIR) is created and use $(POSTCOMPILE). (m2/gm2-compiler/%.o): Ditto. (m2/gm2-libs-iso/%.o): Ditto. (m2/gm2-libs/%.o): Ditto. * README: Purge out of date info. * gm2-compiler/M2Comp.mod (MakeSaveTempsFileNameExt): Import. (OnExitDelete): Import. (GetModuleDefImportStatementList): Import. (GetModuleModImportStatementList): Import. (GetImportModule): Import. (IsImportStatement): Import. (IsImport): Import. (GetImportStatementList): Import. (File): Import. (Close): Import. (EOF): Import. (IsNoError): Import. (WriteLine): Import. (WriteChar): Import. (FlushOutErr): Import. (WriteS): Import. (OpenToRead): Import. (OpenToWrite): Import. (ReadS): Import. (WriteS): Import. (GetM): Import. (GetMM): Import. (GetDepTarget): Import. (GetMF): Import. (GetMP): Import. (GetObj): Import. (GetMD): Import. (GetMMD): Import. (GenerateDefDependency): New procedure. (GenerateDependenciesFromImport): New procedure. (GenerateDependenciesFromList): New procedure. (GenerateDependencies): New procedure. (Compile): Re-write. (compile): Re-format. (CreateFileStem): New procedure function. (DoPass0): Re-write. (IsLibrary): New procedure function. (IsUnique): New procedure function. (Append): New procedure. (MergeDep): New procedure. (GetRuleTarget): New procedure function. (ReadDepContents): New procedure function. (WriteDep): New procedure. (WritePhonyDep): New procedure. (WriteDepContents): New procedure. (CreateDepFilename): New procedure function. (Pass0CheckDef): New procedure function. (Pass0CheckMod): New procedure function. (DoPass0): Re-write. (DepContent): New variable. (DepOutput): New variable. (BaseName): New procedure function. * gm2-compiler/M2GCCDeclare.mod (PrintTerse): Handle IsImport. Replace IsGnuAsmVolatile with IsGnuAsm. * gm2-compiler/M2Options.def (EXPORT QUALIFIED): Remove list. (SetM): New procedure. (GetM): New procedure function. (SetMM): New procedure. (GetMM): New procedure function. (SetMF): New procedure. (GetMF): New procedure function. (SetPPOnly): New procedure. (GetB): New procedure function. (SetMD): New procedure. (GetMD): New procedure function. (SetMMD): New procedure. (GetMMD): New procedure function. (SetMQ): New procedure. (SetMT): New procedure. (GetMT): New procedure function. (GetDepTarget): New procedure function. (SetMP): New procedure. (GetMP): New procedure function. (SetObj): New procedure. (SetSaveTempsDir): New procedure. * gm2-compiler/M2Options.mod (SetM): New procedure. (GetM): New procedure function. (SetMM): New procedure. (GetMM): New procedure function. (SetMF): New procedure. (GetMF): New procedure function. (SetPPOnly): New procedure. (GetB): New procedure function. (SetMD): New procedure. (GetMD): New procedure function. (SetMMD): New procedure. (GetMMD): New procedure function. (SetMQ): New procedure. (SetMT): New procedure. (GetMT): New procedure function. (GetDepTarget): New procedure function. (SetMP): New procedure. (GetMP): New procedure function. (SetObj): New procedure. (SetSaveTempsDir): New procedure. * gm2-compiler/M2Preprocess.def (PreprocessModule): New parameters topSource and outputDep. Re-write. (MakeSaveTempsFileNameExt): New procedure function. (OnExitDelete): New procedure function. * gm2-compiler/M2Preprocess.mod (GetM): Import. (GetMM): Import. (OnExitDelete): Add debugging message. (RemoveFile): Add debugging message. (BaseName): Remove. (BuildCommandLineExecute): New procedure function. * gm2-compiler/M2Search.def (SetDefExtension): Remove unnecessary spacing. * gm2-compiler/SymbolTable.mod (GetSymName): Handle ImportSym and ImportStatementSym. * gm2-gcc/m2options.h (M2Options_SetMD): New function. (M2Options_GetMD): New function. (M2Options_SetMMD): New function. (M2Options_GetMMD): New function. (M2Options_SetM): New function. (M2Options_GetM): New function. (M2Options_SetMM): New function. (M2Options_GetMM): New function. (M2Options_GetMQ): New function. (M2Options_SetMF): New function. (M2Options_GetMF): New function. (M2Options_SetMT): New function. (M2Options_SetMP): New function. (M2Options_GetMP): New function. (M2Options_GetDepTarget): New function. * gm2-lang.cc (gm2_langhook_init): Correct comment case. (gm2_langhook_init_options): Add case OPT_M and OPT_MM. (gm2_langhook_post_options): Add case OPT_MF, OPT_MT, OPT_MD and OPT_MMD. * lang-specs.h (M2CPP): Pass though MF option. (MDMMD): New define. Add MDMMD to "@modula-2". Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>	2023-10-17 14:11:24 +01:00
Richard Biener	323209cd73	tree-optimization/111846 - put simd-clone-info into SLP tree The following avoids bogously re-using the simd-clone-info we currently hang off stmt_info from two different SLP contexts where a different number of lanes should have chosen a different best simdclone. PR tree-optimization/111846 * tree-vectorizer.h (_slp_tree::simd_clone_info): Add. (SLP_TREE_SIMD_CLONE_INFO): New. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize SLP_TREE_SIMD_CLONE_INFO. (_slp_tree::~_slp_tree): Release it. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use SLP_TREE_SIMD_CLONE_INFO or STMT_VINFO_SIMD_CLONE_INFO dependent on if we're doing SLP. * gcc.dg/vect/pr111846.c: New testcase.	2023-10-17 14:24:51 +02:00
Jakub Jelinek	fbdf88a1f6	wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u} The following patch implements printing of wide_int/widest_int numbers decimally when asked for that using print_dec{,s,u}, even if they have precision larger than 64 and get_len () above 1 (right now we printed them hexadecimally and even negative numbers as huge positive hexadecimal). In order to avoid the expensive division/modulo by 10^19 twice, once to estimate how many will be needed and another to actually print it, the patch prints the 19 digit chunks in reverse order (from least significant to most significant) and then reorders those with linear complexity to form the right printed number. Tested with printing both 256 and 320 bit numbers (first as an example of even number of 19 digit chunks plus one shorter above it, the second as an example of odd number of 19 digit chunks plus one shorter above it). The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now is one byte too much (one byte for -, one for '\0') and too conservative, so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g. l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative estimation (though more expensive to compute in inline code). But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come from? Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple of 64... 2023-10-17 Jakub Jelinek <jakub@redhat.com> * wide-int-print.h (print_dec_buf_size): For length, divide number of bits by 3 and add 3 instead of division by 4 and adding 4. * wide-int-print.cc (print_decs): Remove superfluous ()s. Don't call print_hex, instead call print_decu on either negated value after printing - or on wi itself. (print_decu): Don't call print_hex, instead print even large numbers decimally. (pp_wide_int_large): Assume len from print_dec_buf_size is big enough even if it returns false. * pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if pp_wide_int_large should be used. * tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size to compute needed buffer size.	2023-10-17 14:25:00 +02:00
Lehua Ding	5bb79a427a	RISC-V: Fix failed testcase when use -cmodel=medany This little path fix a failed testcase when use -cmodel=medany. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/cpymem-1.c: Split check.	2023-10-17 18:00:28 +08:00
Georg-Johann Lay	c4e773b4cc	LibF7: Implement fma / fmal. libgcc/config/avr/libf7/ * libf7.h (F7_SIZEOF): New macro. * libf7-asm.sx: Use F7_SIZEOF instead of magic number "10". (F7MOD_D_fma_, __fma): New module and function. (fma) [-mdouble=64]: Define as alias for __fma. (fmal) [-mlong-double=64]: Define as alias for __fma. * libf7-common.mk (F7_ASM_PARTS): Add D_fma.	2023-10-17 11:45:22 +02:00
Richard Biener	ce55521bcd	middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of a volatile declared parameter which causes inlining to substitute a constant parameter into a context where its address is required. The main issue is in update_address_taken which clears DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it because is_gimple_reg returns false for volatiles. The following changes maybe_optimize_var to make the 1:1 correspondence between clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and actually rewriting it to SSA. PR middle-end/111818 * tree-ssa.cc (maybe_optimize_var): When clearing DECL_NOT_GIMPLE_REG_P always rewrite into SSA. * gcc.dg/torture/pr111818.c: New testcase.	2023-10-17 08:23:33 +02:00
Richard Biener	3aaf704bca	tree-optimization/111807 - ICE in verify_sra_access_forest The following addresses build_reconstructed_reference failing to build references with a different offset than the models and thus the caller conditional being off. This manifests when attempting to build a ref with offset 160 from the model BIT_FIELD_REF <l_4827[9], 8, 0> onto the same base l_4827 but the models offset being 288. This cannot work for any kind of ref I can think of, not just with BIT_FIELD_REFs. PR tree-optimization/111807 * tree-sra.cc (build_ref_for_model): Only call build_reconstructed_reference when the offsets are the same. * gcc.dg/torture/pr111807.c: New testcase.	2023-10-17 08:23:33 +02:00
Vineet Gupta	8eb9cdd142	expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466] RISC-V suffers from extraneous sign extensions, despite/given the ABI guarantee that 32-bit quantities are sign-extended into 64-bit registers, meaning incoming SI function args need not be explicitly sign extended (so do SI return values as most ALU insns implicitly sign-extend too.) Existing REE doesn't seem to handle this well and there are various ideas floating around to smarten REE about it. RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE etc. Another approach would be to prevent EXPAND from generating the sign_extend in the first place which this patch tries to do. The hunk being removed was introduced way back in 1994 as `5069803972` ("expand_expr, case CONVERT_EXPR .. clear the promotion flag") This survived full testsuite run for RISC-V rv64gc with surprisingly no fallouts: test results before/after are exactly same. \| \| # of unexpected case / # of unique unexpected case \| \| gcc \| g++ \| gfortran \| \| rv64imafdc_zba_zbb_zbs_zicond/\| 264 / 87 \| 5 / 2 \| 72 / 12 \| \| lp64d/medlow Granted for something so old to have survived, there must be a valid reason. Unfortunately the original change didn't have additional commentary or a test case. That is not to say it can't/won't possibly break things on other arches/ABIs, hence the RFC for someone to scream that this is just bonkers, don't do this 🙂 I've explicitly CC'ed Jakub and Roger who have last touched subreg promoted notes in expr.cc for insight and/or screaming 😉 Thanks to Robin for narrowing this down in an amazing debugging session @ GNU Cauldron. ``` foo2: sext.w a6,a1 <-- this goes away beq a1,zero,.L4 li a5,0 li a0,0 .L3: addw a4,a2,a5 addw a5,a3,a5 addw a0,a4,a0 bltu a5,a6,.L3 ret .L4: li a0,0 ret ``` Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com> PR target/111466 gcc/ * expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P. gcc/testsuite * gcc.target/riscv/pr111466.c: New test.	2023-10-16 22:02:26 -06:00
Chenghui Pan	38ad4ad112	LoongArch: Fix vec_initv32qiv16qi template to avoid ICE. Following test code triggers unrecognized insn ICE on LoongArch target with "-O3 -mlasx": void foo (unsigned char dst, unsigned char src) { for (int y = 0; y < 16; y++) { for (int x = 0; x < 16; x++) dst[x] = src[x] + 1; dst += 32; src += 32; } } ICE info: ./test.c: In function ‘foo’: ./test.c:8:1: error: unrecognizable insn: 8 \| } \| ^ (insn 15 14 16 4 (set (reg:V32QI 185 [ vect__24.7 ]) (vec_concat:V32QI (reg:V16QI 186) (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]))) "./test.c":4:19 -1 (nil)) during RTL pass: vregs ./test.c:8:1: internal compiler error: in extract_insn, at recog.cc:2791 0x12028023b _fatal_insn(char const, rtx_def const, char const, int, char const) /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:108 0x12028026f _fatal_insn_not_found(rtx_def const, char const, int, char const) /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:116 0x120a03c5b extract_insn(rtx_insn) /home/panchenghui/upstream/gcc/gcc/recog.cc:2791 0x12067ff73 instantiate_virtual_regs_in_insn /home/panchenghui/upstream/gcc/gcc/function.cc:1610 0x12067ff73 instantiate_virtual_regs /home/panchenghui/upstream/gcc/gcc/function.cc:1983 0x12067ff73 execute /home/panchenghui/upstream/gcc/gcc/function.cc:2030 This RTL is generated inside loongarch_expand_vector_group_init function (related to vec_initv32qiv16qi template). Original impl doesn't ensure all vec_concat arguments are register type. This patch adds force_reg() to the vec_concat argument generation. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): fix impl related to vec_initv32qiv16qi template to avoid ICE. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c: New test.	2023-10-17 10:08:39 +08:00
Lulu Cheng	b20c7ee066	LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. There are two reasons for removing this macro definition: 1. The default in the assembler is to use the nop instruction for filling. 2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]] The third expression it is the maximum number of bytes that should be skipped by this alignment directive. Therefore, it will affect the display of the specified alignment rules and affect the operating efficiency. This modification relies on binutils commit 1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c. (Since the assembler will add nop based on the .align information when doing relax, it will cause the conditional branch to go out of bounds during the assembly process. This submission of binutils solves this problem.) gcc/ChangeLog: * config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP): Delete. Co-authored-by: Chenghua Xu <xuchenghua@loongson.cn>	2023-10-17 09:59:11 +08:00
Juzhe-Zhong	b25b43caf2	RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store Consider this following case: int bar (int x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2i] - a; sum1 += x[2i+1] b; sum2 += x[2i] - b; sum2 += x[2i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrr t0,vlenb csrr a6,vlenb slli t1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.vi v4,v4,1 slli t3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7) ----- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) ----- spill slli a3,a3,1 addi t4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleu a3,a6,.L3 csrr a4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) ---- spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vv v4,v24,v20 bgtu a7,a6,.L13 csrr a1,vlenb slli a1,a1,2 add a1,a1,sp li a4,-1 csrr t0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) ---- spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 slli t1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vv v0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vv v12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addw a0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = _4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type \|\| type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrr a6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srli a2,a6,1 vmv.v.x v4,a1 vid.v v12 slli a3,a3,1 vand.vi v0,v12,1 addi t1,a2,-1 vmseq.vi v0,v0,1 slli a6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minu a4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vv v2,v16,v6 bgtu a4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,-1 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v6,v4,v2,v0 vredsum.vs v6,v6,v1 vmul.vx v0,v12,a3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmv.x.s a4,v6 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v4,v4,v2,v0 vredsum.vs v4,v4,v1 vmv.x.s a0,v4 addw a0,a0,a4 ret .L4: li a0,0 ret No spillings. gcc/ChangeLog: config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix big LMUL issue. (get_store_value): New function. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: New test.	2023-10-17 09:06:11 +08:00
Iain Buclaw	ef8f7e3f97	d: Forbid taking the address of an intrinsic with no implementation This code fails to link: import core.math; real function(real) fn = &sin; However, when called directly, the D intrinsic `sin()' is expanded by the front-end into the GCC built-in `__builtin_sin()'. This has been fixed to now also expand the function when a reference is taken. As there are D intrinsics and GCC built-ins that don't have a fallback implementation, raise an error if taking the address is not possible. gcc/d/ChangeLog: * d-tree.h (intrinsic_code): Update define for DEF_D_INTRINSIC. (maybe_reject_intrinsic): New prototype. * expr.cc (ExprVisitor::visit (SymOffExp )): Call maybe_reject_intrinsic. intrinsics.cc (intrinsic_decl): Add fallback field. (intrinsic_decls): Update define for DEF_D_INTRINSIC. (maybe_reject_intrinsic): New function. * intrinsics.def (DEF_D_LIB_BUILTIN): Update. (DEF_CTFE_BUILTIN): Update. (INTRINSIC_BSF): Declare as library builtin. (INTRINSIC_BSR): Likewise. (INTRINSIC_BT): Likewise. (INTRINSIC_BSF64): Likewise. (INTRINSIC_BSR64): Likewise. (INTRINSIC_BT64): Likewise. (INTRINSIC_POPCNT32): Likewise. (INTRINSIC_POPCNT64): Likewise. (INTRINSIC_ROL): Likewise. (INTRINSIC_ROL_TIARG): Likewise. (INTRINSIC_ROR): Likewise. (INTRINSIC_ROR_TIARG): Likewise. (INTRINSIC_ADDS): Likewise. (INTRINSIC_ADDSL): Likewise. (INTRINSIC_ADDU): Likewise. (INTRINSIC_ADDUL): Likewise. (INTRINSIC_SUBS): Likewise. (INTRINSIC_SUBSL): Likewise. (INTRINSIC_SUBU): Likewise. (INTRINSIC_SUBUL): Likewise. (INTRINSIC_MULS): Likewise. (INTRINSIC_MULSL): Likewise. (INTRINSIC_MULU): Likewise. (INTRINSIC_MULUI): Likewise. (INTRINSIC_MULUL): Likewise. (INTRINSIC_NEGS): Likewise. (INTRINSIC_NEGSL): Likewise. (INTRINSIC_TOPRECF): Likewise. (INTRINSIC_TOPREC): Likewise. (INTRINSIC_TOPRECL): Likewise. gcc/testsuite/ChangeLog: * gdc.dg/builtins_reject.d: New test. * gdc.dg/intrinsics_reject.d: New test.	2023-10-17 02:20:51 +02:00
GCC Administrator	e16ace7c79	Daily bump.	2023-10-17 00:17:33 +00:00
Jeff Law	b626751a4e	Fix minor problem in stack probing probe_stack_range has an assert to capture the possibility that that expand_binop might not construct its result in the provided target. We triggered that internally a little while ago. I'm pretty sure it was in the testsuite, so no new testcase. The fix is easy, copy the result into the proper target when needed. Bootstrapped and regression tested on x86. gcc/ * explow.cc (probe_stack_range): Handle case when expand_binop does not construct its result in the expected location.	2023-10-16 17:16:12 -06:00
David Malcolm	04013e4464	diagnostics: special-case -fdiagnostics-text-art-charset=ascii for LANG=C In the LWN discussion of the "ASCII" art in GCC 14 https://lwn.net/Articles/946733/#Comments there was some concern about the use of non-ASCII characters in the output. Currently -fdiagnostics-text-art-charset defaults to "emoji". To better handle older terminals by default, this patch special-cases LANG=C to use -fdiagnostics-text-art-charset=ascii. gcc/ChangeLog: * diagnostic.cc (diagnostic_initialize): When LANG=C, update default for -fdiagnostics-text-art-charset from emoji to ascii. * doc/invoke.texi (fdiagnostics-text-art-charset): Document the above. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2023-10-16 19:04:50 -04:00
David Malcolm	f8644b6782	diagnostics: fix missing initialization of context->extra_output_kind gcc/ChangeLog: * diagnostic.cc (diagnostic_initialize): Ensure context->extra_output_kind is initialized. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2023-10-16 19:02:34 -04:00
Uros Bizjak	1a64156c7e	i386: Allow -mlarge-data-threshold with -mcmodel=large From: Fangrui Song <maskray@google.com> When using -mcmodel=medium, large data objects larger than the -mlarge-data-threshold threshold are placed into large data sections (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place .l* sections into separate output sections. If small and medium code model object files are mixed, the .l* sections won't exert relocation overflow pressure on sections in object files built with -mcmodel=small. However, when using -mcmodel=large, -mlarge-data-threshold doesn't apply. This means that the .rodata/.data/.bss sections may exert relocation overflow pressure on sections in -mcmodel=small object files. This patch allows -mcmodel=large to generate .l* sections and drops an unneeded documentation restriction that the value must be the same. Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU ("Large data sections for the large code model") Signed-off-by: Fangrui Song <maskray@google.com> gcc/ChangeLog: * config/i386/i386.cc (ix86_can_inline_p): Handle CM_LARGE and CM_LARGE_PIC. (x86_elf_aligned_decl_common): Ditto. (x86_output_aligned_bss): Ditto. * config/i386/i386.opt: Update doc for -mlarge-data-threshold=. * doc/invoke.texi: Update doc for -mlarge-data-threshold=. gcc/testsuite/ChangeLog: * gcc.target/i386/large-data.c: New test.	2023-10-16 23:44:23 +02:00
Christoph Müllner	328745607c	RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc This just moves a few functions out of riscv.cc into riscv-string.cc in an attempt to keep riscv.cc manageable. This was originally Christoph's code and I'm just pushing it on his behalf. Full disclosure: I built rv64gc after changing to verify everything still builds. Given it was just lifting code from one place to another, I didn't run the testsuite. gcc/ * config/riscv/riscv-protos.h (emit_block_move): Remove redundant prototype. Improve comment. * config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc into riscv-string.cc. (riscv_adjust_block_mem, riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise. * config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved function. (riscv_adjust_block_mem, riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise.	2023-10-16 14:02:15 -06:00
Vineet Gupta	c92737722f	RISC-V/testsuite: add a default march (lacking zfa) to some fp tests A bunch of FP tests expecting specific FP asm output fail when built with zfa because different insns are generated. And this happens because those tests don't have an explicit -march and the default used to configure gcc could end up with zfa causing the false fails. Fix that by adding the -march explicitly which doesn't have zfa. BTW it seems we have some duplication in tests for zfa and non-zfa and it would have been better if they were consolidated, but oh well. gcc/testsuite: * gcc.target/riscv/fle-ieee.c: Updates dg-options with explicit -march=rv64gc and -march=rv32gc. * gcc.target/riscv/fle-snan.c: Ditto. * gcc.target/riscv/fle.c: Ditto. * gcc.target/riscv/flef-ieee.c: Ditto. * gcc.target/riscv/flef.c: Ditto. * gcc.target/riscv/flef-snan.c: Ditto. * gcc.target/riscv/flt-ieee.c: Ditto. * gcc.target/riscv/flt-snan.c: Ditto. * gcc.target/riscv/fltf-ieee.c: Ditto. * gcc.target/riscv/fltf-snan.c: Ditto. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>	2023-10-16 12:45:54 -07:00
Manolis Tsamis	04c9cf5c78	Implement new RTL optimizations pass: fold-mem-offsets This is a new RTL pass that tries to optimize memory offset calculations by moving them from add immediate instructions to the memory loads/stores. For example it can transform this: addi t4,sp,16 add t2,a6,t4 shl t3,t2,1 ld a2,0(t3) addi a2,1 sd a2,8(t2) into the following (one instruction less): add t2,a6,sp shl t3,t2,1 ld a2,32(t3) addi a2,1 sd a2,24(t2) Although there are places where this is done already, this pass is more powerful and can handle the more difficult cases that are currently not optimized. Also, it runs late enough and can optimize away unnecessary stack pointer calculations. gcc/ChangeLog: * Makefile.in: Add fold-mem-offsets.o. * passes.def: Schedule a new pass. * tree-pass.h (make_pass_fold_mem_offsets): Declare. * common.opt: New options. * doc/invoke.texi: Document new option. * fold-mem-offsets.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/fold-mem-offsets-1.c: New test. * gcc.target/riscv/fold-mem-offsets-2.c: New test. * gcc.target/riscv/fold-mem-offsets-3.c: New test. * gcc.target/i386/pr52146.c: Adjust expected output. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>	2023-10-16 13:08:57 -06:00
Iain Buclaw	964fd402c9	d: Merge upstream dmd, druntime 4c18eed967, phobos d945686a4. D front-end changes: - Import latest fixes to mainline. D runtime changes: - Import latest fixes to mainline. Phobos changes: - Import latest fixes to mainline. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 4c18eed967. * d-diagnostic.cc (verrorReport): Update for new front-end interface. (verrorReportSupplemental): Likewise. * d-lang.cc (d_init_options): Likewise. (d_handle_option): Likewise. (d_post_options): Likewise. (d_parse_file): Likewise. * decl.cc (get_symbol_decl): Likewise. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 4c18eed967. * src/MERGE: Merge upstream phobos d945686a4.	2023-10-16 19:14:10 +02:00
Andrew Pinski	c7609acb8a	MATCH: Improve `A CMP 0 ? A : -A` set of patterns to use bitwise_equal_p. This improves the `A CMP 0 ? A : -A` set of match patterns to use bitwise_equal_p which allows an nop cast between signed and unsigned. This allows catching a few extra cases which were not being caught before. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/101541 * match.pd (A CMP 0 ? A : -A): Improve using bitwise_equal_p. gcc/testsuite/ChangeLog: PR tree-optimization/101541 * gcc.dg/tree-ssa/phi-opt-36.c: New test. * gcc.dg/tree-ssa/phi-opt-37.c: New test.	2023-10-16 10:11:13 -07:00
Andrew Pinski	29a4453c7b	[PR31531] MATCH: Improve ~a < ~b and ~a < CST, allow a nop cast inbetween ~ and a/b Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a nop conversion in between the `~` and the `a` which can show up. A similarly thing should be done for `~a CMP CST`. I had originally submitted the `~a CMP CST` case as https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585088.html; I noticed we should do the same thing for the `~a CMP ~b` case and combined it with that one here. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/31531 gcc/ChangeLog: * match.pd (~X op ~Y): Allow for an optional nop convert. (~X op C): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr31531-1.c: New test. * gcc.dg/tree-ssa/pr31531-2.c: New test.	2023-10-16 09:59:56 -07:00
Jason Merrill	7550130c86	c++: improve fold-expr location I want to distinguish between constraint && and fold-expressions there of written by the user and those implied by template parameter type-constraints; to that end, let's improve our EXPR_LOCATION for an explicit fold-expression. The fold3.C change is needed because this moves the caret from the end of the expression to the operator, which means the location of the error refers to the macro invocation rather than the macro definition; both locations are still printed, but which one is an error and which a note changes. gcc/cp/ChangeLog: * parser.cc (cp_parser_fold_expression): Track location range. * semantics.cc (finish_unary_fold_expr) (finish_left_unary_fold_expr, finish_right_unary_fold_expr) (finish_binary_fold_expr): Add location parm. * constraint.cc (finish_shorthand_constraint): Pass it. * pt.cc (convert_generic_types_to_packs): Likewise. * cp-tree.h: Adjust. gcc/testsuite/ChangeLog: * g++.dg/concepts/diagnostic3.C: Add expected column. * g++.dg/cpp1z/fold3.C: Adjust diagnostic lines.	2023-10-16 11:11:30 -04:00
Marek Polacek	a22eeaca5c	c++: fix truncated diagnostic in C++23 [PR111272] In C++23, since P2448, a constexpr function F that calls a non-constexpr function N is OK as long as we don't actually call F in a constexpr context. So instead of giving an error in maybe_save_constexpr_fundef, we only give an error when evaluating the call. Unfortunately, as shown in this PR, the diagnostic can be truncated: z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because: 10 \| constexpr Jam() { ft(); } \| ^~~ ...because what? With this patch, we say: z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because: 10 \| constexpr Jam() { ft(); } \| ^~~ z.C:10:23: error: call to non-'constexpr' function 'int Jam::ft()' 10 \| constexpr Jam() { ft(); } \| ~~^~ z.C:8:7: note: 'int Jam::ft()' declared here 8 \| int ft() { return 42; } \| ^~ Like maybe_save_constexpr_fundef, explain_invalid_constexpr_fn should also check the body of a constructor, not just the mem-initializer. PR c++/111272 gcc/cp/ChangeLog: * constexpr.cc (explain_invalid_constexpr_fn): Also check the body of a constructor in C++14 and up. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-diag1.C: New test.	2023-10-16 08:17:31 -04:00
Roger Sayle	817a701681	ARC: Split asl dst,1,src into bset dst,0,src to implement 1<<x. This patch adds a pre-reload splitter to arc.md, to use the bset (set specific bit instruction) to implement 1<<x (i.e. left shifts of one) on ARC processors that don't have a barrel shifter. Currently, int foo(int x) { return 1 << x; } when compiled with -O2 -mcpu=em is compiled as a loop: foo: mov_s r2,1 ;3 and.f lp_count,r0, 0x1f lpnz 2f add r2,r2,r2 nop 2: # end single insn loop j_s.d [blink] mov_s r0,r2 ;4 with this patch we instead generate a single instruction: foo: bset r0,0,r0 j_s [blink] 2023-10-16 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to use bset dst,0,src to implement 1<<x on !TARGET_BARREL_SHIFTER.	2023-10-16 13:03:09 +01:00
Stefan Schulze Frielinghaus	d6ebe61889	s390: Fix expander popcountv8hi2_vx The normal form of a CONST_INT which represents an integer of a mode with fewer bits than in HOST_WIDE_INT is sign extended. This even holds for unsigned integers. This fixes an ICE during cse1 where we bail out at rtl.h:2297 since INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) does not hold. gcc/ChangeLog: * config/s390/vector.md (popcountv8hi2_vx): Sign extend each unsigned vector element.	2023-10-16 13:39:04 +02:00

1 2 3 4 5 ...

204635 Commits