mirror/gcc - gcc - Collaboration & Inovation

mirror/gcc

mirror of git://gcc.gnu.org/git/gcc.git synced 2025-03-03 17:36:03 +08:00

Author	SHA1	Message	Date
Alex Coplan	583ca5f599	aarch64, testsuite: Prevent stp in lr_free_1.c The test is looking for individual stores which are able to be merged into stp instructions. The test currently passes -fno-schedule-fusion -fno-peephole2, presumably to prevent these stores from being turned into stps, but this is no longer sufficient with the new ldp/stp fusion pass. As such, we add --param=aarch64-stp-policy=never to prevent stps being formed. gcc/testsuite/ChangeLog: * gcc.target/aarch64/lr_free_1.c: Add --param=aarch64-stp-policy=never to dg-options.	2023-10-19 11:12:23 +01:00
Alex Coplan	505f1202e3	rtl-ssa: Support inferring uses of mem in change_insns Currently, rtl_ssa::change_insns requires all new uses and defs to be specified explicitly. This turns out to be rather inconvenient for forming load pairs in the new aarch64 load pair pass, as the pass has to determine which mem def the final load pair consumes, and then obtain or create a suitable use (i.e. significant bookkeeping, just to keep the RTL-SSA IR consistent). It turns out to be much more convenient to allow change_insns to infer which def is consumed and create a suitable use of mem itself. This patch does that. gcc/ChangeLog: * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new parameter to give final insn position, infer use of mem if it isn't specified explicitly. (function_info::change_insns): Pass down final insn position to finalize_new_accesses. * rtl-ssa/functions.h: Add parameter to finalize_new_accesses.	2023-10-19 11:12:23 +01:00
Alex Coplan	ba230aa1b8	rtl-ssa: Add entry point to allow re-parenting uses This is needed by the upcoming aarch64 load pair pass, as it can re-order stores (when alias analysis determines this is safe) and thus change which mem def a given use consumes (in the RTL-SSA view, there is no alias disambiguation of memory). gcc/ChangeLog: * rtl-ssa/accesses.cc (function_info::reparent_use): New. * rtl-ssa/functions.h (function_info): Declare new member function reparent_use.	2023-10-19 11:12:22 +01:00
Alex Coplan	c95aab23c1	rtl-ssa: Add drop_memory_access helper Add a helper routine to access-utils.h which removes the memory access from an access_array, if it has one. gcc/ChangeLog: * rtl-ssa/access-utils.h (drop_memory_access): New.	2023-10-19 11:12:22 +01:00
Alex Coplan	c338083377	rtl-ssa: Fix bug in function_info::add_insn_after In the case that !insn->is_debug_insn () && next->is_debug_insn (), this function was missing an update of the prev pointer on the first nondebug insn following the sequence of debug insns starting at next. This can lead to corruption of the insn chain, in that we end up with: insn->next_any_insn ()->prev_any_insn () != insn in this case. This patch fixes that. gcc/ChangeLog: * rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we update the prev pointer on the following nondebug insn in the case that !insn->is_debug_insn () && next->is_debug_insn ().	2023-10-19 11:12:22 +01:00
Haochen Jiang	faa0e82b40	x86: Correct ISA enabled for clients since Arrow Lake gcc/ChangeLog: * config/i386/i386.h: Correct the ISA enabled for Arrow Lake. Also make Clearwater Forest depends on Sierra Forest. * config/i386/i386-options.cc: Revise the order of the macro definition to avoid confusion. * doc/extend.texi: Revise documentation. * doc/invoke.texi: Correct documentation. gcc/testsuite/ChangeLog: * gcc.target/i386/funcspec-56.inc: Group Clearwater Forest with atom cores.	2023-10-19 17:08:36 +08:00
Andrew Stubbs	56ed1055b2	amdgcn: deprecate Fiji device and multilib LLVM wants to remove it, which breaks our build. This patch means that most users won't notice that change, when it comes, and those that do will have chosen to enable Fiji explicitly. I'm selecting gfx900 as the new default as that's the least likely for users to want, which means most users will specify -march explicitly, which means we'll be free to change the default again, when we need to, without breaking anybody's makefiles. gcc/ChangeLog: * config.gcc (amdgcn): Switch default to --with-arch=gfx900. Implement support for --with-multilib-list. * config/gcn/t-gcn-hsa: Likewise. * doc/install.texi: Likewise. * doc/invoke.texi: Mark Fiji deprecated.	2023-10-19 09:46:57 +01:00
Jiahao Xu	8f4bbdc28d	LoongArch:Implement the new vector cost model framework. This patch make loongarch use the new vector hooks and implements the costing function determine_suggested_unroll_factor, to make it be able to suggest the unroll factor for a given loop being vectorized base vec_ops analysis during vector costing and the available issue information. Referring to aarch64 and rs6000 port. The patch also reduces the cost of unaligned stores, making it equal to the cost of aligned ones in order to avoid odd alignment peeling. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from vector_costs. Add a constructor. (loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to adjust the cost for inner loops. (loongarch_vector_costs::count_operations): New function. (loongarch_vector_costs::determine_suggested_unroll_factor): Ditto. (loongarch_vector_costs::finish_cost): Ditto. (loongarch_builtin_vectorization_cost): Adjust. * config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter. (loongarcg-vect-issue-info): Ditto. (mmemvec-cost): Delete. * config/loongarch/genopts/loongarch.opt.in (loongarch-vect-unroll-limit): Ditto. (loongarcg-vect-issue-info): Ditto. (mmemvec-cost): Delete. * doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.	2023-10-19 14:15:38 +08:00
Jiahao Xu	08813894fd	LoongArch:Implement vec_widen standard names. Add support for vec_widen lo/hi patterns. These do not directly match on Loongarch lasx instructions but can be emulated with even/odd + vector merge. gcc/ChangeLog: * config/loongarch/lasx.md (vec_widen_<su>mult_even_v8si): New patterns. (vec_widen_<su>add_hi_<mode>): Ditto. (vec_widen_<su>add_lo_<mode>): Ditto. (vec_widen_<su>sub_hi_<mode>): Ditto. (vec_widen_<su>sub_lo_<mode>): Ditto. (vec_widen_<su>mult_hi_<mode>): Ditto. (vec_widen_<su>mult_lo_<mode>): Ditto. * config/loongarch/loongarch.md (u_bool): New iterator. * config/loongarch/loongarch-protos.h (loongarch_expand_vec_widen_hilo): New prototype. * config/loongarch/loongarch.cc (loongarch_expand_vec_interleave): New function. (loongarch_expand_vec_widen_hilo): New function. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-widen-add.c: New test. * gcc.target/loongarch/vect-widen-mul.c: New test. * gcc.target/loongarch/vect-widen-sub.c: New test.	2023-10-19 14:15:35 +08:00
Jiahao Xu	a7b7284fe1	LoongArch:Implement avg and sad standard names. gcc/ChangeLog: * config/loongarch/lasx.md (avg<mode>3_ceil): New patterns. (uavg<mode>3_ceil): Ditto. (avg<mode>3_floor): Ditto. (uavg<mode>3_floor): Ditto. (usadv32qi): Ditto. (ssadv32qi): Ditto. * config/loongarch/lsx.md (avg<mode>3_ceil): New patterns. (uavg<mode>3_ceil): Ditto. (avg<mode>3_floor): Ditto. (uavg<mode>3_floor): Ditto. (usadv16qi): Ditto. (ssadv16qi): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/avg-ceil-lasx.c: New test. * gcc.target/loongarch/avg-ceil-lsx.c: New test. * gcc.target/loongarch/avg-floor-lasx.c: New test. * gcc.target/loongarch/avg-floor-lsx.c: New test. * gcc.target/loongarch/sad-lasx.c: New test. * gcc.target/loongarch/sad-lsx.c: New test.	2023-10-19 14:15:31 +08:00
GCC Administrator	0308461d9d	Daily bump.	2023-10-19 00:18:05 +00:00
Andrew Pinski	b20dbddcc4	Fix expansion of `(a & 2) != 1` I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only show up when disabling optimization passes as this above form would have been optimized away. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/111863 gcc/ChangeLog: * expr.cc (do_store_flag): Don't over write arg0 when stripping off `& POW2`. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111863-1.c: New test.	2023-10-18 15:11:39 -07:00
Andrew Pinski	879c91fccc	[c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not checking for error When checking to see if we have a function declaration has a conflict due to promotations, there is no test to see if the type was an error mark and then calls c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an ICE. This adds a check for error before the call of c_type_promotes_to. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/101364 gcc/c/ChangeLog: * c-decl.cc (diagnose_arglist_conflict): Test for error mark before calling of c_type_promotes_to. gcc/testsuite/ChangeLog: * gcc.dg/pr101364-1.c: New test.	2023-10-18 15:11:39 -07:00
Andrew Pinski	11e6bcedb4	Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node This is a simple error recovery issue when c_safe_arg_type_equiv_p was added in r8-5312-gc65e18d3331aa999. The issue is that after an error, an argument type (of a function type) might turn into an error mark node and c_safe_arg_type_equiv_p was not ready for that. So this just adds a check for error operand for its arguments before getting the main variant. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/101285 gcc/c/ChangeLog: * c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error operands early. gcc/testsuite/ChangeLog: * gcc.dg/pr101285-1.c: New test.	2023-10-18 15:11:38 -07:00
Prathamesh Kulkarni	3ec8ecb8e9	PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. gcc/ChangeLog: PR tree-optimization/111648 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1 chooses base element from arg, ensure that it's a natural stepped sequence. (build_vec_cst_rand): New param natural_stepped and use it to construct a naturally stepped sequence. (test_nunits_min_2): Add new unit tests Case 6 and Case 7.	2023-10-19 00:29:38 +05:30
Dimitar Dimitrov	fe9767eedc	pru: Implement TARGET_INSN_COST This patch slightly improves the embench-iot benchmark score for PRU code size. There is also small improvement in a few real-world firmware programs. Embench-iot size ------------------------------------------ Benchmark before after delta --------- ---- ---- ----- aha-mont64 4.15 4.15 0 crc32 6.04 6.04 0 cubic 21.64 21.62 -0.02 edn 6.37 6.37 0 huffbench 18.63 18.55 -0.08 matmult-int 5.44 5.44 0 md5sum 25.56 25.43 -0.13 minver 12.82 12.76 -0.06 nbody 15.09 14.97 -0.12 nettle-aes 4.75 4.75 0 nettle-sha256 4.67 4.67 0 nsichneu 3.77 3.77 0 picojpeg 4.11 4.11 0 primecount 7.90 7.90 0 qrduino 7.18 7.16 -0.02 sglib-combined 13.63 13.59 -0.04 slre 5.19 5.19 0 st 14.23 14.12 -0.11 statemate 2.34 2.34 0 tarfind 36.85 36.64 -0.21 ud 10.51 10.46 -0.05 wikisort 7.44 7.41 -0.03 --------- ----- ----- Geometric mean 8.42 8.40 -0.02 Geometric SD 2.00 2.00 0 Geometric range 12.68 12.62 -0.06 gcc/ChangeLog: * config/pru/pru.cc (pru_insn_cost): New function. (TARGET_INSN_COST): Define for PRU. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>	2023-10-18 20:16:20 +03:00
Georg-Johann Lay	67f7bf78ba	LibF7: Implement mul_mant for devices without MUL instruction. libgcc/config/avr/libf7/ * libf7-asm.sx (mul_mant): Implement for devices without MUL. * asm-defs.h (wmov) [!HAVE_MUL]: Fix regno computation. * t-libf7 (F7_ASM_FLAGS): Add -g0.	2023-10-18 19:00:09 +02:00
Andrew Carlotti	ff05a3e91d	aarch64: Replace duplicated selftests gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_test_fractional_cost): Test <= instead of testing < twice.	2023-10-18 16:23:24 +01:00
Jakub Jelinek	bc4bd69faf	cse: Workaround GCC < 5 bug in cse_insn [PR111852] Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile cse.cc which contains a variable with rtx_def type, because rtx_def contains a union with poly_uint16 element. poly_int template has defaulted default constructor and a variadic template constructor which could have empty parameter pack. GCC < 5 treated it as non-trivially constructible class and deleted rtunion and rtx_def default constructors. For the cse_insn purposes, all we need is a variable with size and alignment of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and fill in like rtx is normally allocated from heap, so this patch for GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment. 2023-10-18 Jakub Jelinek <jakub@redhat.com> PR bootstrap/111852 * cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of using rtx_def type for memory_extend_buf, use unsigned char arrayy with size of rtx_def and its alignment.	2023-10-18 17:01:26 +02:00
Jason Merrill	ef10cb8683	diagnostic: add permerror variants with opt In the discussion of promoting some pedwarns to be errors by default, rather than move them all into -fpermissive it seems to me to make sense to support DK_PERMERROR with an option flag. This way will also work with -fpermissive, but users can also still use -Wno-error=narrowing to downgrade that specific diagnostic rather than everything affected by -fpermissive. So, for diagnostics that we want to make errors by default we can just change the pedwarn call to permerror. The tests check desired behavior for such a permerror in a system header with various flags. The patch preserves the existing permerror behavior of ignoring -w and system headers by default, but respecting them when downgraded to a warning by -fpermissive. This seems similar to but a bit better than the approach of forcing -pedantic-errors that I previously used for -Wnarrowing: specifically, in that now -w by itself is not enough to silence the -Wnarrowing error (integer-pack2.C). gcc/ChangeLog: * doc/invoke.texi: Move -fpermissive to Warning Options. * diagnostic.cc (update_effective_level_from_pragmas): Remove redundant system header check. (diagnostic_report_diagnostic): Move down syshdr/-w check. (diagnostic_impl): Handle DK_PERMERROR with an option number. (permerror): Add new overloads. * diagnostic-core.h (permerror): Declare them. gcc/cp/ChangeLog: * typeck2.cc (check_narrowing): Use permerror. gcc/testsuite/ChangeLog: * g++.dg/ext/integer-pack2.C: Add -fpermissive. * g++.dg/diagnostic/sys-narrow.h: New test. * g++.dg/diagnostic/sys-narrow1.C: New test. * g++.dg/diagnostic/sys-narrow1a.C: New test. * g++.dg/diagnostic/sys-narrow1b.C: New test. * g++.dg/diagnostic/sys-narrow1c.C: New test. * g++.dg/diagnostic/sys-narrow1d.C: New test. * g++.dg/diagnostic/sys-narrow1e.C: New test. * g++.dg/diagnostic/sys-narrow1f.C: New test. * g++.dg/diagnostic/sys-narrow1g.C: New test. * g++.dg/diagnostic/sys-narrow1h.C: New test. * g++.dg/diagnostic/sys-narrow1i.C: New test.	2023-10-18 10:00:25 -04:00
Tobias Burnus	af4bb22115	OpenMP: Avoid ICE with LTO and 'omp allocate' gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute to avoid that auxillary statement list reaches LTO. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-13a.f90: New test.	2023-10-18 13:05:35 +02:00
Jakub Jelinek	f1744dd50b	tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845] GCC ICEs on the first testcase. Successful match_uaddc_usubc ends up with some dead stmts which DCE will remove (hopefully) later all. The ICE is because one of the dead stmts refers to a freed SSA_NAME. The code already gsi_removes a couple of stmts in the /* Remove some statements which can't be kept in the IL because they use SSA_NAME whose setter is going to be removed too. / section for the same reason (the reason for the freed SSA_NAMEs is that we don't really have a replacement for those cases - all we have after a match is combined overflow from the addition/subtraction of 2 operands + a [0, 1] carry in, but not the individual overflows from the former 2 additions), but for the last (most significant) limb case, where we try to match x = op1 + op2 + carry1 + carry2; or x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but left around the 2 temporary stmts as dead; if we were unlucky enough that those referenced the carry flag that went away, it ICEs. So, the following patch remembers those temporary statements (rather than trying to rediscover them more expensively) and removes them before the final one is replaced. While working on it, I've noticed we didn't support all the reassociated possibilities of writing the addition of 4 operands or subtracting 3 operands from one, we supported e.g. x = ((op1 + op2) + op3) + op4; x = op1 + ((op2 + op3) + op4); but not x = (op1 + (op2 + op3)) + op4; x = op1 + (op2 + (op3 + op4)); Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what we were searching for (if non-NULL) - rhs[0] is inspected in the first loop and has different handling for the MINUS_EXPR case. 2023-10-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/111845 tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary statements for the 4 operand addition or subtraction of 3 operands from 1 operand cases and remove them when successful. Look for nested additions even from rhs[2], not just rhs[1]. * gcc.dg/pr111845.c: New test. * gcc.target/i386/pr111845.c: New test.	2023-10-18 12:37:40 +02:00
Tobias Burnus	d3961765b5	nvptx: Use fatal_error when -march= is missing not an assert [PR111093] gcc/ChangeLog: PR target/111093 * config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error instead of an assert ICE when no -march= has been specified.	2023-10-18 12:23:38 +02:00
Iain Sandoe	a4184c8a65	Darwin: Check as for .build_version support and use it if available. This adds support for the minimum OS version data in assembler files. At present, we have no mechanism to detect the SDK version in use, and so that is omitted from build_versions. We follow the implementation in clang, '.build_version' is only emitted (where supported) for target macOS versions >= 10.14. For earlier macOS we fall back to using a '.macosx_version_min' directive. This latter is also emitted when the assembler supports it, but not build_version. gcc/ChangeLog: * config.in: Regenerate. * config/darwin.cc (darwin_file_start): Add assembler directives for the target OS version, where these are supported by the assembler. (darwin_override_options): Check for building >= macOS 10.14. * configure: Regenerate. * configure.ac: Check for assembler support of .build_version directives. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2023-10-18 10:36:22 +01:00
Tamar Christina	dd28f90c95	ifcvt: rewrite args handling to remove lookups This refactors the code to remove the args cache and index lookups in favor of a single structure. It also again, removes the use of std::sort as previously requested but avoids the new asserts in trunk. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove. (typedef struct ifcvt_arg_entry): New. (cmp_arg_entry): New. (gen_phi_arg_condition, gen_phi_nest_statement, predicate_scalar_phi): Use them.	2023-10-18 09:53:48 +01:00
Tamar Christina	04227acbe9	AArch64: Rewrite simd move immediate patterns to new syntax This rewrites the simd MOV patterns to use the new compact syntax. No change in semantics is expected. This will be needed in follow on patches. This also merges the splits into the define_insn which will also be needed soon. gcc/ChangeLog: PR tree-optimization/109154 * config/aarch64/aarch64-simd.md (aarch64_simd_mov<VDMOV:mode>): Rewrite to new syntax. (aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in splits.	2023-10-18 09:53:48 +01:00
Tamar Christina	b0fe8f2f96	middle-end: ifcvt: Allow any const IFN in conditional blocks When ifcvt was initially added masking was not a thing and as such it was rather conservative in what it supported. For builtins it only allowed C99 builtin functions which it knew it can fold away. These days the vectorizer is able to deal with needing to mask IFNs itself. vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after the operation to emulate the masking. This is then used by match.pd to conver the IFN into a masked variant if it's available. For these reasons the restriction in ifconvert is no longer require and we needless block vectorization when we can effectively handle the operations. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Note: This patch is part of a testseries and tests for it are added in the AArch64 patch that adds supports for the optab. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.	2023-10-18 09:53:47 +01:00
Tamar Christina	4b39aeef59	middle-end: Fold vec_cond into conditional ternary or binary operation when sharing operand [PR109154] When we have a vector conditional on a masked target which is doing a selection on the result of a conditional operation where one of the operands of the conditional operation is the other operand of the select, then we can fold the vector conditional into the operation. Concretely this transforms c = mask1 ? (masked_op mask2 a b) : b into c = masked_op (mask1 & mask2) a b The mask is then propagated upwards by the compiler. In the SVE case we don't end up needing a mask AND here since `mask2` will end up in the instruction creating `mask` which gives us a natural &. Such transformations are more common now in GCC 13+ as PRE has not started unsharing of common code in case it can make one branch fully independent. e.g. in this case `b` becomes a loop invariant value after PRE. This transformation removes the extra select for masked architectures but doesn't fix the general case. gcc/ChangeLog: PR tree-optimization/109154 * match.pd: Add new cond_op rule. gcc/testsuite/ChangeLog: PR tree-optimization/109154 * gcc.target/aarch64/sve/pre_cond_share_1.c: New test.	2023-10-18 09:53:47 +01:00
Xi Ruoyao	b588dcb77e	LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc During the review of an LLVM change [1], on LA464 we found that zeroing an fcc with fcmp.caf.s is much faster than a movgr2cf from $r0. [1]: https://github.com/llvm/llvm-project/pull/69300 gcc/ChangeLog: * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for zeroing a fcc.	2023-10-18 16:25:05 +08:00
Richard Biener	b0372ef12f	Re-instantiate integer mask to traditional vector mask support The following allows to pass integer mask data as traditional vector mask for OMP SIMD clone calls which is required due to the limited set of OMP SIMD clones in the x86 ABI when using AVX512 but a prefered vector size of 256 bits. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Relax check to again allow passing integer mode masks as traditional vectors.	2023-10-18 10:15:21 +02:00
Tamar Christina	60c231cb65	middle-end: maintain LCSSA throughout loop peeling This final patch updates peeling to maintain LCSSA all the way through. It's significantly easier to maintain it during peeling while we still know where all new edges connect rather than touching it up later as is currently being done. This allows us to remove many of the helper functions that touch up the loops at various parts. The only complication is for loop distribution where we should be able to use the same, however ldist depending on whether redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA form itself or removes are non-virtual phis. The problem here is that if we maintain LCSSA then in some cases the blocks connecting the two loops get PHIs to keep the loop IV up to date. However there is no loop, the guard condition is rewritten as 0 != 0, to the "loop" always exits. However due to the PHI nodes the probabilities get completely wrong. It seems to think that the impossible exit is the likely edge. This causes incorrect warnings and the presence of the PHIs prevent the blocks to be simplified. While it may be possible to make ldist work with LCSSA form, doing so seems more work than not. For that reason the peeling code has an additional parameter used by only ldist to not connect the two loops during peeling. This preserves the current behaviour from ldist until I can dive into the implementation more. Hopefully that's ok for now. gcc/ChangeLog: * tree-loop-distribution.cc (copy_loop_before): Request no LCSSA. * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional asserts. (slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling. (find_guard_arg): Look value up through explicit edge and original defs. (vect_do_peeling): Use it. (slpeel_update_phi_nodes_for_guard2): Take explicit exit edge. (slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops): Remove. * tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi. * tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add optional param to turn off LCSSA mode.	2023-10-18 09:03:06 +01:00
Tamar Christina	0c8522870e	middle-end: updated niters analysis to handle multiple exits. This second part updates niters analysis to be able to analyze any number of exits. If we have multiple exits we determine the main exit by finding the first counting IV. The change allows the vectorizer to pass analysis for multiple loops, but we later gracefully reject them. It does however allow us to test if the exit handling is using the right exit everywhere. Additionally since we analyze all exits, we now return all conditions for them and determine which condition belongs to the main exit. The main condition is needed because the vectorizer needs to ignore the main IV condition during vectorization as it will replace it during codegen. To track versioned loops we extend the contract between ifcvt and the vectorizer to store the exit number in aux so that we can match it up again during peeling. gcc/ChangeLog: * tree-if-conv.cc (tree_if_conversion): Record exits in aux. * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use it. * tree-vect-loop.cc (vect_get_loop_niters): Determine main exit. (vec_init_loop_exit_info): Extend analysis when multiple exits. (vect_analyze_loop_form): Record conds and determine main cond. (vect_create_loop_vinfo): Extend bookkeeping of conds. (vect_analyze_loop): Release conds. * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS, LOOP_VINFO_LOOP_IV_COND): New. (struct vect_loop_form_info): Add conds, alt_loop_conds; (struct loop_vec_info): Add conds, loop_iv_cond.	2023-10-18 09:02:40 +01:00
Tamar Christina	d65e38e616	middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables This is extracted out of the patch series to support early break vectorization in order to simplify the review of that patch series. The goal of this one is to separate out the refactoring from the new functionality. This first patch separates out the vectorizer's definition of an exit to their own values inside loop_vinfo. During vectorization we can have three separate copies for each loop: scalar, vectorized, epilogue. The scalar loop can also be the versioned loop before peeling. Because of this we track 3 different exits inside loop_vinfo corresponding to each of these loops. Additionally each function that uses an exit, when not obviously clear which exit is needed will now take the exit explicitly as an argument. This is because often times the callers switch the loops being passed around. While the caller knows which loops it is, the callee does not. For now the loop exits are simply initialized to same value as before determined by single_exit (..). No change in functionality is expected throughout this patch series. gcc/ChangeLog: * tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly. (loop_distribution::distribute_loop): Bail out of not single exit. * tree-scalar-evolution.cc (get_loop_exit_condition): New. * tree-scalar-evolution.h (get_loop_exit_condition): New. * tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit explicitly. * tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors, vect_set_loop_condition_partial_vectors_avx512, vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly take exit. (slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and return new peeled corresponding peeled exit. (slpeel_can_duplicate_loop_p): Explicitly take exit. (find_loop_location): Handle not knowing an explicit exit. (vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf, find_guard_arg, slpeel_update_phi_nodes_for_loops, slpeel_update_phi_nodes_for_guard2): Use new exits. (vect_do_peeling): Update bookkeeping to keep track of exits. * tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to analyze. (vec_init_loop_exit_info): New. (_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv, vec_epilogue_loop_iv, scalar_loop_iv. (vect_analyze_loop_form): Initialize exits. (vect_create_loop_vinfo): Set main exit. (vect_create_epilog_for_reduction, vectorizable_live_operation, vect_transform_loop): Use it. (scale_profile_for_vect_loop): Explicitly take exit to scale. * tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit. * tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT, LOOP_VINFO_SCALAR_IV_EXIT): New. (struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv, scalar_loop_iv. (vect_set_loop_condition, slpeel_can_duplicate_loop_p, slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits. (vec_init_loop_exit_info): New. (struct vect_loop_form_info): Add loop_exit.	2023-10-18 09:02:12 +01:00
Tamar Christina	46937e1b47	middle-end: refactor vectorizable_comparison to make the main body re-usable. Vectorization of a gcond starts off essentially the same as vectorizing a comparison witht he only difference being how the operands are extracted. This refactors vectorable_comparison such that we now have a generic function that can be used from vectorizable_early_break. The refactoring splits the gassign checks and actual validation/codegen off to a helper function. No change in functionality expected. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body to ... (vectorizable_comparison_1): ...This.	2023-10-18 09:01:41 +01:00
Juzhe-Zhong	c51040cb43	RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx This patch optimize this following permutation with consecutive patterns index: typedef char vnx16i __attribute__ ((vector_size (16))); #define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15 vnx16i __attribute__ ((noinline, noclone)) test_1 (vnx16i x, vnx16i y) { return __builtin_shufflevector (x, y, MASK_16); } Before this patch: lui a5,%hi(.LC0) addi a5,a5,%lo(.LC0) vsetivli zero,16,e8,m1,ta,ma vle8.v v3,0(a5) vle8.v v2,0(a1) vrgather.vv v1,v2,v3 vse8.v v1,0(a0) ret After this patch: vsetivli zero,16,e8,mf8,ta,ma vle8.v v2,0(a1) vsetivli zero,4,e32,mf2,ta,ma vrgather.vi v1,v2,3 vsetivli zero,16,e8,mf8,ta,ma vse8.v v1,0(a0) ret Overal reduce 1 instruction which is vector load instruction which is much more expansive than VL toggling. Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function. (expand_vec_perm_const_1): Add consecutive pattern recognition. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add new test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.	2023-10-18 15:58:53 +08:00
Tobias Burnus	372c5da215	fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL example gcc/fortran/ChangeLog: * intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to the example to make it safer.	2023-10-18 09:29:56 +02:00
Haochen Jiang	f019251ac9	Initial Panther Lake Support gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther Lake. * common/config/i386/i386-common.cc (processor_name): Ditto. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_PANTHERLAKE. * config.gcc: Add -march=pantherlake. * config/i386/driver-i386.cc (host_detect_local_cpu): Refactor the if clause. Handle pantherlake. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle pantherlake. * config/i386/i386-options.cc (processor_cost_table): Ditto. (m_PANTHERLAKE): New. (m_CORE_HYBRID): Add pantherlake. * config/i386/i386.h (enum processor_type): Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Ditto. * gcc.target/i386/funcspec-56.inc: Handle new march.	2023-10-18 14:40:59 +08:00
Haochen Jiang	2aa97c0da4	x86: Add m_CORE_HYBRID for hybrid clients tuning gcc/Changelog: * config/i386/i386-options.cc (m_CORE_HYBRID): New. * config/i386/x86-tune.def: Replace hybrid client tune to m_CORE_HYBRID.	2023-10-18 14:40:22 +08:00
Haochen Jiang	7370c479dd	Initial Clearwater Forest Support gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Clearwater Forest. * common/config/i386/i386-common.cc (processor_name): Add Clearwater Forest. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_CLEARWATERFOREST. * config.gcc: Add -march=clearwaterforest. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle clearwaterforest. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-options.cc (processor_cost_table): Ditto. (m_CLEARWATERFOREST): New. (m_CORE_ATOM): Add clearwaterforest. * config/i386/i386.h (enum processor_type): Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Ditto. * gcc.target/i386/funcspec-56.inc: Handle new march.	2023-10-18 14:39:53 +08:00
liuhongt	cead92b7fc	Support 32/64-bit vectorization for _Float16 fma related operations. gcc/ChangeLog: * config/i386/mmx.md (fma<mode>4): New expander. (fms<mode>4): Ditto. (fnma<mode>4): Ditto. (fnms<mode>4): Ditto. (vec_fmaddsubv4hf4): Ditto. (vec_fmsubaddv4hf4): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-fmaddsubhf-1.c: New test. * gcc.target/i386/part-vect-fmahf-1.c: New test.	2023-10-18 09:14:57 +08:00
Juzhe-Zhong	cf7739d4a6	RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832] Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html which is caused by assertion FAIL. When we enable more currents in rvv.exp with dynamic LMUL, such issue can be reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832 Now, we enable more tests in rvv.exp in this patch and fix the bug. PR target/111832 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.	2023-10-18 09:03:09 +08:00
GCC Administrator	fb69acffa9	Daily bump.	2023-10-18 00:17:58 +00:00
Richard Sandiford	773306e9ef	aarch64: Put LR save slot first in more cases Now that the prologue and epilogue code iterates over saved registers in offset order, we can put the LR save slot first without compromising LDP/STP formation. This isn't worthwhile when shadow call stacks are enabled, since the first two registers are also push/pop candidates, and LR cannot be popped when shadow call stacks are enabled. (LR is instead loaded first and compared against the shadow stack's value.) But otherwise, it seems better to put the LR save slot first, to reduce unnecessary variation with the layout for stack clash protection. gcc/ * config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make the position of the LR save slot dependent on stack clash protection unless shadow call stacks are enabled. gcc/testsuite/ * gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19. * gcc.target/aarch64/test_frame_4.c: Likewise. * gcc.target/aarch64/test_frame_7.c: Likewise. * gcc.target/aarch64/test_frame_10.c: Likewise.	2023-10-17 23:46:33 +01:00
Richard Sandiford	5758585080	aarch64: Use vecs to store register save order aarch64_save/restore_callee_saves looped over registers in register number order. This in turn meant that we could only use LDP and STP for registers that were consecutive both number-wise and offset-wise (after unsaved registers are excluded). This patch instead builds lists of the registers that we've decided to save, in offset order. We can then form LDP/STP pairs regardless of register number order, which in turn means that we can put the LR save slot first without losing LDP/STP opportunities. gcc/ * config/aarch64/aarch64.h (aarch64_frame): Add vectors that store the list saved GPRs, FPRs and predicate registers. * config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize the lists of saved registers. Use them to choose push candidates. Invalidate pop candidates if we're not going to do a pop. (aarch64_next_callee_save): Delete. (aarch64_save_callee_saves): Take a list of registers, rather than a range. Make !skip_wb select only write-back candidates. (aarch64_expand_prologue): Update calls accordingly. (aarch64_restore_callee_saves): Take a list of registers, rather than a range. Always skip pop candidates. Also skip LR if shadow call stacks are enabled. (aarch64_expand_epilogue): Update calls accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores to happen in offset order. * gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.	2023-10-17 23:46:33 +01:00
Richard Sandiford	aeb3f0436f	Handle epilogues that contain jumps The prologue/epilogue pass allows the prologue sequence to contain jumps. The sequence is then partitioned into basic blocks using find_many_sub_basic_blocks. This patch treats epilogues in a similar way. Since only one block might need to be split, the patch (re)introduces a find_sub_basic_blocks routine to handle a single block. The new routine hard-codes the assumption that split_block will chain the new block immediately after the original block. The routine doesn't try to replicate the fix for PR81030, since that was specific to gimple->rtl expansion. The patch is needed for follow-on aarch64 patches that add conditional code to the epilogue. The tests are part of those patches. gcc/ * cfgbuild.h (find_sub_basic_blocks): Declare. * cfgbuild.cc (update_profile_for_new_sub_basic_block): New function, split out from... (find_many_sub_basic_blocks): ...here. (find_sub_basic_blocks): New function. * function.cc (thread_prologue_and_epilogue_insns): Handle epilogues that contain jumps.	2023-10-17 23:45:43 +01:00
Andrew Pinski	5e4abf4233	ssa_name_has_boolean_range vs signed-boolean:31 types This turns out to be a latent bug in ssa_name_has_boolean_range where it would return true for all boolean types but all of the uses of ssa_name_has_boolean_range was expecting 0/1 as the range rather than [-1,0]. So when I fixed vector lower to do all comparisons in boolean_type rather than still in the signed-boolean:31 type (to fix a different issue), the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which was signed-boolean:31) had a range of [0,1] which broke down and sometimes gave us -1/-2 as values rather than what we were expecting of -1/0. This was the simpliest patch I found while testing. We have another way of matching [0,1] range which we could use instead of ssa_name_has_boolean_range except that uses only the global ranges rather than the local range (during VRP). I tried to clean this up slightly by using gimple_match_zero_one_valuedp inside ssa_name_has_boolean_range but that failed because due to using only the global ranges. I then tried to change get_nonzero_bits to use the local ranges at the optimization time but that failed also because we would remove branches to __builtin_unreachable during evrp and lose information as we don't set the global ranges during evrp. OK? Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/110817 gcc/ChangeLog: * tree-ssanames.cc (ssa_name_has_boolean_range): Remove the check for boolean type as they don't have "[0,1]" range. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110817-1.c: New test. * gcc.c-torture/execute/pr110817-2.c: New test. * gcc.c-torture/execute/pr110817-3.c: New test.	2023-10-17 22:44:19 +00:00
Marek Polacek	1fbb7d75ab	c++: accepts-invalid with =delete("") [PR111840] r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration so that we don't emit multiple errors in g++.dg/parse/error57.C. But that means we don't diagnose int f1() = delete("george_crumb"); anymore, because fn decls often have error_mark_node in their DECL_INITIAL. (The code may be allowed one day via https://wg21.link/P2573R0.) I was hoping I could use cp_parser_error_occurred but that would regress error57.C. PR c++/111840 gcc/cp/ChangeLog: * parser.cc (cp_parser_simple_declaration): Do cp_parser_error for FUNCTION_DECLs. gcc/testsuite/ChangeLog: * g++.dg/parse/error65.C: New test.	2023-10-17 17:44:59 -04:00
Marek Polacek	765c3b8f82	c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660] My recent patch introducing cp_fold_immediate_r caused exponential compile time with nested COND_EXPRs. The problem is that the COND_EXPR case recursively walks the arms of a COND_EXPR, but after processing both arms it doesn't end the walk; it proceeds to walk the sub-expressions of the outermost COND_EXPR, triggering again walking the arms of the nested COND_EXPR, and so on. This patch brings the compile time down to about 0m0.030s. The ff_fold_immediate flag is unused after this patch but since I'm using it in the P2564 patch, I'm not removing it now. Maybe at_eof can be used instead and then we can remove ff_fold_immediate. PR c++/111660 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't handle it here. (cp_fold_r): Handle COND_EXPR here. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/hog1.C: New test. * g++.dg/cpp2a/consteval36.C: New test.	2023-10-17 17:41:44 -04:00
Jason Merrill	bac21b7ea6	c++: mangling tweaks Most of this is introducing the abi_check function to reduce the verbosity of most places that check -fabi-version. The start_mangling change is to avoid needing to zero-initialize additional members of the mangling globals, though I'm not actually adding any. The comment documents existing semantics. gcc/cp/ChangeLog: * mangle.cc (abi_check): New. (write_prefix, write_unqualified_name, write_discriminator) (write_type, write_member_name, write_expression) (write_template_arg, write_template_param): Use it. (start_mangling): Assign from {}. * cp-tree.h: Update comment.	2023-10-17 17:20:02 -04:00
Nathaniel Shead	4f8700078c	c++: Add missing auto_diagnostic_groups to constexpr.cc gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing auto_diagnostic_group. (cxx_eval_call_expression): Likewise. (diag_array_subscript): Likewise. (outside_lifetime_error): Likewise. (potential_constant_expression_1): Likewise. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Marek Polacek <polacek@redhat.com>	2023-10-17 16:19:02 -04:00

1 2 3 4 5 ...

204669 Commits