Commit Graph

204669 Commits

Author SHA1 Message Date
Alex Coplan
583ca5f599 aarch64, testsuite: Prevent stp in lr_free_1.c
The test is looking for individual stores which are able to be merged
into stp instructions.  The test currently passes -fno-schedule-fusion
-fno-peephole2, presumably to prevent these stores from being turned
into stps, but this is no longer sufficient with the new ldp/stp fusion
pass.

As such, we add --param=aarch64-stp-policy=never to prevent stps being
formed.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/lr_free_1.c: Add
	--param=aarch64-stp-policy=never to dg-options.
2023-10-19 11:12:23 +01:00
Alex Coplan
505f1202e3 rtl-ssa: Support inferring uses of mem in change_insns
Currently, rtl_ssa::change_insns requires all new uses and defs to be
specified explicitly.  This turns out to be rather inconvenient for
forming load pairs in the new aarch64 load pair pass, as the pass has to
determine which mem def the final load pair consumes, and then obtain or
create a suitable use (i.e. significant bookkeeping, just to keep the
RTL-SSA IR consistent).  It turns out to be much more convenient to
allow change_insns to infer which def is consumed and create a suitable
use of mem itself.  This patch does that.

gcc/ChangeLog:

	* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
	parameter to give final insn position, infer use of mem if it isn't
	specified explicitly.
	(function_info::change_insns): Pass down final insn position to
	finalize_new_accesses.
	* rtl-ssa/functions.h: Add parameter to finalize_new_accesses.
2023-10-19 11:12:23 +01:00
Alex Coplan
ba230aa1b8 rtl-ssa: Add entry point to allow re-parenting uses
This is needed by the upcoming aarch64 load pair pass, as it can
re-order stores (when alias analysis determines this is safe) and thus
change which mem def a given use consumes (in the RTL-SSA view, there is
no alias disambiguation of memory).

gcc/ChangeLog:

	* rtl-ssa/accesses.cc (function_info::reparent_use): New.
	* rtl-ssa/functions.h (function_info): Declare new member
	function reparent_use.
2023-10-19 11:12:22 +01:00
Alex Coplan
c95aab23c1 rtl-ssa: Add drop_memory_access helper
Add a helper routine to access-utils.h which removes the memory access
from an access_array, if it has one.

gcc/ChangeLog:

	* rtl-ssa/access-utils.h (drop_memory_access): New.
2023-10-19 11:12:22 +01:00
Alex Coplan
c338083377 rtl-ssa: Fix bug in function_info::add_insn_after
In the case that !insn->is_debug_insn () && next->is_debug_insn (), this
function was missing an update of the prev pointer on the first nondebug
insn following the sequence of debug insns starting at next.

This can lead to corruption of the insn chain, in that we end up with:

  insn->next_any_insn ()->prev_any_insn () != insn

in this case.  This patch fixes that.

gcc/ChangeLog:

	* rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we
	update the prev pointer on the following nondebug insn in the
	case that !insn->is_debug_insn () && next->is_debug_insn ().
2023-10-19 11:12:22 +01:00
Haochen Jiang
faa0e82b40 x86: Correct ISA enabled for clients since Arrow Lake
gcc/ChangeLog:

	* config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
	Also make Clearwater Forest depends on Sierra Forest.
	* config/i386/i386-options.cc: Revise the order of the macro
	definition to avoid confusion.
	* doc/extend.texi: Revise documentation.
	* doc/invoke.texi: Correct documentation.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/funcspec-56.inc: Group Clearwater Forest
	with atom cores.
2023-10-19 17:08:36 +08:00
Andrew Stubbs
56ed1055b2 amdgcn: deprecate Fiji device and multilib
LLVM wants to remove it, which breaks our build.  This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.

I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.

gcc/ChangeLog:

	* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
	Implement support for --with-multilib-list.
	* config/gcn/t-gcn-hsa: Likewise.
	* doc/install.texi: Likewise.
	* doc/invoke.texi: Mark Fiji deprecated.
2023-10-19 09:46:57 +01:00
Jiahao Xu
8f4bbdc28d LoongArch:Implement the new vector cost model framework.
This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to aarch64 and
rs6000 port.

The patch also reduces the cost of unaligned stores, making it equal to the
cost of aligned ones in order to avoid odd alignment peeling.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from
	vector_costs.  Add a constructor.
	(loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to
	adjust the cost for inner loops.
	(loongarch_vector_costs::count_operations): New function.
	(loongarch_vector_costs::determine_suggested_unroll_factor): Ditto.
	(loongarch_vector_costs::finish_cost): Ditto.
	(loongarch_builtin_vectorization_cost): Adjust.
	* config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter.
	(loongarcg-vect-issue-info): Ditto.
	(mmemvec-cost): Delete.
	* config/loongarch/genopts/loongarch.opt.in
	(loongarch-vect-unroll-limit): Ditto.
	(loongarcg-vect-issue-info): Ditto.
	(mmemvec-cost): Delete.
	* doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.
2023-10-19 14:15:38 +08:00
Jiahao Xu
08813894fd LoongArch:Implement vec_widen standard names.
Add support for vec_widen lo/hi patterns.  These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.

gcc/ChangeLog:

	* config/loongarch/lasx.md
	(vec_widen_<su>mult_even_v8si): New patterns.
	(vec_widen_<su>add_hi_<mode>): Ditto.
	(vec_widen_<su>add_lo_<mode>): Ditto.
	(vec_widen_<su>sub_hi_<mode>): Ditto.
	(vec_widen_<su>sub_lo_<mode>): Ditto.
	(vec_widen_<su>mult_hi_<mode>): Ditto.
	(vec_widen_<su>mult_lo_<mode>): Ditto.
	* config/loongarch/loongarch.md (u_bool): New iterator.
	* config/loongarch/loongarch-protos.h
	(loongarch_expand_vec_widen_hilo): New prototype.
	* config/loongarch/loongarch.cc
	(loongarch_expand_vec_interleave): New function.
	(loongarch_expand_vec_widen_hilo): New function.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vect-widen-add.c: New test.
	* gcc.target/loongarch/vect-widen-mul.c: New test.
	* gcc.target/loongarch/vect-widen-sub.c: New test.
2023-10-19 14:15:35 +08:00
Jiahao Xu
a7b7284fe1 LoongArch:Implement avg and sad standard names.
gcc/ChangeLog:

	* config/loongarch/lasx.md
	(avg<mode>3_ceil): New patterns.
	(uavg<mode>3_ceil): Ditto.
	(avg<mode>3_floor): Ditto.
	(uavg<mode>3_floor): Ditto.
	(usadv32qi): Ditto.
	(ssadv32qi): Ditto.
	* config/loongarch/lsx.md
	(avg<mode>3_ceil): New patterns.
	(uavg<mode>3_ceil): Ditto.
	(avg<mode>3_floor): Ditto.
	(uavg<mode>3_floor): Ditto.
	(usadv16qi): Ditto.
	(ssadv16qi): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/avg-ceil-lasx.c: New test.
	* gcc.target/loongarch/avg-ceil-lsx.c: New test.
	* gcc.target/loongarch/avg-floor-lasx.c: New test.
	* gcc.target/loongarch/avg-floor-lsx.c: New test.
	* gcc.target/loongarch/sad-lasx.c: New test.
	* gcc.target/loongarch/sad-lsx.c: New test.
2023-10-19 14:15:31 +08:00
GCC Administrator
0308461d9d Daily bump. 2023-10-19 00:18:05 +00:00
Andrew Pinski
b20dbddcc4 Fix expansion of (a & 2) != 1
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

	PR middle-end/111863

gcc/ChangeLog:

	* expr.cc (do_store_flag): Don't over write arg0
	when stripping off `& POW2`.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr111863-1.c: New test.
2023-10-18 15:11:39 -07:00
Andrew Pinski
879c91fccc [c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not checking for error
When checking to see if we have a function declaration has a conflict due to
promotations, there is no test to see if the type was an error mark and then calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an
ICE.

This adds a check for error before the call of c_type_promotes_to.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR c/101364

gcc/c/ChangeLog:

	* c-decl.cc (diagnose_arglist_conflict): Test for
	error mark before calling of c_type_promotes_to.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr101364-1.c: New test.
2023-10-18 15:11:39 -07:00
Andrew Pinski
11e6bcedb4 Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node
This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for error operand for its
arguments before getting the main variant.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR c/101285

gcc/c/ChangeLog:

	* c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
	operands early.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr101285-1.c: New test.
2023-10-18 15:11:38 -07:00
Prathamesh Kulkarni
3ec8ecb8e9 PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
gcc/ChangeLog:
	PR tree-optimization/111648
	* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
	chooses base element from arg, ensure that it's a natural stepped
	sequence.
	(build_vec_cst_rand): New param natural_stepped and use it to
	construct a naturally stepped sequence.
	(test_nunits_min_2): Add new unit tests Case 6 and Case 7.
2023-10-19 00:29:38 +05:30
Dimitar Dimitrov
fe9767eedc pru: Implement TARGET_INSN_COST
This patch slightly improves the embench-iot benchmark score for
PRU code size.  There is also small improvement in a few real-world
firmware programs.

  Embench-iot size
  ------------------------------------------
  Benchmark          before   after    delta
  ---------           ----    ----     -----
  aha-mont64          4.15    4.15         0
  crc32               6.04    6.04         0
  cubic              21.64   21.62     -0.02
  edn                 6.37    6.37         0
  huffbench          18.63   18.55     -0.08
  matmult-int         5.44    5.44         0
  md5sum             25.56   25.43     -0.13
  minver             12.82   12.76     -0.06
  nbody              15.09   14.97     -0.12
  nettle-aes          4.75    4.75         0
  nettle-sha256       4.67    4.67         0
  nsichneu            3.77    3.77         0
  picojpeg            4.11    4.11         0
  primecount          7.90    7.90         0
  qrduino             7.18    7.16     -0.02
  sglib-combined     13.63   13.59     -0.04
  slre                5.19    5.19         0
  st                 14.23   14.12     -0.11
  statemate           2.34    2.34         0
  tarfind            36.85   36.64     -0.21
  ud                 10.51   10.46     -0.05
  wikisort            7.44    7.41     -0.03
  ---------          -----   -----
  Geometric mean      8.42    8.40     -0.02
  Geometric SD        2.00    2.00         0
  Geometric range    12.68   12.62     -0.06

gcc/ChangeLog:

	* config/pru/pru.cc (pru_insn_cost): New function.
	(TARGET_INSN_COST): Define for PRU.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2023-10-18 20:16:20 +03:00
Georg-Johann Lay
67f7bf78ba LibF7: Implement mul_mant for devices without MUL instruction.
libgcc/config/avr/libf7/
	* libf7-asm.sx (mul_mant): Implement for devices without MUL.
	* asm-defs.h (wmov) [!HAVE_MUL]: Fix regno computation.
	* t-libf7 (F7_ASM_FLAGS): Add -g0.
2023-10-18 19:00:09 +02:00
Andrew Carlotti
ff05a3e91d aarch64: Replace duplicated selftests
gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
	Test <= instead of testing < twice.
2023-10-18 16:23:24 +01:00
Jakub Jelinek
bc4bd69faf cse: Workaround GCC < 5 bug in cse_insn [PR111852]
Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile
cse.cc which contains a variable with rtx_def type, because rtx_def
contains a union with poly_uint16 element.  poly_int template has
defaulted default constructor and a variadic template constructor which
could have empty parameter pack. GCC < 5 treated it as non-trivially
constructible class and deleted rtunion and rtx_def default constructors.

For the cse_insn purposes, all we need is a variable with size and alignment
of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and
fill in like rtx is normally allocated from heap, so this patch for
GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111852
	* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
	using rtx_def type for memory_extend_buf, use unsigned char
	arrayy with size of rtx_def and its alignment.
2023-10-18 17:01:26 +02:00
Jason Merrill
ef10cb8683 diagnostic: add permerror variants with opt
In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag.  This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.

So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.

The tests check desired behavior for such a permerror in a system header
with various flags.  The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.

This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).

gcc/ChangeLog:

	* doc/invoke.texi: Move -fpermissive to Warning Options.
	* diagnostic.cc (update_effective_level_from_pragmas): Remove
	redundant system header check.
	(diagnostic_report_diagnostic): Move down syshdr/-w check.
	(diagnostic_impl): Handle DK_PERMERROR with an option number.
	(permerror): Add new overloads.
	* diagnostic-core.h (permerror): Declare them.

gcc/cp/ChangeLog:

	* typeck2.cc (check_narrowing): Use permerror.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/integer-pack2.C: Add -fpermissive.
	* g++.dg/diagnostic/sys-narrow.h: New test.
	* g++.dg/diagnostic/sys-narrow1.C: New test.
	* g++.dg/diagnostic/sys-narrow1a.C: New test.
	* g++.dg/diagnostic/sys-narrow1b.C: New test.
	* g++.dg/diagnostic/sys-narrow1c.C: New test.
	* g++.dg/diagnostic/sys-narrow1d.C: New test.
	* g++.dg/diagnostic/sys-narrow1e.C: New test.
	* g++.dg/diagnostic/sys-narrow1f.C: New test.
	* g++.dg/diagnostic/sys-narrow1g.C: New test.
	* g++.dg/diagnostic/sys-narrow1h.C: New test.
	* g++.dg/diagnostic/sys-narrow1i.C: New test.
2023-10-18 10:00:25 -04:00
Tobias Burnus
af4bb22115 OpenMP: Avoid ICE with LTO and 'omp allocate'
gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute
	to avoid that auxillary statement list reaches LTO.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-13a.f90: New test.
2023-10-18 13:05:35 +02:00
Jakub Jelinek
f1744dd50b tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845]
GCC ICEs on the first testcase.  Successful match_uaddc_usubc ends up with
some dead stmts which DCE will remove (hopefully) later all.
The ICE is because one of the dead stmts refers to a freed SSA_NAME.
The code already gsi_removes a couple of stmts in the
  /* Remove some statements which can't be kept in the IL because they
     use SSA_NAME whose setter is going to be removed too.  */
section for the same reason (the reason for the freed SSA_NAMEs is that
we don't really have a replacement for those cases - all we have after
a match is combined overflow from the addition/subtraction of 2 operands + a
[0, 1] carry in, but not the individual overflows from the former 2
additions), but for the last (most significant) limb case, where we try
to match x = op1 + op2 + carry1 + carry2; or
x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but
left around the 2 temporary stmts as dead; if we were unlucky enough that
those referenced the carry flag that went away, it ICEs.

So, the following patch remembers those temporary statements (rather than
trying to rediscover them more expensively) and removes them before the
final one is replaced.

While working on it, I've noticed we didn't support all the reassociated
possibilities of writing the addition of 4 operands or subtracting 3
operands from one, we supported e.g.
x = ((op1 + op2) + op3) + op4;
x = op1 + ((op2 + op3) + op4);
but not
x = (op1 + (op2 + op3)) + op4;
x = op1 + (op2 + (op3 + op4));
Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what
we were searching for (if non-NULL) - rhs[0] is inspected in the first
loop and has different handling for the MINUS_EXPR case.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/111845
	* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary
	statements for the 4 operand addition or subtraction of 3 operands
	from 1 operand cases and remove them when successful.  Look for
	nested additions even from rhs[2], not just rhs[1].

	* gcc.dg/pr111845.c: New test.
	* gcc.target/i386/pr111845.c: New test.
2023-10-18 12:37:40 +02:00
Tobias Burnus
d3961765b5 nvptx: Use fatal_error when -march= is missing not an assert [PR111093]
gcc/ChangeLog:

	PR target/111093
	* config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error
	instead of an assert ICE when no -march= has been specified.
2023-10-18 12:23:38 +02:00
Iain Sandoe
a4184c8a65 Darwin: Check as for .build_version support and use it if available.
This adds support for the minimum OS version data in assembler files.
At present, we have no mechanism to detect the SDK version in use, and
so that is omitted from build_versions.

We follow the implementation in clang, '.build_version' is only emitted
(where supported) for target macOS versions >= 10.14.  For earlier macOS
we fall back to using a '.macosx_version_min' directive.  This latter is
also emitted when the assembler supports it, but not build_version.

gcc/ChangeLog:

	* config.in: Regenerate.
	* config/darwin.cc (darwin_file_start): Add assembler directives
	for the target OS version, where these are supported by the
	assembler.
	(darwin_override_options): Check for building >= macOS 10.14.
	* configure: Regenerate.
	* configure.ac: Check for assembler support of .build_version
	directives.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-18 10:36:22 +01:00
Tamar Christina
dd28f90c95 ifcvt: rewrite args handling to remove lookups
This refactors the code to remove the args cache and index lookups
in favor of a single structure. It also again, removes the use of
std::sort as previously requested but avoids the new asserts in
trunk.

gcc/ChangeLog:

	PR tree-optimization/109154
	* tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
	(typedef struct ifcvt_arg_entry): New.
	(cmp_arg_entry): New.
	(gen_phi_arg_condition, gen_phi_nest_statement,
	predicate_scalar_phi): Use them.
2023-10-18 09:53:48 +01:00
Tamar Christina
04227acbe9 AArch64: Rewrite simd move immediate patterns to new syntax
This rewrites the simd MOV patterns to use the new compact syntax.
No change in semantics is expected.  This will be needed in follow on patches.

This also merges the splits into the define_insn which will also be needed soon.

gcc/ChangeLog:

	PR tree-optimization/109154
	* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
	Rewrite to new syntax.
	(*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in
	splits.
2023-10-18 09:53:48 +01:00
Tamar Christina
b0fe8f2f96 middle-end: ifcvt: Allow any const IFN in conditional blocks
When ifcvt was initially added masking was not a thing and as such it was
rather conservative in what it supported.

For builtins it only allowed C99 builtin functions which it knew it can fold
away.

These days the vectorizer is able to deal with needing to mask IFNs itself.
vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
the operation to emulate the masking.

This is then used by match.pd to conver the IFN into a masked variant if it's
available.

For these reasons the restriction in ifconvert is no longer require and we
needless block vectorization when we can effectively handle the operations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

gcc/ChangeLog:

	PR tree-optimization/109154
	* tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
2023-10-18 09:53:47 +01:00
Tamar Christina
4b39aeef59 middle-end: Fold vec_cond into conditional ternary or binary operation when sharing operand [PR109154]
When we have a vector conditional on a masked target which is doing a selection
on the result of a conditional operation where one of the operands of the
conditional operation is the other operand of the select, then we can fold the
vector conditional into the operation.

Concretely this transforms

  c = mask1 ? (masked_op mask2 a b) : b

into

  c = masked_op (mask1 & mask2) a b

The mask is then propagated upwards by the compiler.  In the SVE case we don't
end up needing a mask AND here since `mask2` will end up in the instruction
creating `mask` which gives us a natural &.

Such transformations are more common now in GCC 13+ as PRE has not started
unsharing of common code in case it can make one branch fully independent.

e.g. in this case `b` becomes a loop invariant value after PRE.

This transformation removes the extra select for masked architectures but
doesn't fix the general case.

gcc/ChangeLog:

	PR tree-optimization/109154
	* match.pd: Add new cond_op rule.

gcc/testsuite/ChangeLog:

	PR tree-optimization/109154
	* gcc.target/aarch64/sve/pre_cond_share_1.c: New test.
2023-10-18 09:53:47 +01:00
Xi Ruoyao
b588dcb77e
LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc
During the review of an LLVM change [1], on LA464 we found that zeroing
an fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

	* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
	zeroing a fcc.
2023-10-18 16:25:05 +08:00
Richard Biener
b0372ef12f Re-instantiate integer mask to traditional vector mask support
The following allows to pass integer mask data as traditional
vector mask for OMP SIMD clone calls which is required due to
the limited set of OMP SIMD clones in the x86 ABI when using
AVX512 but a prefered vector size of 256 bits.

	* tree-vect-stmts.cc (vectorizable_simd_clone_call):
	Relax check to again allow passing integer mode masks
	as traditional vectors.
2023-10-18 10:15:21 +02:00
Tamar Christina
60c231cb65 middle-end: maintain LCSSA throughout loop peeling
This final patch updates peeling to maintain LCSSA all the way through.

It's significantly easier to maintain it during peeling while we still know
where all new edges connect rather than touching it up later as is currently
being done.

This allows us to remove many of the helper functions that touch up the loops
at various parts.  The only complication is for loop distribution where we
should be able to use the same,  however ldist depending on whether
redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
form itself or removes are non-virtual phis.

The problem here is that if we maintain LCSSA then in some cases the blocks
connecting the two loops get PHIs to keep the loop IV up to date.

However there is no loop, the guard condition is rewritten as 0 != 0, to the
"loop" always exits.   However due to the PHI nodes the probabilities get
completely wrong.  It seems to think that the impossible exit is the likely
edge.  This causes incorrect warnings and the presence of the PHIs prevent the
blocks to be simplified.

While it may be possible to make ldist work with LCSSA form, doing so seems more
work than not.  For that reason the peeling code has an additional parameter
used by only ldist to not connect the two loops during peeling.

This preserves the current behaviour from ldist until I can dive into the
implementation more.  Hopefully that's ok for now.

gcc/ChangeLog:

	* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
	asserts.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
	(find_guard_arg): Look value up through explicit edge and original defs.
	(vect_do_peeling): Use it.
	(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
	(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
	Remove.
	* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
	* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
	optional param to turn off LCSSA mode.
2023-10-18 09:03:06 +01:00
Tamar Christina
0c8522870e middle-end: updated niters analysis to handle multiple exits.
This second part updates niters analysis to be able to analyze any number of
exits.  If we have multiple exits we determine the main exit by finding the
first counting IV.

The change allows the vectorizer to pass analysis for multiple loops, but we
later gracefully reject them.  It does however allow us to test if the exit
handling is using the right exit everywhere.

Additionally since we analyze all exits, we now return all conditions for them
and determine which condition belongs to the main exit.

The main condition is needed because the vectorizer needs to ignore the main IV
condition during vectorization as it will replace it during codegen.

To track versioned loops we extend the contract between ifcvt and the vectorizer
to store the exit number in aux so that we can match it up again during peeling.

gcc/ChangeLog:

	* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
	it.
	* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
	(vec_init_loop_exit_info): Extend analysis when multiple exits.
	(vect_analyze_loop_form): Record conds and determine main cond.
	(vect_create_loop_vinfo): Extend bookkeeping of conds.
	(vect_analyze_loop): Release conds.
	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
	LOOP_VINFO_LOOP_IV_COND):  New.
	(struct vect_loop_form_info): Add conds, alt_loop_conds;
	(struct loop_vec_info): Add conds, loop_iv_cond.
2023-10-18 09:02:40 +01:00
Tamar Christina
d65e38e616 middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
This is extracted out of the patch series to support early break vectorization
in order to simplify the review of that patch series.

The goal of this one is to separate out the refactoring from the new
functionality.

This first patch separates out the vectorizer's definition of an exit to their
own values inside loop_vinfo.  During vectorization we can have three separate
copies for each loop: scalar, vectorized, epilogue.  The scalar loop can also be
the versioned loop before peeling.

Because of this we track 3 different exits inside loop_vinfo corresponding to
each of these loops.  Additionally each function that uses an exit, when not
obviously clear which exit is needed will now take the exit explicitly as an
argument.

This is because often times the callers switch the loops being passed around.
While the caller knows which loops it is, the callee does not.

For now the loop exits are simply initialized to same value as before determined
by single_exit (..).

No change in functionality is expected throughout this patch series.

gcc/ChangeLog:

	* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
	(loop_distribution::distribute_loop): Bail out of not single exit.
	* tree-scalar-evolution.cc (get_loop_exit_condition): New.
	* tree-scalar-evolution.h (get_loop_exit_condition): New.
	* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
	explicitly.
	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
	vect_set_loop_condition_partial_vectors_avx512,
	vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
	take exit.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
	return new peeled corresponding peeled exit.
	(slpeel_can_duplicate_loop_p): Explicitly take exit.
	(find_loop_location): Handle not knowing an explicit exit.
	(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
	find_guard_arg, slpeel_update_phi_nodes_for_loops,
	slpeel_update_phi_nodes_for_guard2): Use new exits.
	(vect_do_peeling): Update bookkeeping to keep track of exits.
	* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
	analyze.
	(vec_init_loop_exit_info): New.
	(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
	vec_epilogue_loop_iv, scalar_loop_iv.
	(vect_analyze_loop_form): Initialize exits.
	(vect_create_loop_vinfo): Set main exit.
	(vect_create_epilog_for_reduction, vectorizable_live_operation,
	vect_transform_loop): Use it.
	(scale_profile_for_vect_loop): Explicitly take exit to scale.
	* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
	* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
	LOOP_VINFO_SCALAR_IV_EXIT): New.
	(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
	scalar_loop_iv.
	(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
	slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
	(vec_init_loop_exit_info): New.
	(struct vect_loop_form_info): Add loop_exit.
2023-10-18 09:02:12 +01:00
Tamar Christina
46937e1b47 middle-end: refactor vectorizable_comparison to make the main body re-usable.
Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.

This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break.  The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.

No change in functionality expected.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
	to ...
	(vectorizable_comparison_1): ...This.
2023-10-18 09:01:41 +01:00
Juzhe-Zhong
c51040cb43 RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx
This patch optimize this following permutation with consecutive patterns index:

typedef char vnx16i __attribute__ ((vector_size (16)));

#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15

vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
  return __builtin_shufflevector (x, y, MASK_16);
}

Before this patch:

        lui     a5,%hi(.LC0)
        addi    a5,a5,%lo(.LC0)
        vsetivli        zero,16,e8,m1,ta,ma
        vle8.v  v3,0(a5)
        vle8.v  v2,0(a1)
        vrgather.vv     v1,v2,v3
        vse8.v  v1,0(a0)
        ret

After this patch:

	vsetivli	zero,16,e8,mf8,ta,ma
	vle8.v	v2,0(a1)
	vsetivli	zero,4,e32,mf2,ta,ma
	vrgather.vi	v1,v2,3
	vsetivli	zero,16,e8,mf8,ta,ma
	vse8.v	v1,0(a0)
	ret

Overal reduce 1 instruction which is vector load instruction which is much more expansive
than VL toggling.

Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption.

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function.
	(expand_vec_perm_const_1): Add consecutive pattern recognition.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls/def.h: Add new test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test.
	* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test.
	* gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.
2023-10-18 15:58:53 +08:00
Tobias Burnus
372c5da215 fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL example
gcc/fortran/ChangeLog:

	* intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to
	the example to make it safer.
2023-10-18 09:29:56 +02:00
Haochen Jiang
f019251ac9 Initial Panther Lake Support
gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther
	Lake.
	* common/config/i386/i386-common.cc (processor_name):
	Ditto.
	(processor_alias_table): Ditto.
	* common/config/i386/i386-cpuinfo.h (enum processor_types):
	Add INTEL_PANTHERLAKE.
	* config.gcc: Add -march=pantherlake.
	* config/i386/driver-i386.cc (host_detect_local_cpu): Refactor
	the if clause. Handle pantherlake.
	* config/i386/i386-c.cc (ix86_target_macros_internal):
	Handle pantherlake.
	* config/i386/i386-options.cc (processor_cost_table): Ditto.
	(m_PANTHERLAKE): New.
	(m_CORE_HYBRID): Add pantherlake.
	* config/i386/i386.h (enum processor_type): Ditto.
	* doc/extend.texi: Ditto.
	* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

	* g++.target/i386/mv16.C: Ditto.
	* gcc.target/i386/funcspec-56.inc: Handle new march.
2023-10-18 14:40:59 +08:00
Haochen Jiang
2aa97c0da4 x86: Add m_CORE_HYBRID for hybrid clients tuning
gcc/Changelog:

	* config/i386/i386-options.cc (m_CORE_HYBRID): New.
	* config/i386/x86-tune.def: Replace hybrid client tune to
	m_CORE_HYBRID.
2023-10-18 14:40:22 +08:00
Haochen Jiang
7370c479dd Initial Clearwater Forest Support
gcc/ChangeLog:

	* common/config/i386/cpuinfo.h
	(get_intel_cpu): Handle Clearwater Forest.
	* common/config/i386/i386-common.cc (processor_name):
	Add Clearwater Forest.
	(processor_alias_table): Ditto.
	* common/config/i386/i386-cpuinfo.h (enum processor_types):
	Add INTEL_CLEARWATERFOREST.
	* config.gcc: Add -march=clearwaterforest.
	* config/i386/driver-i386.cc (host_detect_local_cpu): Handle
	clearwaterforest.
	* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
	* config/i386/i386-options.cc (processor_cost_table): Ditto.
	(m_CLEARWATERFOREST): New.
	(m_CORE_ATOM): Add clearwaterforest.
	* config/i386/i386.h (enum processor_type): Ditto.
	* doc/extend.texi: Ditto.
	* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

	* g++.target/i386/mv16.C: Ditto.
	* gcc.target/i386/funcspec-56.inc: Handle new march.
2023-10-18 14:39:53 +08:00
liuhongt
cead92b7fc Support 32/64-bit vectorization for _Float16 fma related operations.
gcc/ChangeLog:

	* config/i386/mmx.md (fma<mode>4): New expander.
	(fms<mode>4): Ditto.
	(fnma<mode>4): Ditto.
	(fnms<mode>4): Ditto.
	(vec_fmaddsubv4hf4): Ditto.
	(vec_fmsubaddv4hf4): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/part-vect-fmaddsubhf-1.c: New test.
	* gcc.target/i386/part-vect-fmahf-1.c: New test.
2023-10-18 09:14:57 +08:00
Juzhe-Zhong
cf7739d4a6 RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html

which is caused by assertion FAIL.

When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832

Now, we enable more tests in rvv.exp in this patch and fix the bug.

	PR target/111832

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.
2023-10-18 09:03:09 +08:00
GCC Administrator
fb69acffa9 Daily bump. 2023-10-18 00:17:58 +00:00
Richard Sandiford
773306e9ef aarch64: Put LR save slot first in more cases
Now that the prologue and epilogue code iterates over saved
registers in offset order, we can put the LR save slot first
without compromising LDP/STP formation.

This isn't worthwhile when shadow call stacks are enabled, since the
first two registers are also push/pop candidates, and LR cannot be
popped when shadow call stacks are enabled.  (LR is instead loaded
first and compared against the shadow stack's value.)

But otherwise, it seems better to put the LR save slot first,
to reduce unnecessary variation with the layout for stack clash
protection.

gcc/
	* config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make
	the position of the LR save slot dependent on stack clash
	protection unless shadow call stacks are enabled.

gcc/testsuite/
	* gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19.
	* gcc.target/aarch64/test_frame_4.c: Likewise.
	* gcc.target/aarch64/test_frame_7.c: Likewise.
	* gcc.target/aarch64/test_frame_10.c: Likewise.
2023-10-17 23:46:33 +01:00
Richard Sandiford
5758585080 aarch64: Use vecs to store register save order
aarch64_save/restore_callee_saves looped over registers in register
number order.  This in turn meant that we could only use LDP and STP
for registers that were consecutive both number-wise and
offset-wise (after unsaved registers are excluded).

This patch instead builds lists of the registers that we've decided to
save, in offset order.  We can then form LDP/STP pairs regardless of
register number order, which in turn means that we can put the LR save
slot first without losing LDP/STP opportunities.

gcc/
	* config/aarch64/aarch64.h (aarch64_frame): Add vectors that
	store the list saved GPRs, FPRs and predicate registers.
	* config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize
	the lists of saved registers.  Use them to choose push candidates.
	Invalidate pop candidates if we're not going to do a pop.
	(aarch64_next_callee_save): Delete.
	(aarch64_save_callee_saves): Take a list of registers,
	rather than a range.  Make !skip_wb select only write-back
	candidates.
	(aarch64_expand_prologue): Update calls accordingly.
	(aarch64_restore_callee_saves): Take a list of registers,
	rather than a range.  Always skip pop candidates.  Also skip
	LR if shadow call stacks are enabled.
	(aarch64_expand_epilogue): Update calls accordingly.

gcc/testsuite/
	* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores
	to happen in offset order.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
2023-10-17 23:46:33 +01:00
Richard Sandiford
aeb3f0436f Handle epilogues that contain jumps
The prologue/epilogue pass allows the prologue sequence to contain
jumps.  The sequence is then partitioned into basic blocks using
find_many_sub_basic_blocks.

This patch treats epilogues in a similar way.  Since only one block
might need to be split, the patch (re)introduces a find_sub_basic_blocks
routine to handle a single block.

The new routine hard-codes the assumption that split_block will chain
the new block immediately after the original block.  The routine doesn't
try to replicate the fix for PR81030, since that was specific to
gimple->rtl expansion.

The patch is needed for follow-on aarch64 patches that add conditional
code to the epilogue.  The tests are part of those patches.

gcc/
	* cfgbuild.h (find_sub_basic_blocks): Declare.
	* cfgbuild.cc (update_profile_for_new_sub_basic_block): New function,
	split out from...
	(find_many_sub_basic_blocks): ...here.
	(find_sub_basic_blocks): New function.
	* function.cc (thread_prologue_and_epilogue_insns): Handle
	epilogues that contain jumps.
2023-10-17 23:45:43 +01:00
Andrew Pinski
5e4abf4233 ssa_name_has_boolean_range vs signed-boolean:31 types
This turns out to be a latent bug in ssa_name_has_boolean_range
where it would return true for all boolean types but all of the
uses of ssa_name_has_boolean_range was expecting 0/1 as the range
rather than [-1,0].
So when I fixed vector lower to do all comparisons in boolean_type
rather than still in the signed-boolean:31 type (to fix a different issue),
the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
was signed-boolean:31) had a range of [0,1] which broke down and sometimes
gave us -1/-2 as values rather than what we were expecting of -1/0.

This was the simpliest patch I found while testing.

We have another way of matching [0,1] range which we could use instead
of ssa_name_has_boolean_range except that uses only the global ranges
rather than the local range (during VRP).
I tried to clean this up slightly by using gimple_match_zero_one_valuedp
inside ssa_name_has_boolean_range but that failed because due to using
only the global ranges. I then tried to change get_nonzero_bits to use
the local ranges at the optimization time but that failed also because
we would remove branches to __builtin_unreachable during evrp and lose
information as we don't set the global ranges during evrp.

OK? Bootstrapped and tested on x86_64-linux-gnu.

	PR tree-optimization/110817

gcc/ChangeLog:

	* tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
	check for boolean type as they don't have "[0,1]" range.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr110817-1.c: New test.
	* gcc.c-torture/execute/pr110817-2.c: New test.
	* gcc.c-torture/execute/pr110817-3.c: New test.
2023-10-17 22:44:19 +00:00
Marek Polacek
1fbb7d75ab c++: accepts-invalid with =delete("") [PR111840]
r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration
so that we don't emit multiple errors in g++.dg/parse/error57.C.
But that means we don't diagnose

  int f1() = delete("george_crumb");

anymore, because fn decls often have error_mark_node in their
DECL_INITIAL.  (The code may be allowed one day via https://wg21.link/P2573R0.)

I was hoping I could use cp_parser_error_occurred but that would
regress error57.C.

	PR c++/111840

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_simple_declaration): Do cp_parser_error
	for FUNCTION_DECLs.

gcc/testsuite/ChangeLog:

	* g++.dg/parse/error65.C: New test.
2023-10-17 17:44:59 -04:00
Marek Polacek
765c3b8f82 c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.030s.

The ff_fold_immediate flag is unused after this patch but since I'm
using it in the P2564 patch, I'm not removing it now.  Maybe at_eof
can be used instead and then we can remove ff_fold_immediate.

	PR c++/111660

gcc/cp/ChangeLog:

	* cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't
	handle it here.
	(cp_fold_r): Handle COND_EXPR here.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/hog1.C: New test.
	* g++.dg/cpp2a/consteval36.C: New test.
2023-10-17 17:41:44 -04:00
Jason Merrill
bac21b7ea6 c++: mangling tweaks
Most of this is introducing the abi_check function to reduce the verbosity
of most places that check -fabi-version.

The start_mangling change is to avoid needing to zero-initialize additional
members of the mangling globals, though I'm not actually adding any.

The comment documents existing semantics.

gcc/cp/ChangeLog:

	* mangle.cc (abi_check): New.
	(write_prefix, write_unqualified_name, write_discriminator)
	(write_type, write_member_name, write_expression)
	(write_template_arg, write_template_param): Use it.
	(start_mangling): Assign from {}.
	* cp-tree.h: Update comment.
2023-10-17 17:20:02 -04:00
Nathaniel Shead
4f8700078c c++: Add missing auto_diagnostic_groups to constexpr.cc
gcc/cp/ChangeLog:

	* constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing
	auto_diagnostic_group.
	(cxx_eval_call_expression): Likewise.
	(diag_array_subscript): Likewise.
	(outside_lifetime_error): Likewise.
	(potential_constant_expression_1): Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>
2023-10-17 16:19:02 -04:00