Commit Graph

193916 Commits

Author SHA1 Message Date
Jonathan Wakely
29da01709f libstdc++: Fix indentation in allocator base classes
libstdc++-v3/ChangeLog:

	* include/bits/new_allocator.h: Fix indentation.
	* include/ext/malloc_allocator.h: Likewise.
2022-06-14 21:07:47 +01:00
Jonathan Wakely
0a9af7b4ef libstdc++: Check for size overflow in constexpr allocation [PR105957]
libstdc++-v3/ChangeLog:

	PR libstdc++/105957
	* include/bits/allocator.h (allocator::allocate): Check for
	overflow in constexpr allocation.
	* testsuite/20_util/allocator/105975.cc: New test.
2022-06-14 21:07:47 +01:00
Surya Kumari Jangala
3e16b4359e regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]
In check_new_reg_p, the nregs of a du chain is computed by obtaining the
MODE of the first element in the chain, and then calling
hard_regno_nregs() with the MODE. But the first element of the chain can
be a DEBUG_INSN whose mode need not be the same as the rest of the
elements in the du chain. This was resulting in fcompare-debug failure
as check_new_reg_p was returning a different result with -g for the same
candidate register. We can instead obtain nregs from the du chain
itself.

2022-06-10  Surya Kumari Jangala  <jskumari@linux.ibm.com>

gcc/
	PR rtl-optimization/105041
	* regrename.cc (check_new_reg_p): Use nregs value from du chain.

gcc/testsuite/
	PR rtl-optimization/105041
	* gcc.target/powerpc/pr105041.c: New test.
2022-06-14 17:36:48 +00:00
Segher Boessenkool
e0e3ce6348 rs6000: Delete VS_scalar
It is just the same as VEC_base, which is a more generic name.

2022-06-14  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/vsx.md (VS_scalar): Delete.
	(rest of file): Adjust.
2022-06-14 17:31:15 +00:00
Nathan Sidwell
e8609768fb c++: Elide calls to NOP module initializers
gcc/cp
	* cp-tree.h (fini_modules): Add has_inits parm.
	* decl2.cc (c_parse_final_cleanups): Check for
	inits, adjust fini_modules flags.
	* module.cc (module_state): Rename call_init_p to
	active_init_p.
	(module_state::write_config): Write active_init.
	(module_state::read_config): Read it.
	(module_determine_import_inits): Clear active_init_p
	of covered inits.
	(late_finish_module): Add has_init parm.  Record it.
	(fini_modules): Adjust.

	gcc/testsuite/
	* g++.dg/modules/init-2_a.C: Adjust.
	* g++.dg/modules/init-2_c.C: Adjust.
	* g++.dg/modules/init-2_d.C: New.
2022-06-14 07:57:36 -07:00
Jan Hubicka
8f6c317b3a Fix ipa-cp wrt volatile loads
Check for volatile flag to ipa_load_from_parm_agg.

gcc/ChangeLog:

2022-06-10  Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/105739
	* ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads.

gcc/testsuite/ChangeLog:

2022-06-10  Jan Hubicka  <hubicka@ucw.cz>

	* gcc.dg/ipa/pr105739.c: New test.
2022-06-14 14:05:53 +02:00
Philipp Tomsich
0247ad3e0f RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.w
When encountering a prescaled (biased) value as a candidate for
sh[123]add.uw, the combine pass will present this as shifted by the
aggregate amount (prescale + shift-amount) with an appropriately
adjusted mask constant that has fewer than 32 bits set.

E.g., here's the failing expression seen in combine for a prescale of
1 and a shift of 2 (note how 0x3fffffff8 >> 3 is 0x7fffffff).
  Trying 7, 8 -> 10:
      7: r78:SI=r81:DI#0<<0x1
        REG_DEAD r81:DI
      8: r79:DI=zero_extend(r78:SI)
        REG_DEAD r78:SI
     10: r80:DI=r79:DI<<0x2+r82:DI
        REG_DEAD r79:DI
        REG_DEAD r82:DI
  Failed to match this instruction:
  (set (reg:DI 80 [ cD.1491 ])
      (plus:DI (and:DI (ashift:DI (reg:DI 81)
                       (const_int 3 [0x3]))
               (const_int 17179869176 [0x3fffffff8]))
          (reg:DI 82)))

To address this, we introduce a splitter handling these cases.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Co-developed-by: Manolis Tsamis <manolis.tsamis@vrull.eu>

gcc/ChangeLog:

	* config/riscv/bitmanip.md: Add split to handle opportunities
	for slli + sh[123]add.uw

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zba-shadd.c: New test.
2022-06-14 13:37:51 +02:00
Philipp Tomsich
4bf0dcb049 RISC-V: add consecutive_bits_operand predicate
Provide an easy way to constrain for constants that are a a single,
consecutive run of ones.

gcc/ChangeLog:

	* config/riscv/predicates.md (consecutive_bits_operand):
	Implement new predicate.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2022-06-14 13:35:49 +02:00
Richard Biener
e07a876c07 tree-optimization/105946 - avoid accessing excess args from uninit diag
uninit diagnostics uses passing via reference and access attributes
but that iterates over function type arguments which can in some
cases appearantly outrun the actual arguments leading to ICEs.
The following simply ignores not present arguments.

2022-06-14  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/105946
	* tree-ssa-uninit.cc (maybe_warn_pass_by_reference):
	Do not look at arguments not specified in the function call.
2022-06-14 12:52:49 +02:00
Richard Biener
90467f0ad6 middle-end/105965 - add missing v_c_e <{ el }> simplification
When we got the simplification of bit-field-ref to view-convert
we lost the ability to detect FMAs since we cannot look through

  _1 = {_10};
  _11 = VIEW_CONVERT_EXPR<float>(_1);

the following amends the (view_convert CONSTRUCTOR) pattern
to handle this case.

2022-06-14  Richard Biener  <rguenther@suse.de>

	PR middle-end/105965
	* match.pd (view_convert CONSTRUCTOR): Handle single-element
	CTOR case.

	* gcc.target/i386/pr105965.c: New testcase.
2022-06-14 12:52:49 +02:00
Eric Botcazou
be6676286a Restore bootstrap on ARM
The -Wuse-after-free warning is explicitly disabled for destructors on ARM
because of the special ABI and the previous change to the warning machinery
uncovered another case where the warning data would be incorrectly erased.

gcc/
	* warning-control.cc (copy_warning) [generic version]: Do not erase
	the warning data of the destination location when the no-warning
	bit is not set on the source.
	(copy_warning) [tree version]: Return early if TO is equal to FROM.
	(copy_warning) [gimple version]: Likewise.
gcc/testsuite/
	* g++.dg/warn/Wuse-after-free5.C: New test.
2022-06-14 12:41:11 +02:00
Kewen Lin
f907cf4c07 vect: Move suggested_unroll_factor applying [PR105940]
As PR105940 shown, when rs6000 port tries to assign
m_suggested_unroll_factor by 4 or so, there will be ICE on:

  exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
             loop_vinfo->suggested_unroll_factor);

In function vect_analyze_loop_2, the current place of
suggested_unroll_factor applying can't guarantee it's
applied for all cases.  As the case shows, vectorizer
could retry with SLP forced off, the vf is reset by
saved_vectorization_factor which isn't applied with
suggested_unroll_factor before.  It means it can end
up with one vf which neglects suggested_unroll_factor.
I think it's off design, we should move the applying
of suggested_unroll_factor after start_over.

	PR tree-optimization/105940

gcc/ChangeLog:

	* tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
	applying suggested_unroll_factor after start_over.
2022-06-14 00:57:01 -05:00
Takayuki 'January June' Suwa
077438933c xtensa: Optimize bitwise AND operation with some specific forms of constants
This patch offers several insn-and-split patterns for bitwise AND with
register and constant that can be represented as:

i.   1's least significant N bits and the others 0's (17 <= N <= 31)
ii.  1's most significant N bits and the others 0's (12 <= N <= 31)
iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a
	"MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30)

And also offers shortcuts for conditional branch if each of the abovementioned
operations is (not) equal to zero.

gcc/ChangeLog:

	* config/xtensa/predicates.md (shifted_mask_operand):
	New predicate.
	* config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
	New insn-and-split pattern.
	(*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
	*masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
	*masktrue_const_shifted_mask): Ditto.
2022-06-13 17:25:48 -07:00
Takayuki 'January June' Suwa
70ce04ca35 xtensa: Make use of BALL/BNALL instructions
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation, but a few similar fused instructions are exist:

  "BALL  Ax, Ay, label"  // if ((~Ax & Ay) == 0) goto label;
  "BNALL Ax, Ay, label"  // if ((~Ax & Ay) != 0) goto label;

These instructions have never been emitted before, but it seems no reason not
to make use of them.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*masktrue_bitcmpl): New insn pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/BALL-BNALL.c: New.
2022-06-13 17:25:48 -07:00
Takayuki 'January June' Suwa
e1b193c1cc xtensa: Simplify conditional branch/move insn patterns
No need to describe the "false side" conditional insn patterns anymore.

gcc/ChangeLog:

	* config/xtensa/xtensa-protos.h (xtensa_emit_branch):
	Remove the first argument.
	(xtensa_emit_bit_branch): Remove it because now called only from the
	output statement of *bittrue insn pattern.
	* config/xtensa/xtensa.cc (gen_int_relational): Remove the last
	argument 'p_invert', and make so that the condition is reversed by
	itself as needed.
	(xtensa_expand_conditional_branch): Share the common path, and remove
	condition inversion code.
	(xtensa_emit_branch, xtensa_emit_movcc): Simplify by removing the
	"false side" pattern.
	(xtensa_emit_bit_branch): Remove it because of the abovementioned
	reason, and move the function body to *bittrue insn pattern.
	* config/xtensa/xtensa.md (*bittrue): Transplant the output
	statement from removed xtensa_emit_bit_branch().
	(*bfalse, *ubfalse, *bitfalse, *maskfalse): Remove the "false side"
	insn patterns.
2022-06-13 17:25:48 -07:00
Takayuki 'January June' Suwa
1c68ec1f8a xtensa: Improve shift operations more
This patch introduces funnel shifter utilization, and rearranges existing
"per-byte shift" insn patterns.

gcc/ChangeLog:

	* config/xtensa/predicates.md (logical_shift_operator,
	xtensa_shift_per_byte_operator): New predicates.
	* config/xtensa/xtensa-protos.h (xtensa_shlrd_which_direction):
	New prototype.
	* config/xtensa/xtensa.cc (xtensa_shlrd_which_direction):
	New helper function for funnel shift patterns.
	* config/xtensa/xtensa.md (ior_op): New code iterator.
	(*ashlsi3_1): Replace with new split pattern.
	(*shift_per_byte): Unify *ashlsi3_3x, *ashrsi3_3x and *lshrsi3_3x.
	(*shift_per_byte_omit_AND_0, *shift_per_byte_omit_AND_1):
	New insn-and-split patterns that redirect to *xtensa_shift_per_byte,
	in order to omit unnecessary bitwise AND operation.
	(*shlrd_reg_<code>, *shlrd_const_<code>, *shlrd_per_byte_<code>,
	*shlrd_per_byte_<code>_omit_AND):
	New insn patterns for funnel shifts.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/funnel_shifter.c: New.
2022-06-13 17:25:48 -07:00
GCC Administrator
c3642271e8 Daily bump. 2022-06-14 00:16:39 +00:00
Iain Buclaw
4f19e078cc libphobos: Check in missing core.sync package module
This was meant to be part of r13-1062 in the merge with upstream
druntime 454471d8.
2022-06-14 00:07:13 +02:00
Jason Merrill
2c11662391 ubsan: -Wreturn-type and ubsan trap-on-error
I noticed that -fsanitize=undefined -fsanitize-undefined-trap-on-error was
omitting the usual -Wreturn-type warning for control flowing off the end of
a function.  This was because the warning code was looking for calls either
to __builtin_unreachable or the UBSan function, but these flags produce a
call to __builtin_trap instead.

gcc/c-family/ChangeLog:

	* c-ubsan.cc (ubsan_instrument_return): Use BUILTINS_LOCATION.

gcc/ChangeLog:

	* tree-cfg.cc (pass_warn_function_return::execute): Also check
	BUILT_IN_TRAP.

gcc/testsuite/ChangeLog:

	* g++.dg/ubsan/return-8.C: New test.
2022-06-13 17:54:37 -04:00
Maciej W. Rozycki
72b185189f RISC-V: Reset the length to the default of 4 for FP comparisons
The default length for floating-point compare operations is overridden
to 8, however the FEQ.fmt, FLT.fmt, FLE.fmt machine instructions and
FGE.fmt, FGT.fmt assembly idioms the relevant RTL insns produce are all
4 bytes long each.  And all the floating-point compare RTL insns that
produce multiple machine instructions explicitly set their lengths.

Remove the override then, letting the default of 4 apply for the single
instruction case.

	gcc/
	* config/riscv/riscv.md (length): Remove the explicit setting
	for "fcmp".
2022-06-13 22:29:45 +01:00
H.J. Lu
751f306688 x86: Require AVX for F16C and VAES
Since F16C and VAES are only usable with AVX, require AVX for F16C and
VAES.

	libgcc/105920
	* common/config/i386/cpuinfo.h (get_available_features): Require
	AVX for F16C and VAES.
2022-06-13 13:27:13 -07:00
Mark Mentovai
254e88b3d7 libstdc++: Rename __null_terminated to avoid collision with Apple SDK
The macOS 13 SDK (and equivalent-version iOS and other Apple OS SDKs)
contain this definition in <sys/cdefs.h>:

863  #define __null_terminated

This collides with the use of __null_terminated in libstdc++'s
experimental fs_path.h.

As libstdc++'s use of this token is entirely internal to fs_path.h, the
simplest workaround, renaming it, is most appropriate. Here, it's
renamed to __nul_terminated, referencing the NUL ('\0') value that is
used to terminate the strings in the context in which this tag structure
is used.

libstdc++-v3/ChangeLog:

	* include/experimental/bits/fs_path.h (__detail::__null_terminated):
	Rename to __nul_terminated to avoid colliding with a macro in
	Apple's SDK.

Signed-off-by: Mark Mentovai <mark@mentovai.com>
2022-06-13 20:25:49 +01:00
Jonathan Wakely
30cc1b65e4 libstdc++: Use type_identity_t for non-deducible std::atomic_xxx args
This is LWG 3220 which is about to become Tentatively Ready.

libstdc++-v3/ChangeLog:

	* include/std/atomic (__atomic_val_t): Use __type_identity_t
	instead of atomic<T>::value_type, as per LWG 3220.
	* testsuite/29_atomics/atomic/lwg3220.cc: New test.
2022-06-13 16:39:07 +01:00
Uros Bizjak
b3dd7d8b48 i386: Return true for (SUBREG (MEM....)) in register_no_elim_operand [PR105927]
Under certain conditions register_operand predicate also allows
subregs of memory operands.  When RTL checking is enabled, these
will fail with REGNO (op).

Allow subregs of memory operands, these are guaranteed
to be reloaded to a register.

2022-06-13  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/105927
	* config/i386/predicates.md (register_no_elim_operand):
	Return true for subreg of a memory operand.

gcc/testsuite/ChangeLog:

	PR target/105927
	* gcc.target/i386/pr105927.c: New test.
2022-06-13 17:10:49 +02:00
Iain Buclaw
77718f38f8 d: Match function declarations of gcc built-ins from any module.
Declarations of recognised gcc built-in functions are now matched from
any module.  Previously, only the `core.stdc' package was scanned.

In addition to matching of the symbol, any user-applied `@attributes' or
`pragma(mangle)' name will be applied to the built-in decl as well.
Because there would now be no control over where built-in declarations
are coming from, the warning option `-Wbuiltin-declaration-mismatch' has
been implemented in the D front-end too.

gcc/d/ChangeLog:

	* d-builtins.cc: Include builtins.h.
	(gcc_builtins_libfuncs): Remove.
	(strip_type_modifiers): New function.
	(matches_builtin_type): New function.
	(covariant_with_builtin_type_p): New function.
	(maybe_set_builtin_1): Set front-end built-in if identifier matches
	gcc built-in name.  Apply user-specified attributes and assembler name
	overrides to the built-in.  Warn about built-in declaration mismatches.
	(d_builtin_function): Set IDENTIFIER_DECL_TREE of built-in functions.
	* d-compiler.cc (Compiler::onParseModule): Scan all modules for any
	identifiers that match built-in function names.
	* lang.opt (Wbuiltin-declaration-mismatch): New option.

gcc/testsuite/ChangeLog:

	* gdc.dg/Wbuiltin_declaration_mismatch.d: New test.
	* gdc.dg/builtins.d: New test.
2022-06-13 16:37:13 +02:00
Richard Sandiford
f8baf4004e Add a general mapping from internal fns to target insns
Several existing internal functions map directly to an instruction
defined in target-insns.def.  This patch makes it easier to define
more such functions in future.

This should help to reduce cut-&-paste, but more importantly, it allows
the difference between optab functions and target-insns.def functions
to be abstracted away; both are now treated as “directly-mapped”.

gcc/
	* internal-fn.def (DEF_INTERNAL_INSN_FN): New macro.
	(GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, GOMP_SIMT_LANE)
	(GOMP_SIMT_LAST_LANE, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_VOTE_ANY)
	(GOMP_SIMT_XCHG_BFLY, GOMP_SIMT_XCHG_IDX): Use it.
	* internal-fn.h (direct_internal_fn_info::directly_mapped): New
	member variable.
	(direct_internal_fn_info::vectorizable): Reduce to 1 bit.
	(direct_internal_fn_p): Also return true for internal functions
	that map directly to instructions defined target-insns.def.
	(direct_internal_fn): Adjust comment accordingly.
	* internal-fn.cc (direct_insn, optab1, optab2, vectorizable_optab1)
	(vectorizable_optab2): New local macros.
	(not_direct): Initialize directly_mapped.
	(mask_load_direct, load_lanes_direct, mask_load_lanes_direct)
	(gather_load_direct, len_load_direct, mask_store_direct)
	(store_lanes_direct, mask_store_lanes_direct, vec_cond_mask_direct)
	(vec_cond_direct, scatter_store_direct, len_store_direct)
	(vec_set_direct, unary_direct, binary_direct, ternary_direct)
	(cond_unary_direct, cond_binary_direct, cond_ternary_direct)
	(while_direct, fold_extract_direct, fold_left_direct)
	(mask_fold_left_direct, check_ptrs_direct): Use the macros above.
	(expand_GOMP_SIMT_ENTER_ALLOC, expand_GOMP_SIMT_EXIT): Delete
	(expand_GOMP_SIMT_LANE, expand_GOMP_SIMT_LAST_LANE): Likewise;
	(expand_GOMP_SIMT_ORDERED_PRED, expand_GOMP_SIMT_VOTE_ANY): Likewise.
	(expand_GOMP_SIMT_XCHG_BFLY, expand_GOMP_SIMT_XCHG_IDX): Likewise.
	(direct_internal_fn_types): Handle functions that map to instructions
	defined in target-insns.def.
	(direct_internal_fn_types): Likewise.
	(direct_internal_fn_supported_p): Likewise.
	(internal_fn_expanders): Likewise.
2022-06-13 15:24:34 +01:00
Richard Sandiford
1d205dbac1 Factor out common internal-fn idiom
internal-fn.c has quite a few functions that simply map the result
of the call to an instruction's output operand (if any) and map
each argument to an instruction's input operand, in order.
This patch adds a single function for doing that.  It's really
just a generalisation of expand_direct_optab_fn, but with the
output operand being optional.

Unfortunately, it isn't possible to do this for vcond_mask
because the internal function has a different argument order
from the optab.

gcc/
	* internal-fn.cc (expand_fn_using_insn): New function,
	split out and adapted from...
	(expand_direct_optab_fn): ...here.
	(expand_GOMP_SIMT_ENTER_ALLOC): Use it.
	(expand_GOMP_SIMT_EXIT): Likewise.
	(expand_GOMP_SIMT_LANE): Likewise.
	(expand_GOMP_SIMT_LAST_LANE): Likewise.
	(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
	(expand_GOMP_SIMT_VOTE_ANY): Likewise.
	(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
	(expand_GOMP_SIMT_XCHG_IDX): Likewise.
2022-06-13 15:24:34 +01:00
Iain Buclaw
e55eda2385 d: Improve TypeInfo errors when compiling in -fno-rtti mode
The existing TypeInfo errors can be cryptic.  This alters the diagnostic
to include which expression is requiring `object.TypeInfo'.

gcc/d/ChangeLog:

	* d-tree.h (check_typeinfo_type): Add Expression* parameter.
	(build_typeinfo): Likewise.  Declare new override.
	* expr.cc (ExprVisitor): Call build_typeinfo with Expression*.
	* typeinfo.cc (check_typeinfo_type): Include expression in the
	diagnostic message.
	(build_typeinfo): New override.

gcc/testsuite/ChangeLog:

	* gdc.dg/rtti1.d: New test.
2022-06-13 15:08:18 +02:00
Jakub Jelinek
1158fe4340 openmp: Conforming device numbers and omp_{initial,invalid}_device
OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant.  And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1.  That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.

For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
		     omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
			  #pragma omp target if (cond)
			  where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.

So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.

This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).

2022-06-13  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.cc (expand_omp_target): Remap user provided
	device clause arguments, -1 to -2 and -2 to -3, either
	at compile time if constant, or at runtime.
include/
	* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
	* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
	* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
	parameters.
	* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
	* target.c (resolve_device): Add remapped argument, handle
	GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
	for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
	if remapped, otherwise treat omp_initial_device that way.  For
	omp_invalid_device, always emit gomp_fatal, even when
	OMP_TARGET_OFFLOAD isn't mandatory.
	(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
	GOMP_target_update, GOMP_target_update_ext,
	GOMP_target_enter_exit_data): Pass true as remapped argument to
	resolve_device.
	(omp_target_alloc, omp_target_free, omp_target_is_present,
	omp_target_memcpy_check, omp_target_associate_ptr,
	omp_target_disassociate_ptr, omp_get_mapped_ptr,
	omp_target_is_accessible): Pass false as remapped argument to
	resolve_device.  Treat omp_initial_device the same as
	gomp_get_num_devices ().  Don't bypass resolve_device calls if
	device_num is negative.
	(omp_pause_resource): Treat omp_initial_device the same as
	gomp_get_num_devices ().  Call resolve_device.
	* icv-device.c (omp_set_default_device): Always set to device_num
	even when it is negative.
	* libgomp.texi: Document that Conforming device numbers,
	omp_initial_device and omp_invalid_device is implemented.
	* testsuite/libgomp.c/target-41.c (main): Add test with
	omp_initial_device.
	* testsuite/libgomp.c/target-45.c: New test.
	* testsuite/libgomp.c/target-46.c: New test.
	* testsuite/libgomp.c/target-47.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
	test with omp_initial_device.  Use -5 instead of -1 for negative value
	test.
	* testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
	Likewise.  Reorder stop numbers.
2022-06-13 14:02:37 +02:00
Eric Botcazou
3b598848f6 Introduce -finstrument-functions-once
The goal is to make it possible to use it in (large) production binaries
to do function-level coverage, so the overhead must be minimum and, in
particular, there is no protection against data races so the "once"
moniker is imprecise.

gcc/
	* common.opt (finstrument-functions): Set explicit value.
	(-finstrument-functions-once): New option.
	* doc/invoke.texi (Program Instrumentation Options): Document it.
	* gimplify.cc (build_instrumentation_call): New static function.
	(gimplify_function_tree): Call it to emit the instrumentation calls
	if -finstrument-functions[-once] is specified.
gcc/testsuite/
	* gcc.dg/instrument-4.c: New test.
2022-06-13 13:35:33 +02:00
Eric Botcazou
cb1ecf3819 Do not erase warning data in gimple_set_location
gimple_set_location is mostly invoked on newly built GIMPLE statements, so
their location is UNKNOWN_LOCATION and setting it will clobber the warning
data of the passed location, if any.

gcc/
	* dwarf2out.cc (output_one_line_info_table): Initialize prev_addr.
	* gimple.h (gimple_set_location): Do not copy warning data from
	the previous location when it is UNKNOWN_LOCATION.
	* optabs.cc (expand_widen_pattern_expr): Always set oprnd{1,2}.
gcc/testsuite/
	* c-c++-common/nonnull-1.c: Remove XFAIL for C++.
2022-06-13 13:35:33 +02:00
Nathan Sidwell
6303eee4b9 c++: Separate late stage module writing
This moves some module writing into a newly added write_end function,
which is called after writing initializers.

	gcc/cp/
	* module.cc (module_state::write): Separate to ...
	(module_state::write_begin, module_state::write_end): ...
	these.
	(module_state::write_readme): Drop extensions parameter.
	(struct module_processing_cookie): Add more fields.
	(finish_module_processing): Adjust state writing call.
	(late_finish_module): Call write_end.
2022-06-13 04:20:49 -07:00
Iain Buclaw
ec486b739b d: Merge upstream dmd 821ed393d, druntime 454471d8, phobos 1206fc94f.
D front-end changes:

    - Import latest bug fixes to mainline.

D runtime changes:

    - Fix duplicate Elf64_Dyn definitions on Solaris.
    - _d_newThrowable has been converted to a template.

Phobos changes:

    - Import latest bug fixes to mainline.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd 821ed393d.
	* expr.cc (ExprVisitor::visit (NewExp *)): Remove handled of
	allocating `@nogc' throwable object.
	* runtime.def (NEWTHROW): Remove.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 454471d8.
	* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
	core/sync/package.d.
	* libdruntime/Makefile.in: Regenerate.
	* src/MERGE: Merge upstream phobos 1206fc94f.
2022-06-13 11:38:10 +02:00
Jakub Jelinek
13ea4a6e83 i386: Fix up *<dwi>3_doubleword_mask [PR105911]
Another regression caused by my recent patch.

This time because define_insn_and_split only requires that the
constant mask is const_int_operand.  When it was only SImode,
that wasn't a problem, HImode neither, but for DImode if we need
to and the shift count we might run into a problem that it isn't
a representable signed 32-bit immediate.

But, we don't really care about the upper bits of the mask, so
we can just mask the CONST_INT with the mode mask.

2022-06-13  Jakub Jelinek  <jakub@redhat.com>

	PR target/105911
	* config/i386/i386.md (*ashl<dwi>3_doubleword_mask,
	*<insn><dwi>3_doubleword_mask): Use operands[3] masked with
	(<MODE_SIZE> * BITS_PER_UNIT) - 1 as AND operand instead of
	operands[3] unmodified.

	* gcc.dg/pr105911.c: New test.
2022-06-13 10:54:22 +02:00
Cui,Lili
033e5ee3c4 testsuite: Add -mtune=generic to dg-options for two testcases.
Use -mtune=generic to limit these two test cases. Because configuring them with
-mtune=cascadelake or znver3 will vectorize them.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Add
	-mtune=generic to dg-options.
	* gcc.target/i386/pr84101.c: Likewise.
2022-06-13 10:03:54 +08:00
GCC Administrator
fd1fcd4756 Daily bump. 2022-06-13 00:16:18 +00:00
Simon Wright
add1adaa17 Darwin: Truncate kernel-provided version to OS major for Darwin >= 20.
In common with system tools, GCC uses a version obtained from the kernel as
the prevailing macOS target, when that is not overridden by command line or
environment versions (i.e. mmacosx-version-min=, MACOSX_DEPLOYMENT_TARGET).

Presently, GCC assumes that if the OS version is >= 20, the value used should
include both major and minium version identifiers.  However the system tools
(for those versions) truncate the value to the major version - this leads to
link errors when combining objects built with clang and GCC for example:

ld: warning: object file (null.o) was built for newer macOS version (12.2)
than being linked (12.0)

The change here truncates the values GCC uses to the major version.

gcc/ChangeLog:

	PR target/104871
	* config/darwin-driver.cc (darwin_find_version_from_kernel): If the OS
	version is darwin20 (macOS 11) or greater, truncate the version to the
	major number.
2022-06-12 23:22:20 +01:00
Mark Mentovai
6725f186cb Darwin: Future-proof -mmacosx-version-min
f18cbc1ee1 (2021-12-18) updated various parts of gcc to not impose a
Darwin or macOS version maximum of the current known release. Different
parts of gcc accept, variously, Darwin version numbers matching
darwin2*, and macOS major version numbers up to 99. The current released
version is Darwin 21 and macOS 12, with Darwin 22 and macOS 13 expected
for public release later this year. With one major OS release per year,
this strategy is expected to provide another 8 years of headroom.

However, f18cbc1ee1 missed config/darwin-c.c (now .cc), which
continued to impose a maximum of macOS 12 on the -mmacosx-version-min
compiler driver argument. This was last updated from 11 to 12 in
11b9675774 (2021-10-27), but kicking the can down the road one year at
a time is not a viable strategy, and is not in line with the more recent
technique from f18cbc1ee1.

Prior to 556ab51259 (2020-11-06), config/darwin-c.c did not impose a
maximum that needed annual maintenance, as at that point, all macOS
releases had used a major version of 10. The stricter approach imposed
since then was valuable for a time until the particulars of the new
versioning scheme were established and understood, but now that they
are, it's prudent to restore a more permissive approach.

gcc/ChangeLog:

	* config/darwin-c.cc: Make -mmacosx-version-min more future-proof.

Signed-off-by: Mark Mentovai <mark@mentovai.com>
2022-06-12 23:20:57 +01:00
Max Filippov
ff500e1cf1 gcc: xtensa: fix pr95571 test for call0 ABI
gcc/testsuite/
	* g++.target/xtensa/pr95571.C (__xtensa_libgcc_window_spill):
	New definition.
2022-06-11 22:50:14 -07:00
Prathamesh Kulkarni
494bec0250 PR96463: Optimise svld1rq from vectors for little endian AArch64 targets.
The patch folds:
lhs = svld1rq({-1, -1, ...}, rhs)
into:
tmp = mem_ref<vectype> [(elem_type * {ref-all}) rhs]
lhs = vec_perm_expr<tmp, tmp, {0, 1, 2, 3 ...}>.
which is then expanded using aarch64_expand_sve_dupq.

Example:

svint32_t
foo (int32x4_t x)
{
  return svld1rq (svptrue_b8 (), &x[0]);
}

code-gen:
foo:
.LFB4350:
	dup     z0.q, z0.q[0]
	ret

The patch relaxes type-checking for VEC_PERM_EXPR by allowing different
vector types for lhs and rhs provided:
(1) rhs3 is constant and has integer type element.
(2) len(lhs) == len(rhs3) and len(rhs1) == len(rhs2)
(3) lhs and rhs have same element type.

gcc/ChangeLog:
	PR target/96463
	* config/aarch64/aarch64-sve-builtins-base.cc: Include ssa.h.
	(svld1rq_impl::fold): Define.
	* config/aarch64/aarch64.cc (expand_vec_perm_d): Define new members
	op_mode and op_vec_flags.
	(aarch64_evpc_reencode): Initialize newd.op_mode and
	newd.op_vec_flags.
	(aarch64_evpc_sve_dup): New function.
	(aarch64_expand_vec_perm_const_1): Gate existing calls to
	aarch64_evpc_* functions under d->vmode == d->op_mode,
	and call aarch64_evpc_sve_dup.
	(aarch64_vectorize_vec_perm_const): Remove assert
	d->vmode != d->op_mode, and initialize d.op_mode and d.op_vec_flags.
	* tree-cfg.cc (verify_gimple_assign_ternary): Allow different
	vector types for lhs and rhs in VEC_PERM_EXPR if rhs3 is
	constant.

gcc/testsuite/ChangeLog:
	PR target/96463
	* gcc.target/aarch64/sve/acle/general/pr96463-1.c: New test.
	* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Likewise.
2022-06-12 08:55:04 +05:30
GCC Administrator
cbd842717e Daily bump. 2022-06-12 00:16:26 +00:00
Takayuki 'January June' Suwa
cd02f15f1a xtensa: Improve constant synthesis for both integer and floating-point
This patch revises the previous implementation of constant synthesis.

First, changed to use define_split machine description pattern and to run
after reload pass, in order not to interfere some optimizations such as
the loop invariant motion.

Second, not only integer but floating-point is subject to processing.

Third, several new synthesis patterns - when the constant cannot fit into
a "MOVI Ax, simm12" instruction, but:

I.   can be represented as a power of two minus one (eg. 32767, 65535 or
     0x7fffffffUL)
       => "MOVI(.N) Ax, -1" + "SRLI Ax, Ax, 1 ... 31" (or "EXTUI")
II.  is between -34816 and 34559
       => "MOVI(.N) Ax, -2048 ... 2047" + "ADDMI Ax, Ax, -32768 ... 32512"
III. (existing case) can fit into a signed 12-bit if the trailing zero bits
     are stripped
       => "MOVI(.N) Ax, -2048 ... 2047" + "SLLI Ax, Ax, 1 ... 31"

The above sequences consist of 5 or 6 bytes and have latency of 2 clock cycles,
in contrast with "L32R Ax, <litpool>" (3 bytes and one clock latency, but may
suffer additional one clock pipeline stall and implementation-specific
InstRAM/ROM access penalty) plus 4 bytes of constant value.

In addition, 3-instructions synthesis patterns (8 or 9 bytes, 3 clock latency)
are also provided when optimizing for speed and L32R instruction has
considerable access penalty:

IV.  2-instructions synthesis (any of I ... III) followed by
     "SLLI Ax, Ax, 1 ... 31"
V.   2-instructions synthesis followed by either "ADDX[248] Ax, Ax, Ax"
     or "SUBX8 Ax, Ax, Ax" (multiplying by 3, 5, 7 or 9)

gcc/ChangeLog:

	* config/xtensa/xtensa-protos.h (xtensa_constantsynth):
	New prototype.
	* config/xtensa/xtensa.cc (xtensa_emit_constantsynth,
	xtensa_constantsynth_2insn, xtensa_constantsynth_rtx_SLLI,
	xtensa_constantsynth_rtx_ADDSUBX, xtensa_constantsynth):
	New backend functions that process the abovementioned logic.
	(xtensa_emit_move_sequence): Revert the previous changes.
	* config/xtensa/xtensa.md: New split patterns for integer
	and floating-point, as the frontend part.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/constsynth_2insns.c: New.
	* gcc.target/xtensa/constsynth_3insns.c: Ditto.
	* gcc.target/xtensa/constsynth_double.c: Ditto.
2022-06-11 14:39:10 -07:00
Takayuki 'January June' Suwa
ccd02e734e xtensa: Improve instruction cost estimation and suggestion
This patch implements a new target-specific relative RTL insn cost function
because of suboptimal cost estimation by default, and fixes several "length"
insn attributes (related to the cost estimation).

And also introduces a new machine-dependent option "-mextra-l32r-costs="
that tells implementation-specific InstRAM/ROM access penalty for L32R
instruction to the compiler (in clock-cycle units, 0 by default).

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (xtensa_rtx_costs): Correct wrong case
	for ABS and NEG, add missing case for BSWAP and CLRSB, and
	double the costs for integer divisions using libfuncs if
	optimizing for speed, in order to take advantage of fast constant
	division by multiplication.
	(TARGET_INSN_COST): New macro definition.
	(xtensa_is_insn_L32R_p, xtensa_insn_cost): New functions for
	calculating relative costs of a RTL insns, for both of speed and
	size.
	* config/xtensa/xtensa.md (return, nop, trap): Correct values of
	the attribute "length" that depends on TARGET_DENSITY.
	(define_asm_attributes, blockage, frame_blockage): Add missing
	attributes.
	* config/xtensa/xtensa.opt (-mextra-l32r-costs=): New machine-
	dependent option, however, preparatory work for now.
2022-06-11 13:15:30 -07:00
Takayuki 'January June' Suwa
fddf0e1057 xtensa: Consider the Loop Option when setmemsi is expanded to small loop
Now apply to almost any size of aligned block under such circumstances.

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (xtensa_expand_block_set_small_loop):
	Pass through the block length / loop count conditions if
	zero-overhead looping is configured and active,
2022-06-11 13:15:29 -07:00
Takayuki 'January June' Suwa
9489a1ab05 xtensa: Tweak some widen multiplications
umulsidi3 is faster than umuldi3 even if library call, and is also
prerequisite for fast constant division by multiplication.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (mulsidi3, umulsidi3):
	Split into individual signedness, in order to use libcall
	"__umulsidi3" but not the other.
	(<u>mulhisi3): Merge into one by using code iterator.
	(<u>mulsidi3, mulhisi3, umulhisi3): Remove.
2022-06-11 13:15:26 -07:00
Michael Meissner
fddb7f6512 Disable generating load/store vector pairs for block copies.
Testing has found that using load and store vector pair for block copies
can result in a slow down on power10.  This patch disables using the
vector pair instructions for block copies if we are tuning for power10.

2022-06-11   Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
	not generate block copies with vector pair instructions if we are
	tuning for power10.
2022-06-11 00:40:16 -04:00
GCC Administrator
ef1e4d80dd Daily bump. 2022-06-11 00:16:21 +00:00
Patrick Palka
343d83c7a8 c++: improve TYPENAME_TYPE hashing [PR65328]
For the testcase in this PR, compilation takes very long ultimately due
to our poor hashing of TYPENAME_TYPE causing a huge number of collisions
in the spec_hasher and typename_hasher tables.

In spec_hasher, we don't hash the components of TYPENAME_TYPE, which
means most TYPENAME_TYPE arguments end up contributing the same hash.
This is the safe thing to do uniformly since structural_comptypes may
try resolving a TYPENAME_TYPE via the current instantiation.  But this
behavior of structural_comptypes is suppressed from spec_hasher::equal
via the comparing_specializations flag, which means spec_hasher::hash
can assume it's disabled too.  To that end, this patch makes
spec_hasher::hash set the flag, and teaches iterative_hash_template_arg
to hash the relevant components of TYPENAME_TYPE when the flag is set.

And in typename_hasher, the hash function considers TYPE_IDENTIFIER
instead of the more informative TYPENAME_TYPE_FULLNAME, which this patch
fixes accordingly.

After this patch, compile time for the testcase in the PR falls to
around 30 seconds on my machine (down from dozens of minutes).

	PR c++/65328

gcc/cp/ChangeLog:

	* decl.cc (typename_hasher::hash): Add extra overloads.
	Use iterative_hash_object instead of htab_hash_pointer.
	Hash TYPENAME_TYPE_FULLNAME instead of TYPE_IDENTIFIER.
	(build_typename_type): Use typename_hasher::hash.
	* pt.cc (spec_hasher::hash): Add two-parameter overload.
	Set comparing_specializations around the call to
	hash_tmpl_and_args.
	(iterative_hash_template_arg) <case TYPENAME_TYPE>:
	When comparing_specializations, hash the TYPE_CONTEXT
	and TYPENAME_TYPE_FULLNAME.
	(tsubst_function_decl): Use spec_hasher::hash instead of
	hash_tmpl_and_args.
	(tsubst_template_decl): Likewise.
	(tsubst_decl): Likewise.
2022-06-10 16:10:02 -04:00
Patrick Palka
f9b5a8e58d c++: optimize specialization of templated member functions
This applies one of the lookup_template_class optimizations from the
previous patch to instantiate_template as well.

gcc/cp/ChangeLog:

	* pt.cc (instantiate_template): Don't substitute the context
	of the most general template if that of the partially
	instantiated template is already non-dependent.
2022-06-10 16:09:58 -04:00
Patrick Palka
cb7fd1ea85 c++: optimize specialization of nested templated classes
When substituting a class template specialization, tsubst_aggr_type
substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
This appears to be unnecessary, however, because the the initial value
of lookup_template_class's context parameter is unused outside of the
IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
context, anyway.  So this patch removes the redundant substitution in
tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
we substitute and complete the context A<T>::C with T=E, which in turn
registers the desired dependent specialization of D for us which we end
up trying to register twice.  This patch fixes this by checking the
specializations table again after completion of the context.

This patch also implements a couple of other optimizations:

  * In lookup_template_class, if the context of the partially
    instantiated template is already non-dependent, then we could
    reuse that instead of substituting the context of the most
    general template.
  * During tsubst_decl for the TYPE_DECL for an injected-class-name,
    we can avoid substituting its TREE_TYPE.  We can also avoid
    template argument substitution/coercion for this TYPE_DECL, and
    for class-scope non-template VAR_/TYPE_DECLs more generally.

Together these optimizations improve memory usage for the range-v3
file test/view/zip.cc by about 5%.

gcc/cp/ChangeLog:

	* pt.cc (lookup_template_class): Remove dead stores to
	context parameter.  Don't substitute the context of the
	most general template if that of the partially instantiated
	template is already non-dependent.  Check the specializations
	table again after completing the context of a nested dependent
	specialization.
	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
	TYPE_CONTEXT or pass it to lookup_template_class.
	(tsubst_decl) <case TYPE_DECL, case TYPE_DECL>: Avoid substituting
	the TREE_TYPE for DECL_SELF_REFERENCE_P.  Avoid template argument
	substitution or coercion in some cases.
2022-06-10 16:09:48 -04:00