Commit Graph

193888 Commits

Author SHA1 Message Date
Jakub Jelinek
1158fe4340 openmp: Conforming device numbers and omp_{initial,invalid}_device
OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant.  And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1.  That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.

For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
		     omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
			  #pragma omp target if (cond)
			  where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.

So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.

This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).

2022-06-13  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.cc (expand_omp_target): Remap user provided
	device clause arguments, -1 to -2 and -2 to -3, either
	at compile time if constant, or at runtime.
include/
	* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
	* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
	* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
	parameters.
	* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
	* target.c (resolve_device): Add remapped argument, handle
	GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
	for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
	if remapped, otherwise treat omp_initial_device that way.  For
	omp_invalid_device, always emit gomp_fatal, even when
	OMP_TARGET_OFFLOAD isn't mandatory.
	(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
	GOMP_target_update, GOMP_target_update_ext,
	GOMP_target_enter_exit_data): Pass true as remapped argument to
	resolve_device.
	(omp_target_alloc, omp_target_free, omp_target_is_present,
	omp_target_memcpy_check, omp_target_associate_ptr,
	omp_target_disassociate_ptr, omp_get_mapped_ptr,
	omp_target_is_accessible): Pass false as remapped argument to
	resolve_device.  Treat omp_initial_device the same as
	gomp_get_num_devices ().  Don't bypass resolve_device calls if
	device_num is negative.
	(omp_pause_resource): Treat omp_initial_device the same as
	gomp_get_num_devices ().  Call resolve_device.
	* icv-device.c (omp_set_default_device): Always set to device_num
	even when it is negative.
	* libgomp.texi: Document that Conforming device numbers,
	omp_initial_device and omp_invalid_device is implemented.
	* testsuite/libgomp.c/target-41.c (main): Add test with
	omp_initial_device.
	* testsuite/libgomp.c/target-45.c: New test.
	* testsuite/libgomp.c/target-46.c: New test.
	* testsuite/libgomp.c/target-47.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
	test with omp_initial_device.  Use -5 instead of -1 for negative value
	test.
	* testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
	Likewise.  Reorder stop numbers.
2022-06-13 14:02:37 +02:00
Eric Botcazou
3b598848f6 Introduce -finstrument-functions-once
The goal is to make it possible to use it in (large) production binaries
to do function-level coverage, so the overhead must be minimum and, in
particular, there is no protection against data races so the "once"
moniker is imprecise.

gcc/
	* common.opt (finstrument-functions): Set explicit value.
	(-finstrument-functions-once): New option.
	* doc/invoke.texi (Program Instrumentation Options): Document it.
	* gimplify.cc (build_instrumentation_call): New static function.
	(gimplify_function_tree): Call it to emit the instrumentation calls
	if -finstrument-functions[-once] is specified.
gcc/testsuite/
	* gcc.dg/instrument-4.c: New test.
2022-06-13 13:35:33 +02:00
Eric Botcazou
cb1ecf3819 Do not erase warning data in gimple_set_location
gimple_set_location is mostly invoked on newly built GIMPLE statements, so
their location is UNKNOWN_LOCATION and setting it will clobber the warning
data of the passed location, if any.

gcc/
	* dwarf2out.cc (output_one_line_info_table): Initialize prev_addr.
	* gimple.h (gimple_set_location): Do not copy warning data from
	the previous location when it is UNKNOWN_LOCATION.
	* optabs.cc (expand_widen_pattern_expr): Always set oprnd{1,2}.
gcc/testsuite/
	* c-c++-common/nonnull-1.c: Remove XFAIL for C++.
2022-06-13 13:35:33 +02:00
Nathan Sidwell
6303eee4b9 c++: Separate late stage module writing
This moves some module writing into a newly added write_end function,
which is called after writing initializers.

	gcc/cp/
	* module.cc (module_state::write): Separate to ...
	(module_state::write_begin, module_state::write_end): ...
	these.
	(module_state::write_readme): Drop extensions parameter.
	(struct module_processing_cookie): Add more fields.
	(finish_module_processing): Adjust state writing call.
	(late_finish_module): Call write_end.
2022-06-13 04:20:49 -07:00
Iain Buclaw
ec486b739b d: Merge upstream dmd 821ed393d, druntime 454471d8, phobos 1206fc94f.
D front-end changes:

    - Import latest bug fixes to mainline.

D runtime changes:

    - Fix duplicate Elf64_Dyn definitions on Solaris.
    - _d_newThrowable has been converted to a template.

Phobos changes:

    - Import latest bug fixes to mainline.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd 821ed393d.
	* expr.cc (ExprVisitor::visit (NewExp *)): Remove handled of
	allocating `@nogc' throwable object.
	* runtime.def (NEWTHROW): Remove.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 454471d8.
	* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
	core/sync/package.d.
	* libdruntime/Makefile.in: Regenerate.
	* src/MERGE: Merge upstream phobos 1206fc94f.
2022-06-13 11:38:10 +02:00
Jakub Jelinek
13ea4a6e83 i386: Fix up *<dwi>3_doubleword_mask [PR105911]
Another regression caused by my recent patch.

This time because define_insn_and_split only requires that the
constant mask is const_int_operand.  When it was only SImode,
that wasn't a problem, HImode neither, but for DImode if we need
to and the shift count we might run into a problem that it isn't
a representable signed 32-bit immediate.

But, we don't really care about the upper bits of the mask, so
we can just mask the CONST_INT with the mode mask.

2022-06-13  Jakub Jelinek  <jakub@redhat.com>

	PR target/105911
	* config/i386/i386.md (*ashl<dwi>3_doubleword_mask,
	*<insn><dwi>3_doubleword_mask): Use operands[3] masked with
	(<MODE_SIZE> * BITS_PER_UNIT) - 1 as AND operand instead of
	operands[3] unmodified.

	* gcc.dg/pr105911.c: New test.
2022-06-13 10:54:22 +02:00
Cui,Lili
033e5ee3c4 testsuite: Add -mtune=generic to dg-options for two testcases.
Use -mtune=generic to limit these two test cases. Because configuring them with
-mtune=cascadelake or znver3 will vectorize them.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Add
	-mtune=generic to dg-options.
	* gcc.target/i386/pr84101.c: Likewise.
2022-06-13 10:03:54 +08:00
GCC Administrator
fd1fcd4756 Daily bump. 2022-06-13 00:16:18 +00:00
Simon Wright
add1adaa17 Darwin: Truncate kernel-provided version to OS major for Darwin >= 20.
In common with system tools, GCC uses a version obtained from the kernel as
the prevailing macOS target, when that is not overridden by command line or
environment versions (i.e. mmacosx-version-min=, MACOSX_DEPLOYMENT_TARGET).

Presently, GCC assumes that if the OS version is >= 20, the value used should
include both major and minium version identifiers.  However the system tools
(for those versions) truncate the value to the major version - this leads to
link errors when combining objects built with clang and GCC for example:

ld: warning: object file (null.o) was built for newer macOS version (12.2)
than being linked (12.0)

The change here truncates the values GCC uses to the major version.

gcc/ChangeLog:

	PR target/104871
	* config/darwin-driver.cc (darwin_find_version_from_kernel): If the OS
	version is darwin20 (macOS 11) or greater, truncate the version to the
	major number.
2022-06-12 23:22:20 +01:00
Mark Mentovai
6725f186cb Darwin: Future-proof -mmacosx-version-min
f18cbc1ee1 (2021-12-18) updated various parts of gcc to not impose a
Darwin or macOS version maximum of the current known release. Different
parts of gcc accept, variously, Darwin version numbers matching
darwin2*, and macOS major version numbers up to 99. The current released
version is Darwin 21 and macOS 12, with Darwin 22 and macOS 13 expected
for public release later this year. With one major OS release per year,
this strategy is expected to provide another 8 years of headroom.

However, f18cbc1ee1 missed config/darwin-c.c (now .cc), which
continued to impose a maximum of macOS 12 on the -mmacosx-version-min
compiler driver argument. This was last updated from 11 to 12 in
11b9675774 (2021-10-27), but kicking the can down the road one year at
a time is not a viable strategy, and is not in line with the more recent
technique from f18cbc1ee1.

Prior to 556ab51259 (2020-11-06), config/darwin-c.c did not impose a
maximum that needed annual maintenance, as at that point, all macOS
releases had used a major version of 10. The stricter approach imposed
since then was valuable for a time until the particulars of the new
versioning scheme were established and understood, but now that they
are, it's prudent to restore a more permissive approach.

gcc/ChangeLog:

	* config/darwin-c.cc: Make -mmacosx-version-min more future-proof.

Signed-off-by: Mark Mentovai <mark@mentovai.com>
2022-06-12 23:20:57 +01:00
Max Filippov
ff500e1cf1 gcc: xtensa: fix pr95571 test for call0 ABI
gcc/testsuite/
	* g++.target/xtensa/pr95571.C (__xtensa_libgcc_window_spill):
	New definition.
2022-06-11 22:50:14 -07:00
Prathamesh Kulkarni
494bec0250 PR96463: Optimise svld1rq from vectors for little endian AArch64 targets.
The patch folds:
lhs = svld1rq({-1, -1, ...}, rhs)
into:
tmp = mem_ref<vectype> [(elem_type * {ref-all}) rhs]
lhs = vec_perm_expr<tmp, tmp, {0, 1, 2, 3 ...}>.
which is then expanded using aarch64_expand_sve_dupq.

Example:

svint32_t
foo (int32x4_t x)
{
  return svld1rq (svptrue_b8 (), &x[0]);
}

code-gen:
foo:
.LFB4350:
	dup     z0.q, z0.q[0]
	ret

The patch relaxes type-checking for VEC_PERM_EXPR by allowing different
vector types for lhs and rhs provided:
(1) rhs3 is constant and has integer type element.
(2) len(lhs) == len(rhs3) and len(rhs1) == len(rhs2)
(3) lhs and rhs have same element type.

gcc/ChangeLog:
	PR target/96463
	* config/aarch64/aarch64-sve-builtins-base.cc: Include ssa.h.
	(svld1rq_impl::fold): Define.
	* config/aarch64/aarch64.cc (expand_vec_perm_d): Define new members
	op_mode and op_vec_flags.
	(aarch64_evpc_reencode): Initialize newd.op_mode and
	newd.op_vec_flags.
	(aarch64_evpc_sve_dup): New function.
	(aarch64_expand_vec_perm_const_1): Gate existing calls to
	aarch64_evpc_* functions under d->vmode == d->op_mode,
	and call aarch64_evpc_sve_dup.
	(aarch64_vectorize_vec_perm_const): Remove assert
	d->vmode != d->op_mode, and initialize d.op_mode and d.op_vec_flags.
	* tree-cfg.cc (verify_gimple_assign_ternary): Allow different
	vector types for lhs and rhs in VEC_PERM_EXPR if rhs3 is
	constant.

gcc/testsuite/ChangeLog:
	PR target/96463
	* gcc.target/aarch64/sve/acle/general/pr96463-1.c: New test.
	* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Likewise.
2022-06-12 08:55:04 +05:30
GCC Administrator
cbd842717e Daily bump. 2022-06-12 00:16:26 +00:00
Takayuki 'January June' Suwa
cd02f15f1a xtensa: Improve constant synthesis for both integer and floating-point
This patch revises the previous implementation of constant synthesis.

First, changed to use define_split machine description pattern and to run
after reload pass, in order not to interfere some optimizations such as
the loop invariant motion.

Second, not only integer but floating-point is subject to processing.

Third, several new synthesis patterns - when the constant cannot fit into
a "MOVI Ax, simm12" instruction, but:

I.   can be represented as a power of two minus one (eg. 32767, 65535 or
     0x7fffffffUL)
       => "MOVI(.N) Ax, -1" + "SRLI Ax, Ax, 1 ... 31" (or "EXTUI")
II.  is between -34816 and 34559
       => "MOVI(.N) Ax, -2048 ... 2047" + "ADDMI Ax, Ax, -32768 ... 32512"
III. (existing case) can fit into a signed 12-bit if the trailing zero bits
     are stripped
       => "MOVI(.N) Ax, -2048 ... 2047" + "SLLI Ax, Ax, 1 ... 31"

The above sequences consist of 5 or 6 bytes and have latency of 2 clock cycles,
in contrast with "L32R Ax, <litpool>" (3 bytes and one clock latency, but may
suffer additional one clock pipeline stall and implementation-specific
InstRAM/ROM access penalty) plus 4 bytes of constant value.

In addition, 3-instructions synthesis patterns (8 or 9 bytes, 3 clock latency)
are also provided when optimizing for speed and L32R instruction has
considerable access penalty:

IV.  2-instructions synthesis (any of I ... III) followed by
     "SLLI Ax, Ax, 1 ... 31"
V.   2-instructions synthesis followed by either "ADDX[248] Ax, Ax, Ax"
     or "SUBX8 Ax, Ax, Ax" (multiplying by 3, 5, 7 or 9)

gcc/ChangeLog:

	* config/xtensa/xtensa-protos.h (xtensa_constantsynth):
	New prototype.
	* config/xtensa/xtensa.cc (xtensa_emit_constantsynth,
	xtensa_constantsynth_2insn, xtensa_constantsynth_rtx_SLLI,
	xtensa_constantsynth_rtx_ADDSUBX, xtensa_constantsynth):
	New backend functions that process the abovementioned logic.
	(xtensa_emit_move_sequence): Revert the previous changes.
	* config/xtensa/xtensa.md: New split patterns for integer
	and floating-point, as the frontend part.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/constsynth_2insns.c: New.
	* gcc.target/xtensa/constsynth_3insns.c: Ditto.
	* gcc.target/xtensa/constsynth_double.c: Ditto.
2022-06-11 14:39:10 -07:00
Takayuki 'January June' Suwa
ccd02e734e xtensa: Improve instruction cost estimation and suggestion
This patch implements a new target-specific relative RTL insn cost function
because of suboptimal cost estimation by default, and fixes several "length"
insn attributes (related to the cost estimation).

And also introduces a new machine-dependent option "-mextra-l32r-costs="
that tells implementation-specific InstRAM/ROM access penalty for L32R
instruction to the compiler (in clock-cycle units, 0 by default).

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (xtensa_rtx_costs): Correct wrong case
	for ABS and NEG, add missing case for BSWAP and CLRSB, and
	double the costs for integer divisions using libfuncs if
	optimizing for speed, in order to take advantage of fast constant
	division by multiplication.
	(TARGET_INSN_COST): New macro definition.
	(xtensa_is_insn_L32R_p, xtensa_insn_cost): New functions for
	calculating relative costs of a RTL insns, for both of speed and
	size.
	* config/xtensa/xtensa.md (return, nop, trap): Correct values of
	the attribute "length" that depends on TARGET_DENSITY.
	(define_asm_attributes, blockage, frame_blockage): Add missing
	attributes.
	* config/xtensa/xtensa.opt (-mextra-l32r-costs=): New machine-
	dependent option, however, preparatory work for now.
2022-06-11 13:15:30 -07:00
Takayuki 'January June' Suwa
fddf0e1057 xtensa: Consider the Loop Option when setmemsi is expanded to small loop
Now apply to almost any size of aligned block under such circumstances.

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (xtensa_expand_block_set_small_loop):
	Pass through the block length / loop count conditions if
	zero-overhead looping is configured and active,
2022-06-11 13:15:29 -07:00
Takayuki 'January June' Suwa
9489a1ab05 xtensa: Tweak some widen multiplications
umulsidi3 is faster than umuldi3 even if library call, and is also
prerequisite for fast constant division by multiplication.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (mulsidi3, umulsidi3):
	Split into individual signedness, in order to use libcall
	"__umulsidi3" but not the other.
	(<u>mulhisi3): Merge into one by using code iterator.
	(<u>mulsidi3, mulhisi3, umulhisi3): Remove.
2022-06-11 13:15:26 -07:00
Michael Meissner
fddb7f6512 Disable generating load/store vector pairs for block copies.
Testing has found that using load and store vector pair for block copies
can result in a slow down on power10.  This patch disables using the
vector pair instructions for block copies if we are tuning for power10.

2022-06-11   Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
	not generate block copies with vector pair instructions if we are
	tuning for power10.
2022-06-11 00:40:16 -04:00
GCC Administrator
ef1e4d80dd Daily bump. 2022-06-11 00:16:21 +00:00
Patrick Palka
343d83c7a8 c++: improve TYPENAME_TYPE hashing [PR65328]
For the testcase in this PR, compilation takes very long ultimately due
to our poor hashing of TYPENAME_TYPE causing a huge number of collisions
in the spec_hasher and typename_hasher tables.

In spec_hasher, we don't hash the components of TYPENAME_TYPE, which
means most TYPENAME_TYPE arguments end up contributing the same hash.
This is the safe thing to do uniformly since structural_comptypes may
try resolving a TYPENAME_TYPE via the current instantiation.  But this
behavior of structural_comptypes is suppressed from spec_hasher::equal
via the comparing_specializations flag, which means spec_hasher::hash
can assume it's disabled too.  To that end, this patch makes
spec_hasher::hash set the flag, and teaches iterative_hash_template_arg
to hash the relevant components of TYPENAME_TYPE when the flag is set.

And in typename_hasher, the hash function considers TYPE_IDENTIFIER
instead of the more informative TYPENAME_TYPE_FULLNAME, which this patch
fixes accordingly.

After this patch, compile time for the testcase in the PR falls to
around 30 seconds on my machine (down from dozens of minutes).

	PR c++/65328

gcc/cp/ChangeLog:

	* decl.cc (typename_hasher::hash): Add extra overloads.
	Use iterative_hash_object instead of htab_hash_pointer.
	Hash TYPENAME_TYPE_FULLNAME instead of TYPE_IDENTIFIER.
	(build_typename_type): Use typename_hasher::hash.
	* pt.cc (spec_hasher::hash): Add two-parameter overload.
	Set comparing_specializations around the call to
	hash_tmpl_and_args.
	(iterative_hash_template_arg) <case TYPENAME_TYPE>:
	When comparing_specializations, hash the TYPE_CONTEXT
	and TYPENAME_TYPE_FULLNAME.
	(tsubst_function_decl): Use spec_hasher::hash instead of
	hash_tmpl_and_args.
	(tsubst_template_decl): Likewise.
	(tsubst_decl): Likewise.
2022-06-10 16:10:02 -04:00
Patrick Palka
f9b5a8e58d c++: optimize specialization of templated member functions
This applies one of the lookup_template_class optimizations from the
previous patch to instantiate_template as well.

gcc/cp/ChangeLog:

	* pt.cc (instantiate_template): Don't substitute the context
	of the most general template if that of the partially
	instantiated template is already non-dependent.
2022-06-10 16:09:58 -04:00
Patrick Palka
cb7fd1ea85 c++: optimize specialization of nested templated classes
When substituting a class template specialization, tsubst_aggr_type
substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
This appears to be unnecessary, however, because the the initial value
of lookup_template_class's context parameter is unused outside of the
IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
context, anyway.  So this patch removes the redundant substitution in
tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
we substitute and complete the context A<T>::C with T=E, which in turn
registers the desired dependent specialization of D for us which we end
up trying to register twice.  This patch fixes this by checking the
specializations table again after completion of the context.

This patch also implements a couple of other optimizations:

  * In lookup_template_class, if the context of the partially
    instantiated template is already non-dependent, then we could
    reuse that instead of substituting the context of the most
    general template.
  * During tsubst_decl for the TYPE_DECL for an injected-class-name,
    we can avoid substituting its TREE_TYPE.  We can also avoid
    template argument substitution/coercion for this TYPE_DECL, and
    for class-scope non-template VAR_/TYPE_DECLs more generally.

Together these optimizations improve memory usage for the range-v3
file test/view/zip.cc by about 5%.

gcc/cp/ChangeLog:

	* pt.cc (lookup_template_class): Remove dead stores to
	context parameter.  Don't substitute the context of the
	most general template if that of the partially instantiated
	template is already non-dependent.  Check the specializations
	table again after completing the context of a nested dependent
	specialization.
	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
	TYPE_CONTEXT or pass it to lookup_template_class.
	(tsubst_decl) <case TYPE_DECL, case TYPE_DECL>: Avoid substituting
	the TREE_TYPE for DECL_SELF_REFERENCE_P.  Avoid template argument
	substitution or coercion in some cases.
2022-06-10 16:09:48 -04:00
Nathan Sidwell
e6d369bbdb c++: Add a late-writing step for modules
To add a module initializer optimization, we need to defer finishing writing
out the module file until the end of determining the dynamic initializers.
This is achieved by passing some saved-state from the main module writing
to a new function that completes it.

This patch merely adds the skeleton of that state and move things around,
allowing the finalization of the ELF file to be postponed.  None of the
contents writing is moved, or the init optimization added.

	gcc/cp/
	* cp-tree.h (fini_modules): Add some parameters.
	(finish_module_processing): Return an opaque pointer.
	* decl2.cc (c_parse_final_cleanups): Propagate a cookie from
	finish_module_processing to fini_modules.
	* module.cc (struct module_processing_cookie): New.
	(finish_module_processing): Return a heap-allocated cookie.
	(late_finish_module): New.  Finish out the module writing.
	(fini_modules): Adjust.
2022-06-10 12:32:22 -07:00
Jakub Jelinek
1eff4872d2 openmp: Call dlopen with "libmemkind.so.0" rather than "libmemkind.so"
On Thu, Jun 09, 2022 at 12:11:28PM +0200, Thomas Schwinge wrote:
> > This patch adds support for dlopening libmemkind.so
>
> Instead of 'dlopen'ing literally 'libmemkind.so':
> ..., shouldn't this instead 'dlopen' 'libmemkind.so.0'?  At least for
> Debian/Ubuntu, the latter ('libmemkind.so.0') is shipped in the "library"
> package:

I agree and I've actually noticed it too right before committing, but I thought
I'll investigate and tweak incrementally because "libmemkind.so"
is what I've actually tested (it is what llvm libomp uses).

Here is the now tested incremental fix.

2022-06-10  Jakub Jelinek  <jakub@redhat.com>

	* allocator.c (gomp_init_memkind): Call dlopen with "libmemkind.so.0"
	rather than "libmemkind.so".
2022-06-10 21:19:51 +02:00
Nathan Sidwell
c08ba00487 c++: Adjust module initializer calling emission
We special-case emitting the calls of module initializer functions.  It's
simpler to just emit a static fn do do that, and add it onto the front of
the global init fn chain.  We can also move the calculation of the set of
initializers to call to the point of use.

	gcc/cp/
	* cp-tree.h (module_has_import_init): Rename to ...
	(module_determined_import_inits): ... here.
	* decl2.cc (start_objects): Do not handle module initializers
	here.
	(c_parse_final_cleanups): Generate a separate module
	initializer calling function and add it to the list.  Shrink
	the c-lang region.
	* module.cc (num_init_calls_needed): Delete.
	 (module_has_import_init): Rename to ...
	(module_determined_import_inits): ... here. Do the
	calculation here ...
	(finish_module_processing): ... rather than here.
	(module_add_import_initializers): Reformat.

	gcc/testsuite/
	* g++.dg/modules/init-3_a.C: New.
	* g++.dg/modules/init-3_b.C: New.
	* g++.dg/modules/init-3_c.C: New.
2022-06-10 09:27:40 -07:00
Thomas Schwinge
1459b55d24 libgomp nvptx plugin: Remove '--with-cuda-driver=[...]' etc. configuration option
That means, exposing to the user only the '--without-cuda-driver' behavior:
including the GCC-shipped 'include/cuda/cuda.h' (not system <cuda.h>), and
'dlopen'ing the CUDA Driver library (not linking it).

For development purposes, the libgomp nvptx plugin developer may still manually
override that, to get the previous '--with-cuda-driver' behavior.

	libgomp/
	* plugin/Makefrag.am: Evaluate 'if PLUGIN_NVPTX_DYNAMIC' to true.
	* plugin/configfrag.ac (--with-cuda-driver)
	(--with-cuda-driver-include, --with-cuda-driver-lib)
	(CUDA_DRIVER_INCLUDE, CUDA_DRIVER_LIB, PLUGIN_NVPTX_CPPFLAGS)
	(PLUGIN_NVPTX_LDFLAGS, PLUGIN_NVPTX_LIBS, PLUGIN_NVPTX_DYNAMIC):
	Remove.
	* testsuite/libgomp-test-support.exp.in (cuda_driver_include)
	(cuda_driver_lib): Remove.
	* testsuite/lib/libgomp.exp (libgomp_init): Don't consider these.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-06-10 17:08:57 +02:00
Jonathan Wakely
671970a562 libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]
When I fixed PR libstdc++/92978 I introduced a regression whereby
std::lcm(INT_MIN, 1) and std::lcm(50000, 49999) would no longer produce
errors during constant evaluation. Those calls are undefined, because
they violate the preconditions that |m| and the result can be
represented in the return type (which is int in both those cases). The
regression occurred because __absu<unsigned>(INT_MIN) is well-formed,
due to the explicit casts to unsigned in that new helper function, and
the out-of-range multiplication is well-formed, because unsigned
arithmetic wraps instead of overflowing.

To fix 92978 I made std::gcm and std::lcm calculate |m| and |n|
immediately, yielding a common unsigned type that was used to calculate
the result. That was partly correct, but there's no need to use an
unsigned type. Doing so only suppresses the overflow errors so the
compiler can't detect them. This change replaces __absu with __abs_r
that returns the common type (not its corresponding unsigned type). This
way we can detect overflow in __abs_r when required, while still
supporting the most-negative value when it can be represented in the
result type. To detect LCM results that are out of range of the result
type we still need explicit checks, because neither constant evaluation
nor UBsan will complain about unsigned wrapping for cases such as
std::lcm(500000u, 499999u). We can detect those overflows efficiently by
using __builtin_mul_overflow and asserting.

libstdc++-v3/ChangeLog:

	PR libstdc++/105844
	* include/experimental/numeric (experimental::gcd): Simplify
	assertions. Use __abs_r instead of __absu.
	(experimental::lcm): Likewise. Remove use of __detail::__lcm so
	overflow can be detected.
	* include/std/numeric (__detail::__absu): Rename to __abs_r and
	change to allow signed result type, so overflow can be detected.
	(__detail::__lcm): Remove.
	(gcd): Simplify assertions. Use __abs_r instead of __absu.
	(lcm): Likewise. Remove use of __detail::__lcm so overflow can
	be detected.
	* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
	* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
	* testsuite/26_numerics/gcd/105844.cc: New test.
	* testsuite/26_numerics/lcm/105844.cc: New test.
2022-06-10 15:24:29 +01:00
Jonathan Wakely
1e65f2ed99 libstdc++: Fix lifetime bugs for non-TLS eh_globals [PR105880]
This ensures that the single-threaded fallback buffer eh_globals is not
destroyed during program termination, using the same immortalization
technique used for error category objects.

Also ensure that init._M_init can still be read after init has been
destroyed, by making it a static data member.

libstdc++-v3/ChangeLog:

	PR libstdc++/105880
	* libsupc++/eh_globals.cc (eh_globals): Ensure constant init and
	prevent destruction during termination.
	(__eh_globals_init::_M_init): Replace with static member _S_init.
	(__cxxabiv1::__cxa_get_globals_fast): Update.
	(__cxxabiv1::__cxa_get_globals): Likewise.
2022-06-10 15:24:29 +01:00
Roger Sayle
1753a71201 PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re:	movq    %xmm0, -8(%rsp)
        movss   -8(%rsp), %xmm0
        ret
im:	movq    %xmm0, -8(%rsp)
        movss   -4(%rsp), %xmm0
        ret

with this patch we now generate:

re:	ret
im:	movq    %xmm0, %rax
        shrq    $32, %rax
        movd    %eax, %xmm0
        ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

2020-06-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/7061
	* expr.cc (emit_group_store): For groups that consist of a single
	scalar integer register that hold a complex mode value, use
	gen_lowpart to generate a SUBREG to "view_convert" to the complex
	mode.  For modes of different sizes, first convert to an integer
	mode of the appropriate size.

gcc/testsuite/ChangeLog
	PR rtl-optimization/7061
	* gcc.target/i386/pr7061-1.c: New test case.
	* gcc.target/i386/pr7061-2.c: New test case.
2022-06-10 15:16:55 +01:00
Jonathan Wakely
b370ed0bf9 libstdc++: Make std::hash<basic_string<>> allocator-agnostic (LWG 3705)
This new library issue was recently moved to Tentatively Ready by an LWG
poll, so I'm making the change on trunk.

As noted in PR libstc++/105907 the std::hash specializations for PMR
strings were not treated as slow hashes by the unordered containers, so
this change preserves that. The new specializations for custom
allocators are also not treated as slow, for the same reason. For the
versioned namespace (i.e. unstable ABI) we don't have to worry about
that, so can enable hash code caching for all basic_string
specializations.

libstdc++-v3/ChangeLog:

	* include/bits/basic_string.h (__hash_str_base): New class
	template.
	(hash<basic_string<C, char_traits<C>, A>>): Define partial
	specialization for each of the standard character types.
	(hash<string>, hash<wstring>, hash<u8string>, hash<u16string>)
	(hash<u32string>): Remove explicit specializations.
	* include/std/string (__hash_string_base): Remove class
	template.
	(hash<pmr::string>, hash<pmr::wstring>, hash<pmr::u8string>)
	(hash<pmr::u16string>, hash<pmr::u32string>): Remove explicit
	specializations.
	* testsuite/21_strings/basic_string/hash/hash.cc: Test with
	custom allocators.
	* testsuite/21_strings/basic_string/hash/hash_char8_t.cc:
	Likewise.
2022-06-10 14:39:25 +01:00
Antoni Boucher
5940b4e59f libgccjit: Support getting the size of a float [PR105829]
2022-06-09  Antoni Boucher  <bouanto@zoho.com>

gcc/jit/
	PR jit/105829
	* libgccjit.cc: Add support for floating-point types in
	gcc_jit_type_get_size.

gcc/testsuite/
	PR jit/105829
	* jit.dg/test-types.c: Add tests for gcc_jit_type_get_size.
2022-06-09 21:50:25 -04:00
GCC Administrator
e3bba42fb5 Daily bump. 2022-06-10 00:16:43 +00:00
Takayuki 'January June' Suwa
29dc90a580 xtensa: Add clrsbsi2 insn pattern
> (clrsb:m x)
> Represents the number of redundant leading sign bits in x, represented
> as an integer of mode m, starting at the most significant bit position.

This explanation is just what the NSA instruction (not ever emitted before)
calculates in Xtensa ISA.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (clrsbsi2): New insn pattern.

libgcc/ChangeLog:

	* config/xtensa/lib1funcs.S (__clrsbsi2): New function.
	* config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _clrsbsi2.
2022-06-09 15:07:59 -07:00
Takayuki 'January June' Suwa
e44e7face1 xtensa: Optimize '(~x & y)' to '((x & y) ^ y)'
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*andsi3_bitcmpl):
	New insn_and_split pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/check_zero_byte.c: New.
2022-06-09 15:07:47 -07:00
Takayuki 'January June' Suwa
9777d446e2 xtensa: Make one_cmplsi2 optimizer-friendly
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.  But a few optimizers assume that bitwise negation can be
done by a single insn.

As a result, '((x < 0) ? ~x : x)' cannot be optimized to '(x ^ (x >> 31))'
ever before, for example.

This patch relaxes such limitation, by putting the insn expansion off till
the split pass.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (one_cmplsi2):
	Rearrange as an insn_and_split pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/one_cmpl_abs.c: New.
2022-06-09 15:07:22 -07:00
Takayuki 'January June' Suwa
2fcc69d8ce xtensa: Implement bswaphi2 insn pattern
This patch adds bswaphi2 insn pattern that is one instruction less than the
default expansion.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (bswaphi2): New insn pattern.
2022-06-09 15:07:08 -07:00
Joseph Myers
6458486345 Update gcc sv.po
* sv.po: Update.
2022-06-09 22:04:25 +00:00
Segher Boessenkool
a05aac0a13 rs6000: Delete FP_ISA3
FP_ISA3 is exactly the same as SFDF, just a less obvious name.  So,
let's delete it.

2022-06-09  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/rs6000.md (FP_ISA3): Delete.
	(float<QHI:mode><FP_ISA3:mode>2): Rename to...
	(float<QHI:mode><SFDF:mode>2): ... this.  Adjust.
	(*float<QHI:mode><FP_ISA3:mode>2_internal): Rename to...
	(*float<QHI:mode><SFDF:mode>2_internal): ... this.  Adjust.
	(floatuns<QHI:mode><FP_ISA3:mode>2): Rename to...
	(floatuns<QHI:mode><SFDF:mode>2): ... this.  Adjust.
	(*floatuns<QHI:mode><FP_ISA3:mode>2_internal): Rename to...
	(*floatuns<QHI:mode><SFDF:mode>2_internal): ... this.  Adjust.
2022-06-09 19:35:53 +00:00
Jakub Jelinek
699e9a0f67 openmp: Fix up include of the generic allocator.c
As reported by Richard Sandiford, #include "../../../allocator.c"
has one too many ../s, dunno why it worked for me when using
../configure (VPATH = ../../../libgomp)

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/allocator.c: Fix up #include directive.
2022-06-09 19:44:50 +02:00
Jakub Jelinek
4c334e0e4f c++: Fix up ICE on __builtin_shufflevector constexpr evaluation [PR105871]
As the following testcase shows, BIT_FIELD_REF result doesn't have to have
just integral type, it can also have vector type.  And in that case
cxx_eval_bit_field_ref just ICEs on it because it is unprepared for that
case, creates the initial value with build_int_cst (sure, that one could be
easily replaced with build_zero_cst) and then expects it can through shifts,
ands and ors come up with the final value, but that doesn't work for
vectors.

We already call fold_ternary if whole is a VECTOR_CST, this patch does the
same if the result doesn't have integral type.  And, there is no guarantee
fold_ternary will succeed and the callers certainly don't expect NULL
being returned, so it also diagnoses those as non-constant and returns
original t in that case.

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	PR c++/105871
	* constexpr.cc (cxx_eval_bit_field_ref): For BIT_FIELD_REF with
	non-integral result type use fold_ternary too like for BIT_FIELD_REFs
	from VECTOR_CST.  If fold_ternary returns NULL, diagnose non-constant
	expression, set *non_constant_p and return t, instead of returning
	NULL.

	* g++.dg/pr105871.C: New test.
2022-06-09 17:42:31 +02:00
Maciej W. Rozycki
702a11ade2 RISC-V: Use a tab rather than space with FSFLAGS
Consistently use a tab rather than a space as the separator between the
assembly instruction mnemonic and its operand with FSFLAGS instructions
produced with the unordered FP comparison RTL insns.

	gcc/
	* config/riscv/riscv.md
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Emit a tab
	rather than space with FSFLAGS.
2022-06-09 14:34:34 +01:00
Nathan Sidwell
97b81fb036 c++: Better module initializer code
Every module interface needs to emit a global initializer, but it
might have nothing to init.  In those cases, there's no need for any
idempotency boolean to be emitted.

	gcc/cp
	* cp-tree.h (module_initializer_kind): Replace with ...
	(module_global_init_needed, module_has_import_inits): ...
	these.
	* decl2.cc (start_objects): Add has_body parm.  Reorganize
	module initializer creation.
	(generate_ctor_or_dtor_function): Adjust.
	(c_parse_final_cleanups): Adjust.
	(vtv_start_verification_constructor_init_function): Adjust.
	* module.cc (module_initializer_kind): Replace with ...
	(module_global_init_needed, module_has_import_inits): ...
	these.

	gcc/testsuite/
	* g++.dg/modules/init-2_a.C: Check no idempotency.
	* g++.dg/modules/init-2_b.C: Check idempotency.
2022-06-09 06:22:15 -07:00
Tobias Burnus
209de00fdb OpenMP: Handle ancestor:1 with discover_declare_target
gcc/
	* omp-offload.cc (omp_discover_declare_target_tgt_fn_r,
	omp_discover_declare_target_fn_r): Don't walk reverse-offload
	target regions.

gcc/testsuite/
	* c-c++-common/gomp/reverse-offload-1.c: New.
2022-06-09 14:48:24 +02:00
Jakub Jelinek
2dc19a1b59 doc: Fix up -Waddress documentation
WHen looking up the -Waddress documentation due to some PR that mentioned it,
I've noticed some typos and thus I'm fixing them.

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* doc/invoke.texi (-Waddress): Fix a typo in small example.
	Fix typos inptr_t -> intptr_t and uinptr_t -> uintptr_t.
2022-06-09 10:19:53 +02:00
Jakub Jelinek
17f52a1c72 openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library
This patch adds support for dlopening libmemkind.so on Linux and uses it
for some kinds of allocations (but not yet e.g. pinned memory).

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* allocator.c: Include dlfcn.h if LIBGOMP_USE_MEMKIND is defined.
	(enum gomp_memkind_kind): New type.
	(struct omp_allocator_data): Add memkind field if LIBGOMP_USE_MEMKIND
	is defined.
	(struct gomp_memkind_data): New type.
	(memkind_data, memkind_data_once): New variables.
	(gomp_init_memkind, gomp_get_memkind): New functions.
	(omp_init_allocator): Initialize data.memkind, don't fail for
	omp_high_bw_mem_space if libmemkind supports it.
	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
	memkind support of LIBGOMP_USE_MEMKIND is defined.
	* config/linux/allocator.c: New file.
2022-06-09 10:14:42 +02:00
Cui,Lili
269edf4e5e Update {skylake,icelake,alderlake}_cost to add a bit preference to vector store.
Since the interger vector construction cost has changed, we need to adjust the
load and store costs for intel processers.

With the patch applied
538.imagic_r:gets ~6% improvement on ADL for multicopy.
525.x264_r  :gets ~2% improvement on ADL and ICX for multicopy.
with no measurable changes for other benchmarks.

gcc/ChangeLog

	PR target/105493
	* config/i386/x86-tune-costs.h (skylake_cost): Raise the gpr load cost
	from 4 to 6 and gpr store cost from 6 to 8. Change SSE loads and
	unaligned loads cost from {6, 6, 6, 10, 20} to {8, 8, 8, 8, 16}.
	(icelake_cost): Ditto.
	(alderlake_cost): Raise the gpr store cost from 6 to 8 and SSE loads,
	stores and unaligned stores cost from {6, 6, 6, 10, 15} to
	{8, 8, 8, 10, 15}.

gcc/testsuite/

	PR target/105493
	* gcc.target/i386/pr91446.c: Adjust to expect vectorization
	* gcc.target/i386/pr99881.c: XFAIL.
	* gcc.target/i386/pr105493.c: New.
	* g++.target/i386/pr105638.C: Use other sequence checks
	instead of vpxor, because code generation changed.
2022-06-09 14:59:44 +08:00
Haochen Gui
2fc6e3d55f This patch replaces shift and ior insns with one rotate and mask insn for the split patterns which are for DI byte swap on Power6.
gcc/
	* config/rs6000/rs6000.md (define_split for bswapdi load): Merge shift
	and ior insns to one rotate and mask insn.
	(define_split for bswapdi register): Likewise.

gcc/testsuite/
	* gcc.target/powerpc/pr93453-1.c: New.
2022-06-09 13:31:09 +08:00
GCC Administrator
02b4e2de32 Daily bump. 2022-06-09 00:16:26 +00:00
Jason Merrill
e8ed26c2ac c++: non-templated friends [PR105852]
The previous patch for 105852 avoids copying DECL_TEMPLATE_INFO from a
non-templated friend, but it really shouldn't have it in the first place.

	PR c++/105852

gcc/cp/ChangeLog:

	* decl.cc (duplicate_decls): Change non-templated friend
	check to an assert.
	* pt.cc	(tsubst_function_decl): Don't set DECL_TEMPLATE_INFO
	on non-templated friends.
	(tsubst_friend_function): Adjust.
2022-06-08 16:38:25 -04:00
Jason Merrill
7d87790a87 c++: redeclared hidden friend take 2 [PR105852]
My previous patch for 105761 avoided copying DECL_TEMPLATE_INFO from a
friend to a later definition, but in this testcase we have first a
non-friend declaration and then a definition, and we need to avoid copying
in that case as well.  But we do still want to set new_template_info to
avoid GC trouble.

With this change, the modules dump correctly identifies ::foo as a
non-template function in tpl-friend-2_a.C.

Along the way I noticed that the duplicate_decls handling of
DECL_UNIQUE_FRIEND_P was backwards for templates, where we don't clobber
DECL_LANG_SPECIFIC (olddecl) with DECL_LANG_SPECIFIC (newdecl) like we do
for non-templates.

	PR c++/105852
	PR c++/105761

gcc/cp/ChangeLog:

	* decl.cc (duplicate_decls): Avoid copying template info
	from non-templated friend even if newdecl isn't a definition.
	Correct handling of DECL_UNIQUE_FRIEND_P on templates.
	* pt.cc (non_templated_friend_p): New.
	* cp-tree.h (non_templated_friend_p): Declare it.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/tpl-friend-2_a.C: Adjust expected dump.
	* g++.dg/template/friend74.C: New test.
2022-06-08 16:37:50 -04:00