Adds an initial body of documentation for the D front-end - other than
the existing documentation for command-line usage/the man page.
Documentation covers code generation choices specific to GNU D - what
attributes are supported, intrinsics, pragmas, predefined versions,
language extensions, missing features and deviations from spec.
More could be added or elaborated upon, such as what linkage do
different symbols get, mixed language programming with C and C++, the
anatomy of a TypeInfo and ModuleInfo object, and so on. This is enough
as a first wave just to get it off the ground.
gcc/d/ChangeLog:
* Make-lang.in (D_TEXI_FILES): Add d/implement-d.texi.
* gdc.texi: Adjust introduction, include implement-d.texi.
* implement-d.texi: New file.
While most PA 2.0 instructions support both 32 and 64-bit traps
and conditions, the addi and subi instructions only support 32-bit
traps and conditions. Thus, we need to force immediate operands
to register operands on the 64-bit target and use the add/sub
instructions which can trap on 64-bit signed overflow.
2022-11-30 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (addvdi3): Force operand 2 to a register.
Remove "addi,tsv,*" instruction from unamed pattern.
(subvdi3): Force operand 1 to a register.
Remove "subi,tsv" instruction from from unamed pattern.
According to the architecture pseudocode the FEAT_MOPS sequences overwrite the NZCV flags
as par of their operation, so GCC needs to model that in the relevant RTL patterns.
For the testcase:
void g();
void foo (int a, size_t N, char *__restrict__ in,
char *__restrict__ out)
{
if (a != 3)
__builtin_memcpy (out, in, N);
if (a > 3)
g ();
}
we will currently generate:
foo:
cmp w0, 3
bne .L6
.L1:
ret
.L6:
cpyfp [x3]!, [x2]!, x1!
cpyfm [x3]!, [x2]!, x1!
cpyfe [x3]!, [x2]!, x1!
ble .L1 // Flags reused after CPYF* sequence
b g
This is wrong as the result of cmp needs to be recalculated after the MOPS sequence.
With this patch we'll insert a "cmp w0, 3" before the ble, similar to what clang does.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk and to the GCC 12 branch after some baking time.
gcc/ChangeLog:
* config/aarch64/aarch64.md (aarch64_cpymemdi): Specify clobber of CC reg.
(*aarch64_cpymemdi): Likewise.
(aarch64_movmemdi): Likewise.
(aarch64_setmemdi): Likewise.
(*aarch64_setmemdi): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mops_5.c: New test.
* gcc.target/aarch64/mops_6.c: Likewise.
* gcc.target/aarch64/mops_7.c: Likewise.
Continue labels in an unrolled loop require a unique label per
iteration. Previously this used the Statement body node for each
unrolled iteration to generate a new entry in the label hash table.
This does not work when the continue label has an identifier, as said
named label is pointing to the outer UnrolledLoopStatement node.
What would happen is that during the lowering of `continue label', an
automatic label associated with the unrolled loop would be generated,
and a jump to that label inserted, but because it was never pushed by
the visitor for the loop itself, it subsequently never gets emitted.
To fix, correctly use the UnrolledLoopStatement as the key to look up
and store the break/continue label pair, but remove the continue label
from the value entry after every loop to force a new label to be
generated by the next call to `push_continue_label'
PR d/107592
gcc/d/ChangeLog:
* toir.cc (IRVisitor::push_unrolled_continue_label): New method.
(IRVisitor::pop_unrolled_continue_label): New method.
(IRVisitor::visit (UnrolledLoopStatement *)): Use them instead of
push_continue_label and pop_continue_label.
gcc/testsuite/ChangeLog:
* gdc.dg/pr107592.d: New test.
Fixes:
gcc/fortran/parse.cc:5782:32: warning: for loop has empty body [-Wempty-body]
gcc/fortran/ChangeLog:
* parse.cc (parse_omp_structured_block): Remove extra semicolon.
According to the documentation, the -Werror= option makes the specified
warning into an error and also automatically implies that option. Then
it seems that the behavior of the compiler when specifying
-Werror=array-bounds=X should be the same as specifying
"-Werror=array-bounds -Warray-bounds=X", so we expect to receive
array-bounds pass diagnostics and they must be processed as errors.
In practice, we observe that the array-bounds pass is indeed invoked,
but its diagnostics are processed as warnings, not errors.
This happens because Warray-bounds and Warray-bounds= are
declared as two different options in common.opt, so when
diagnostic_classify_diagnostic is called, DK_ERROR is set for
the Warray-bounds= option, but diagnostic_report_diagnostic called from
warning_at receives opt_index of Warray-bounds, so information about
DK_ERROR is lost. Fix this by using Alias in declaration of
Warray-bounds (similar to Wattribute-alias).
Co-authored-by: Franz Sirl <Franz.Sirl-kernel@lauterbach.com>
gcc/ChangeLog:
PR driver/107787
* common.opt (Warray-bounds): Turn into alias of
-Warray-bounds=1.
* builtins.cc (c_strlen): Use OPT_Warray_bounds_
instead of OPT_Warray_bounds.
* diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Ditto.
* gimple-array-bounds.cc (array_bounds_checker::check_array_ref,
array_bounds_checker::check_mem_ref,
array_bounds_checker::check_addr_expr,
array_bounds_checker::check_array_bounds): Ditto.
* gimple-ssa-warn-restrict.cc (maybe_diag_access_bounds): Ditto.
gcc/c-family/ChangeLog:
PR driver/107787
* c-common.cc (fold_offsetof,
convert_vector_to_array_for_subscript): Use OPT_Warray_bounds_
instead of OPT_Warray_bounds.
gcc/testsuite/ChangeLog:
PR driver/107787
* gcc.dg/Warray-bounds-34.c: Correct the regular expression
for -Warray-bounds=.
* gcc.dg/Warray-bounds-43.c: Likewise.
* gcc.dg/pr107787.c: New test.
PR tree-optimization/101301
PR tree-optimization/103680
gcc/ChangeLog:
* tree-switch-conversion.cc (bit_test_cluster::emit):
Handle correctly remaining probability.
(switch_decision_tree::try_switch_expansion): Fix BB's count
where a cluster expansion happens.
(switch_decision_tree::emit_cmp_and_jump_insns): Fill up also
BB count.
(switch_decision_tree::do_jump_if_equal): Likewise.
(switch_decision_tree::emit_case_nodes): Handle special case
for BT expansion which can also fallback to a default BB.
* tree-switch-conversion.h (cluster::cluster): Add
m_default_prob probability.
The testcase from the PR at -O2 shows
((_277 == 2) AND (_79 == 0))
OR ((NOT (_277 == 0)) AND (NOT (_277 > 2)) AND (NOT (_277 == 2)) AND (_79 == 0))
OR ((NOT (pretmp_300 == 255)) AND (_277 == 0) AND (NOT (_277 > 2)) AND (NOT (_277 == 2)) AND (_79 == 0))
which we fail to simplify. The following patch makes us simplify
the relations on _277, producing
((_79 == 0) AND (_277 == 2))
OR ((_79 == 0) AND (_277 <= 1) AND (NOT (_277 == 0)))
OR ((_79 == 0) AND (_277 == 0) AND (NOT (pretmp_300 == 255)))
which might be an incremental step to resolve a bogus uninit
diagnostic at -O2. The patch uses maybe_fold_and_comparison for this.
PR tree-optimization/107919
* gimple-predicate-analysis.cc (simplify_1): Rename to ...
(simplify_1a): .. this.
(simplify_1b): New.
(predicate::simplify): Call both simplify_1a and simplify_1b.
We fail to simplify
((_145 != 0B) AND (_531 == 2) AND (_109 == 0))
OR ((NOT (_145 != 0B)) AND (_531 == 2) AND (_109 == 0))
OR ((NOT (_531 == 2)) AND (_109 == 0))
because the existing simplification of !A && B || A && B is implemented
too simplistic. The following re-implements that which fixes the
bogus uninit diagnostic when using -O1 but not yet at -O2.
PR tree-optimization/107919
* gimple-predicate-analysis.cc (predicate::simplify_2):
Handle predicates of arbitrary length.
* g++.dg/warn/Wuninitialized-pr107919-1.C: New testcase.
r13-254-gdd3c7873a61019e9 added an optimization for {a, +, a} (x-1),
but as can be seen on the following testcase, the way it is written
where chrec_fold_multiply is called with type doesn't work for pointers:
res = build_int_cst (TREE_TYPE (x), 1);
res = chrec_fold_plus (TREE_TYPE (x), x, res);
res = chrec_convert_rhs (type, res, NULL);
res = chrec_fold_multiply (type, chrecr, res);
while what we were doing before and what is still used if the condition
doesn't match is fine:
res = chrec_convert_rhs (TREE_TYPE (chrecr), x, NULL);
res = chrec_fold_multiply (TREE_TYPE (chrecr), chrecr, res);
res = chrec_fold_plus (type, CHREC_LEFT (chrec), res);
because it performs chrec_fold_multiply on TREE_TYPE (chrecr) and converts
only afterwards.
I think the easiest fix is to ignore the new path for pointer types.
2022-11-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107835
* tree-chrec.cc (chrec_apply): Don't handle "{a, +, a} (x-1)"
as "a*x" if type is a pointer type.
* gcc.c-torture/compile/pr107835.c: New test.
Add support for gfx803 as an alias for fiji.
Add test cases for all supported 'isa' values.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803.
* config/gcn/t-omp-device: Add gfx803.
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
* testsuite/libgomp.c/declare-variant-4.h: New header file.
The test uses target_clones, that requires ifunc support.
for gcc/testsuite/ChangeLog
PR target/107304
* gcc.target/i386/pr107304.c: dg-require ifunc support.
The old stack check was performed before the stack was dropped,
which would cause the detection tool to report a memory leak.
The current stack check scheme is as follows:
'-fstack-clash-protection':
1. When the frame->total_size is smaller than the guard page size,
the stack is dropped according to the original scheme, and there
is no need to perform stack detection in the prologue.
2. When frame->total_size is greater than or equal to guard page size,
the first step to drop the stack is to drop the space required by
the caller-save registers. This space needs to save the caller-save
registers, so an implicit stack check is performed.
So just need to check the rest of the stack space.
'-fstack-check':
There is no one-time stack drop and then page-by-page detection as
described in the document. It is also the same as
'-fstack-clash-protection', which is detected immediately after page drop.
It is judged that when frame->total_size is not 0, only the size required
to save the s register is dropped for the first stack down.
The test cases are referenced from aarch64.
gcc/ChangeLog:
* config/loongarch/linux.h (STACK_CHECK_MOVING_SP):
Define this macro to 1.
* config/loongarch/loongarch.cc (STACK_CLASH_PROTECTION_GUARD_SIZE):
Size of guard page.
(loongarch_first_stack_step): Return the size of the first drop stack
according to whether stack checking is performed.
(loongarch_emit_probe_stack_range): Adjust the method of stack checking in prologue.
(loongarch_output_probe_stack_range): Delete useless code.
(loongarch_expand_prologue): Adjust the method of stack checking in prologue.
(loongarch_option_override_internal): Enforce that interval is the same
size as size so the mid-end does the right thing.
* config/loongarch/loongarch.h (STACK_CLASH_MAX_UNROLL_PAGES):
New macro decide whether to loop stack detection.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp:
* gcc.target/loongarch/stack-check-alloca-1.c: New test.
* gcc.target/loongarch/stack-check-alloca-2.c: New test.
* gcc.target/loongarch/stack-check-alloca-3.c: New test.
* gcc.target/loongarch/stack-check-alloca-4.c: New test.
* gcc.target/loongarch/stack-check-alloca-5.c: New test.
* gcc.target/loongarch/stack-check-alloca-6.c: New test.
* gcc.target/loongarch/stack-check-alloca.h: New test.
* gcc.target/loongarch/stack-check-cfa-1.c: New test.
* gcc.target/loongarch/stack-check-cfa-2.c: New test.
* gcc.target/loongarch/stack-check-prologue-1.c: New test.
* gcc.target/loongarch/stack-check-prologue-2.c: New test.
* gcc.target/loongarch/stack-check-prologue-3.c: New test.
* gcc.target/loongarch/stack-check-prologue-4.c: New test.
* gcc.target/loongarch/stack-check-prologue-5.c: New test.
* gcc.target/loongarch/stack-check-prologue-6.c: New test.
* gcc.target/loongarch/stack-check-prologue-7.c: New test.
* gcc.target/loongarch/stack-check-prologue.h: New test.
PR analyzer/103546 tracks various false positives seen on
flex-generated lexers.
Whilst investigating them, I noticed an ICE with
-fanalyzer-call-summaries due to attempting to store sm-state
for an UNKNOWN svalue, which this patch fixes.
This patch also provides known_function implementations of all of the
external functions called by the lexer, reducing the number of false
positives.
The patch doesn't eliminate all false positives, but adds integration
tests to try to establish a baseline from which the remaining false
positives can be fixed.
gcc/analyzer/ChangeLog:
PR analyzer/103546
* analyzer.h (register_known_file_functions): New decl.
* program-state.cc (sm_state_map::replay_call_summary): Rejct
attempts to store sm-state for caller_sval that can't have
associated state.
* region-model-impl-calls.cc (register_known_functions): Call
register_known_file_functions.
* sm-fd.cc (class kf_isatty): New.
(register_known_fd_functions): Register it.
* sm-file.cc (class kf_ferror): New.
(class kf_fileno): New.
(class kf_getc): New.
(register_known_file_functions): New.
gcc/ChangeLog:
PR analyzer/103546
* doc/invoke.texi (Static Analyzer Options): Add isatty, ferror,
fileno, and getc to the list of functions known to the analyzer.
gcc/testsuite/ChangeLog:
PR analyzer/103546
* gcc.dg/analyzer/ferror-1.c: New test.
* gcc.dg/analyzer/fileno-1.c: New test.
* gcc.dg/analyzer/flex-with-call-summaries.c: New test.
* gcc.dg/analyzer/flex-without-call-summaries.c: New test.
* gcc.dg/analyzer/getc-1.c: New test.
* gcc.dg/analyzer/isatty-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
PR analyzer/105784
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): For POINTER_PLUS_EXPR,
PLUS_EXPR and MINUS_EXPR, eliminate requirement that the final
type matches that of arg0 in favor of a cast.
gcc/testsuite/ChangeLog:
PR analyzer/105784
* gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
In a SFINAE context composite_pointer_type returns error_mark_node if
the given pointer types are incompatible. But the SPACESHIP_EXPR case
of cp_build_binary_op wasn't prepared for this error_mark_node result,
which led to an ICE (from spaceship_comp_cat) for the below testcase.
(In a non-SFINAE context composite_pointer_type issues a permerror and
returns cv void* in this case, so this ICE seems specific to SFINAE.)
PR c++/107542
gcc/cp/ChangeLog:
* typeck.cc (cp_build_binary_op): In the SPACESHIP_EXPR case,
handle an error_mark_node result type.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/spaceship-sfinae2.C: New test.
gcc/fortran/ChangeLog:
PR fortran/107874
* simplify.cc (gfc_simplify_merge): When simplifying MERGE with a
constant scalar MASK, ensure that arguments TSOURCE and FSOURCE are
either constant or will be evaluated.
* trans-intrinsic.cc (gfc_conv_intrinsic_merge): Evaluate arguments
before generating conditional expression.
gcc/testsuite/ChangeLog:
PR fortran/107874
* gfortran.dg/merge_init_expr_2.f90: Adjust code to the corrected
simplification.
* gfortran.dg/merge_1.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
GCC assumes that any global variable might be modified by operator new,
and so in the testcase for this PR all data members get reloaded after
allocating new storage. By making local copies of the _M_start and
_M_finish members we avoid that, and then the compiler has enough info
to remove the dead branches that trigger bogus -Warray-bounds warnings.
libstdc++-v3/ChangeLog:
PR libstdc++/107852
PR libstdc++/106199
PR libstdc++/100366
* include/bits/vector.tcc (vector::_M_fill_insert): Copy
_M_start and _M_finish members before allocating.
(vector::_M_default_append): Likewise.
(vector::_M_range_insert): Likewise.
There's no need to call a _M_xxx_dispatch function with a
statically-known __false_type tag, we can just directly call the
function that should be dispatched to. This will compile a tiny bit
faster and save a function call with optimization or inlining turned
off.
Also add the always_inline attribute to the __iterator_category helper
used for dispatching on the iterator category.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator_base_types.h (__iterator_category):
Add always_inline attribute.
* include/bits/stl_vector.h (assign(Iter, Iter)): Call
_M_assign_aux directly, instead of _M_assign_dispatch.
(insert(const_iterator, Iter, Iter)): Call _M_range_insert
directly instead of _M_insert_dispatch.
These names (and __unused) are defined as macros by newlib.
libstdc++-v3/ChangeLog:
* include/std/format: Rename all variables called __used or
__packed.
* testsuite/17_intro/badnames.cc: Add no_pch options.
* testsuite/17_intro/names.cc: Check __packed, __unused and
__used.
Here we're crashing when using the explicit specialization of the
function template g with trailing requirements ultimately because
earlier decls_match (called indirectly from register_specialization) for
for the explicit specialization returned false since the template has
trailing requirements whereas the specialization doesn't.
In r12-2230-gddd25bd1a7c8f4, we fixed a similar issue concerning template
requirements instead of trailing requirements. We could extend that fix
to ignore trailing requirement mismatches for explicit specializations
as well, but it seems cleaner to just propagate constraints from the
specialized template to the specialization when declaring an explicit
specialization so that decls_match will naturally return true in this
case. And it looks like determine_specialization already does this,
albeit inconsistently (only when specializing a non-template member
function of a class template as in cpp2a/concepts-explicit-spec4.C).
So this patch makes determine_specialization consistently propagate
constraints from the specialized template to the specialization, which
in turn lets us get rid of the function_requirements_equivalent_p special
case added by r12-2230.
PR c++/107864
gcc/cp/ChangeLog:
* decl.cc (function_requirements_equivalent_p): Don't check
DECL_TEMPLATE_SPECIALIZATION.
* pt.cc (determine_specialization): Propagate constraints when
specializing a function template too. Simplify by using
add_outermost_template_args.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/explicit-spec1a.C: New test.
The following deals with the situation where we have
<bb 2> [local count: 1073741824]:
_5 = bytes.D.25336._M_impl.D.24643._M_start;
_6 = bytes.D.25336._M_impl.D.24643._M_finish;
pretmp_66 = bytes.D.25336._M_impl.D.24643._M_end_of_storage;
if (_5 != _6)
goto <bb 3>; [70.00%]
else
goto <bb 4>; [30.00%]
...
<bb 6> [local count: 329045359]:
_89 = operator new (4);
_43 = bytes.D.25336._M_impl.D.24643._M_start;
_Num_44 = _137 - _43;
if (_Num_44 != 0)
but fail to see that _137 is equal to _5 and thus eventually _Num_44
is zero if not operator new would possibly clobber the global
bytes variable.
The following resolves this in value-numbering by using the
predicated values for _5 == _6 recorded for the dominating
condition.
PR tree-optimization/107852
* tree-ssa-sccvn.cc (visit_phi): Use equivalences recorded
as predicated values to elide more redundant PHIs.
* gcc.dg/tree-ssa/ssa-fre-101.c: New testcase.
When we version loops for vectorization during if-conversion it
can happen that either loop vanishes because we run some VN and
CFG cleanup. If the to-be vectorized part vanishes we already
redirect the versioning condition to the original loop. The following
does the same in case the original loop vanishes as happened
for the testcase in the bug in the past (but no longer).
PR tree-optimization/106995
* tree-if-conv.cc (pass_if_conversion::execute): Also redirect the
versioning condition to the original loop if this very loop
vanished during CFG cleanup.
The following avoids ICEing with a mismatched prototype for alloca
and -Walloca-larger-than using irange for checks which doesn't
like mismatched types.
PR tree-optimization/107898
* gimple-ssa-warn-alloca.cc (alloca_call_type): Check
the type of the alloca argument is compatible with size_t
before querying ranges.
The target clone pass is the only small IPA pass that doesn't disable
itself after errors but has properties whose verification can fail
because we cut off build SSA passes after errors.
PR ipa/107897
* multiple_target.cc (pass_target_clone::gate): Disable
after errors.
Description section was missing in AC_DEFINE(ENABLE_MULTIARCH, 1).
It makes autoheader fail.
Thanks Lulu Cheng points it out.
gcc/ChangeLog:
* configure.ac: add description for
AC_DEFINE(ENABLE_MULTIARCH, 1)
Usually a requirement starting with 'typename' is a type-requirement, but it
might be a simple-requirement such as a functional cast to a typename-type.
PR c++/101733
gcc/cp/ChangeLog:
* parser.cc (cp_parser_requirement): Parse tentatively for the
'typename' case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires32.C: New test.
Some clang folks mailed me asking about being less permissive about
'concept bool', so let's bump it up from pedwarn to permerror.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_decl_specifier_seq): Change 'concept bool'
diagnostic from pedwarn to permerror.
There was a small typo where Also was done
twice. The second also should have been
handled. This fixes that.
Committed as obvious after a build.
gcc/ChangeLog:
* match.pd ((A / (1 << B)) -> (A >> B).):
Fix comment.
The motivation of this patch is to correct the wrong estimation of the number of instructions needed for loading a 64bit constant in rv32 in the current cost model(riscv_interger_cost). According to the current implementation, if a constant requires more than 3 instructions(riscv_const_insn and riscv_legitimate_constant_p), then the constant will be put into constant pool when expanding gimple to rtl(legitimate_constant_p hook and emit_move_insn). So the inaccurate cost model leads to the suboptimal codegen in rv32 and the wrong estimation part could be corrected through this fix.
e.g. the current codegen for loading 0x839290001 in rv32
lui a5,%hi(.LC0)
lw a0,%lo(.LC0)(a5)
lw a1,%lo(.LC0+4)(a5)
.LC0:
.word 958988289
.word 8
output after this patch
li a0,958988288
addi a0,a0,1
li a1,8
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer): Improve some cases
of loading 64bit constants for rv32.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv32-load-64bit-constant.c: New test.
We produce inefficient code for some synthesized SImode conditional set
operations (i.e. ones that are not directly implemented in hardware) on
RV64. For example a piece of C code like this:
int
sleu (unsigned int x, unsigned int y)
{
return x <= y;
}
gets compiled (at `-O2') to this:
sleu:
sgtu a0,a0,a1 # 9 [c=4 l=4] *sgtu_disi
xori a0,a0,1 # 10 [c=4 l=4] *xorsi3_internal/1
andi a0,a0,1 # 16 [c=4 l=4] anddi3/1
ret # 25 [c=0 l=4] simple_return
or (at `-O1') to this:
sleu:
sgtu a0,a0,a1 # 9 [c=4 l=4] *sgtu_disi
xori a0,a0,1 # 10 [c=4 l=4] *xorsi3_internal/1
sext.w a0,a0 # 16 [c=4 l=4] extendsidi2/0
ret # 24 [c=0 l=4] simple_return
This is because the middle end expands a SLEU operation missing from
RISC-V hardware into a sequence of a SImode SGTU operation followed by
an explicit SImode XORI operation with immediate 1. And while the SGTU
machine instruction (alias SLTU with the input operands swapped) gives a
properly sign-extended 32-bit result which is valid both as a SImode or
a DImode operand the middle end does not see that through a SImode XORI
operation, because we tell the middle end that the RISC-V target (unlike
MIPS) may hold values in DImode integer registers that are valid for
SImode operations even if not properly sign-extended.
However the RISC-V psABI requires that 32-bit function arguments and
results passed in 64-bit integer registers be properly sign-extended, so
this is explicitly done at the conclusion of the function.
Fix this by making the backend use a sequence of a DImode SGTU operation
followed by a SImode SEQZ operation instead. The latter operation is
known by the middle end to produce a properly sign-extended 32-bit
result and therefore combine gets rid of the sign-extension operation
that follows and actually folds it into the very same XORI machine
operation resulting in:
sleu:
sgtu a0,a0,a1 # 9 [c=4 l=4] *sgtu_didi
xori a0,a0,1 # 16 [c=4 l=4] xordi3/1
ret # 25 [c=0 l=4] simple_return
instead (although the SEQZ alias SLTIU against immediate 1 machine
instruction would equally do and is actually retained at `-O0'). This
is handled analogously for the remaining synthesized operations of this
kind, i.e. `SLE', `SGEU', and `SGE'.
gcc/
* config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0
rather that XOR 1 for LE and LEU operations.
gcc/testsuite/
* gcc.target/riscv/sge.c: New test.
* gcc.target/riscv/sgeu.c: New test.
* gcc.target/riscv/sle.c: New test.
* gcc.target/riscv/sleu.c: New test.
gcc/fortran/ChangeLog:
PR fortran/107819
* trans-stmt.cc (gfc_conv_elemental_dependencies): In checking for
elemental dependencies, treat dummy argument with VALUE attribute
as implicitly having intent(in).
gcc/testsuite/ChangeLog:
PR fortran/107819
* gfortran.dg/elemental_dependency_7.f90: New test.