Per clarification in [1], macro is supposed to check for partial
clobbering of single HW registers. Since PRU declares only 8-bit
HW registers, and ABI does not define individual bit clobbering,
it is safe to remove the implementation.
[1] https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html
gcc/ChangeLog:
2020-05-05 Dimitar Dimitrov <dimitar@dinux.eu>
* config/pru/pru.c (pru_hard_regno_call_part_clobbered): Remove.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Remove.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
TI has clarified [1] that R3.w0 is caller saved, so allow compiler to
use it. This is safe change because older GCC versions treat R3.w0 as
fixed register and never use it.
[1] https://e2e.ti.com/support/tools/ccs/f/81/t/849993
gcc/ChangeLog:
2020-05-05 Dimitar Dimitrov <dimitar@dinux.eu>
* config/pru/pru.h: Mark R3.w0 as caller saved.
gcc/testsuite/ChangeLog:
2020-05-05 Dimitar Dimitrov <dimitar@dinux.eu>
* gcc.target/pru/lra-framepointer-fragmentation-1.c: Update test to
take into account additional available registers.
* gcc.target/pru/lra-framepointer-fragmentation-2.c: Ditto.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
Use the new @insn syntax for simpler gen_* invocation.
gcc/ChangeLog:
2020-05-05 Dimitar Dimitrov <dimitar@dinux.eu>
* config/pru/pru.c (pru_emit_doloop): Use new gen_doloop_end_internal
and gen_doloop_begin_internal.
(pru_reorg_loop): Use gen_pruloop with mode.
* config/pru/pru.md: Use new @insn syntax.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
In function handle_vector_size_attribute local variable nunits is
supposed to be initialized by function type_valid_for_vector_size.
However, in case ARGS is null the function may return with a non-null
value and leave nunits uninitialized. This results in warning/error:
gcc/poly-int.h: In function 'tree_node* handle_vector_size_attribute(tree_node**, tree, tree, int, bool*)':
gcc/poly-int.h:330:3: error: 'nunits' may be used uninitialized in this function [-Werror=maybe-uninitialized]
330 | ((void) (&(RES).coeffs[0] == (C *) 0), \
| ^
gcc/c-family/c-attribs.c:3695:26: note: 'nunits' was declared here
3695 | unsigned HOST_WIDE_INT nunits;
|
Added attribute nonnull for argument args in order to silence warning
and added an assert statement in order to check the invariant candidate.
gcc/c-family/ChangeLog:
2020-05-05 Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
* c-attribs.c (handle_vector_size_attribute): Add attribute
nonnull for argument args in order to silence warning of
uninitialized variable usage. Since this is local to the
compilation unit and thus cannot be checked at call sides by the
compiler, added an assert statement in order to verify this.
This patch addresses a missed optimization caused by the cselib changes.
Already in the past postreload could replace sp = sp + const_int with
sp = regxy if regxy already has the right value, but with the cselib
changes it happens several times more often. It can result in smaller
code, so it seems undesirable to prevent such optimizations, but
unfortunately it can get into the way of stack adjustment coalescing,
where e.g. if we used to have sp = sp + 32; sp = sp - 8;, previously
we'd turn that into sp = sp + 24;, but now postreload optimizes
into sp = r12; sp = sp - 8; and csa gives up.
The patch just adds a REG_EQUAL note when changing sp = sp + const into
sp = reg, where we remember it was actually a stack adjustment by certain
constant, and the combine-stack-adj changes than make use of those REG_EQUAL
notes, together with LR tracking (csa did enable the note problem, just
didn't simulate each insn) so that we can add the needed clobbers etc.
(taken from the other stack adjustment insn).
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/94516
* postreload.c (reload_cse_simplify): When replacing sp = sp + const
with sp = reg, add REG_EQUAL note with sp + const.
* combine-stack-adj.c (try_apply_stack_adjustment): Change return
type from int to bool. Add LIVE and OTHER_INSN arguments. Undo
postreload sp = sp + const to sp = reg optimization if needed and
possible.
(combine_stack_adjustments_for_block): Add LIVE argument. Handle
reg = sp insn with sp + const REG_EQUAL note. Adjust
try_apply_stack_adjustment caller, call
df_simulate_initialize_forwards and df_simulate_one_insn_forwards.
(combine_stack_adjustments): Allocate and free LIVE bitmap,
adjust combine_stack_adjustments_for_block caller.
Whew, this took a while. We fail to parse "p->template A<T>::a()"
(where p is of type A<T> *) because since r249752 we treat the RHS of the ->
as dependent and avoid a lookup in the enclosing context: since that rev
cp_parser_template_name checks parser->context->object_type too, which
here is unknown_type_node, signalling a type-dependent object:
7756 if (dependent_p)
7757 /* Tell cp_parser_lookup_name that there was an object, even though it's
7758 type-dependent. */
7759 parser->context->object_type = unknown_type_node;
with which cp_parser_template_name returns identifier 'A', cp_parser_class_name
then creates a TEMPLATE_ID_EXPR A<T>, but then
23735 decl = make_typename_type (scope, decl, tag_type, tf_error);
in cp_parser_class_name fails because scope is NULL. Then we return
error_mark_node and parse errors ensue.
I've tried various approaches, e.g. keeping TEMPLATE_ID_EXPR around
instead of calling make_typename_type, which didn't work, whereupon I
realized that since we don't want to perform name lookup if we've seen
the template keyword and the scope is dependent, we can adjust
parser->context->object_type and use the type of the object expression
as the scope, even if it's type-dependent. This should be in line with
[basic.lookup.classref]p4. If the postfix expression doesn't have a type,
use typeof to carry its type. This typeof will be processed in
tsubst/TYPENAME_TYPE.
PR c++/94799
* parser.c (cp_parser_postfix_dot_deref_expression): If we have
a type-dependent object of class type, stash it to
parser->context->object_type. If the postfix expression doesn't have
a type, use typeof.
(cp_parser_class_name): Consider object scope too.
(cp_parser_lookup_name): Remove code dealing with the case when
object_type is unknown_type_node.
* g++.dg/lookup/this1.C: Adjust dg-error.
* g++.dg/template/lookup12.C: New test.
* g++.dg/template/lookup13.C: New test.
* g++.dg/template/lookup14.C: New test.
* g++.dg/template/lookup15.C: New test.
PR gcov-profile/93623
* libgcov-interface.c (__gcov_fork): Do not flush
and reset only in child process.
(__gcov_execl): Dump counters only and reset them
only if exec* fails.
(__gcov_execlp): Likewise.
(__gcov_execle): Likewise.
(__gcov_execv): Likewise.
(__gcov_execvp): Likewise.
(__gcov_execve): Likewise.
gcc/ChangeLog:
2020-04-17 Martin Liska <mliska@suse.cz>
PR gcov-profile/94636
* gcov.c (main): Print total lines summary at the end.
(generate_results): Expect file_name always being non-null.
Print newline after intermediate file is printed in order to align with
what we do for normal files.
libstdc++-v3/ChangeLog:
2020-02-04 Martin Liska <mliska@suse.cz>
PR c/92472
* include/parallel/multiway_merge.h:
Use const for _Compare template argument.
This rewrites hybrid SLP detection to be simpler and cope with
group size changes in the SLP graph. In particular detection
works starting from non-SLP stmts following use->def chains
rather than walking the SLP graph and following def->use chains.
It's all temporary of course since non-SLP and thus hybrid SLP
will go away.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (struct vdhs_data): New.
(vect_detect_hybrid_slp): New walker.
(vect_detect_hybrid_slp): Rewrite.
We now always vectorize two BBs, adjust the selector to also scan
for integer multiplication vectorization explicitely.
2020-05-05 Richard Biener <rguenther@suse.de>
PR testsuite/92177
* gcc.dg/vect/bb-slp-22.c: Adjust.
This fixes lack of an escape point of externally declared variables.
2020-05-05 Richard Biener <rguenther@suse.de>
PR ipa/94947
* tree-ssa-structalias.c (ipa_pta_execute): Use
varpool_node::externally_visible_p ().
(refered_from_nonlocal_var): Likewise.
* gcc.dg/torture/pr94947-1.c: New testcase.
* gcc.dg/torture/pr94947-2.c: Likewise.
The link phase is always partial (-r) for VxWorks in kernel mode, which
means that it uses incremental LTO linking by default (-flinker-output=rel).
But in this mode the LTO plugin outputs a warning if one of the object files
involved in the link does not contain LTO bytecode, before switching to
nolto-rel mode. We do not do repeated incremental linking for VxWorks so
silence the warning.
lto-plugin/
* lto-plugin.c: Document -linker-output-auto-notlo-rel option.
(linker_output_set): Change type to bool.
(linker_output_known): Likewise.
(linker_output_auto_nolto_rel): New variable.
(all_symbols_read_handler): Take it into account.
<LDPO_REL>: Do not issue the warning if it is set.
(process_option): Process -linker-output-auto-notlo-rel.
(cleanup_handler): Remove unused variable.
(onload) <LDPT_LINKER_OUTPUT>: Adjust to above type change.
gcc/
* gcc.c (LTO_PLUGIN_SPEC): Define if not already.
(LINK_PLUGIN_SPEC): Execute LTO_PLUGIN_SPEC.
* config/vxworks.h (LTO_PLUGIN_SPEC): Define.
The CONSTRUCTOR_NO_CLEARING flag was invented to avoid generating a memset
for CONSTRUCTORS that lack elements, but it turns out that the gimplifier
can generate a memcpy for them instead, which is worse performance-wise,
so this prevents it from doing that for them.
* gimplify.c (gimplify_init_constructor): Do not put the constructor
into static memory if it is not complete.
This fixes the case of not using the multithreaded model when
only conditionally storing to the destination. We cannot elide
the load in this case.
2020-05-05 Richard Biener <rguenther@suse.de>
PR tree-optimization/94949
* tree-ssa-loop-im.c (execute_sm): Check whether we use
the multithreaded model or always compute the stored value
before eliding a load.
* gcc.dg/torture/pr94949.c: New testcase.
The attached patch eliminates a redundant zero extend from the AArch64 backend. Given the following C code:
unsigned long long foo(unsigned a)
{
return ~a;
}
prior to this patch, AArch64 GCC at -O2 generates:
foo:
mvn w0, w0
uxtw x0, w0
ret
but the uxtw is redundant, since the mvn clears the upper half of the x0 register. After applying this patch, GCC at -O2 gives:
foo:
mvn w0, w0
ret
Testing:
Added regression test which passes after applying the change to aarch64.md.
Full bootstrap and regression on aarch64-linux with no additional failures.
* config/aarch64/aarch64.md (*one_cmpl_zero_extend): New.
* gcc.target/aarch64/mvn_zero_ext.c: New test.
The popcount* testcases show yet another creative way to write popcount,
but rather than adjusting the popcount matcher to deal with it, I think
we just should canonicalize those (X + (X << C) to X * (1 + (1 << C))
and (X << C1) + (X << C2) to X * ((1 << C1) + (1 << C2)), because for
multiplication we already have simplification rules that can handle nested
multiplication (X * CST1 * CST2), while the the shifts and adds we have
nothing like that. And user could have written the multiplication anyway,
so if we don't emit the fastest or smallest code for the multiplication by
constant, we should improve that. At least on the testcases seems the
emitted code is reasonable according to cost, except that perhaps we could
in some cases try to improve expansion of vector multiplication by
uniform constant.
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94800
* match.pd (X + (X << C) to X * (1 + (1 << C)),
(X << C1) + (X << C2) to X * ((1 << C1) + (1 << C2))): New
canonicalizations.
* gcc.dg/tree-ssa/pr94800.c: New test.
* gcc.dg/tree-ssa/popcount5.c: New test.
* gcc.dg/tree-ssa/popcount5l.c: New test.
* gcc.dg/tree-ssa/popcount5ll.c: New test.
This insn and split splits into HI->V?HImode broadcast for avx2 and later,
but either the operands need to be %xmm0-%xmm15 (i.e. VEX encoded insn), or
the insn needs both AVX512BW and AVX512VL.
Now, Yv constraint is v for AVX512VL and x otherwise, so for -mavx512vl -mno-avx512bw
we ICE if we end up with a %xmm16+ register from RA.
Yw constraint is v for AVX512VL and AVX512BW and nothing otherwise, so
in this pattern we actually need xYw.
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR target/94942
* config/i386/mmx.md (*vec_dupv4hi): Use xYw constraints instead of Yv.
* gcc.target/i386/pr94942.c: New test.
On x86 (the only target with umulv4_optab) one can use mull; seto to check
for overflow instead of performing wider multiplication and performing
comparison on the high bits.
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94914
* match.pd ((((type)A * B) >> prec) != 0 to .MUL_OVERFLOW(A, B) != 0):
New simplification.
* gcc.target/i386/pr94914.c: New test.