In function handle_vector_size_attribute local variable nunits is
supposed to be initialized by function type_valid_for_vector_size.
However, in case ARGS is null the function may return with a non-null
value and leave nunits uninitialized. This results in warning/error:
gcc/poly-int.h: In function 'tree_node* handle_vector_size_attribute(tree_node**, tree, tree, int, bool*)':
gcc/poly-int.h:330:3: error: 'nunits' may be used uninitialized in this function [-Werror=maybe-uninitialized]
330 | ((void) (&(RES).coeffs[0] == (C *) 0), \
| ^
gcc/c-family/c-attribs.c:3695:26: note: 'nunits' was declared here
3695 | unsigned HOST_WIDE_INT nunits;
|
Added attribute nonnull for argument args in order to silence warning
and added an assert statement in order to check the invariant candidate.
gcc/c-family/ChangeLog:
2020-05-05 Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
* c-attribs.c (handle_vector_size_attribute): Add attribute
nonnull for argument args in order to silence warning of
uninitialized variable usage. Since this is local to the
compilation unit and thus cannot be checked at call sides by the
compiler, added an assert statement in order to verify this.
This patch addresses a missed optimization caused by the cselib changes.
Already in the past postreload could replace sp = sp + const_int with
sp = regxy if regxy already has the right value, but with the cselib
changes it happens several times more often. It can result in smaller
code, so it seems undesirable to prevent such optimizations, but
unfortunately it can get into the way of stack adjustment coalescing,
where e.g. if we used to have sp = sp + 32; sp = sp - 8;, previously
we'd turn that into sp = sp + 24;, but now postreload optimizes
into sp = r12; sp = sp - 8; and csa gives up.
The patch just adds a REG_EQUAL note when changing sp = sp + const into
sp = reg, where we remember it was actually a stack adjustment by certain
constant, and the combine-stack-adj changes than make use of those REG_EQUAL
notes, together with LR tracking (csa did enable the note problem, just
didn't simulate each insn) so that we can add the needed clobbers etc.
(taken from the other stack adjustment insn).
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/94516
* postreload.c (reload_cse_simplify): When replacing sp = sp + const
with sp = reg, add REG_EQUAL note with sp + const.
* combine-stack-adj.c (try_apply_stack_adjustment): Change return
type from int to bool. Add LIVE and OTHER_INSN arguments. Undo
postreload sp = sp + const to sp = reg optimization if needed and
possible.
(combine_stack_adjustments_for_block): Add LIVE argument. Handle
reg = sp insn with sp + const REG_EQUAL note. Adjust
try_apply_stack_adjustment caller, call
df_simulate_initialize_forwards and df_simulate_one_insn_forwards.
(combine_stack_adjustments): Allocate and free LIVE bitmap,
adjust combine_stack_adjustments_for_block caller.
Whew, this took a while. We fail to parse "p->template A<T>::a()"
(where p is of type A<T> *) because since r249752 we treat the RHS of the ->
as dependent and avoid a lookup in the enclosing context: since that rev
cp_parser_template_name checks parser->context->object_type too, which
here is unknown_type_node, signalling a type-dependent object:
7756 if (dependent_p)
7757 /* Tell cp_parser_lookup_name that there was an object, even though it's
7758 type-dependent. */
7759 parser->context->object_type = unknown_type_node;
with which cp_parser_template_name returns identifier 'A', cp_parser_class_name
then creates a TEMPLATE_ID_EXPR A<T>, but then
23735 decl = make_typename_type (scope, decl, tag_type, tf_error);
in cp_parser_class_name fails because scope is NULL. Then we return
error_mark_node and parse errors ensue.
I've tried various approaches, e.g. keeping TEMPLATE_ID_EXPR around
instead of calling make_typename_type, which didn't work, whereupon I
realized that since we don't want to perform name lookup if we've seen
the template keyword and the scope is dependent, we can adjust
parser->context->object_type and use the type of the object expression
as the scope, even if it's type-dependent. This should be in line with
[basic.lookup.classref]p4. If the postfix expression doesn't have a type,
use typeof to carry its type. This typeof will be processed in
tsubst/TYPENAME_TYPE.
PR c++/94799
* parser.c (cp_parser_postfix_dot_deref_expression): If we have
a type-dependent object of class type, stash it to
parser->context->object_type. If the postfix expression doesn't have
a type, use typeof.
(cp_parser_class_name): Consider object scope too.
(cp_parser_lookup_name): Remove code dealing with the case when
object_type is unknown_type_node.
* g++.dg/lookup/this1.C: Adjust dg-error.
* g++.dg/template/lookup12.C: New test.
* g++.dg/template/lookup13.C: New test.
* g++.dg/template/lookup14.C: New test.
* g++.dg/template/lookup15.C: New test.
PR gcov-profile/93623
* libgcov-interface.c (__gcov_fork): Do not flush
and reset only in child process.
(__gcov_execl): Dump counters only and reset them
only if exec* fails.
(__gcov_execlp): Likewise.
(__gcov_execle): Likewise.
(__gcov_execv): Likewise.
(__gcov_execvp): Likewise.
(__gcov_execve): Likewise.
gcc/ChangeLog:
2020-04-17 Martin Liska <mliska@suse.cz>
PR gcov-profile/94636
* gcov.c (main): Print total lines summary at the end.
(generate_results): Expect file_name always being non-null.
Print newline after intermediate file is printed in order to align with
what we do for normal files.
libstdc++-v3/ChangeLog:
2020-02-04 Martin Liska <mliska@suse.cz>
PR c/92472
* include/parallel/multiway_merge.h:
Use const for _Compare template argument.
This rewrites hybrid SLP detection to be simpler and cope with
group size changes in the SLP graph. In particular detection
works starting from non-SLP stmts following use->def chains
rather than walking the SLP graph and following def->use chains.
It's all temporary of course since non-SLP and thus hybrid SLP
will go away.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (struct vdhs_data): New.
(vect_detect_hybrid_slp): New walker.
(vect_detect_hybrid_slp): Rewrite.
We now always vectorize two BBs, adjust the selector to also scan
for integer multiplication vectorization explicitely.
2020-05-05 Richard Biener <rguenther@suse.de>
PR testsuite/92177
* gcc.dg/vect/bb-slp-22.c: Adjust.
This fixes lack of an escape point of externally declared variables.
2020-05-05 Richard Biener <rguenther@suse.de>
PR ipa/94947
* tree-ssa-structalias.c (ipa_pta_execute): Use
varpool_node::externally_visible_p ().
(refered_from_nonlocal_var): Likewise.
* gcc.dg/torture/pr94947-1.c: New testcase.
* gcc.dg/torture/pr94947-2.c: Likewise.
The link phase is always partial (-r) for VxWorks in kernel mode, which
means that it uses incremental LTO linking by default (-flinker-output=rel).
But in this mode the LTO plugin outputs a warning if one of the object files
involved in the link does not contain LTO bytecode, before switching to
nolto-rel mode. We do not do repeated incremental linking for VxWorks so
silence the warning.
lto-plugin/
* lto-plugin.c: Document -linker-output-auto-notlo-rel option.
(linker_output_set): Change type to bool.
(linker_output_known): Likewise.
(linker_output_auto_nolto_rel): New variable.
(all_symbols_read_handler): Take it into account.
<LDPO_REL>: Do not issue the warning if it is set.
(process_option): Process -linker-output-auto-notlo-rel.
(cleanup_handler): Remove unused variable.
(onload) <LDPT_LINKER_OUTPUT>: Adjust to above type change.
gcc/
* gcc.c (LTO_PLUGIN_SPEC): Define if not already.
(LINK_PLUGIN_SPEC): Execute LTO_PLUGIN_SPEC.
* config/vxworks.h (LTO_PLUGIN_SPEC): Define.
The CONSTRUCTOR_NO_CLEARING flag was invented to avoid generating a memset
for CONSTRUCTORS that lack elements, but it turns out that the gimplifier
can generate a memcpy for them instead, which is worse performance-wise,
so this prevents it from doing that for them.
* gimplify.c (gimplify_init_constructor): Do not put the constructor
into static memory if it is not complete.
This fixes the case of not using the multithreaded model when
only conditionally storing to the destination. We cannot elide
the load in this case.
2020-05-05 Richard Biener <rguenther@suse.de>
PR tree-optimization/94949
* tree-ssa-loop-im.c (execute_sm): Check whether we use
the multithreaded model or always compute the stored value
before eliding a load.
* gcc.dg/torture/pr94949.c: New testcase.
The attached patch eliminates a redundant zero extend from the AArch64 backend. Given the following C code:
unsigned long long foo(unsigned a)
{
return ~a;
}
prior to this patch, AArch64 GCC at -O2 generates:
foo:
mvn w0, w0
uxtw x0, w0
ret
but the uxtw is redundant, since the mvn clears the upper half of the x0 register. After applying this patch, GCC at -O2 gives:
foo:
mvn w0, w0
ret
Testing:
Added regression test which passes after applying the change to aarch64.md.
Full bootstrap and regression on aarch64-linux with no additional failures.
* config/aarch64/aarch64.md (*one_cmpl_zero_extend): New.
* gcc.target/aarch64/mvn_zero_ext.c: New test.
The popcount* testcases show yet another creative way to write popcount,
but rather than adjusting the popcount matcher to deal with it, I think
we just should canonicalize those (X + (X << C) to X * (1 + (1 << C))
and (X << C1) + (X << C2) to X * ((1 << C1) + (1 << C2)), because for
multiplication we already have simplification rules that can handle nested
multiplication (X * CST1 * CST2), while the the shifts and adds we have
nothing like that. And user could have written the multiplication anyway,
so if we don't emit the fastest or smallest code for the multiplication by
constant, we should improve that. At least on the testcases seems the
emitted code is reasonable according to cost, except that perhaps we could
in some cases try to improve expansion of vector multiplication by
uniform constant.
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94800
* match.pd (X + (X << C) to X * (1 + (1 << C)),
(X << C1) + (X << C2) to X * ((1 << C1) + (1 << C2))): New
canonicalizations.
* gcc.dg/tree-ssa/pr94800.c: New test.
* gcc.dg/tree-ssa/popcount5.c: New test.
* gcc.dg/tree-ssa/popcount5l.c: New test.
* gcc.dg/tree-ssa/popcount5ll.c: New test.
This insn and split splits into HI->V?HImode broadcast for avx2 and later,
but either the operands need to be %xmm0-%xmm15 (i.e. VEX encoded insn), or
the insn needs both AVX512BW and AVX512VL.
Now, Yv constraint is v for AVX512VL and x otherwise, so for -mavx512vl -mno-avx512bw
we ICE if we end up with a %xmm16+ register from RA.
Yw constraint is v for AVX512VL and AVX512BW and nothing otherwise, so
in this pattern we actually need xYw.
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR target/94942
* config/i386/mmx.md (*vec_dupv4hi): Use xYw constraints instead of Yv.
* gcc.target/i386/pr94942.c: New test.
On x86 (the only target with umulv4_optab) one can use mull; seto to check
for overflow instead of performing wider multiplication and performing
comparison on the high bits.
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94914
* match.pd ((((type)A * B) >> prec) != 0 to .MUL_OVERFLOW(A, B) != 0):
New simplification.
* gcc.target/i386/pr94914.c: New test.
Pattern explosing and manual mode checks can be avoided by using
int_nonimmediate_operand special predicate.
While there, rewrite *x86_mov<SWI48:mode>cc_0_m1_neg_leu<SWI:mode>
to a combine pass splitter.
* config/i386/i386.md (*testqi_ext_3): Use
int_nonimmediate_operand instead of manual mode checks.
(*x86_mov<SWI48:mode>cc_0_m1_neg_leu<SWI:mode>):
Use int_nonimmediate_operand predicate. Rewrite
define_insn_and_split pattern to a combine pass splitter.
C++ makes mismatched prototype and implementation OK.
2020-05-05 Richard Biener <rguenther@suse.de>
* targhooks.h (default_add_stmt_cost): Add vec_info * parameter.
I've recently tested i386-pc-solaris2.11 bootstrap on Solaris 11/x86
with only the bundled tools (using /usr/gnu/bin/as from binutils 2.30 in
this case). It failed compiling libgo/runtime/proc.c, creating invalid
assembly:
proc.s: Assembler messages:
proc.s:2092: Error: junk at end of line, first unrecognized character is `*'
.globl __emutls_v.*runtime.g
and several more errors. This is completely unexpected since Solaris
does support TLS. It turned out that 32-bit TLS detection in
gcc/configure had failed:
configure:25145: checking assembler for thread-local storage support
configure:25158: /usr/gnu/bin/as --fatal-warnings -o conftest.o conftest.s >&5
conftest.s: Assembler messages:
conftest.s:6: Error: relocated field and relocation type differ in signedness
conftest.s:7: Error: @TLSLDM reloc is not supported with 64-bit output format
conftest.s:7: Error: junk `@tlsldm' after expression
which isn't unexpected given that the bundled gas has been configured
for x86_64-pc-solaris2.11, i.e. 64-bit-default.
This is easily fixed by explicitly passing --32 for the 32-bit case,
matching what is done for the 64-bit test.
Tested on i386-pc-solaris2.11 with 32-bit-default and 64-bit-default gas
as well as with /usr/bin/as, always correctly detecting TLS support.
* configure.ac <i[34567]86-*-*>: Add --32 to tls_as_opt on Solaris.
* configure: Regenerate.
Soonish we'll get SLP nodes which have no corresponding scalar
stmt and thus not stmt_vec_info and thus no way to get back to
the associated vec_info. This patch makes the vec_info available
as part of the APIs instead of putting in that back-pointer into
the leaf data structures.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::vinfo): Remove.
(STMT_VINFO_LOOP_VINFO): Likewise.
(STMT_VINFO_BB_VINFO): Likewise.
* tree-vect-data-refs.c: Adjust for the above, adding vec_info *
parameters and adjusting calls.
* tree-vect-loop-manip.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-patterns.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* tree-vectorizer.c: Likewise.
* target.def (add_stmt_cost): Add vec_info * parameter.
* target.h (stmt_in_inner_loop_p): Likewise.
* targhooks.c (default_add_stmt_cost): Adjust.
* doc/tm.texi: Re-generate.
* config/aarch64/aarch64.c (aarch64_extending_load_p): Add
vec_info * parameter and adjust.
(aarch64_sve_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Likewise.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
As mentioned in the previous PR94460 patch, the RTL patterns look too
large/complicated, we can simplify them by just performing two 2 arg
permutations to move the arguments into the right spots and then just
doing the plus/minus (or signed saturation version thereof).
2020-05-05 Jakub Jelinek <jakub@redhat.com>
PR target/94460
* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3,
ssse3_ph<plusminus_mnemonic>wv8hi3, ssse3_ph<plusminus_mnemonic>wv4hi3,
avx2_ph<plusminus_mnemonic>dv8si3, ssse3_ph<plusminus_mnemonic>dv4si3,
ssse3_ph<plusminus_mnemonic>dv2si3): Simplify RTL patterns.
When folding a CALL_EXPR, we can avoid copying it until folding changes
one of its arguments. And when folding a TREE_VEC, we can avoid using
an intermediate releasing_vec by copying the TREE_VEC as soon as folding
changes one of its arguments, like we do in the CALL_EXPR case.
Incidentally, the CALL_EXPR change also fixes the testcase in PR94038.
The reason is that the call to maybe_constant_value from cp_fold on
the call 'bar<int>()' now reuses the result of the earlier call to
maybe_constant_value from fold_for_warn, via the cv_cache. This earlier
call passes uid_sensitive=true, whereas the call from cp_fold passes
uid_sensitive=false, and so by reusing the cached result of the earlier
call we now avoid instantiating bar<int> at all.
gcc/cp/ChangeLog:
PR c++/94038
* cp-gimplify.c (cp_fold) <case CALL_EXPR>: Move some variable
declarations closer to their uses. Copy the CALL_EXPR only
when one of its arguments has changed.
<case TREE_VEC>: Instead of first collecting the folded
arguments into a releasing_vec, just make a copy of the TREE_VEC
as soon as folding changes one of its arguments.
gcc/testsuite/ChangeLog:
PR c++/94038
* g++.dg/warn/pr94038.C: New test.
This should return void according to the Itanium C++ ABI.
2020-05-04 Fangrui Song <maskray@google.com>
* libsupc++/cxxabi.h (__cxa_finalize): Fix return type.
The previous URL to an entry in the wayback machine now redirects to a
page saying "SGI.com Tech Archive Resources now retired" so use an older
entry from the archive.
* doc/xml/faq.xml: Use working link for SGI STL FAQ.
* doc/html/*: Regenerate.
Calculating the size of a chunk being returned to the upstream allocator
was done with a 32-bit type, so it wrapped if the chunk was 4GB or
larger.
I don't know how to test this without allocating 4GB, so there's no test
in the testsuite. It has been tested manually with allocations sizes and
alignments exceeding 4GB.
PR libstdc++/94906
* src/c++17/memory_resource.cc
(monotonic_buffer_resource::_Chunk::release): Use size_t for shift
operands.
This fixes two compilation errors preventing bootstrap with Ada
on x86_64-pc-cygwin.
2020-05-04 Mikael Pettersson <mikpelinux@gmail.com>
PR bootstrap/94918
* mingw32.h: Prevent windows.h from including emmintrin.h on Cygw64.
* s-oscons-tmplt.c (Serial_Port_Descriptor): Use System.Win32.HANDLE
also on Cygwin.
When long doubles are 64 bit, the AIX C library overrides the definitions
but GCC builtins point to 128 bit names. This patch overrides the
builtins for fmodl, frexpl, ldexpl and modfl to refer to the 64 bit symbols.
2020-05-04 Clement Chigot <clement.chigot@atos.net>
David Edelsohn <dje.gcc@gmail.com>
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Override explicit
for fmodl, frexpl, ldexpl and modfl builtins.
create_output_operand coerces an output operand to the insn's
predicates, using a suggested rtx location if convenient.
But if that rtx location is actually required rather than
optional, the builder of the insn has to emit a move afterwards.
(We could instead add a new interface that does this automatically,
but that's future work.)
This PR shows that we were failing to emit the move for some of the
vector load internal functions. I think there are other routines in
internal-fn.c that potentially have the same problem, but this patch is
supposed to be a conservative subset suitable for backporting to GCC 10.
2020-05-04 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR middle-end/94941
* internal-fn.c (expand_load_lanes_optab_fn): Emit a move if the
chosen lhs is different from the gcall lhs.
(expand_mask_load_optab_fn): Likewise.
(expand_gather_load_optab_fn): Likewise.
gcc/testsuite/
PR middle-end/94941
* gcc.target/aarch64/sve/acle/general/unoptimized_1.c: New test.
This corrects an oversight, the coro.gro object is a
a compiler-generated entity and should be marked as
artificial and ignored.
gcc/cp/ChangeLog:
2020-05-04 Iain Sandoe <iain@sandoe.co.uk>
* coroutines.cc (morph_fn_to_coro): Mark the coro.gro variable
as artificial and ignored.