NetBSD does not include the null terminator when in its reported socket
length. Port the upstream bugfix for the issue (#6627).
This was likely missed during the usual upstream merge because the gc
and gccgo socket implementations have diverged quite a bit.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/261741
In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled
misaligned stores, but it didn't handle PARALLEL values, and I refer the
other part of this function, I found the PARALLEL need handled by
emit_group_* functions, so I add a check, and using emit_group_store if
storing a PARALLEL value, also checked this change didn't break the
testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes.
For riscv64 target, struct S {int a; double b;} will pack into a parallel
value to return and it has TImode when misaligned access is supported,
however TImode required 16-byte align, but it only 8-byte align, so it go to
the misaligned stores handling, then it will try to generate move
instruction from a PARALLEL value.
Tested on following target without introduced new reguression:
- riscv32/riscv64 elf
- x86_64-linux
- arm-eabi
v2 changes:
- Use maybe_emit_group_store instead of emit_group_store.
- Remove push_temp_slots/pop_temp_slots, emit_group_store only require
stack temp slot when dst is CONCAT or PARALLEL, however
maybe_emit_group_store will always use REG for dst if needed.
gcc/ChangeLog:
PR target/96759
* expr.c (expand_assignment): Handle misaligned stores with PARALLEL
value.
gcc/testsuite/ChangeLog:
PR target/96759
* g++.target/riscv/pr96759.C: New.
* gcc.target/riscv/pr96759.c: New.
On AIX, duplication of type descriptors can occur if one is
declared in the libgo and one in the Go program being compiled.
The AIX linker isn't able to merge them together as Linux one does.
One solution is to always load libgo first but that needs a huge mechanism in
gcc core. Thus, this patch ensures that the duplication isn't visible
for the end user.
In reflect and internal/reflectlite, the comparison of rtypes is made on their
name and not only on their addresses.
In reflect, toType() function is using a canonicalization map to force rtypes
having the same rtype.String() to return the same Type. This can't be made in
internal/reflectlite as it needs sync package. But, for now, it doesn't matter
as internal/reflectlite is not widely used.
Fixesgolang/go#39276
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/260158
This patch implements the omp_get_supported_active_levels runtime routine
from the OpenMP 5.0 specification, which returns the maximum number of
active nested parallel regions supported by this implementation. The
current maximum (set using the omp_set_max_active_levels routine or the
OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number.
2020-10-13 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* env.c (gomp_max_active_levels_var): Initialize to
gomp_supported_active_levels.
(initialize_env): Limit gomp_max_active_levels_var to be at most
equal to gomp_supported_active_levels.
* fortran.c (omp_get_supported_active_levels): Add ialias_redirect.
(omp_get_supported_active_levels_): New.
* icv.c (omp_set_max_active_levels): Limit gomp_max_active_levels_var
to at most equal to gomp_supported_active_levels.
(omp_get_supported_active_levels): New.
* libgomp.h (gomp_supported_active_levels): New.
* libgomp.map (OMP_5.0.1): Add omp_get_supported_active_levels and
omp_get_supported_active_levels_.
* libgomp.texi (omp_get_supported_active_levels): New.
(omp_set_max_active_levels): Update. Add reference to
omp_get_supported_active_levels.
* omp.h.in (omp_get_supported_active_levels): New.
* omp_lib.f90.in (omp_get_supported_active_levels): New.
* omp_lib.h.in (omp_get_supported_active_levels): New.
* testsuite/libgomp.c/lib-2.c (main): Check omp_get_max_active_levels
against omp_get_supported_active_levels.
* testsuite/libgomp.fortran/lib4.f90 (lib4): Likewise.
The following testcases are miscompiled (the first one since my improvements
to rotate discovery on GIMPLE, the other one for many years) because
combiner optimizes nested ROTATEs with narrowing SUBREG in between (i.e.
the outer rotate is performed in shorter precision than the inner one) to
just one ROTATE of the rotated constant. While that (under certain
conditions) can work for shifts, it can't work for rotates where we can only
do that with rotates of the same precision.
2020-10-13 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/97386
* combine.c (simplify_shift_const_1): Don't optimize nested ROTATEs if
they have different modes.
* gcc.c-torture/execute/pr97386-1.c: New test.
* gcc.c-torture/execute/pr97386-2.c: New test.
There's a read of a freed block while accessing the default_slot in
calc_switch_ranges.
default_slot->intersect (def_range);
It seems the default_slot got swiped from under us, and the valgrind
dump indicates the free came from the get_or_insert in the same
function:
irange *&slot = m_edge_table->get_or_insert (e, &existed);
So it looks like the get_or_insert is actually freeing the value of
the previously allocated default_slot. Looking down the chain
from get_or_insert, we see it calls hash_table<>::expand, which
actually does a free while doing a resize of sorts:
if (!m_ggc)
Allocator <value_type> ::data_free (oentries);
else
ggc_free (oentries);
This patch avoids keeping a pointer to the default_slot across multiple
calls to get_or_insert in the loop.
gcc/ChangeLog:
PR tree-optimization/97379
* gimple-range-edge.cc (outgoing_range::calc_switch_ranges): Do
not save hash slot across calls to hash_table<>::get_or_insert.
Using -O2 made the tests subject to LDRD vs. LDM tuning.
The simplest fix seems to be to use -Os, so that LDM is
unequivocally a win.
gcc/testsuite/
* gcc.target/arm/stack-protector-5.c: Use -Os rather than -O2.
* gcc.target/arm/stack-protector-6.c: Likewise.
This patch adds a tuning structure for Neoverse N2 to allow for further
tuning.
For now it's just a deduplication of the Neoverse N1 struct that it was
reusing but with the SVE width set to 128.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64.c (neoversen2_tunings): Define.
* config/aarch64/aarch64-cores.def (neoverse-n2): Use it.
This makes the only consumer of STMT_VINFO_SAME_ALIGN_REFS, the
loop peeling for alignment code, use locally computed data and
then removes STMT_VINFO_SAME_ALIGN_REFS and its computation.
It also adjusts the auto_vec<> move CTOR/assignment so you
can write
auto_vec<..> foo = bar.copy ();
and have foo own the generated copy.
2020-10-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/97382
* tree-vectorizer.h (_stmt_vec_info::same_align_refs): Remove.
(STMT_VINFO_SAME_ALIGN_REFS): Likewise.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Do not
allocate STMT_VINFO_SAME_ALIGN_REFS.
(vec_info::free_stmt_vec_info): Do not release
STMT_VINFO_SAME_ALIGN_REFS.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences):
Do not compute self and read-read dependences.
(vect_dr_aligned_if_related_peeled_dr_is): New helper.
(vect_dr_aligned_if_peeled_dr_is): Likewise.
(vect_update_misalignment_for_peel): Use it instead of
iterating over STMT_VINFO_SAME_ALIGN_REFS.
(dr_align_group_sort_cmp): New function.
(vect_enhance_data_refs_alignment): Count the number of
same aligned refs here and elide uses of STMT_VINFO_SAME_ALIGN_REFS.
(vect_find_same_alignment_drs): Remove.
(vect_analyze_data_refs_alignment): Do not call it.
* vec.h (auto_vec<T, 0>::auto_vec): Adjust CTOR to take
a vec<>&&, assert it isn't using auto storage.
(auto_vec& operator=): Apply a similar change.
* gcc.dg/vect/no-vfa-vect-dv-2.c: Remove same align dump
scanning.
* gcc.dg/vect/vect-103.c: Likewise.
* gcc.dg/vect/vect-91.c: Likewise.
* gfortran.dg/vect/vect-4.f90: Likewise.
This propagates needed values from the point where number of iterations
is calculated on composite loops to the places where that information
is needed to use the more efficient square root discovery to compute
the starting iterator values from the logical iteration number.
2020-10-13 Jakub Jelinek <jakub@redhat.com>
* omp-low.c (add_taskreg_looptemp_clauses): For triangular loops
with non-constant number of iterations add another 4 _looptemp_
clauses before the (optional) one for lastprivate.
(lower_omp_for_lastprivate): Skip those clauses when looking for
the lastprivate clause.
(lower_omp_for): For triangular loops with non-constant number of
iterations add another 4 _looptemp_ clauses.
* omp-expand.c (expand_omp_for_init_counts): For triangular loops
with non-constant number of iterations set counts[0],
fd->first_inner_iterations, fd->factor and fd->adjn1 from the newly
added _looptemp_ clauses.
(expand_omp_for_init_vars): Initialize the newly added _looptemp_
clauses.
(find_lastprivate_looptemp): New function.
(expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
expand_omp_taskloop_for_outer): Use it instead of manually skipping
_looptemp_ clauses.
this patch fixes tramp3d ICE with PGO. It has turned out to be by a misupdate
in ignore_edge I introduced in previous patch that made us to not compute
SCCs correctly with -fno-lto.
While looking for problem I proofread the sources and also fortified the
srouces for situation where we insert a summary for no good reason and noticed
a problem that early ipa-modref disabled itself in some cases.
I also noticed that param_index is treamed as uhwi while it is signed (that
wastes file space).
Bootstrapping/regtesting x86_64-linux, will commit it tomorrow if that passes.
gcc/ChangeLog:
2020-10-13 Jan Hubicka <hubicka@ucw.cz>
PR ipa/97389
* ipa-modref.c (dump_lto_records): Fix formating of dump file.
(modref_summary::dump): Do not check loads to be non-null.
(modref_summary_lto::dump): Do not check loads to be non-null.
(merge_call_side_effects): Improve debug output.
(analyze_call): Crash when cur_summary->loads is NULL.
(analyze_function): Update.
(modref_summaries::insert): Insert only into summaries, not
optimization_summaries.
(modref_summaries::duplicate): Likewise; crash when load or sotres
are NULL.
(modref_summaries_lto::duplicate): Crash when loads or stores are NULL.
(write_modref_records): param_index is signed.
(read_modref_records): param_index is signed.
(modref_write): Crash when loads or stores are NULL.
(read_section): Compensate previous change.
(pass_modref::execute): Do not check optimization_summaries t be
non-NULL.
(ignore_edge): Fix.
(compute_parm_map): Fix formating.
(modref_propagate_in_scc): Do not expect loads/stores to be NULL.
The analyzer's initial worklist was only populated with non-static
functions in the TU (along with those that look promising for call
summaries). Hence some static functions that were never explicitly
called but could be called via function pointers were not being
analyzed.
This patch remedies this by ensuring that functions that escape as
function pointers get added to the worklist, if they haven't been
already. Another fix would be to simply analyze all functions that
we have a body for, but too much of the testsuite relies on static
test functions not being directly analyzed.
gcc/analyzer/ChangeLog:
PR analyzer/97258
* engine.cc (impl_region_model_context::on_escaped_function): New
vfunc.
(exploded_graph::add_function_entry): Use m_functions_with_enodes
to implement idempotency.
(add_any_callbacks): New.
(exploded_graph::build_initial_worklist): Use the above to find
callbacks that are reachable from global initializers.
(exploded_graph::on_escaped_function): New.
* exploded-graph.h
(impl_region_model_context::on_escaped_function): New decl.
(exploded_graph::on_escaped_function): New decl.
(exploded_graph::m_functions_with_enodes): New field.
* region-model-reachability.cc
(reachable_regions::reachable_regions): Replace "store" param with
"model" param; use it to initialize m_model.
(reachable_regions::add): When getting the svalue for the region,
call get_store_value on the model rather than using an initial
value.
(reachable_regions::mark_escaped_clusters): Add ctxt param and
use it to call on_escaped_function when a function_region escapes.
* region-model-reachability.h
(reachable_regions::reachable_regions): Replace "store" param with
"model" param.
(reachable_regions::mark_escaped_clusters): Add ctxt param.
(reachable_regions::m_model): New field.
* region-model.cc (region_model::handle_unrecognized_call): Update
for change in reachable_regions ctor.
(region_model::handle_unrecognized_call): Pass ctxt to
mark_escaped_clusters.
(region_model::get_reachable_svalues): Update for change in
reachable_regions ctor.
(region_model::get_initial_value_for_global): Read-only variables
keep their initial values.
* region-model.h (region_model_context::on_escaped_function): New
vfunc.
(noop_region_model_context::on_escaped_function): New.
gcc/testsuite/ChangeLog:
PR analyzer/97258
* gcc.dg/analyzer/callbacks-1.c: New test.
* gcc.dg/analyzer/callbacks-2.c: New test.
* gcc.dg/analyzer/callbacks-3.c: New test.
Martin Liška reported warnings about type mismatches in the cases in
the recently-introduced mathfn_built_in_type. This patch adjusts the
macros to use the combined_fn enumerators rather than the
(currently same-numbered) built_in_function ones.
for gcc/ChangeLog
* builtins.c (mathfn_built_in_type): Use CFN_ enumerators.
Enable the sincos optimization within callers of these
(single-argument) elementary functions.
for gcc/ada/ChangeLog
* libgnat/a-ngelfu.ads (Sin, Cos): Make the single-argument
functions inline.
Correct MIPS I assembly build errors in switchcontext.S:
.../libphobos/libdruntime/config/mips/switchcontext.S: Assembler messages:
.../libphobos/libdruntime/config/mips/switchcontext.S:50: Error: opcode not supported on this processor: mips1 (mips1) `sdc1 $f20,(0*8-((6*8+4+(-6*8+4&7))))($sp)'
etc., due to the use of the MIPS II LDC1 and SDC1 hardware instructions
for FP register load and store operations. Instead use the L.D and S.D
generic assembly instructions, which are strict aliases for the LDC1 and
SDC1 instructions respectively and produce identical machine code where
the assembly for the MIPS II or a higher ISA has been requested, however
they become assembly macros and expand to compatible sequences of LWC1
and SWC1 hardware instructions where the assembly for the MIPS I ISA is
in effect.
libphobos/
* libdruntime/config/mips/switchcontext.S [__mips_hard_float]:
Use L.D and S.D generic assembly instructions rather than LDC1
and SDC1 MIPS II hardware instructions.
We were ignoring the return value if op2 returned false and getting garbage ranges propagated.
gcc/ChangeLog:
PR tree-optimization/97381
* gimple-range-gori.cc (gori_compute::compute_operand2_range): If a range cannot be
calculated through operand 2, return false.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97381.c: New test.
libstdc++-v3/ChangeLog:
* include/std/ranges (take_while_view::begin): Constrain the
const overload further as per LWG 3450.
(take_while_view::end): Likewise.
* testsuite/std/ranges/adaptors/take_while.cc: Add test for LWG
3450.
Now that the frontend bug PR96805 is fixed, we can cleanly apply the
proposed resolution for this issue.
This slightly deviates from the proposed resolution by declaring _CI a
member of take_view instead of take_view::_Sentinel, since it doesn't
depend on anything within _Sentinel anymore.
libstdc++-v3/ChangeLog:
PR libstdc++/95322
* include/std/ranges (take_view::_CI): Define this alias
template as per LWG 3449 and remove ...
(take_view::_Sentinel::_CI): ... this type alias.
(take_view::_Sentinel::operator==): Adjust use of _CI
accordingly. Define a second overload that accepts an iterator
of the opposite constness as per LWG 3449.
(take_while_view::_Sentinel::operator==): Likewise.
* testsuite/std/ranges/adaptors/95322.cc: Add tests for LWG 3449.
The doxygen comments for these algos all incorrectly claim to use
(first - last) as the difference from the start of the output range to
the return value. As reported on the mailing list by Johannes Choo, it
should be (last - first).
libstdc++-v3/ChangeLog:
* include/bits/stl_algobase.h (copy, move, copy_backward)
(move_backward): Fix documentation for returned iterator.
gcc/ChangeLog:
PR tree-optimization/97378
* range-op.cc (operator_trunc_mod::wi_fold): Return VARYING for mod by zero.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97378.c: New test.
This patch adds two new warnings:
-Wanalyzer-write-to-const
-Wanalyzer-write-to-string-literal
for code paths where the analyzer detects a write to a constant region.
As noted in the documentation part of the patch, the analyzer doesn't
prioritize detection of such writes, in that the state-merging logic
will blithely lose the distinction between const and non-const regions.
Hence false negatives are likely to arise due to state-merging.
However, if the analyzer does happen to spot such a write, it seems worth
reporting, hence this patch.
gcc/analyzer/ChangeLog:
* analyzer.opt (Wanalyzer-write-to-const): New.
(Wanalyzer-write-to-string-literal): New.
* region-model-impl-calls.cc (region_model::impl_call_memcpy):
Call check_for_writable_region.
(region_model::impl_call_memset): Likewise.
(region_model::impl_call_strcpy): Likewise.
* region-model.cc (class write_to_const_diagnostic): New.
(class write_to_string_literal_diagnostic): New.
(region_model::check_for_writable_region): New.
(region_model::set_value): Call check_for_writable_region.
* region-model.h (region_model::check_for_writable_region): New
decl.
gcc/ChangeLog:
* doc/invoke.texi: Document -Wanalyzer-write-to-const and
-Wanalyzer-write-to-string-literal.
gcc/testsuite/ChangeLog:
PR c/83347
PR middle-end/90404
PR analyzer/95007
* gcc.dg/analyzer/write-to-const-1.c: New test.
* gcc.dg/analyzer/write-to-string-literal-1.c: New test.
gcc/cp/ChangeLog:
PR c++/97201
* error.c (dump_type_suffix): Handle both the C and C++ forms of
zero-length arrays.
libstdc++-v3/ChangeLog:
PR c++/97201
* libsupc++/new (operator new): Add attribute alloc_size and malloc.
gcc/testsuite/ChangeLog:
PR c++/97201
* g++.dg/warn/Wplacement-new-size-8.C: Adjust expected message.
* g++.dg/warn/Warray-bounds-10.C: New test.
* g++.dg/warn/Warray-bounds-11.C: New test.
* g++.dg/warn/Warray-bounds-12.C: New test.
* g++.dg/warn/Warray-bounds-13.C: New test.
Also resolves:
PR middle-end/97342 - bogus -Wstringop-overflow with nonzero signed and unsigned offsets
PR middle-end/97023 - missing warning on buffer overflow in chained mempcpy
PR middle-end/96384 - bogus -Wstringop-overflow= storing into multidimensional array with index in range
gcc/ChangeLog:
PR middle-end/97342
PR middle-end/97023
PR middle-end/96384
* builtins.c (access_ref::access_ref): Initialize new member. Use
new enum.
(access_ref::size_remaining): Define new member function.
(inform_access): Handle expressions referencing objects.
(gimple_call_alloc_size): Call get_size_range instead of get_range.
(gimple_call_return_array): New function.
(get_range): Rename...
(get_offset_range): ...to this. Improve detection of ranges from
types of expressions.
(gimple_call_return_array): Adjust calls to get_range per above.
(compute_objsize): Same. Set maximum size or offset instead of
failing for unknown objects and handle more kinds of expressions.
(compute_objsize): Call access_ref::size_remaining.
(compute_objsize): Have transitional wrapper fail for pointers
into unknown objects.
(expand_builtin_strncmp): Call access_ref::size_remaining and
handle new cases.
* builtins.h (access_ref::size_remaining): Declare new member function.
(access_ref::set_max_size_range): Define new member function.
(access_ref::add_ofset, access_ref::add_max_ofset): Same.
(access_ref::add_base0): New data member.
* calls.c (get_size_range): Change argument type. Handle new
condition.
* calls.h (get_size_range): Adjust signature.
(enum size_range_flags): Define new type.
* gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Correct
argument to get_size_range.
* tree-ssa-strlen.c (get_range): Handle anti-ranges.
(maybe_warn_overflow): Check DECL_P before assuming it's one.
gcc/testsuite/ChangeLog:
PR middle-end/97342
PR middle-end/97023
PR middle-end/96384
* c-c++-common/Wrestrict.c: Adjust comment.
* gcc.dg/Wstringop-overflow-34.c: Remove xfail.
* gcc.dg/Wstringop-overflow-43.c: Remove xfails. Adjust regex patterns.
* gcc.dg/pr51683.c: Prune out expected warning.
* gcc.target/i386/pr60693.c: Same.
* g++.dg/warn/Wplacement-new-size-8.C: New test.
* gcc.dg/Wstringop-overflow-41.c: New test.
* gcc.dg/Wstringop-overflow-44.s: New test.
* gcc.dg/Wstringop-overflow-45.c: New test.
* gcc.dg/Wstringop-overflow-46.c: New test.
* gcc.dg/Wstringop-overflow-47.c: New test.
* gcc.dg/Wstringop-overflow-49.c: New test.
* gcc.dg/Wstringop-overflow-50.c: New test.
* gcc.dg/Wstringop-overflow-51.c: New test.
* gcc.dg/Wstringop-overflow-52.c: New test.
* gcc.dg/Wstringop-overflow-53.c: New test.
* gcc.dg/Wstringop-overflow-54.c: New test.
* gcc.dg/Wstringop-overflow-55.c: New test.
* gcc.dg/Wstringop-overread-5.c: New test.
Resolves:
PR c++/96511 - Incorrect -Wplacement-new on POINTER_PLUS into an array with 4-byte elements
PR middle-end/96384 - bogus -Wstringop-overflow= storing into multidimensional array with index in range
gcc/ChangeLog:
PR c++/96511
PR middle-end/96384
* builtins.c (get_range): Return full range of type when neither
value nor its range is available. Fail for ranges inverted due
to the signedness of offsets.
(compute_objsize): Handle more special array members. Handle
POINTER_PLUS_EXPR and VIEW_CONVERT_EXPR that come up in front end
code.
(access_ref::offset_bounded): Define new member function.
* builtins.h (access_ref::eval): New data member.
(access_ref::offset_bounded): New member function.
(access_ref::offset_zero): New member function.
(compute_objsize): Declare a new overload.
* gimple-array-bounds.cc (array_bounds_checker::check_array_ref): Use
enum special_array_member.
* tree.c (component_ref_size): Use special_array_member.
* tree.h (special_array_member): Define a new type.
(component_ref_size): Change signature.
gcc/cp/ChangeLog:
PR c++/96511
PR middle-end/96384
* init.c (warn_placement_new_too_small): Call builtin_objsize instead
of duplicating what it does.
gcc/testsuite/ChangeLog:
PR c++/96511
PR middle-end/96384
* g++.dg/init/strlen.C: Add expected warning.
* g++.dg/warn/Wplacement-new-size-1.C: Relax warnings.
* g++.dg/warn/Wplacement-new-size-2.C: Same.
* g++.dg/warn/Wplacement-new-size-6.C: Same.
* gcc.dg/Warray-bounds-58.c: Adjust
* gcc.dg/Wstringop-overflow-37.c: Same.
* g++.dg/warn/Wplacement-new-size-7.C: New test.
this is largely mechanical patch fixing some suboptimal datastructure decision
in modref. It records three different things
1) optimization_summaries that are used by tree-ssa-alias to disambiguate
(computed by local passes or ipa execute)
2) summaries produced by local analysis and used by the ipa execute
3) summaries_lto produced by analysis when streaming is expected,
streamed, used by ipa execute
All three items are stored in "summaries" datastructure where 1 dn 2
are mixed and differentiated by "finished" flags.
This use extra memory and also makes it impossible to use modref while producing
other IPA summaries (by ipa-prop and ipa-devirt). This patch separates the
summaries into three special purpose datastructures.
There is one fix to propagation in ipa_merge_modref_summary_after_inlining
where check to ignore stores was placed incorrectly. This seems to lead
to increased clobber disambiguations:
Alias oracle query stats:
refs_may_alias_p: 64267006 disambiguations, 74475486 queries
ref_maybe_used_by_call_p: 142119 disambiguations, 65169365 queries
call_may_clobber_ref_p: 22975 disambiguations, 28762 queries
nonoverlapping_component_refs_p: 0 disambiguations, 36803 queries
nonoverlapping_refs_since_match_p: 19401 disambiguations, 55550 must overlaps, 75722 queries
aliasing_component_refs_p: 54714 disambiguations, 759027 queries
TBAA oracle: 23636760 disambiguations 56001742 queries
16112157 are in alias set 0
10614737 queries asked about the same object
125 queries asked about the same alias set
0 access volatile
3994423 are dependent in the DAG
1643540 are aritificially in conflict with void *
Modref stats:
modref use: 11667 disambiguations, 40207 queries
modref clobber: 1508990 disambiguations, 1829697 queries
3916688 tbaa queries (2.140621 per modref query)
623504 base compares (0.340769 per modref query)
PTA query stats:
pt_solution_includes: 967354 disambiguations, 13605701 queries
pt_solutions_intersect: 1032982 disambiguations, 13121107 queries
Bootstrapped/regtested x86_64-linux. I plan to commit it tomorrow if there are
no complains.
gcc/ChangeLog:
2020-10-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (modref_summaries): Remove field IPA.
(class modref_summary_lto): New global variable.
(class modref_summaries_lto): New.
(modref_summary::modref_summary): Remove loads_lto and stores_lto.
(modref_summary::~modref_summary): Remove loads_lto and stores_lto.
(modref_summary::useful_p): Do not use lto_useful.
(modref_records_lto): New typedef.
(struct modref_summary_lto): New type.
(modref_summary_lto::modref_summary_lto): New member function.
(modref_summary_lto::~modref_summary_lto): New member function.
(modref_summary_lto::useful_p): New member function.
(modref_summary::dump): Do not handle lto.
(modref_summary_lto::dump): New member function.
(get_modref_function_summary): Use optimization_summary.
(merge_call_side_effects): Use optimization_summary.
(analyze_call): Use optimization_summary.
(struct summary_ptrs): New struture.
(analyze_load): Update to handle separate lto and non-lto summaries.
(analyze_store): Likewise.
(analyze_stmt): Likewise.
(remove_summary): Break out from ...
(analyze_function): ... here; update to handle seprated summaries.
(modref_summaries::insert): Do not handle lto summary.
(modref_summaries_lto::insert): New member function.
(modref_summaries::duplicate): Do not handle lto summary.
(modref_summaries_lto::duplicate): New member function.
(read_modref_records): Expect nolto_ret or lto_ret to be NULL>
(modref_write): Write lto summary.
(read_section): Handle separated summaries.
(modref_read): Initialize separated summaries.
(modref_transform): Handle separated summaries.
(pass_modref::execute): Turn summary to optimization_summary; handle
separate summaries.
(ignore_edge): Handle separate summaries.
(ipa_merge_modref_summary_after_inlining): Likewise.
(collapse_loads): Likewise.
(modref_propagate_in_scc): Likewise.
(pass_ipa_modref::execute): Likewise.
(ipa_modref_c_finalize): Likewise.
* ipa-modref.h (modref_records_lto): Remove typedef.
(struct modref_summary): Remove stores_lto, loads_lto and finished
fields; remove lto_useful_p member function.
This introduces a permute optimization phase for SLP which is
intended to cover the existing permute eliding for SLP reductions
plus handling commonizing the easy cases.
It currently uses graphds to compute a postorder on the reverse
SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
did (hopefully - I've adjusted most testcases that triggered it
a few days ago). It restricts itself to move around bijective
permutations to simplify things for now, mainly around constant nodes.
As a prerequesite it makes the SLP graph cyclic (ugh). It looks
like it would pay off to compute a PRE/POST order visit array
once and elide all the recursive SLP graph walks and their
visited hash-set. At least for the time where we do not change
the SLP graph during such walk.
I do not like using graphds too much but at least I don't have to
re-implement yet another RPO walk, so maybe it isn't too bad.
It now computes permute placement during iteration and thus should
get cycles more obviously correct.
Richard.
2020-10-06 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
Use SLP_TREE_REPRESENTATIVE.
* tree-vectorizer.h (_slp_tree::vertex): New member used
for graphds interfacing.
* tree-vect-slp.c (vect_build_slp_tree_2): Allocate space
for PHI SLP children.
(vect_analyze_slp_backedges): New function filling in SLP
node children for PHIs that correspond to backedge values.
(vect_analyze_slp): Call vect_analyze_slp_backedges for the
graph.
(vect_slp_analyze_node_operations): Deal with a cyclic graph.
(vect_schedule_slp_instance): Likewise.
(vect_schedule_slp): Likewise.
(slp_copy_subtree): Remove.
(vect_slp_rearrange_stmts): Likewise.
(vect_attempt_slp_rearrange_stmts): Likewise.
(vect_slp_build_vertices): New functions.
(vect_slp_permute): Likewise.
(vect_slp_perms_eq): Likewise.
(vect_optimize_slp): Remove special code to elide
permutations with SLP reductions. Implement generic
permute optimization.
* gcc.dg/vect/bb-slp-50.c: New testcase.
* gcc.dg/vect/bb-slp-51.c: Likewise.
gcc-4.8.5 does not accept case clauses with non-literal type, which
happens for "QImode" as it expands to (scalar_int_mode
((scalar_int_mode::from_int) E_QImode)).
Use E_QImode instead in arm_preferred_simd_mode, to fix the
build. Same for HImode, SImode, HFmode and SFmode as introduced by a
recent patch.
2020-10-12 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm.c (arm_preferred_simd_mode): Use E_FOOmode
instead of FOOmode.
The patch fixes the following 2 issues:
.MASK_STORE_LANES (&a, 4B, max_mask_34, vect_array.12);
here we miss to return the last argument as stored value.
ivtmp_32 = ivtmp_31 + POLY_INT_CST [4, 4];
here we miss a bail out in vect_recog_over_widening_pattern.
gcc/ChangeLog:
PR tree-optimization/97079
* internal-fn.c (internal_fn_stored_value_index): Handle also
.MASK_STORE_LANES.
* tree-vect-patterns.c (vect_recog_over_widening_pattern): Bail
out for unsupported TREE_TYPE.
gcc/testsuite/ChangeLog:
PR tree-optimization/97079
* gcc.target/aarch64/sve/pr97079.c: New test.
When a VEC_PERM SLP node just permutes existing lanes this confuses
the SLP subgraph detection where I tried to elide a node-based
visited hash-map in a way that doesn't work. Fixed by adding such.
2020-10-12 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_bb_partition_graph_r): Use visited
hash-map.
(vect_bb_partition_graph): Likewise.
When processing the cond expression, vect_recog_mask_conversion_pattern
doesn't consider the situation that two operands of rhs1 are different
vectypes, leading to a vect ICE. This patch adds the identification and
handling of the situation to fix the problem.
gcc/ChangeLog:
PR target/96757
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Add
the identification and handling of the dropped situation in the
cond expression processing phase.
gcc/testsuite/ChangeLog:
PR target/96757
* gcc.target/aarch64/pr96757.c: New test.
This patch fixes the PR by adjusting the input types of the intrinsic
prototypes to the ones mandated by ACLE
Turns out the tests in the testsuite were already using the correct
ones, but implicit conversions hid the bug...
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/97349
* config/aarch64/arm_neon.h (vdupq_n_p8, vdupq_n_p16,
vdupq_n_p64, vdupq_n_s8, vdupq_n_s16, vdupq_n_u8, vdupq_n_u16):
Fix argument type.
gcc/testsuite/
PR target/97349
* gcc.target/aarch64/simd/pr97349.c: New test.
The vector copysign pattern incorrectly assumes that vector
if_then_else operates on bits, not on elements. This can theoretically
mislead the optimizers. Fix by changing it to use bitwise operations,
like commit 2930bb321794 ("PR94613: Fix vec_sel builtin for IBM Z") did
for vec_sel builtin.
gcc/ChangeLog:
2020-10-07 Ilya Leoshkevich <iii@linux.ibm.com>
* config/s390/s390-protos.h (s390_build_signbit_mask): New
function.
* config/s390/s390.c (s390_contiguous_bitmask_vector_p):
Bitcast the argument to an integral mode.
(s390_expand_vec_init): Do not call
s390_contiguous_bitmask_vector_p with a scalar argument.
(s390_build_signbit_mask): New function.
* config/s390/vector.md (copysign<mode>3): Use bitwise
operations.
Some of the larger tests in the phobos testsuite on occasion trigger the
default timeout limit. Increasing the limit to 10 minutes should give
compilation enough time to finish.
libphobos/ChangeLog:
* testsuite/lib/libphobos.exp: Define tool_timeout, set to 600.
Fixes a symbol resolver bug where a private alias becomes public if used
before its declaration.
Reviewed-on: https://github.com/dlang/dmd/pull/11831
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 70aabfb51