As described in the PR, the recursive_directory_iterator constructor
calls advance(ec), but ec is a pointer so it calls _Dir::advance(bool).
The intention was to either call advance() or advance(*ec) depending
whether the pointer is null or not.
This fixes the bug and renames the parameter to ecptr to make similar
mistakes less likely in future.
libstdc++-v3/ChangeLog:
PR libstdc++/97731
* src/filesystem/dir.cc (recursive_directory_iterator): Call the
right overload of _Dir::advance.
* testsuite/experimental/filesystem/iterators/97731.cc: New test.
This fixes some multiple definition errors caused by the changes for
PR libstdc++/90295. The previous solution for inlining the members of
std::exception_ptr but still exporting them from the library was to
suppress the 'inline' keyword on those functions when compiling
libsupc++/eh_ptr.cc, so they get defined in that file. That produces ODR
violations though, because there are now both inline and non-inline
definitions in the library, due to the use of std::exception_ptr in
other files sucg as src/c++11/future.cc.
The new solution is to define all the relevant members as 'inline'
unconditionally, but use __attribute__((used)) to cause definitions to
be emitted in libsupc++/eh_ptr.cc as before. This doesn't quite work
however, because PR c++/67453 means the attribute is ignored on
constructors and destructors. As a workaround, the old solution
(conditionally inline) is still used for those members, but they are
given the always_inline attribute so that they aren't emitted in
src/c++11/future.o as inline definitions.
libstdc++-v3/ChangeLog:
PR libstdc++/97729
* include/std/future (__basic_future::_M_get_result): Use
nullptr for null pointer constant.
* libsupc++/eh_ptr.cc (operator==, operator!=): Remove
definitions.
* libsupc++/exception_ptr.h (_GLIBCXX_EH_PTR_USED): Define
macro to conditionally add __attribute__((__used__)).
(operator==, operator!=, exception_ptr::exception_ptr())
(exception_ptr::exception_ptr(const exception_ptr&))
(exception_ptr::~exception_ptr())
(exception_ptr::operator=(const exception_ptr&))
(exception_ptr::swap(exception_ptr&)): Always define as
inline. Add macro to be conditionally "used".
The test uses -fpic and doesn't query the target support
for that option otherwise, resulting in failure on configurations
not supporting -fpic such as VxWorks for kernel mode.
2020-11-03 Olivier Hainque <hainque@adacore.com>
gcc/testsuite/
* gcc.dg/sms-12.c: Add dg-require-effective-target fpic.
The change moves the definitions of PROBE_STACK_FIRST_REG
and PROBE_STACK_SECOND_REG to a more appropriate place for such
items (here, in aarch64.md as suggested by Richard), and adjusts
their value from r9/r10 to r10/r11 to free r9 for a possibly
more general purpose (e.g. as a static chain at least on targets
which have a private use of r18, such as Windows or Vxworks).
2020-11-07 Olivier Hainque <hainque@adacore.com>
gcc/
* config/aarch64/aarch64.md: Define PROBE_STACK_FIRST_REGNUM
and PROBE_STACK_SECOND_REGNUM constants, designating r10/r11.
Replacements for the PROBE_STACK_FIRST/SECOND_REG constants in
aarch64.c.
* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG): Remove.
(PROBE_STACK_SECOND_REG): Remove.
(aarch64_emit_probe_stack_range): Adjust to the _REG -> _REGNUM
suffix update for PROBE_STACK register numbers.
They say third time is the charm.. It looks like the testcase
disables the cost model and so AArch64 we end up being able to
do the permute but on x86 we can't. However when analyzing the
testcase I didn't disable the cost model hence the difference.
So I now guard the testcase on vect_load_lanes as there's not a
"can do any permute" test directive and load lanes is what I will
be fixing up next year so this should catch it.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-11b.c: Guard statements.
Sometimes the overflow flag will leak into the IL. Drop it while
creating ranges.
There are various places we could plug this. This patch just plugs things
at get_tree_range which is the entry point for ranges from tree expressions.
It fixes the PR, and probably fixes the ranger entirely, but we may need
to revisit this.
For example, I looked to see if there were other places that created
ranges with TREE_OVERFLOW set, and there are various. For example,
the following code pattern appears multiple times in vr-values.c:
else if (is_gimple_min_invariant (op0))
vr0.set (op0);
This can pick up TREE_OVERFLOW from the IL if present. However, the
ranger won't see them so we're good.
At some point we should audit all this. Or perhaps just nuke all
TREE_OVERFLOW's at irange::set.
For now, this will do.
gcc/ChangeLog:
PR tree-optimization/97721
* gimple-range.cc (get_tree_range): Drop overflow from constants.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97721.c: New test.
This change fixes a bug in the i386 backend when adding
-fzero-call-used-regs=all on a target that has no x87
registers.
When there is no x87 registers available, we should not
zero stack registers.
gcc/ChangeLog:
PR target/97715
* config/i386/i386.c (zero_all_st_registers): Return
earlier when the FPU is disabled.
gcc/testsuite/ChangeLog:
PR target/97715
* gcc.target/i386/zero-scratch-regs-32.c: New test.
- Add a missing 'explicit' to a basic_stringbuf constructor.
- Set up the get/put area pointers in the constructor from strings using
different allocator types.
- Remove public basic_stringbuf::__sv_type alias.
- Do not construct temporary basic_string objects with a
default-constructed allocator.
Also, change which basic_string constructor is used, as a minor
compile-time optimization. Constructing from a basic_string_view
requires more work from the compiler, so just use a pointer and length.
libstdc++-v3/ChangeLog:
* include/std/sstream (basic_stringbuf(const allocator_type&):
Add explicit.
(basic_stringbuf(const basic_string<C,T,SA>&, openmode, const A&)):
Call _M_stringbuf_init. Construct _M_string from pointer and length
to avoid constraint checks for string view.
(basic_stringbuf::view()): Make __sv_type alias local to the
function.
(basic_istringstream(const basic_string<C,T,SA>&, openmode, const A&)):
Pass string to _M_streambuf instead of constructing a temporary
with the wrong allocator.
(basic_ostringstream(const basic_string<C,T,SA>&, openmode, const A&)):
Likewise.
(basic_stringstream(const basic_string<C,T,SA>&, openmode, const A&)):
Likewise.
* src/c++20/sstream-inst.cc: Use string_view and wstring_view
typedefs in explicit instantiations.
* testsuite/27_io/basic_istringstream/cons/char/1.cc: Add more
tests for constructors.
* testsuite/27_io/basic_ostringstream/cons/char/1.cc: Likewise.
* testsuite/27_io/basic_stringbuf/cons/char/1.cc: Likewise.
* testsuite/27_io/basic_stringbuf/cons/char/2.cc: Likewise.
* testsuite/27_io/basic_stringbuf/cons/wchar_t/1.cc: Likewise.
* testsuite/27_io/basic_stringbuf/cons/wchar_t/2.cc: Likewise.
* testsuite/27_io/basic_stringstream/cons/char/1.cc: Likewise.
The following fixes SLP vectorization of stores that were
pattern recognized. Since in SLP vectorization pattern analysis
happens after dataref group analysis we have to adjust the groups
with the pattern stmts. This has some effects down the pipeline
and exposes cases where we looked at the wrong pattern/non-pattern
stmts.
2020-11-05 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
Use the original stmts.
(vect_slp_analyze_node_alignment): Use the pattern stmt.
* tree-vect-slp.c (vect_fixup_store_groups_with_patterns):
New function.
(vect_slp_analyze_bb_1): Call it.
* gcc.dg/vect/bb-slp-69.c: New testcase.
This optimizes sequential permutes. i.e. if there are two permutes back to back
this function applies the permute of the parent to the child and removed the
parent.
This relies on the materialization point calculation in optimize SLP.
This allows us to remove useless permutes such as
ldr q0, [x0, x3]
ldr q2, [x1, x3]
trn1 v1.4s, v0.4s, v0.4s
trn2 v0.4s, v0.4s, v0.4s
trn1 v0.4s, v1.4s, v0.4s
mov v1.16b, v3.16b
fcmla v1.4s, v0.4s, v2.4s, #0
fcmla v1.4s, v0.4s, v2.4s, #90
str q1, [x2, x3]
from the sequence the vectorizer puts out and give
ldr q0, [x0, x3]
ldr q2, [x1, x3]
mov v1.16b, v3.16b
fcmla v1.4s, v0.4s, v2.4s, #0
fcmla v1.4s, v0.4s, v2.4s, #90
str q1, [x2, x3]
instead.
gcc/ChangeLog:
* tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
(vect_optimize_slp): Optimize permutes.
(vectorizable_slp_permutation): Fix typo.
My previous patch accidentally enabled some tests on x86 because my target
selector foo was weak.. This now properly only runs them on AArch64.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-11b.c: Update testcase.
* gcc.dg/vect/slp-perm-6.c: Update target selector.
The change to clear the external_die_map slot after creating
the concrete instance DIE broke abstract origin processing which
tried to make sure to have those point to the early abstract instance
and not the concrete instance. The following restores this by
eventually following the abstract origin link in the concrete instance.
2020-11-05 Richard Biener <rguenther@suse.de>
PR debug/97718
* dwarf2out.c (add_abstract_origin_attribute): Make sure to
point to the abstract instance.
This patch stores the SLP instance kind in the SLP instance so that we can use
it later when detecting load/store lanes support.
This also changes the load/store lane support check to only check if the SLP
kind is a store. This means that in order for the load/lanes to work all
instances must be of kind store.
gcc/ChangeLog:
* tree-vect-loop.c (vect_analyze_loop_2): Check kind.
* tree-vect-slp.c (vect_build_slp_instance): New.
(enum slp_instance_kind): Move to...
* tree-vectorizer.h (enum slp_instance_kind): .. Here
(SLP_INSTANCE_KIND): New.
This patch is to make vector CTOR with char/short leverage direct
move instructions when they are available. With one constructed
test case, it can speed up 145% for char and 190% for short on P9.
Tested SPEC2017 x264_r at -Ofast on P9, it gets 1.61% speedup
(but based on unexpected SLP see PR96789).
Bootstrapped/regtested on powerpc64{,le}-linux-gnu P8 and
powerpc64le-linux-gnu P9.
gcc/ChangeLog:
PR target/96933
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Use direct move
instructions for vector construction with char/short types.
* config/rs6000/rs6000.md (p8_mtvsrwz_v16qisi2): New define_insn.
(p8_mtvsrd_v16qidi2): Likewise.
gcc/testsuite/ChangeLog:
PR target/96933
* gcc.target/powerpc/pr96933-1.c: New test.
* gcc.target/powerpc/pr96933-2.c: New test.
* gcc.target/powerpc/pr96933-3.c: New test.
* gcc.target/powerpc/pr96933-4.c: New test.
* gcc.target/powerpc/pr96933.h: New test.
* gcc.target/powerpc/pr96933-run.h: New test.
This moves the code that checks for load/store lanes further in the pipeline and
places it after slp_optimize. This would allow us to perform optimizations on
the SLP tree and only bail out if we really have a permute.
With this change it allows us to handle permutes such as {1,1,1,1} which should
be handled by a load and replicate.
This change however makes it all or nothing. Either all instances can be handled
or none at all. This is why some of the test cases have been adjusted.
gcc/ChangeLog:
* tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes
check to ...
* tree-vect-loop.c (vect_analyze_loop_2): ..Here
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-11b.c: Update output scan.
* gcc.dg/vect/slp-perm-6.c: Likewise.
Go programs expect to be able to get reliable backtrace information
with correct file/line information, but -fipa-icf-functions breaks
that because it merges together distinct functions which should have
distinct file/line info.
* go-lang.c (go_langhook_post_options): Disable
-fipa-icf-functions if it was not explicitly enabled.
Add a timestamp to supplement the global range cache to detect when a value
may become stale.
gcc/
PR tree-optimization/97515
* gimple-range-cache.h (class ranger_cache): New prototypes plus
temporal cache pointer.
* gimple-range-cache.cc (struct range_timestamp): New.
(class temporal_cache): New.
(temporal_cache::temporal_cache): New.
(temporal_cache::~temporal_cache): New.
(temporal_cache::get_timestamp): New.
(temporal_cache::set_dependency): New.
(temporal_cache::temporal_value): New.
(temporal_cache::current_p): New.
(temporal_cache::set_timestamp): New.
(temporal_cache::set_always_current): New.
(ranger_cache::ranger_cache): Allocate the temporal cache.
(ranger_cache::~ranger_cache): Free temporal cache.
(ranger_cache::get_non_stale_global_range): New.
(ranger_cache::set_global_range): Add a timestamp.
(ranger_cache::register_dependency): New. Add timestamp dependency.
* gimple-range.cc (gimple_ranger::range_of_range_op): Add operand
dependencies.
(gimple_ranger::range_of_phi): Ditto.
(gimple_ranger::range_of_stmt): Check if global range is stale, and
recalculate if so.
gcc/testsuite/
* gcc.dg/pr97515.c: Check listing for folding of entire function.
As noted in PR 96817 this new test fails if the library is built without
futexes. That's expected of course, but we might as well fail more
obviously than a deadlock that eventually times out.
libstdc++-v3/ChangeLog:
* testsuite/18_support/96817.cc: Fail fail if the library is
configured to not use futexes.
I forgot to cost vectorized PHIs. Scalar PHIs are just costed
as scalar_stmt so the following costs vector PHIs as vector_stmt.
2020-11-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vectorizable_phi): Adjust prototype.
* tree-vect-stmts.c (vect_transform_stmt): Adjust.
(vect_analyze_stmt): Pass cost_vec to vectorizable_phi.
* tree-vect-loop.c (vectorizable_phi): Do costing.
This properly sets the abnormal flag when vectorizing live lanes
when the original scalar was live across an abnormal edge.
2020-11-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/97709
* tree-vect-loop.c (vectorizable_live_operation): Set
SSA_NAME_OCCURS_IN_ABNORMAL_PHI when necessary.
* gcc.dg/vect/bb-slp-pr97709.c: New testcase.
The following patch generalizes the x ? 1 : 0 -> (int) x optimization
to handle also left shifts by constant.
During x86_64-linux and i686-linux bootstraps + regtests it triggered
in 1514 unique non-LTO -m64 cases (sort -u on log mentioning
filename, function name and shift count) and 1866 -m32 cases.
Unfortunately, the patch regresses (before the tests have been adjusted):
+FAIL: gcc.dg/tree-ssa/ssa-ccp-11.c scan-tree-dump-times optimized "if " 0
+FAIL: gcc.dg/vect/bb-slp-pattern-2.c -flto -ffat-lto-objects scan-tree-dump-times slp1 "optimized: basic block" 1
+FAIL: gcc.dg/vect/bb-slp-pattern-2.c scan-tree-dump-times slp1 "optimized: basic block" 1
and in both cases it actually results in worse code.
> > We'd need some optimization that would go through all PHI edges and
> > compute if some use of the phi results don't actually compute a constant
> > across all the PHI edges - 1 & 0 and 0 & 1 is always 0.
> PRE should do this, IMHO only optimizing it at -O2 is fine.
> > Similarly, in the slp vectorization test there is:
> > a[0] = b[0] ? 1 : 7;
> note this, carefully avoiding the already "optimized" b[0] ? 1 : 0 ...
> So the option is to put : 7 in the 2, 4 an 8 case as well. The testcase
> wasn't added for any real-world case but is artificial I guess for
> COND_EXPR handling of invariants.
> But yeah, for things like SLP it means we eventually have to
> implement reverse transforms for all of this to make the lanes
> matching. But that's true anyway for things like x + 1 vs. x + 0
> or x / 3 vs. x / 2 or other simplifications we do.
2020-11-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97690
* tree-ssa-phiopt.c (conditional_replacement): Also optimize
cond ? pow2p_cst : 0 as ((type) cond) << cst.
* gcc.dg/tree-ssa/phi-opt-22.c: New test.
* gcc.dg/tree-ssa/ssa-ccp-11.c: Use -O2 instead of -O1.
* gcc.dg/vect/bb-slp-pattern-2.c (foo): Use ? 2 : 7, ? 4 : 7 and
? 8 : 7 instead of ? 2 : 0, ? 4 : 0, ? 8 : 0.
Clang and EDG say the class member access expressions __urng.min() and
__urng.max() are not constant expressions, because the object expression
__urng is not usable in a constant expresion. Use a qualified-id to call
those static member functions instead.
Co-authored-by: Stephan Bergmann <sbergman@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/uniform_int_dist.h (uniform_int_distribution::_S_nd):
Use qualified-id to refer to static member functions.
Change the default that is used by GIT server hook and also
by git_update_version.py. Both should use True now.
contrib/ChangeLog:
* gcc-changelog/git_repository.py: Set strict=True
for parse_git_revisions as a default.
This re-instantiates the previously removed CSE, fixing the
FAIL of gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
It turns out the previous approach still works.
2020-11-04 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vectorizable_induction): Re-instantiate
previously removed CSE of SLP IVs.
Add -mfloat-abi=soft and skip the tests if -mfloat-abi=hard is
supplied.
This avoids failures when testing with overridden flags such as
mthumb/-mcpu=cortex-m4/-mfloat-abi=hard
2020-11-04 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
* gcc.target/arm/pure-code/no-literal-pool-m0.c: Add dg-skip-if
and -mfloat-abi=soft option.
* gcc.target/arm/pure-code/no-literal-pool-m23.c: Likewise.
Especially when using mklog.py, it is simply to forget to fill in
the entries after the '\t* file.c (section):' or '\t(section):'.
contrib/ChangeLog:
* gcc-changelog/git_commit.py (item_parenthesis_empty_regex,
item_parenthesis_regex): Add.
(check_for_empty_description): Use them.
* gcc-changelog/test_email.py (test_emptry_entry_desc,
test_emptry_entry_desc_2): Add.
* gcc-changelog/test_patches.txt: Add two testcases for it.
This patch finds the base expression of reduction array sections and uses it
in checks whether allocate clause lists only variables that have been privatized.
Also fixes a pasto that caused an ICE.
2020-11-04 Jakub Jelinek <jakub@redhat.com>
PR c++/97670
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Look through array reductions to find
underlying decl to clear in the allocate_head bitmap.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Look through array reductions to
find underlying decl to clear in the aligned_head bitmap.
gcc/cp/
* semantics.c (finish_omp_clauses): Look through array reductions to
find underlying decl to clear in the aligned_head bitmap. Use
DECL_UID (t) instead of DECL_UID (OMP_CLAUSE_DECL (c)) when clearing
in the bitmap. Only diagnose errors about allocate vars not being
privatized on the same construct on allocate clause if it has
a DECL_P OMP_CLAUSE_DECL.
gcc/testsuite/
* c-c++-common/gomp/allocate-4.c: New test.
* g++.dg/gomp/allocate-2.C: New test.
* g++.dg/gomp/allocate-3.C: New test.
Pastoed the previous fix too quickly, the following fixes the
correct spot - the memset, not the allocation.
2020-11-04 Richard Biener <rguenther@suse.de>
PR bootstrap/97666
* tree-vect-slp.c (vect_build_slp_tree_2): Revert previous
fix and instead adjust the memset.
While i386elf.h was originally derived from sysv4.h it has not been kept
up to date with the development of the compiler. Two changes are made:
* The return convention now follows the i386 and x86_64 SVR4 ABIs again.
* The more efficient default version of ASM_OUTPUT_ASCII in elfos.h is used.
2020-11-04 Pat Bernardi <bernardi@adacore.com>
gcc/ChangeLog
* config/i386/i386elf.h (SUBTARGET_RETURN_IN_MEMORY): Remove.
(ASM_OUTPUT_ASCII): Likewise.
(DEFAULT_PCC_STRUCT_RETURN): Define.
* config/i386/i386.c (ix86_return_in_memory): Remove
SUBTARGET_RETURN_IN_MEMORY.
We cannot, as things stand, handle Objective-C tree codes in
the switch and deal with this by calling out to a function that
has a dummy version when Objective-C is not enabled.
Because of the way the logic works (with a fall through to a
'sorry' in case of unhandled expressions), the function reports
cases that are known to be unsuitable for constant exprs. The
dummy function always reports 'false' and thus will fall through
to the 'sorry'.
gcc/c-family/ChangeLog:
* c-objc.h (objc_non_constant_expr_p): New.
* stub-objc.c (objc_non_constant_expr_p): New.
gcc/cp/ChangeLog:
* constexpr.c (potential_constant_expression_1): Handle
expressions known to be non-constant for Objective-C.
gcc/objc/ChangeLog:
* objc-act.c (objc_non_constant_expr_p): New.
C2x adds the nodiscard standard attribute, with an optional string
argument, as in C++; implement it for C.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
2020-11-04 Joseph Myers <joseph@codesourcery.com>
* c-decl.c (handle_nodiscard_attribute): New.
(std_attribute_table): Add nodiscard.
* c-parser.c (c_parser_std_attribute): Expect argument to
nodiscard attribute to be a string. Do not special-case ignoring
nodiscard.
* c-typeck.c (maybe_warn_nodiscard): New.
(build_compound_expr, emit_side_effect_warnings): Call
maybe_warn_nodiscard.
(c_process_expr_stmt, c_finish_stmt_expr): Also call
emit_side_effect_warnings if warn_unused_result.
gcc/testsuite/
2020-11-04 Joseph Myers <joseph@codesourcery.com>
* gcc.dg/c2x-attr-nodiscard-1.c, gcc.dg/c2x-attr-nodiscard-2.c,
gcc.dg/c2x-attr-nodiscard-3.c, gcc.dg/c2x-attr-nodiscard-4.c: New
tests.
* gcc.dg/c2x-attr-syntax-5.c: Remove nodiscard test.
gcc/ChangeLog
PR target/97540
* ira.c: (ira_setup_alts): Extract memory from operand only
for special memory constraint.
* recog.c (asm_operand_ok): Ditto.
* lra-constraints.c (process_alt_operands): MEM_P is
required for normal memory constraint.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr97540.c: New test.