As PR94023 shows, the expected SLP requires misaligned vector access
support. This patch is to guard the check under the target condition
vect_hw_misalign to ensure that.
gcc/testsuite/ChangeLog
2020-03-09 Kewen Lin <linkw@gcc.gnu.org>
PR testsuite/94023
* gcc.dg/vect/slp-perm-12.c: Expect loop vectorized messages only on
vect_hw_misalign targets.
We are unconditionally emitting an error here, without first checking complain.
gcc/cp/ChangeLog:
PR c++/93729
* call.c (convert_like_real): Check complain before emitting an error
about binding a bit-field to a reference.
gcc/testsuite/ChangeLog:
PR c++/93729
* g++.dg/concepts/pr93729.C: New test.
I noticed that in some concepts diagnostic messages, we were printing typename
types incorrectly, e.g. printing remove_reference_t<T> as
typename remove_reference<T>::remove_reference_t
instead of
typename remove_reference<T>::type.
Fix this by printing the TYPENAME_TYPE_FULLNAME instead of the TYPE_NAME in
cxx_pretty_printer::simple_type_specifier, which is consistent with how
dump_typename in error.c does it.
gcc/cp/ChangeLog:
* cxx-pretty-print.c (cxx_pretty_printer::simple_type_specifier)
[TYPENAME_TYPE]: Print the TYPENAME_TYPE_FULLNAME instead of the
TYPE_NAME.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/diagnostic4.C: New test.
This patch extends region_model::get_representative_tree so that dumps
are able to refer to string literals, which I've found useful in
investigating a state-bloat issue.
Doing so uncovered a bug in the handling of views I introduced in
r10-7024-ge516294a1acb28aaaad44cfd583cc6a80354044e where the code was
erroneously using TREE_TYPE on the view region's type, rather than just
using its type, which the patch also fixes.
gcc/analyzer/ChangeLog:
* analyzer.h (class array_region): New forward decl.
* program-state.cc (selftest::test_program_state_dumping_2): New.
(selftest::analyzer_program_state_cc_tests): Call it.
* region-model.cc (array_region::constant_from_key): New.
(region_model::get_representative_tree): Handle region_svalue by
generating an ADDR_EXPR.
(region_model::get_representative_path_var): In view handling,
remove erroneous TREE_TYPE when determining the type of the tree.
Handle array regions and STRING_CST.
(selftest::assert_dump_tree_eq): New.
(ASSERT_DUMP_TREE_EQ): New macro.
(selftest::test_get_representative_tree): New selftest.
(selftest::analyzer_region_model_cc_tests): Call it.
* region-model.h (region::dyn_cast_array_region): New vfunc.
(array_region::dyn_cast_array_region): New vfunc implementation.
(array_region::constant_from_key): New decl.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/malloc-4.c: Update expected output of leak to
reflect fix to region_model::get_representative_path_var, adding
the missing "*" from the cast.
This patch fixes a bug in which summarized state dumps involving a
non-NULL pointer to a region for which get_representative_path_var
returned NULL were erroneously dumped as "NULL".
It also extends sm-state dumps so that they show representative tree
values, where available.
Finally, it adds some selftest coverage for such dumps. Doing so
requires replacing some %qE with a dump_quoted_tree, to avoid
C vs C++ differences between "make selftest-c" and "make selftest-c++".
gcc/analyzer/ChangeLog:
* analyzer.h (dump_quoted_tree): New decl.
* engine.cc (exploded_node::dump_dot): Pass region model to
sm_state_map::print.
* program-state.cc: Include diagnostic-core.h.
(sm_state_map::print): Add "model" param and use it to print
representative trees. Only print origin information if non-null.
(sm_state_map::dump): Pass NULL for model to print call.
(program_state::print): Pass region model to sm_state_map::print.
(program_state::dump_to_pp): Use spaces rather than newlines when
summarizing. Pass region_model to sm_state_map::print.
(ana::selftest::assert_dump_eq): New function.
(ASSERT_DUMP_EQ): New macro.
(ana::selftest::test_program_state_dumping): New function.
(ana::selftest::analyzer_program_state_cc_tests): Call it.
* program-state.h (program_state::print): Add model param.
* region-model.cc (dump_quoted_tree): New function.
(map_region::print_fields): Use dump_quoted_tree rather than
%qE to avoid lang-dependent output.
(map_region::dump_child_label): Likewise.
(region_model::dump_summary_of_map): For SK_REGION, when
get_representative_path_var fails, print the region id rather than
erroneously printing NULL.
* sm.cc (state_machine::get_state_by_name): New function.
* sm.h (state_machine::get_state_by_name): New decl.
PR c++/94027
* mangle.c (find_substitution): Don't call same_type_p on template
args that cannot match.
Now same_type_p rejects argument packs, we need to be more careful
calling it with template argument vector contents.
The mangler needs to do some comparisons to find the special
substitutions. While that code looks a little ugly, this seems the
smallest fix.
Inline assembler instructions don't have latency info and the scheduler does
not attempt to schedule them at all - it does not even honor latencies of
asm source operands. As a result, SIMD intrinsics which are implemented using
inline assembler perform very poorly, particularly on in-order cores.
Add new patterns and intrinsics for widening multiplies, which results in a
63% speedup for the example in the PR, thus fixing the reported regression.
gcc/
PR target/91598
* config/aarch64/aarch64-builtins.c (TYPES_TERNOPU_LANE): Add define.
* config/aarch64/aarch64-simd.md
(aarch64_vec_<su>mult_lane<Qlane>): Add new insn for widening lane mul.
(aarch64_vec_<su>mlal_lane<Qlane>): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Add intrinsics.
* config/aarch64/arm_neon.h:
(vmlal_lane_s16): Expand using intrinsics rather than inline asm.
(vmlal_lane_u16): Likewise.
(vmlal_lane_s32): Likewise.
(vmlal_lane_u32): Likewise.
(vmlal_laneq_s16): Likewise.
(vmlal_laneq_u16): Likewise.
(vmlal_laneq_s32): Likewise.
(vmlal_laneq_u32): Likewise.
(vmull_lane_s16): Likewise.
(vmull_lane_u16): Likewise.
(vmull_lane_s32): Likewise.
(vmull_lane_u32): Likewise.
(vmull_laneq_s16): Likewise.
(vmull_laneq_u16): Likewise.
(vmull_laneq_s32): Likewise.
(vmull_laneq_u32): Likewise.
* config/aarch64/iterators.md (Vcondtype): New iterator for lane mul.
(Qlane): Likewise.
The syntax for lane specifiers uses a vector element rather than a vector:
fmls v0.2s, v1.2s, v1.s[1] // rather than v1.2s[1]
Fix all the lane specifiers to use Vetype which uses the correct element type.
gcc/
* aarch64/aarch64-simd.md (aarch64_mla_elt<mode>): Correct lane syntax.
(aarch64_mla_elt_<vswap_width_name><mode>): Likewise.
(aarch64_mls_elt<mode>): Likewise.
(aarch64_mls_elt_<vswap_width_name><mode>): Likewise.
(aarch64_fma4_elt<mode>): Likewise.
(aarch64_fma4_elt_<vswap_width_name><mode>): Likewise.
(aarch64_fma4_elt_to_64v2df): Likewise.
(aarch64_fnma4_elt<mode>): Likewise.
(aarch64_fnma4_elt_<vswap_width_name><mode>): Likewise.
(aarch64_fnma4_elt_to_64v2df): Likewise.
testsuite/
* gcc.target/aarch64/fmla_intrinsic_1.c: Check for correct lane syntax.
* gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.
* gcc.target/aarch64/mla_intrinsic_1.c: Likewise.
* gcc.target/aarch64/mls_intrinsic_1.c: Likewise.
The two affected SVE2 patterns in this patch output a movprfx'ed instruction in their second alternative
but don't set the "movprfx" attribute, which will result in the wrong instruction length being assumed by the midend.
This patch fixes that in the same way as the other SVE patterns in the backend.
Bootstrapped and tested on aarch64-none-linux-gnu.
2020-03-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64-sve2.md (@aarch64_sve_<sve_int_op><mode>:
Specify movprfx attribute.
(@aarch64_sve_<sve_int_op>_lane_<mode>): Likewise.
aix61.h, aix71.h and aix72.h intends to prevent SUM_IN_TOC and FP_IN_TOC
when cmodel=large. This patch defines the variables associated with the
target options to 1 to _enable_ NO_SUM_IN_TOC and enable NO_FP_IN_TOC.
Bootstrapped on powerpc-ibm-aix7.2.0.0
2020-03-06 David Edelsohn <dje.gcc@gmail.com>
PR target/94065
* config/rs6000/aix61.h (TARGET_NO_SUM_IN_TOC): Set to 1 for
cmodel=large.
(TARGET_NO_FP_IN_TOC): Same.
* config/rs6000/aix71.h: Same.
* config/rs6000/aix72.h: Same.
The test is using -O1 and, the macu instruction is generated by the
combiner and not in the expand step. My previous "arc: Improve code
gen for 64bit add/sub operations." is actually splitting the 64-bit
add in the expand, leading to the impossibility to match the multiply
and accumulate on 64 bit datum by the combiner, hence, the error. This
patch is stepping up the optimization level which will generate the
macu instruction at the expand time.
xxxx-xx-xx Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/tumaddsidi4.c: Step-up optimization level.
Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
The converting constructor of join_view::_Sentinel<true> needs to be able to
access the private members of join_view::_Sentinel<false>.
libstdc++-v3/ChangeLog:
* include/std/ranges (join_view::_Sentinel<_Const>): Befriend
join_view::_Sentinel<!_Const>.
* testsuite/std/ranges/adaptors/join.cc: Augment test.
This works around PR 93978 by avoiding having to instantiate the body of
ranges::empty() when checking the constraints of view_interface::operator
bool(). When ranges::empty() has an auto return type, then we must instantiate
its body in order to determine whether the requires expression {
ranges::empty(_M_derived()); } is well-formed. But this means instantiating
view_interface::empty() and hence view_interface::_M_derived(), all before we've
yet deduced the return type of join_view::end(). (The reason
view_interface::operator bool() is needed in join_view::end() in the first place
is because in this function we perform direct initialization of
join_view::_Sentinel from a join_view, and so we try to find a conversion
sequence from the latter to the former that goes through this conversion
operator.)
Giving ranges::empty() a concrete return type of bool should be safe according
to [range.prim.empty]/4 which says "whenever ranges::empty(E) is a valid
expression, it has type bool."
This fixes the test case in PR 93978 when compiling without -Wall, but with -Wall
the test case still fails due to the issue described in PR c++/94038, I think.
I still don't quite understand why the test case doesn't fail without -O.
libstdc++-v3/ChangeLog:
PR libstdc++/93978
* include/bits/range_access.h (__cust_access::_Empty::operator()):
Declare return type to be bool instead of auto.
* testsuite/std/ranges/adaptors/93978.cc: New test.
When the target doesn't define PTHREAD_RWLOCK_INITIALIZER we use a
wrapper around pthread_wrlock_init, but the wrapper only takes one
argument and we try to call it with two.
This went unnnoticed on most targets because they do define the
PTHREAD_RWLOCK_INITIALIZER macro, but it causes a bootstrap failure on
darwin8.
PR libstdc++/93244
* include/std/shared_mutex [!PTHREAD_RWLOCK_INITIALIZER]
(__shared_mutex_pthread::__shared_mutex_pthread()): Remove incorrect
second argument to __glibcxx_rwlock_init.
* testsuite/30_threads/shared_timed_mutex/94069.cc: New test.
The checks for PR 93244 don't actually pass on Windows (which is the
target where the bug is present) because of a different bug, PR 94063.
This adjusts the tests to not be affected by 94063 so that they verify
that 93244 was fixed.
PR libstdc++/93244
* testsuite/27_io/filesystem/path/generic/generic_string.cc: Adjust
test to not fail due to PR 94063.
* testsuite/27_io/filesystem/path/generic/utf.cc: Likewise.
* testsuite/27_io/filesystem/path/generic/wchar_t.cc: Likewise.
zTPF uses the same numeric value for ENOSYS and ENOTSUP.
libstdc++-v3/ChangeLog:
2020-03-06 Andreas Krebbel <krebbel@linux.ibm.com>
* src/c++11/system_error.cc: Omit the ENOTSUP case statement if it
would match ENOSYS.
2020-03-06 Delia Burduv <delia.burduv@arm.com>
* config/arm/arm_neon.h (bfloat16x4x2_t): New typedef.
(bfloat16x8x2_t): New typedef.
(bfloat16x4x3_t): New typedef.
(bfloat16x8x3_t): New typedef.
(bfloat16x4x4_t): New typedef.
(bfloat16x8x4_t): New typedef.
(vst2_bf16): New.
(vst2q_bf16): New.
(vst3_bf16): New.
(vst3q_bf16): New.
(vst4_bf16): New.
(vst4q_bf16): New.
* config/arm/arm-builtins.c (v2bf_UP): Define.
(VAR13): New.
(arm_init_simd_builtin_types): Init Bfloat16x2_t eltype.
* config/arm/arm-modes.def (V2BF): New mode.
* config/arm/arm-simd-builtin-types.def
(Bfloat16x2_t): New entry.
* config/arm/arm_neon_builtins.def
(vst2): Changed to VAR13 and added v4bf, v8bf
(vst3): Changed to VAR13 and added v4bf, v8bf
(vst4): Changed to VAR13 and added v4bf, v8bf
* config/arm/iterators.md (VDXBF): New iterator.
(VQ2BF): New iterator.
*config/arm/neon.md (neon_vst2<mode>): Used new iterators.
(neon_vst2<mode>): Used new iterators.
(neon_vst3<mode>): Used new iterators.
(neon_vst3<mode>): Used new iterators.
(neon_vst3qa<mode>): Used new iterators.
(neon_vst3qb<mode>): Used new iterators.
(neon_vst4<mode>): Used new iterators.
(neon_vst4<mode>): Used new iterators.
(neon_vst4qa<mode>): Used new iterators.
(neon_vst4qb<mode>): Used new iterators.
* gcc.target/arm/simd/bf16_vstn_1.c: New test.
This patch adds the Armv8.6-a ACLE intrinsics for bfcvtn, bfcvtn2 and
bfcvt as part of the BFloat16 extension.
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
The intrinsics are declared in arm_bf16.h and arm_neon.h and the RTL
patterns are defined in aarch64-simd.md.
2020-03-06 Delia Burduv <delia.burduv@arm.com>
gcc/
* config/aarch64/aarch64-simd-builtins.def
(bfcvtn): New built-in function.
(bfcvtn_q): New built-in function.
(bfcvtn2): New built-in function.
(bfcvt): New built-in function.
* config/aarch64/aarch64-simd.md
(aarch64_bfcvtn<q><mode>): New pattern.
(aarch64_bfcvtn2v8bf): New pattern.
(aarch64_bfcvtbf): New pattern.
* config/aarch64/arm_bf16.h (float32_t): New typedef.
(vcvth_bf16_f32): New intrinsic.
* config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
(vcvtq_low_bf16_f32): New intrinsic.
(vcvtq_high_bf16_f32): New intrinsic.
* config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
(UNSPEC_BFCVTN): New UNSPEC.
(UNSPEC_BFCVTN2): New UNSPEC.
(UNSPEC_BFCVT): New UNSPEC.
* config/arm/types.md (bf_cvt): New type.
gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/bfcvt-nobf16.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: New test.
gcc/ChangeLog:
2020-03-06 Andreas Krebbel <krebbel@linux.ibm.com>
* config/s390/s390.md ("tabort"): Get rid of two consecutive
blanks in format string.
After add --param max-inline-insns-size=1 all target will remove the
redundant store at dse1, except some targets like AArch64 and MIPS will
expand the struct initialization into loop due to CLEAR_RATIO.
Tested on cross compiler of riscv32, riscv64, x86, x86_64, mips, mips64,
aarch64, nds32 and arm.
gcc/testsuite/ChangeLog
PR tree-optimization/90883
* g++.dg/tree-ssa/pr90883.c: Add --param max-inline-insns-size=1.
Add aarch64-*-* mips*-*-* to XFAIL.
On x86, when AVX and AVX512 are enabled, vector move instructions can
be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2
4: 62 f1 fd 08 6f d1 vmovdqa64 %xmm1,%xmm2
We prefer VEX encoding over EVEX since VEX is shorter. Also AVX512F
only supports 512-bit vector moves. AVX512F + AVX512VL supports 128-bit
and 256-bit vector moves. xmm16-xmm31 and ymm16-ymm31 are disallowed in
128-bit and 256-bit modes when AVX512VL is disabled. Mode attributes on
x86 vector move patterns indicate target preferences of vector move
encoding. For scalar register to register move, we can use 512-bit
vector move instructions to move 32-bit/64-bit scalar if AVX512VL isn't
available. With AVX512F and AVX512VL, we should use VEX encoding for
128-bit/256-bit vector moves if upper 16 vector registers aren't used.
This patch adds a function, ix86_output_ssemov, to generate vector moves:
1. If zmm registers are used, use EVEX encoding.
2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
will be generated.
3. If xmm16-xmm31/ymm16-ymm31 registers are used:
a. With AVX512VL, AVX512VL vector moves will be generated.
b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
move will be done with zmm register move.
There is no need to set mode attribute to XImode explicitly since
ix86_output_ssemov can properly encode xmm16-xmm31/ymm16-ymm31 registers
with and without AVX512VL.
Tested on AVX2 and AVX512 with and without --with-arch=native.
gcc/
PR target/89229
PR target/89346
* config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
* config/i386/i386.c (ix86_get_ssemov): New function.
(ix86_output_ssemov): Likewise.
* config/i386/sse.md (VMOVE:mov<mode>_internal): Call
ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_AVX512VL
check.
(*movxi_internal_avx512f): Call ix86_output_ssemov for TYPE_SSEMOV.
(*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
Remove ext_sse_reg_operand and TARGET_AVX512VL check.
(*movti_internal): Likewise.
(*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
gcc/testsuite/
PR target/89229
PR target/89346
* gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
* gcc.target/i386/pr89229-2a.c: New test.
* gcc.target/i386/pr89229-2b.c: Likewise.
* gcc.target/i386/pr89229-2c.c: Likewise.
* gcc.target/i386/pr89229-3a.c: Likewise.
* gcc.target/i386/pr89229-3b.c: Likewise.
* gcc.target/i386/pr89229-3c.c: Likewise.
* gcc.target/i386/pr89346.c: Likewise.
Bug 93577, apparently a regression (although it isn't very clear to me
exactly when it was introduced; tests I made with various past
compilers produced inconclusive results, including e.g. ICEs appearing
with 64-bit-host compilers for some versions but not 32-bit-host
compilers for the same versions) is an C front-end tree-checking ICE
processing initializers for structs using the VLA-in-struct extension.
There is an error for such initializers, but other processing that
still takes place for them results in the ICE.
This patch ensures that processing of initializers for variable-size
types stops earlier to avoid the code that results in the ICE (and
ensures it stops earlier for error_mark_node to avoid ICEs in the
check for variable-size types), adjusts the conditions for the "empty
scalar initializer" diagnostic to avoid consequent excess errors in
the case of a bad type name, and adds tests for a few variations on
what such initializers might look like, as well as tests for cases
identified from ICEs seen with an earlier version of this patch.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/93577
gcc/c:
* c-typeck.c (pop_init_level): Do not diagnose initializers as
empty when initialized type is error_mark_node.
(set_designator, process_init_element): Ignore initializers for
elements of a variable-size type or of error_mark_node.
gcc/testsuite:
* gcc.dg/pr93577-1.c, gcc.dg/pr93577-2.c, gcc.dg/pr93577-3.c,
gcc.dg/pr93577-4.c, gcc.dg/pr93577-5.c, gcc.dg/pr93577-6.c: New
tests.
* gcc.dg/vla-init-1.c: Expect fewer errors about VLA initializer.
PR tree-optimization/91890
* gimple-ssa-warn-restrict.c (maybe_diag_overlap): Remove LOC argument.
Use gimple_or_expr_nonartificial_location.
(check_bounds_overlap): Drop LOC argument to maybe_diag_access_bounds.
Use gimple_or_expr_nonartificial_location.
* gimple.c (gimple_or_expr_nonartificial_location): New function.
* gimple.h (gimple_or_expr_nonartificial_location): Declare it.
* tree-ssa-strlen.c (maybe_warn_overflow): Use
gimple_or_expr_nonartificial_location.
(maybe_diag_stxncpy_trunc, handle_builtin_stxncpy_strncat): Likewise.
(maybe_warn_pointless_strcmp): Likewise.
* gcc.dg/pragma-diag-8.c: New test.
As the testcases show, the macros we have for -O0 for intrinsics that require
constant argument(s) should first cast the argument to the type the -O1+
inline uses and afterwards to whatever type e.g. a builtin needs.
The PR reported one which violated this, and I've grepped for all double-casts
and grepped out from that meaningful casts where the __m{128,256,512}{,d,i}
first cast is cast to same sized __v* type and has the same kind of element
type (float, double, integral). These 7 macros were using different casts,
and I've double checked them against the inline function types.
2020-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/94046
* config/i386/avx2intrin.h (_mm_mask_i32gather_ps): Fix first cast of
SRC and MASK arguments to __m128 from __m128d.
(_mm256_mask_i32gather_ps): Fix first cast of MASK argument to __m256
from __m256d.
(_mm_mask_i64gather_ps): Fix first cast of MASK argument to __m128
from __m128d.
* config/i386/xopintrin.h (_mm_permute2_pd): Fix first cast of C
argument to __m128i from __m128d.
(_mm256_permute2_pd): Fix first cast of C argument to __m256i from
__m256d.
(_mm_permute2_ps): Fix first cast of C argument to __m128i from __m128.
(_mm256_permute2_ps): Fix first cast of C argument to __m256i from
__m256.
* g++.target/i386/pr94046-1.C: New test.
* g++.target/i386/pr94046-2.C: New test.
There's a -Wunused-but-set-variable warning in operations/all.cc which
can be fixed with [[maybe_unused]].
The statements in operations/copy.cc give -Wunused-value warnings. I
think I meant to use |= rather than !=.
And operations/file_size.cc gets -Wsign-compare warnings.
* testsuite/27_io/filesystem/operations/all.cc: Mark unused variable.
* testsuite/27_io/filesystem/operations/copy.cc: Fix typo.
* testsuite/experimental/filesystem/operations/copy.cc: Likewise.
* testsuite/27_io/filesystem/operations/file_size.cc: Use correct type
for return value, and in comparison.
* testsuite/experimental/filesystem/operations/file_size.cc: Likewise.
asan_test.cc tries to allocate 0xf0000000 bytes for 32bit targets in
a disabled DISABLED_DemoOOM test. Since the testcase is compiled with
-Werror, the compilation fails with:
error: argument 1 value '4026531840' exceeds maximum object size 2147483647
Compile with -Wno-alloc-size-larger-than to avoid compilation failure.
* g++.dg/asan/asan_test.C (dg-options): Add
-Wno-alloc-size-larger-than.
I don't think this is actually required to compile, because using
operator<< without a definition of the ostream doesn't seem valid to me.
But it's easy to make it work.
PR libstdc++/94051
* include/std/string_view: Include <bits/ostream_insert.h>.
* testsuite/21_strings/basic_string_view/inserters/94051.cc: New test.
A BOZ constant can not appear as a component inialiser for a derived
type.
gcc/fortran/ChangeLog:
PR93792
* decl.c (variable_decl): If param and initializer check
for BOZ, if found, output an error, set m to MATCH_ERROR
and goto cleanup.
gcc/testsuite/ChangeLog:
PR93792
* gfortran.dg/pr93792.f90: New test.
This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat
as part of the BFloat16 extension.
(https://developer.arm.com/docs/101028/latest.)
The intrinsics are declared in arm_neon.h and the RTL patterns are
defined in neon.md.
Two new tests are added to check assembler output and lane indices.
2020-03-05 Delia Burduv <delia.burduv@arm.com>
* config/arm/arm_neon.h (vbfmmlaq_f32): New.
(vbfmlalbq_f32): New.
(vbfmlaltq_f32): New.
(vbfmlalbq_lane_f32): New.
(vbfmlaltq_lane_f32): New.
(vbfmlalbq_laneq_f32): New.
(vbfmlaltq_laneq_f32): New.
* config/arm/arm_neon_builtins.def (vmmla): New.
(vfmab): New.
(vfmat): New.
(vfmab_lane): New.
(vfmat_lane): New.
(vfmab_laneq): New.
(vfmat_laneq): New.
* config/arm/iterators.md (BF_MA): New int iterator.
(bt): New int attribute.
(VQXBF): Copy of VQX with V8BF.
* config/arm/neon.md (neon_vmmlav8bf): New insn.
(neon_vfma<bt>v8bf): New insn.
(neon_vfma<bt>_lanev8bf): New insn.
(neon_vfma<bt>_laneqv8bf): New expand.
(neon_vget_high<mode>): Changed iterator to VQXBF.
* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC.
(UNSPEC_BFMAB): New UNSPEC.
(UNSPEC_BFMAT): New UNSPEC.
2020-03-05 Delia Burduv <delia.burduv@arm.com>
* gcc.target/arm/simd/bf16_ma_1.c: New test.
* gcc.target/arm/simd/bf16_ma_2.c: New test.
* gcc.target/arm/simd/bf16_mmla_1.c: New test.
The following testcase fails to assemble, as CONST_STRING in the DEBUG_INSNs
is printed as is, so if it contains \n and/or \r, we are in trouble:
.loc 1 14 3
# DEBUG haystack => [si]
# DEBUG needle => "
"
In the gimple dumps we print those (STRING_CSTs) as
# DEBUG haystack => D#1
# DEBUG needle => "\n"
so this patch uses what we use in tree printing for the CONST_STRINGs too.
2020-03-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93399
* tree-pretty-print.h (pretty_print_string): Declare.
* tree-pretty-print.c (pretty_print_string): Remove forward
declaration, no longer static. Change nbytes parameter type
from unsigned to size_t.
* print-rtl.c (print_value) <case CONST_STRING>: Use
pretty_print_string and for shrink way too long strings.
* gcc.dg/pr93399.c: New test.
> > where POINTER_PLUS_EXPR last operand has sizetype type, thus unsigned,
> > and in the testcase gimple_assign_rhs2 (def) is thus 0xf000000000000001ULL
> > which multiplied by 8 doesn't fit into signed HWI. If it would be treated
> > as signed offset instead, it would fit (-0xfffffffffffffffLL, multiplied
> > by 8 is -0x7ffffffffffffff8LL). Unfortunately with the poly_int obfuscation
> > I'm not sure how to convert it from unsigned to signed poly_int.
>
> mem_ref_offset provides a boiler-plate for this:
>
> poly_offset_int::from (wi::to_poly_wide (TREE_OPERAND (t, 1)), SIGNED);
Thanks, that seems to work.
The test now works on both big-endian and little-endian.
2020-03-05 Richard Biener <rguenther@suse.de>
Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/93582
* tree-ssa-sccvn.c (vn_reference_lookup_3): Treat POINTER_PLUS_EXPR
last operand as signed when looking for memset offset. Formatting
fix.
* gcc.dg/tree-ssa/pr93582-11.c: New test.
Co-authored-by: Richard Biener <rguenther@suse.de>