I went a little bit too simple with implementing SLP gather support
for emulated and builtin based gathers. The following fixes the
conflict that appears when running into .MASK_LOAD where we rely
on vect_get_operand_map and the bolted-on STMT_VINFO_GATHER_SCATTER_P
checking wrecks that. The following properly integrates this with
vect_get_operand_map, adding another special index refering to
the vect_check_gather_scatter analyzed offset.
This unbreaks aarch64 (and hopefully riscv), I'll followup with
more fixes and testsuite coverage for x86 where I think I got
masked gather SLP support wrong.
* tree-vect-slp.cc (off_map, off_op0_map, off_arg2_map,
off_arg3_arg2_map): New.
(vect_get_operand_map): Get flag whether the stmt was
recognized as gather or scatter and use the above
accordingly.
(vect_get_and_check_slp_defs): Adjust.
(vect_build_slp_tree_2): Likewise.
The omp_lock_hint_* parameters were deprecated in favor of
omp_sync_hint_*. While omp.h contained deprecation markers for those,
the omp_lib module only contained them for omp_{g,s}_nested.
Note: The -Wdeprecated-declarations warning will only become active once
openmp_version / _OPENMP is bumped from 201511 (4.5) to 201811 (5.0).
libgomp/ChangeLog:
* omp_lib.f90.in: Tag omp_lock_hint_* as being deprecated when
_OPENMP >= 201811.
1. Remove "m_" prefix as they are not private members.
2. Rename infos -> local_infos, info -> global_info to clarify their meaning.
Pushed as it is obvious.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): Rename variables.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.
(pre_vsetvl::emit_vsetvl): Ditto.
With the patch enabling the vectorization of early-breaks, we'd like to allow
bitfield lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so. In order to avoid a similar issue to PR107275,
the code that rejects loops with certain types of gimple_stmts was hoisted from
'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
to lower bitfields in loops we are not going to vectorize anyway.
This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
shouldn't as it will never come across them. I made sure to add a comment to
make clear that there is a direct connection between the two and if we were to
enable vectorization of any other gimple statement we should make sure both
handle it.
gcc/ChangeLog:
* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
(get_loop_body_if_conv_order): ... to here.
(if_convertible_loop_p): Remove single_exit check.
(tree_if_conversion): Move single_exit check to if-conversion part and
support multiple exits.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-8.c: New test.
* gcc.dg/vect/vect-bitfield-read-9.c: New test.
Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>
The bitfield vectorization support does not currently recognize bitfields inside
gconds. This means they can't be used as conditions for early break
vectorization which is a functionality we require.
This adds support for them by explicitly matching and handling gcond as a
source.
Testcases are added in the testsuite update patch as the only way to get there
is with the early break vectorization. See tests:
- vect-early-break_20.c
- vect-early-break_21.c
gcc/ChangeLog:
* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
from original statement.
(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.
Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>
Hi, all
This patch aims to fix some scan-asm fail of pr89229-{5,6,7}b.c since we emit
scalar vmov{s,d} here, when trying to use x/ymm 16+ w/o avx512vl but with
avx512f+evex512.
If everyone has no objection to the modification of this behavior, then we tend
to solve these failures by modifying these testcases.
BRs,
Lin
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr89229-5b.c: Modify test.
* gcc.target/i386/pr89229-6b.c: Ditto.
* gcc.target/i386/pr89229-7b.c: Ditto.
The need to initialize edge probabilities has made make_eh_edges
undesirably hard to use. I suppose we don't want make_eh_edges to
initialize the probability of the newly-added edge itself, so that the
caller takes care of it, but identifying the added edge in need of
adjustments is inefficient and cumbersome. Change make_eh_edges so
that it returns the added edge.
for gcc/ChangeLog
* tree-eh.cc (make_eh_edges): Return the new edge.
* tree-eh.h (make_eh_edges): Likewise.
This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.
To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.
This patch also ensures that constructors for a union field mark that
field as the active member before entering the call itself; this ensures
that modifications of the field within the constructor's body don't
cause false positives (as these will not appear to be member access
expressions). This means that we no longer need to start the lifetime of
empty union members after the constructor body completes.
As a drive-by fix, this patch also ensures that value-initialised unions
are considered to have activated their initial member for the purpose of
checking stores and accesses, which catches some additional mistakes
pre-C++20.
PR c++/101631
PR c++/102286
gcc/cp/ChangeLog:
* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_component_reference): Check against first member of
value-initialised union.
(cxx_eval_store_expression): Activate member for
value-initialised union. Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
* g++.dg/cpp1y/constexpr-union6.C: New test.
* g++.dg/cpp1y/constexpr-union7.C: New test.
* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.
* g++.dg/cpp2a/constexpr-union6.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
This patch improves the errors given when casting from void* in C++26 to
include the expected type if the types of the pointed-to objects were
not similar. It also ensures (for all standard modes) that void* casts
are checked even for DECL_ARTIFICIAL declarations, such as
lifetime-extended temporaries, and is only ignored for cases where we
know it's OK (e.g. source_location::current) or have no other choice
(heap-allocated data).
gcc/cp/ChangeLog:
* constexpr.cc (is_std_source_location_current): New.
(cxx_eval_constant_expression): Only ignore cast from void* for
specific cases and improve other diagnostics.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-cast4.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
This patch is an optimization tweak for cp_fold_r. If we cp_fold_r the
COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O.
cp_fold has:
3143 if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
3144 && !flag_no_inline)
...
3151 r = maybe_constant_value (x, /*decl=*/NULL_TREE,
flag_no_inline is 1 for -O0:
1124 if (opts->x_optimize == 0)
1125 {
1126 /* Inlining does not work if not optimizing,
1127 so force it not to be done. */
1128 opts->x_warn_inline = 0;
1129 opts->x_flag_no_inline = 1;
1130 }
but otherwise it's 0 and cp_fold will maybe_constant_value calls to
constexpr functions. And if it doesn't, then folding the COND_EXPR
will keep both arms, and we can avoid calling maybe_constant_value.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_r): Don't call maybe_constant_value.
This patch enables the compiler to use inbranch simdclones when generating
masked loops in autovectorization.
gcc/ChangeLog:
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
compatible with mask parameters in clone.
* tree-vect-stmts.cc (vect_build_all_ones_mask): Allow vector boolean
typed masks.
(vectorizable_simd_clone_call): Enable the use of masked clones in
fully masked loops.
When analyzing a loop and choosing a simdclone to use it is possible to choose
a simdclone that cannot be used 'inbranch' for a loop that can use partial
vectors. This may lead to the vectorizer deciding to use partial vectors which
are not supported for notinbranch simd clones. This patch fixes that by
disabling the use of partial vectors once a notinbranch simd clone has been
selected.
gcc/ChangeLog:
PR tree-optimization/110485
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
vectors usage if a notinbranch simdclone has been selected.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/pr110485.c: New test.
The vect_get_smallest_scalar_type helper function was using any argument to a
simd clone call when trying to determine the smallest scalar type that would be
vectorized. This included the function pointer type in a MASK_CALL for
instance, and would result in the wrong type being selected. Instead this
patch special cases simd_clone_call's and uses only scalar types of the
original function that get transformed into vector types.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Special case
simd clone calls and only use types that are mapped to vectors.
(simd_clone_call_p): New helper function.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
between targets with different pointer sizes.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
Teach parloops how to handle a poly nit and bound e ahead of the changes to
enable non-constant simdlen.
gcc/ChangeLog:
* tree-parloops.cc (try_transform_to_exit_first_loop_alt): Accept
poly NIT and ALT_BOUND.
SVE simd clones require to be compiled with a SVE target enabled or the argument
types will not be created properly. To achieve this we need to copy
DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
clones. I decided it was probably also a good idea to copy
DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
be compiled with specific optimization options.
gcc/ChangeLog:
* tree-parloops.cc (create_loop_fn): Copy specific target and
optimization options to clone.
On merge, reuse a merged node's possibly cached hash code only if we are on the
same type of hash and this hash is stateless.
Usage of function pointers or std::function as hash functor will prevent reusing
cached hash code.
libstdc++-v3/ChangeLog
* include/bits/hashtable_policy.h
(_Hash_code_base::_M_hash_code(const _Hash&, const _Hash_node_value<>&)): Remove.
(_Hash_code_base::_M_hash_code<_H2>(const _H2&, const _Hash_node_value<>&)): Remove.
* include/bits/hashtable.h
(_M_src_hash_code<_H2>(const _H2&, const key_type&, const __node_value_type&)): New.
(_M_merge_unique<>, _M_merge_multi<>): Use latter.
* testsuite/23_containers/unordered_map/modifiers/merge.cc
(test04, test05, test06): New test cases.
In the case of convert_argument, we would return the same expression
back rather than error_mark_node after the error message about
trying to convert to an incomplete type. This causes issues in
the gimplfier trying to see if another conversion is needed.
The code here dates back to before the revision history too so
it might be the case it never noticed we should return an error_mark_node.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/100532
gcc/c/ChangeLog:
* c-typeck.cc (convert_argument): After erroring out
about an incomplete type return error_mark_node.
gcc/testsuite/ChangeLog:
* gcc.dg/pr100532-1.c: New test.
In a similar way we don't warn about NULL pointer constant conversion to
a different named address we should not warn to a different sso endian
either.
This adds the simple check.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/104822
gcc/c/ChangeLog:
* c-typeck.cc (convert_for_assignment): Check for null pointer
before warning about an incompatible scalar storage order.
gcc/testsuite/ChangeLog:
* gcc.dg/sso-18.c: New test.
* gcc.dg/sso-19.c: New test.
While checking another change, I noticed that the new permerror overloads
break gettext with "permerror used incompatibly as both
--keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
--keyword=permerror:3 --flag=permerror:3:gcc-internal-format". So let's
change the name.
gcc/ChangeLog:
* diagnostic-core.h (permerror): Rename new overloads...
(permerror_opt): To this.
* diagnostic.cc: Likewise.
gcc/cp/ChangeLog:
* typeck2.cc (check_narrowing): Adjust.
Since these strings are passed to error_at, they should be marked for
translation with G_, like other diagnostic messages, rather than _, which
forces immediate (redundant) translation. The use of N_ is less
problematic, but also imprecise.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_primary_expression): Use G_.
(cp_parser_using_enum): Likewise.
* decl.cc (identify_goto): Likewise.
SPARK RM 6.1.11 introduces a new aspect Side_Effects to denote
those functions which may have output parameters, write global
variables, raise exceptions and not terminate. This adds support
for this aspect and the corresponding pragma in the frontend.
Handling of this aspect in the frontend is very similar to
the handling of aspect Extensions_Visible: both are Boolean
aspects whose expression should be static, they can be specified
on the same entities, with the same rule of inheritance from
overridden to overriding primitives for tagged types.
There is no impact on code generation.
gcc/ada/
* aspects.ads: Add aspect Side_Effects.
* contracts.adb (Add_Pre_Post_Condition)
(Inherit_Subprogram_Contract): Add support for new contract.
* contracts.ads: Update comments.
* einfo-utils.adb (Get_Pragma): Add support.
* einfo-utils.ads (Prag): Update comment.
* errout.ads: Add explain codes.
* par-prag.adb (Prag): Add support.
* sem_ch13.adb (Analyze_Aspect_Specifications)
(Check_Aspect_At_Freeze_Point): Add support.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper)
(Analyze_Subprogram_Declaration): Call new analysis procedure to
check SPARK legality rules.
(Analyze_SPARK_Subprogram_Specification): New procedure to check
SPARK legality rules. Use an explain code for the error.
(Analyze_Subprogram_Specification): Move checks to new subprogram.
This code was effectively dead, as the kind for parameters was set
to E_Void at this point to detect early references.
* sem_ch6.ads (Analyze_Subprogram_Specification): Add new
procedure.
* sem_prag.adb (Analyze_Depends_In_Decl_Part)
(Analyze_Global_In_Decl_Part): Adapt legality check to apply only
to functions without side-effects.
(Analyze_If_Present): Extract functionality in new procedure
Analyze_If_Present_Internal.
(Analyze_If_Present_Internal): New procedure to analyze given
pragma kind.
(Analyze_Pragmas_If_Present): New procedure to analyze given
pragma kind associated with a declaration.
(Analyze_Pragma): Adapt support for Always_Terminates and
Exceptional_Cases. Add support for Side_Effects. Make sure to call
Analyze_If_Present to ensure pragma Side_Effects is analyzed prior
to analyzing pragmas Global and Depends. Use explain codes for the
errors.
* sem_prag.ads (Analyze_Pragmas_If_Present): Add new procedure.
* sem_util.adb (Is_Function_With_Side_Effects): New query function
to determine if a function is a function with side-effects.
* sem_util.ads (Is_Function_With_Side_Effects): Same.
* snames.ads-tmpl: Declare new names for pragma and aspect.
* doc/gnat_rm/implementation_defined_aspects.rst: Document new aspect.
* doc/gnat_rm/implementation_defined_pragmas.rst: Document new pragma.
* gnat_rm.texi: Regenerate.
Rewrite for loop containing an exit (which violates GNATcheck
rule Exits_From_Conditional_Loops), to use a while loop
which contains the exit criteria in its condition.
Also, move special case of first time through loop, to come
before loop.
gcc/ada/
* libgnat/s-imagef.adb (Set_Image_Fixed): Refactor loop.
Exempt the GNATcheck rule "Unassigned_OUT_Parameters"
with the rationale "the OUT parameter is assigned by component".
gcc/ada/
* libgnat/s-imguti.adb (Set_Decimal_Digits): Add pragma to exempt
Unassigned_OUT_Parameters.
(Set_Floating_Invalid_Value): Likewise
Add documentation for the -Q gnatbind switch in GNAT User's Guide and
improve gnatbind's help output for the switch to emphasize that it adds the
requested number of stacks to the secondary stack pool generated by the
binder.
gcc/ada/
* bindusg.adb (Display): Make it clear -Q adds to the number of
secondary stacks generated by the binder.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst:
Document the -Q gnatbind switch and fix references to old
runtimes.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
This patch is intended as a readability improvement. It doesn't
change the behavior of the compiler.
gcc/ada/
* sem_ch3.adb (Constrain_Array): Replace manual list length
computation by call to List_Length.
As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
the specific case of -Wunknown-pragmas, because that warning is issued
during preprocessing, but not by libcpp directly (it comes from the
cb_def_pragma callback). Address that by handling this pragma in
addition to libcpp pragmas during the early pragma handler.
gcc/c-family/ChangeLog:
PR c++/89038
* c-pragma.cc (handle_pragma_diagnostic_impl): Handle
-Wunknown-pragmas during early processing.
gcc/testsuite/ChangeLog:
PR c++/89038
* c-c++-common/cpp/Wunknown-pragmas-1.c: New test.
This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
that is not represented elsewhere.
gcc/testsuite/ChangeLog:
PR preprocessor/82335
* c-c++-common/cpp/diagnostic-pragma-3.c: New test.
As the testcase shows, when a PHI node dominates the loop there is no new
definition inside the loop. As such there would be no PHI nodes to update.
When we maintain LCSSA form we create an intermediate node in between the two
loops to thread alongt the value. However later on when we update the second
loop we don't have any PHI nodes to update and so adjust_phi_and_debug_stmts
does nothing. This leaves us with an incorrect phi node. Normally this does
nothing and just gets ignored. But in the case of the vUSE chain we end up
corrupting the chain.
As such whenever a PHI node's argument dominates the loop, we should remove
the newly created PHI node after edge redirection.
The one exception to this is when the loop has been versioned. In such cases
the versioned loop may not use the value but the second loop can.
When this happens and we add the loop guard unless the join block has the PHI
it can't find the original value for use inside the guard block.
The next refactoring in the series moves the formation of the guard block
inside peeling itself. Here we have all the information and wouldn't
need to re-create it later.
gcc/ChangeLog:
PR tree-optimization/111860
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Remove PHI nodes that dominate loop.
gcc/testsuite/ChangeLog:
PR tree-optimization/111860
* gcc.dg/vect/pr111860.c: New test.
The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target). That includes support for emulated gathers but also
the legacy x86 builtin path.
PR tree-optimization/111131
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.
* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one. Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.
The following moves the builtin decl gather vectorization path along
the internal function and emulated gather vectorization paths,
simplifying the existing function down to generating the call and
required conversions to the actual argument types. This thereby
exposes the unique support of two times larger number of offset
or data vector lanes. It also makes the code path handle SLP
in principle (but SLP build needs adjustments for this, patch coming).
* tree-vect-stmts.cc (vect_build_gather_load_calls): Rename
to ...
(vect_build_one_gather_load_call): ... this. Refactor,
inline widening/narrowing support ...
(vectorizable_load): ... here, do gather vectorization
with builtin decls along other gather vectorization.
The test is trying to check that we don't use q-register stores with
-mstrict-align, so actually check specifically for that.
This is a prerequisite to avoid regressing:
scan-assembler-not "add\tx0, x0, :"
with the upcoming ldp fusion pass, as we change where the ldps are
formed such that a register is used rather than a symbolic (lo_sum)
address for the first load.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr71727.c: Adjust scan-assembler-not to
make sure we don't have q-register stores with -mstrict-align.
With the new ldp/stp pass enabled, there is a change in the codegen for
this test as follows:
add x8, sp, 16
ptrue p3.h, mul3
str p3, [x8]
- str x8, [sp, 8]
- str x9, [sp]
+ stp x9, x8, [sp]
ptrue p3.d, vl8
ptrue p2.s, vl7
ptrue p1.h, vl6
i.e. we now form an stp that we were missing previously. This patch
adjusts the scan-assembler such that it should pass whether or not
we form the stp.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pcs/args_9.c: Adjust scan-assemblers to
allow for stp.
The test is looking for individual stores which are able to be merged
into stp instructions. The test currently passes -fno-schedule-fusion
-fno-peephole2, presumably to prevent these stores from being turned
into stps, but this is no longer sufficient with the new ldp/stp fusion
pass.
As such, we add --param=aarch64-stp-policy=never to prevent stps being
formed.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/lr_free_1.c: Add
--param=aarch64-stp-policy=never to dg-options.
Currently, rtl_ssa::change_insns requires all new uses and defs to be
specified explicitly. This turns out to be rather inconvenient for
forming load pairs in the new aarch64 load pair pass, as the pass has to
determine which mem def the final load pair consumes, and then obtain or
create a suitable use (i.e. significant bookkeeping, just to keep the
RTL-SSA IR consistent). It turns out to be much more convenient to
allow change_insns to infer which def is consumed and create a suitable
use of mem itself. This patch does that.
gcc/ChangeLog:
* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
parameter to give final insn position, infer use of mem if it isn't
specified explicitly.
(function_info::change_insns): Pass down final insn position to
finalize_new_accesses.
* rtl-ssa/functions.h: Add parameter to finalize_new_accesses.
This is needed by the upcoming aarch64 load pair pass, as it can
re-order stores (when alias analysis determines this is safe) and thus
change which mem def a given use consumes (in the RTL-SSA view, there is
no alias disambiguation of memory).
gcc/ChangeLog:
* rtl-ssa/accesses.cc (function_info::reparent_use): New.
* rtl-ssa/functions.h (function_info): Declare new member
function reparent_use.
Add a helper routine to access-utils.h which removes the memory access
from an access_array, if it has one.
gcc/ChangeLog:
* rtl-ssa/access-utils.h (drop_memory_access): New.
In the case that !insn->is_debug_insn () && next->is_debug_insn (), this
function was missing an update of the prev pointer on the first nondebug
insn following the sequence of debug insns starting at next.
This can lead to corruption of the insn chain, in that we end up with:
insn->next_any_insn ()->prev_any_insn () != insn
in this case. This patch fixes that.
gcc/ChangeLog:
* rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we
update the prev pointer on the following nondebug insn in the
case that !insn->is_debug_insn () && next->is_debug_insn ().
gcc/ChangeLog:
* config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
Also make Clearwater Forest depends on Sierra Forest.
* config/i386/i386-options.cc: Revise the order of the macro
definition to avoid confusion.
* doc/extend.texi: Revise documentation.
* doc/invoke.texi: Correct documentation.
gcc/testsuite/ChangeLog:
* gcc.target/i386/funcspec-56.inc: Group Clearwater Forest
with atom cores.
LLVM wants to remove it, which breaks our build. This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.
I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.
gcc/ChangeLog:
* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
Implement support for --with-multilib-list.
* config/gcn/t-gcn-hsa: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Mark Fiji deprecated.
This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to aarch64 and
rs6000 port.
The patch also reduces the cost of unaligned stores, making it equal to the
cost of aligned ones in order to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from
vector_costs. Add a constructor.
(loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to
adjust the cost for inner loops.
(loongarch_vector_costs::count_operations): New function.
(loongarch_vector_costs::determine_suggested_unroll_factor): Ditto.
(loongarch_vector_costs::finish_cost): Ditto.
(loongarch_builtin_vectorization_cost): Adjust.
* config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* config/loongarch/genopts/loongarch.opt.in
(loongarch-vect-unroll-limit): Ditto.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.
Add support for vec_widen lo/hi patterns. These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vec_widen_<su>mult_even_v8si): New patterns.
(vec_widen_<su>add_hi_<mode>): Ditto.
(vec_widen_<su>add_lo_<mode>): Ditto.
(vec_widen_<su>sub_hi_<mode>): Ditto.
(vec_widen_<su>sub_lo_<mode>): Ditto.
(vec_widen_<su>mult_hi_<mode>): Ditto.
(vec_widen_<su>mult_lo_<mode>): Ditto.
* config/loongarch/loongarch.md (u_bool): New iterator.
* config/loongarch/loongarch-protos.h
(loongarch_expand_vec_widen_hilo): New prototype.
* config/loongarch/loongarch.cc
(loongarch_expand_vec_interleave): New function.
(loongarch_expand_vec_widen_hilo): New function.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vect-widen-add.c: New test.
* gcc.target/loongarch/vect-widen-mul.c: New test.
* gcc.target/loongarch/vect-widen-sub.c: New test.