mirror/gcc - gcc - Collaboration & Inovation

mirror/gcc

mirror of git://gcc.gnu.org/git/gcc.git synced 2024-11-28 09:20:03 +08:00

Author	SHA1	Message	Date
Jeff Law	df2e832c90	[RISC-V][PR target/109279] Improve RISC-V constant synthesis This is a small improvement to the constant synthesis code to capture a case appended to PR 109279. The case in question has the property that the high 32 bits have the value one less than the low 32 bits and the highest bit in two low 32 bits is on. The example used in BZ is 0xcccccccccccccccd which comes up computing N/10. When we construct a constant with bit 31 on, it gets implicitly sign extended. So something like 0xcccccccd when constructed would generate 0xffffffffcccccccd. The low bits are precisely what we want and the high bits are a "-1". Both properties are useful. We left shift that value by 32 positions into a temporary and add that temporary to the original value. Concretely: 0xffffffffcccccccd + 0xcccccccd00000000 ------------------ 0xcccccccccccccccd Tested in my tester on rv32 and rv64, waiting on the pre-commit tester to do its thing. PR target/109279 gcc/ * config/riscv/riscv.cc (riscv_build_integer): Handle another 64-bit synthesis where high half is one less than the low half and the 32-bit sign bit is on. gcc/testsuite/ * gcc.target/riscv/synthesis-16.c: New test.	2024-11-22 16:12:45 -07:00
Andrew Pinski	76c2023294	test-art: Fix comment in types.h The comment references INCLUDE_MEMORY but the code actually checks INCLUDE_VECTOR. So fix up the comment to mention INCLUDE_VECTROR. Pushed as obvious. gcc/ChangeLog: * text-art/types.h: Fix comment. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-22 13:13:57 -08:00
Georg-Johann Lay	982d10b74b	AVR: Tabify avr-common.cc according to coding rules. gcc/ * common/config/avr/avr-common.cc: Tabify.	2024-11-22 21:51:10 +01:00
Georg-Johann Lay	939362411d	AVR: target/117726 - Tweak ashiftrt:SI and lshiftrt:SI insns. This patch is similar to r15-5569 (tweak ashift:SI) but for ashiftrt and lshiftrt codes. It splits constant shift offsets > 16 into a 3-operand byte shift and a 2-operand residual bit shift. Moreover, some of the constraint alternatives have been promoted to 3-operand alternatives regardless of options. For example, ashift:HI and lshiftrt:HI can support 3 operands for offsets 9...12 without any overhead. Apart from that, it's a bit of code clean up for 2-byte and 4-byte shift insns: Use one RTL peephole with any_shift code iterator instead of 3 individual peepholes. It also removes some useless split insns; presumably introduced during the cc0 -> CCmode work. PR target/117726 gcc/ * config/avr/avr-passes.cc (avr_split_shift): Also handle ASHIFTRT and LSHIFTRT codes for 4-byte shifts. (constr_split_shift4): New code_attr. (avr_emit_shift): Adjust to new shift capabilities. * config/avr/predicates.md (scratch_or_d_register_operand): rename to scratch_or_dreg_operand. * config/avr/avr.md: Same. (define_peephole2): Write the RTL scratch peephole for 2-byte and 4-byte shifts that generates sh<mode>3_const insns using code iterator any_shift. (ashlhi3_const_split, ashrhi3_const_split, ashrhi3_const_split) (lshrsi3_const_split, lshrhi3_const_split): Remove useless split insns. (define_split) [avropt_split_bit_shift]: Add splitters for 4-byte ASHIFTRT and LSHIFTRT insns using avr_split_shift(). (ashrsi3, ashrsi3, ashrsi3_const): Add "r,0,C4a" and "r,r,C4a" constraint alternatives depending on 2op, 3op. (lshrsi3, lshrsi3, lshrsi3_const): Add "r,0,C4r" and "r,r,C4r" constraint alternatives depending on 2op, 3op. Add "r,r,C15". (lshrhi3, lshrhi3, lshrhi3_const, ashlhi3, ashlhi3) (ashlhi3_const): Add "r,r,C7c" alternative. (ashrpsi, ashrpsi3): Add "r,r,C22" alternative. (ashlqi, ashlqi): Turn C06 alternative into "r,r,C06". config/avr/constraints.md (C14, C22, C30, C7c): New constraints. * config/avr/avr.cc (ashlhi3_out, lshrhi3_out) [case 7, 9, 10, 11, 12]: Support as 3-operand insn. (lshrsi3_out) [case 15]: Same. (ashrsi3_out) [case 30]: Same. (ashrhi3_out) [case 14]: Same. (ashrqi3_out) [case 6]: Same. (avr_out_ashrpsi3) [case 22]: Same. * config/avr/avr.h: Fix comment typo. * doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Document.	2024-11-22 21:47:31 +01:00
Joseph Myers	84a335eb4f	c: Fix typeof_unqual handling of qualified array types [PR112841] As reported in bug 112841, typeof_unqual fails to remove qualifiers from qualified array types. In C23 (unlike in previous standard versions), array types are considered to have the qualifiers of the element type, so typeof_unqual should remove such qualifiers (and an example in the standard shows that is as intended). Fix this by calling strip_array_types when checking for the presence of qualifiers. (The reason we check for qualifiers rather than just using TYPE_MAIN_VARIANT unconditionally is to avoid, as a quality of implementation matter, unnecessarily losing typedef information in the case where the type is already unqualified.) Bootstrapped with no regressions for x86_64-pc-linux-gnu. PR c/112841 gcc/c/ * c-parser.cc (c_parser_typeof_specifier): Call strip_array_types when checking for type qualifiers for typeof_unqual. gcc/testsuite/ * gcc.dg/c23-typeof-4.c: New test.	2024-11-22 20:33:10 +00:00
Siddhesh Poyarekar	684595188d	tree-optimization/117355: object size for PHI nodes with negative offsets When the object size estimate is returned for a PHI node, it is the maximum possible value, which is fine in isolation. When combined with negative offsets however, it may sometimes end up in zero size because the resultant size was larger than the wholesize, leading size_for_offset to conclude that there's a potential underflow. Fix this by allowing a non-strict mode to size_for_offset, which conservatively returns the size (or wholesize) in case of a negative offset. gcc/ChangeLog: PR tree-optimization/117355 * tree-object-size.cc (size_for_offset): New argument STRICT, return SZ if it is set to false. (plus_stmt_object_size): Adjust call to SIZE_FOR_OFFSET. gcc/testsuite/ChangeLog: PR tree-optimization/117355 * g++.dg/ext/builtin-object-size2.C (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-3.c (test10): Adjust expected size. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>	2024-11-22 15:25:09 -05:00
Georg-Johann Lay	5f95136e5e	AVR: Use Var(avropt_xxx) for option variables in avr.opt. This is a no-op refactoring that uses a prefix of avropt_ (formerly: avr_) for variables defined qua Var() directives in avr.opt. This makes it easier to spot values that come directly from avr.opt in the rest of the backend. gcc/ * config/avr/avr.opt (avr_bits_e, avr_lra_p, avr_mmcu) (avr_gasisr_prologues, avr_n_flash, avr_log_details) (avr_branch_cost, avr_split_bit_shift, avr_strict_X) (avr_flmap, avr_rodata_in_ram, avr_sp8, avr_fuse_add) (avr_warn_addr_space_convert, avr_warn_misspelled_isr) (avr_fuse_move, avr_double, avr_long_double): Rename to respectively: avropt_bits_e, avropt_lra_p, avropt_mmcu, avropt_gasisr_prologues, avropt_n_flash, avropt_log_details, avropt_branch_cost, avropt_split_bit_shift, avropt_strict_X, avropt_flmap, avropt_rodata_in_ram, avropt_sp8, avropt_fuse_add, avropt_warn_addr_space_convert, avropt_warn_misspelled_isr, avropt_fuse_move, avropt_double, avropt_long_double. * config/avr/avr.h: Same. * config/avr/avr.cc: Same. * config/avr/avr.md: Same. * config/avr/avr-passes.cc * config/avr/avr-log.cc: Same. * common/config/avr/avr-common.cc: Same.	2024-11-22 21:08:38 +01:00
Jakub Jelinek	27778979c9	Add -f{,no-}assume-sane-operators-new-delete options [PR110137] The following patch adds a new option for optimizations related to replaceable global operators new/delete. The option isn't called -fassume-sane-operator-new (which clang++ implements), because 1) clang++ option means something different; initially it was an option to add malloc attribute to those declarations (but we have malloc attribute on all <new> calls already unconditionally); later it was changed to add noalias attribute rather than malloc, whatever it means, but it is certainly about the return value from the operator new (whether it can alias with other pointers); we already assume malloc-ish behavior that it doesn't alias any other pointers 2) the option only affects operator new, we want it affect also operator delete The option basically allows to choose between pre-PR101480 behavior (now the default, more optimistic) and post-PR101480 behavior (safer but penalizing most of the code in the wild for rare needs). I've tried to explain stuff in the documentation too. 2024-11-22 Jakub Jelinek <jakub@redhat.com> PR c++/110137 PR middle-end/101480 gcc/ * doc/invoke.texi (-fassume-sane-operators-new-delete, -fno-assume-sane-operators-new-delete): Document. * gimple.cc (gimple_call_fnspec): Handle -f{,no-}assume-sane-operators-new-delete. * ipa-inline-transform.cc (inline_call): Also clear flag_assume_sane_operators_new_delete on caller when inlining -fno-assume-sane-operators-new-delete callee into -fassume-sane-operators-new-delete caller. gcc/c-family/ * c.opt (fassume-sane-operators-new-delete): New option. gcc/testsuite/ * g++.dg/tree-ssa/pr110137-1.C: New test. * g++.dg/tree-ssa/pr110137-2.C: New test. * g++.dg/tree-ssa/pr110137-3.C: New test. * g++.dg/tree-ssa/pr110137-4.C: New test. * g++.dg/torture/pr10148.C: Add -fno-assume-sane-operators-new-delete as dg-additional-options. * g++.dg/warn/Warray-bounds-16.C: Revert 2021-11-10 changes.	2024-11-22 19:52:35 +01:00
Jakub Jelinek	c25c172959	match.pd: Fix up the new simpliofiers using with_possible_nonzero_bits2 [PR117420] The following testcase shows wrong-code caused by incorrect use of with_possible_nonzero_bits2. That matcher is defined as /* Slightly extended version, do not make it recursive to keep it cheap. / (match (with_possible_nonzero_bits2 @0) with_possible_nonzero_bits@0) (match (with_possible_nonzero_bits2 @0) (bit_and:c with_possible_nonzero_bits@0 @2)) and because with_possible_nonzero_bits includes the SSA_NAME case with integral/pointer argument, both forms can actually match when a SSA_NAME with integral/pointer type has a def stmt which is BIT_AND_EXPR assignment with say SSA_NAME with integral/pointer type as one of its operands (or INTEGER_CST, another with_possible_nonzero_bits case). And in match.pd the latter actually wins if both match and so when using (with_possible_nonzero_bits2 @0) the @0 will actually be one of the BIT_AND_EXPR operands if that form is matched. Now, with_possible_nonzero_bits2 and with_certain_nonzero_bits2 were added for the / X == C (or X & Z == Y \| C) is impossible if ~nonzero(X) & C != 0. / (for cmp (eq ne) (simplify (cmp:c (with_possible_nonzero_bits2 @0) (with_certain_nonzero_bits2 @1)) (if (wi::bit_and_not (wi::to_wide (@1), get_nonzero_bits (@0)) != 0) { constant_boolean_node (cmp == NE_EXPR, type); }))) simplifier, but even for that one I think they do not do a good job, they might actually pessimize stuff rather than optimize, but at least does not result in wrong-code, because the operands are solely tested with wi::to_wide or get_nonzero_bits, but not actually used in the simplification. The reason why it can pessimize stuff is say if we have # RANGE [irange] int ... MASK 0xb VALUE 0x0 x_1 = ...; # RANGE [irange] int ... MASK 0x8 VALUE 0x0 _2 = x_1 & 0xc; _3 = _2 == 2; then if it used just with_possible_nonzero_bits@0, @0 would have get_nonzero_bits (@0) 0x8 and (2 & ~8) != 0, so we can fold it into _3 = 0; But as it uses (with_possible_nonzero_bits2 @0), @0 is x_1 rather than _2 and get_nonzero_bits (@0) is unnecessarily conservative, 0xb rather than 0x8 and (2 & ~0xb) == 0, so we don't optimize. Now, with_possible_nonzero_bits2 can actually improve stuff as well in that pattern, if say value ranges aren't fully computed yet or the BIT_AND_EXPR assignment has been added later and the lhs doesn't have range computed yet, get_nonzero_range on the BIT_AND_EXPR lhs will be all bits set, while on the BIT_AND_EXPR operand might actually succeed. I believe better would be to either modify get_nonzero_bits so that it special cases the SSA_NAME with BIT_AND_EXPR def_stmt (but one level deep only like with_possible_nonzero_bits2, no recursion), in that case return bitwise and of get_nonzero_bits (non-recursive) for the lhs and both operands, and possibly BIT_AND_EXPR itself e.g. for GENERIC matching during by returning bitwise and of both operands. Then with_possible_nonzero_bits2 could be needed for the GENERIC case, perhaps have the second match #if GENERIC, but changed so that the @N operand always is the whole thing rather than its operand which is error-prone. Or add get_nonzero_bits wrapper with a different name which would do that. with_certain_nonzero_bits2 could be changed similarly, these days we can test known non-zero bits rather than possible non-zero bits on SSA_NAMEs too, we record both mask and value, so possible nonzero bits (aka. get_nonzero_bits) is mask () \| value (), while known nonzero bits is value () & ~mask (), with a new function (get_known_nonzero_bits or get_certain_nonzero_bits etc.) which handles that. Anyway, the following patch doesn't do what I wrote above just yet, for that single pattern it is just a missed optimization. But the with_possible_nonzero_bits2 uses in the 3 new simplifiers are just completely incorrect, because they don't just use the @0 operand in get_nonzero_bits (pessimizing stuff if value ranges are fully computed), but also use it in the replacement, then they act as if the BIT_AND_EXPR wasn't there at all. While we could use (with_possible_nonzero_bits2@3 @0) and use get_nonzero_bits (@0) and use @3 in the replacement, that would still often be a pessimization, so I've just used with_possible_nonzero_bits@0. 2024-11-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117420 match.pd ((X >> C1) << (C1 + C2) -> X << C2, (X >> C1) * (C2 << C1) -> X * C2, X / (1 << C) -> X /[ex] (1 << C)): Use with_possible_nonzero_bits@0 rather than (with_possible_nonzero_bits2 @0). * gcc.dg/torture/pr117420.c: New test.	2024-11-22 19:50:22 +01:00
Jakub Jelinek	44984f7f75	c-family: Yet another fix for _BitInt & __sync_* builtins [PR117641] Sorry, the last patch only partially fixed the __sync_* ICEs with _BitInt(128) on ia32. Even for !fetch we need to error out and return 0. I was afraid of APIs like __atomic_exchange/__atomic_compare_exchange, those obviously need to be supported even on _BitInt(128) on ia32, but they actually never sync_resolve_size, they are handled by adding the size argument and using the library version much earlier. For fetch && !orig_format (i.e. __atomic_fetch_* etc.) we need to return -1 so that we handle it with a manualy __atomic_load + __atomic_compare_exchange loop in the caller, all other cases should be rejected. 2024-11-22 Jakub Jelinek <jakub@redhat.com> PR c/117641 * c-common.cc (sync_resolve_size): For size 16 with _BitInt on targets where TImode isn't supported, use goto incompatible if !fetch. * gcc.dg/bitint-117.c: New test.	2024-11-22 19:47:52 +01:00
Andrew Pinski	cdd7171a6b	libsanitizer: Move language level from gnu++14 to gnu++17 While compiling libsanitizer for aarch64-linux-gnu, I noticed the new warning: ``` ../../../../libsanitizer/asan/asan_interceptors.cpp: In function ‘char* ___interceptor_strcpy(char, const char)’: ../../../../libsanitizer/asan/asan_interceptors.cpp:554:6: warning: ‘if constexpr’ only available with ‘-std=c++17’ or ‘-std=gnu++17’ [-Wc++17-extensions] 554 \| if constexpr (SANITIZER_APPLE) { \| ^~~~~~~~~ ``` So compile-rt upstream compiles this as gnu++17 (the current defualt for clang), so let's update it to be similar. Build and tested on aarch64-linux-gnu. PR sanitizer/117731 libsanitizer/ChangeLog: * asan/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * asan/Makefile.in: Regenerate. * hwasan/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * hwasan/Makefile.in: Regenerate. * interception/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * interception/Makefile.in: Regenerate. * libbacktrace/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * libbacktrace/Makefile.in (AM_CXXFLAGS): Regenerate. * lsan/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * lsan/Makefile.in: Regenerate. * sanitizer_common/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * sanitizer_common/Makefile.in: Regenerate. * tsan/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * tsan/Makefile.in: Regenerate. * ubsan/Makefile.am (AM_CXXFLAGS): Replace gnu++14 with gnu++17. * ubsan/Makefile.in: Regenerate. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-22 08:52:44 -08:00
Dimitar Dimitrov	eeff504238	testsuite: RISC-V: Fix vector flags handling [PR117603] The DejaGnu routine "riscv_get_arch" fails to infer the correct architecture string when GCC is built for RV32EC. This causes invalid architecture string to be produced by "add_options_for_riscv_v": xgcc: error: '-march=rv32cv': first ISA subset must be 'e', 'i' or 'g' Fix by adding the E base ISA variant to the list of possible architecture modifiers. Also, the V extension is added to the machine string without checking whether dependent extensions are available. This results in errors when GCC is built for RV32EC: Executing on host: .../xgcc ... -march=rv32ecv ... cc1: error: ILP32E ABI does not support the 'D' extension cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension Fix by disabling vector tests for RISC-V if V extension cannot be added to current architecture. Tested riscv32-none-elf for -march=rv32ec using GNU simulator. Most of the remaining failures are due to explicit addition of vector options, yet missing "dg-require-effective-target riscv_v_ok": === gcc Summary === # of expected passes 211958 # of unexpected failures 1826 # of expected failures 1059 # of unresolved testcases 5209 # of unsupported tests 15513 Ensured riscv64-unknown-linux-gnu tested with qemu has no new passing or failing tests, before and after applying this patch: Running target riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow ... === gcc Summary === # of expected passes 237209 # of unexpected failures 335 # of expected failures 1670 # of unresolved testcases 43 # of unsupported tests 16767 PR target/117603 gcc/testsuite/ChangeLog: * lib/target-supports.exp (riscv_get_arch): Add comment about function purpose. Add E ISA to list of possible modifiers. (check_vect_support_and_set_flags): Do not advertise vector support if V extension cannot be enabled. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>	2024-11-22 18:29:24 +02:00
Tobias Burnus	f34422e06c	OpenMP: Add 'interop' clause to 'dispatch' for C/C++ Will fail with an error if/as no suitable 'append_args' has been specified, given that 'append_args' is not yet implemented. gcc/c-family/ChangeLog: * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_INTEROP. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_interop): New. (c_parser_omp_clause_name, c_parser_omp_all_clauses, c_parser_omp_dispatch_body): Handle 'interop' clause. * c-typeck.cc (c_finish_omp_clauses): Likewise. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name, cp_parser_omp_all_clauses, cp_parser_omp_dispatch_body): Handle 'interop' clause. * pt.cc (tsubst_omp_clauses): Likewise. * semantics.cc (finish_omp_clauses): Likewise. gcc/ChangeLog: * gimplify.cc (gimplify_call_expr): Add initial support for dispatch's 'interop' clause. (gimplify_scan_omp_clauses): Handle interop clause. * tree-pretty-print.cc (dump_omp_clause): Likewise. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_INTEROP. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Add interop. gcc/testsuite/ChangeLog: * c-c++-common/gomp/dispatch-11.c: New test. * c-c++-common/gomp/dispatch-12.c: New test.	2024-11-22 16:15:17 +01:00
Tobias Burnus	8f0c8e577a	OpenMP: 'interop' construct - add C/C++ parser support, improve Fortran parsing Add middle end support for the 'interop' directive and the 'init', 'use', and 'destroy' clauses - but fail with a sorry, unimplemented in gimplify.cc. For Fortran, generate the tree code, update the internal representation, add some more diagnostic checks and update for newer specification changes ('fr' only takes a single value, but it integer expressions are permitted again [like with the old syntax] not only constant identifiers). For C and C++, this patch adds the full parser support for 'interop'. Still missing is actually handling the directive in the middle end and in libgomp. The GOMP_INTEROP_IFR_* internal values have been changed to have space for vendor specific values that are adjacent to the existing values but negative, if needed. gcc/c-family/ChangeLog: * c-common.h (enum c_omp_region_type): Add C_ORT_INTEROP and C_ORT_OMP_INTEROP. (c_omp_interop_t_p): New prototype. * c-omp.cc (c_omp_interop_t_p): Check whether the type is omp_interop_t. (c_omp_directives): Uncomment 'interop'. * c-pragma.cc (omp_pragmas): Add 'interop'. * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_INTEROP. (enum pragma_omp_clause): Add init, use, and destroy clauses. gcc/c/ChangeLog: * c-parser.cc (INCLUDE_STRING): Define. (c_parser_pragma): Handle 'interop' directive. (c_parser_omp_clause_name): Handle init, use, and destroy clauses. (c_parser_omp_all_clauses): Likewise; use C_ORT_OMP_INTEROP, if 'use' is permitted, for c_finish_omp_clauses. (c_parser_omp_clause_destroy, c_parser_omp_modifier_prefer_type, c_parser_omp_clause_init, c_parser_omp_clause_use, OMP_INTEROP_CLAUSE_MASK, c_parser_omp_interop): New. * c-typeck.cc (c_finish_omp_clauses): Add missing OPT_Wopenmp to a warning; handle new clauses. gcc/cp/ChangeLog: * parser.cc (INCLUDE_STRING): Define. (cp_parser_omp_clause_name): Handle init, use, and destroy clauses. (cp_parser_omp_all_clauses): Likewise; use C_ORT_OMP_INTEROP, if 'use' is permitted, for c_finish_omp_clauses. (cp_parser_omp_modifier_prefer_type, cp_parser_omp_clause_init, OMP_INTEROP_CLAUSE_MASK, cp_parser_omp_interop): New. (cp_parser_pragma): Handle 'interop' directive. * pt.cc (tsubst_omp_clauses): Handle init, use, and destroy clauses. (tsubst_stmt): Handle OMP_INTEROP. * semantics.cc (cp_omp_init_prefer_type_update): New. (finish_omp_clauses): Handle init, use, and destroy clauses and add clause check for 'depend' on 'interop'. gcc/fortran/ChangeLog: * gfortran.h (gfc_omp_namelist): Cleanup interop internal representation. * dump-parse-tree.cc (show_omp_namelist): Update for changed internal representation. * match.cc (gfc_free_omp_namelist): Likewise. * openmp.cc (gfc_match_omp_prefer_type, gfc_match_omp_init): Likewise; also handle some corner cases better and update for newer 6.0 changes related to 'fr'. (resolve_omp_clauses): Add type-check for interop variables. * trans-openmp.cc (gfc_trans_omp_clauses): Handle init, use and destroy clauses. (gfc_trans_openmp_interop): New. (gfc_trans_omp_directive): Call it. gcc/ChangeLog: * gimplify.cc (gimplify_expr): Handle OMP_INTEROP by printing "sorry, uninplemented". * omp-api.h (omp_get_fr_id_from_name): Change return type to 'char'. * omp-general.cc (omp_get_fr_id_from_name): Likewise; return GOMP_INTEROP_IFR_UNKNOWN not 0 if not found. (omp_get_name_from_fr_id): Return "<unknown>" not NULL if not found (used for dumps). * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DESTROY, OMP_CLAUSE_USE, and OMP_CLAUSE_INIT. * tree-pretty-print.cc (dump_omp_init_prefer_type): New. (dump_omp_clause): Handle init, use and destroy clauses. (dump_generic_node): Handle interop directive. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Add new init/use/destroy clauses. * tree.def (OACC_LOOP): Fix comment. (OMP_INTEROP): Add. * tree.h (OMP_INTEROP_CLAUSES, OMP_CLAUSE_INIT_TARGET, OMP_CLAUSE_INIT_TARGETSYNC, OMP_CLAUSE_INIT_PREFER_TYPE): New. include/ChangeLog: * gomp-constants.h (GOMP_INTEROP_IFR_NONE): Rename ... (GOMP_INTEROP_IFR_UNKNOWN): ... to this. And change value. (GOMP_INTEROP_IFR_SEPARATOR): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/interop-1.f90: Update for parser changes, spec changes and add new tests. * gfortran.dg/gomp/interop-2.f90: Likewise. * gfortran.dg/gomp/interop-3.f90: Likewise. * c-c++-common/gomp/interop-1.c: New test. * c-c++-common/gomp/interop-2.c: New test. * c-c++-common/gomp/interop-3.c: New test. * c-c++-common/gomp/interop-4.c: New test. * g++.dg/gomp/interop-5.C: New test. * gfortran.dg/gomp/interop-4.f90: New test.	2024-11-22 15:30:53 +01:00
Evgeny Karpov	8d7f2d53c8	MAINTAINERS: Add myself to write after approval ChangeLog: * MAINTAINERS: Add myself to write after approval.	2024-11-22 13:37:34 +01:00
Jakub Jelinek	d6d1fdcf95	i386: Make __builtin_ia32_f{nstenv,ldenv,nstsw,fnclex} builtins internal [PR117165] As the comment says, these builtins are meant to be internal for the atomic support and cause various ICEs when using them directly in various conditions. So the following patch makes them internal. We do have also internal-fn., but those target specific builtins would need to be there in generic code, so I've just added space to their name, which is the old way to hide builtins/attributes etc. 2024-11-22 Jakub Jelinek <jakub@redhat.com> PR target/117165 config/i386/i386-builtin.def (IX86_BUILTIN_FNSTENV, IX86_BUILTIN_FLDENV, IX86_BUILTIN_FNSTSW, IX86_BUILTIN_FNCLEX): Add space to the end of the builtin name to make it really internal. * gcc.target/i386/pr117165.c: New test.	2024-11-22 11:33:34 +01:00
Jakub Jelinek	77f4b1097e	testsuite: Fix up vector-{8,9,10}.c tests On Thu, Nov 21, 2024 at 01:30:39PM +0100, Christoph Müllner wrote: > > > * gcc.dg/tree-ssa/satd-hadamard.c: New test. > > > * gcc.dg/tree-ssa/vector-10.c: New test. > > > * gcc.dg/tree-ssa/vector-8.c: New test. > > > * gcc.dg/tree-ssa/vector-9.c: New test. I see FAILs on i686-linux or on x86_64-linux (in the latter with -m32 testing). One problem is that vector-10.c doesn't use -Wno-psabi option and uses a function which returns a vector and takes vector as first parameter, the other problems are that 3 other tests don't arrange for at least basic vector ISA support, plus non-standardly test only on x86_64--, while normally one would allow both i?86-- x86_64-- and if it is e.g. specific to 64-bit, also check for lp64 or int128 or whatever else is needed. E.g. Solaris I think has i?86-- triplet even for 64-bit code, etc. The following patch fixes these. 2024-11-22 Jakub Jelinek <jakub@redhat.com> * gcc.dg/tree-ssa/satd-hadamard.c: Add -msse2 as dg-additional-options on x86. Also scan-tree-dump on i?86--. * gcc.dg/tree-ssa/vector-8.c: Likewise. * gcc.dg/tree-ssa/vector-9.c: Likewise. * gcc.dg/tree-ssa/vector-10.c: Add -Wno-psabi to dg-additional-options.	2024-11-22 10:02:59 +01:00
Tamar Christina	a9473f9c6f	middle-end:For multiplication try swapping operands when matching complex multiply [PR116463] This commit fixes the failures of complex.exp=fast-math-complex-mls-.c on the GCC 14 branch and some of the ones on the master. The current matching just looks for one order for multiplication and was relying on canonicalization to always give the right order because of the TWO_OPERANDS. However when it comes to the multiplication trying only one order is a bit fragile as they can be flipped. The failing tests on the branch are: void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N], _Complex TYPE c[restrict N]) { for (int i = 0; i < N; i++) c[i] -= a[i] (b[i] * I * I); } void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N], _Complex TYPE c[restrict N]) { for (int i = 0; i < N; i++) c[i] -= (a[i] * I * I) * b[i]; } The issue is just a small difference in commutative operations. we look for {R,R} * {R,I} but found {R,I} * {R,R}. Since the DF analysis is cached, we should be able to swap operands and retry for multiply cheaply. There is a constraint being checked by vect_validate_multiplication for the data flow of the operands feeding the multiplications. So e.g. between the nodes: note: node 0x4d1d210 (max_nunits=2, refcnt=3) vector(2) double note: op template: _27 = _10 * _25; note: stmt 0 _27 = _10 * _25; note: stmt 1 _29 = _11 * _25; note: node 0x4d1d060 (max_nunits=2, refcnt=2) vector(2) double note: op template: _26 = _11 * _24; note: stmt 0 _26 = _11 * _24; note: stmt 1 _28 = _10 * _24; we require the lanes to come from the same source which vect_validate_multiplication checks. As such it doesn't make sense to flip them individually because that would invalidate the earlier linear_loads_p checks which have validated that the arguments all come from the same datarefs. This patch thus flips the operands in unison to still maintain this invariant, but also honor the commutative nature of multiplication. gcc/ChangeLog: PR tree-optimization/116463 * tree-vect-slp-patterns.cc (complex_mul_pattern::matches, complex_fms_pattern::matches): Try swapping operands on multiply.	2024-11-22 08:05:54 +00:00
Lulu Cheng	9286411658	LoongArch: Modify the document to remove options that don't exist. gcc/ChangeLog: * doc/invoke.texi: Remove the non-existent option '-msmall-data-limit' and add a description of '-G'.	2024-11-22 15:23:30 +08:00
Lulu Cheng	a3a375b2d1	LoongArch: Remove redundant code. TARGET_ASM_ALIGNED_{HI,SI,QI}_OP are defined repeatedly and deleted. gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (loongarch_builtin_vectorized_function): Delete. (LARCH_GET_BUILTIN): Delete. * config/loongarch/loongarch-protos.h (loongarch_builtin_vectorized_function): Delete. * config/loongarch/loongarch.cc (TARGET_ASM_ALIGNED_HI_OP): Delete. (TARGET_ASM_ALIGNED_SI_OP): Delete. (TARGET_ASM_ALIGNED_DI_OP): Delete.	2024-11-22 15:23:26 +08:00
Haochen Jiang	45135f9d5f	i386/testsuite: Enhance AVX10.2 vmovd/w testcases Under -fno-omit-frame-pointer, %ebp will be used, which is the Solaris/x86 default. Both check %ebp and %esp to avoid error on that. gcc/testsuite/ChangeLog: PR target/117697 * gcc.target/i386/avx10_2-vmovd-1.c: Both check %esp and %ebp. * gcc.target/i386/avx10_2-vmovw-1.c: Ditto.	2024-11-22 10:47:23 +08:00
Lulu Cheng	f0cb64fb3f	LoongArch: Fix clerical errors in lasx_xvreplgr2vr_* and lsx_vreplgr2vr_. [x]vldi.{b/h/w/d} is not implemented in LoongArch. Use the macro [x]vrepli.{b/h/w/d} to replace. gcc/ChangeLog: config/loongarch/lasx.md: Fixed. * config/loongarch/lsx.md: Fixed.	2024-11-22 09:53:45 +08:00
Xi Ruoyao	ae7e25662f	LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned Align them with other vector bitwise builtins. This may break programs directly invoking __builtin_lsx_vorn_v or __builtin_lasx_xvorn_v, but doing so is not supported (as builtins are not documented, only intrinsics are documented and users should use them instead). gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (vorn_v, xvorn_v): Use unsigned vector modes. * config/loongarch/lsxintrin.h (__lsx_vorn_v): Cast arguments to v16u8. * config/loongarch/lasxintrin.h (__lasx_xvorn_v): Cast arguments to v32u8. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lsx/lsx-builtin.c (__lsx_vorn_v): Change arguments and return value to v16u8. * gcc.target/loongarch/vector/lasx/lasx-builtin.c (__lasx_xvorn_v): Change arguments and return value to v32u8.	2024-11-22 09:42:44 +08:00
GCC Administrator	8500a8c32b	Daily bump.	2024-11-22 00:20:01 +00:00
Jeff Law	9b7917b34f	[RISC-V][PR target/117690] Add missing shift in constant synthesis As hinted out in the BZ, we were missing a left shift in the constant synthesis in the case where the upper 32 bits can be synthesized using a shNadd of the low 32 bits. This adjusts the synthesis to add the missing left shift and adjusts the cost to account for the additional instruction. Regression tested on riscv64-elf in my tester. Waiting for the pre-commit tester before moving forward. PR target/117690 gcc/ * config/riscv/riscv.cc (riscv_build_integer): Add missing left shift when using shNadd to derive upper 32 bits from lower 32 bits. gcc/testsuite * gcc.target/riscv/pr117690.c: New test. * gcc.target/riscv/synthesis-13.c: Adjust expected output.	2024-11-21 16:21:07 -07:00
Arsen Arsenović	ffeee625c5	doc/cpp: Document __has_include_next While hacking on an unrelated change, I noticed that __has_include_next hasn't been documented at all. This patch adds it to the __has_include manual node. gcc/ChangeLog: * doc/cpp.texi (__has_include): Document __has_include_next also. (Conditional Syntax): Mention __has_include_next in the description for the __has_include menu entry.	2024-11-21 23:48:49 +01:00
Joseph Myers	338d687e2a	c: Give errors more consistently for void parameters [PR114816] Cases of void parameters, other than a parameter list of (void) (or equivalent with a typedef for void) in its entirety, have been made a constraint violation in C2Y (N3344 alternative 1 was adopted), as part of a series of changes to eliminate unnecessary undefined behavior by turning it into constraint violations, implementation-defined behavior or something else with stricter bounds on what behavior is allowed. Previously, these were implicitly undefined behavior (see DR#295), with only some cases listed in Annex J as undefined (but even those cases not having wording in the normative text to make them explicitly undefined). As discussed in bug 114816, GCC is not entirely consistent about diagnosing such usages; unnamed void parameters get errors when not the entire parameter list, while qualified and register void (the cases listed in Annex J) get errors as a single unnamed parameter, but named void parameters are accepted with a warning (in a declaration that's not a definition; it's not possible to define a function with incomplete parameter types). Following C2Y, make all these cases into errors. The errors are not conditional on the standard version, given that this was previously implicit undefined behavior. Since it wasn't possible anyway to define such functions, only declare them without defining them (or otherwise use such parameters in function type names that can't correspond to any defined function), hopefully the risks of compatibility issues are small. Bootstrapped with no regressions for x86-64-pc-linux-gnu. PR c/114816 gcc/c/ * c-decl.cc (grokparms): Do not warn for void parameter type here. (get_parm_info): Give errors for void parameters even when named. gcc/testsuite/ * gcc.dg/c2y-void-parm-1.c: New test. * gcc.dg/noncompile/920616-2.c, gcc.dg/noncompile/921116-1.c, gcc.dg/parm-incomplete-1.c: Update expected diagnostics.	2024-11-21 21:46:00 +00:00
David Malcolm	4574f15bb3	json parsing: avoid relying on floating point equality [PR117677] gcc/ChangeLog: PR bootstrap/117677 * json-parsing.cc (selftest::test_parse_number): Replace ASSERT_EQ of 'double' values with ASSERT_NEAR. Eliminate ASSERT_PRINT_EQ for such values. * selftest.h (ASSERT_NEAR): New. (ASSERT_NEAR_AT): New. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2024-11-21 14:36:23 -05:00
David Malcolm	b599498e18	testsuite: add print-stack.exp I wrote this support file to help me debug Tcl issues in the testsuite. Adding a call to: print_stack_backtrace somewhere in a .exp file (along with "load_lib print-stack.exp") leads to the interpreter printing a backtrace in a form that e.g. Emacs can consume, with filename:linenum: lines, and quoting the line of .exp source code. Fer example, adding a print_stack_backtrace to scansarif.exp in run-sarif-pytest I get this output: VVV START OF BACKTRACE VVV /home/david/coding/gcc-newgit/src/gcc/testsuite/lib/scansarif.exp:142: frame 16 in proc print_stack_backtrace 142 \| print_stack_backtrace <proc>: frame 15 in proc run-sarif-pytest <eval>: frame 14 in proc dg-final-proc /usr/share/dejagnu/dg.exp:851: frame 13 in proc dg-final-proc 851 \| if {[catch "dg-final-proc $prog" errmsg]} { <eval>: frame 12 in proc saved-dg-test /home/david/coding/gcc-newgit/src/gcc/testsuite/lib/gcc-dg.exp:1080: frame 11 in proc saved-dg-test 1080 \| if { [ catch { eval saved-dg-test $args } errmsg ] } { /usr/share/dejagnu/dg.exp:559: frame 10 in proc dg-test 559 \| dg-test $testcase $options ${default-extra-options} /home/david/coding/gcc-newgit/src/gcc/testsuite/gcc.dg/sarif-output/sarif-output.exp:28: frame 9 28 \| dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/.c]] "" "" <eval>: frame 8 <eval>: frame 7 /usr/share/dejagnu/runtest.exp:1460: frame 6 1460 \| if { [catch "uplevel #0 source $test_file_name"] == 1 } { /usr/share/dejagnu/runtest.exp:1886: frame 5 in proc dg-runtest 1886 \| runtest $test_name /usr/share/dejagnu/runtest.exp:1845: frame 4 in proc dg-runtest 1845 \| foreach test_name [lsort [find ${dir} .exp]] { /usr/share/dejagnu/runtest.exp:1788: frame 3 in proc dg-runtest 1788 \| foreach dir "${test_top_dirs}" { /usr/share/dejagnu/runtest.exp:1669: frame 2 in proc dg-runtest 1669 \| foreach pass $multipass { /usr/share/dejagnu/runtest.exp:1619: frame 1 in proc dg-runtest 1619 \| foreach current_target $target_list { ^^^ END OF BACKTRACE ^^^ and can click on the lines in Emacs's compilation buffer to take me to the relevant places. I found this made it much easier to debug my .exp files. That said, I'm uncomfortable with Tcl, and so (a) there may be a better way of doing this (b) I may have made mistakes gcc/testsuite/ChangeLog: * lib/print-stack.exp: New file. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2024-11-21 14:36:16 -05:00
Christoph Müllner	ae0d842f3e	testsuite: tree-ssa: Limit targets for vec perm tests Recently added test cases assume optimized code generation for certain vectorized code. However, this optimization might not be applied if the backends don't support the optimized permuation. The tests are confirmed to work on aarch64 and x86-64, so this patch restricts the tests accordingly. Tested on x86-64. PR117728 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/satd-hadamard.c: Restrict to aarch64 and x86-64. * gcc.dg/tree-ssa/vector-8.c: Likewise. * gcc.dg/tree-ssa/vector-9.c: Likewise. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>	2024-11-21 20:18:50 +01:00
Jason Merrill	819f67a2f6	c++: inline variables and modules We weren't writing out the definition of an inline variable, so the importer either got an undefined symbol or 0. gcc/cp/ChangeLog: * module.cc (has_definition): Also true for inline vars. gcc/testsuite/ChangeLog: * g++.dg/modules/inline-1_a.C: New test. * g++.dg/modules/inline-1_b.C: New test.	2024-11-21 19:37:23 +01:00
Jason Merrill	74498be0e6	c++: modules and debug marker stmts 21_strings/basic_string/operations/contains/nonnull.cc was failing because the module was built with debug markers and the testcase was built not expecting debug markers, so we crashed in lower_stmt. Let's accommodate this by discarding debug marker statements we don't want. gcc/cp/ChangeLog: * module.cc (trees_in::core_vals) [STATEMENT_LIST]: Skip DEBUG_BEGIN_STMT if !MAY_HAVE_DEBUG_MARKER_STMTS.	2024-11-21 19:37:23 +01:00
Jason Merrill	03c7145a41	c++: modules and tsubst_friend_class In 20_util/function_objects/mem_fn/constexpr.cc we start to instantiate _Mem_fn_base's friend declaration of _Bind_check_arity before we've loaded the namespace-scope declaration, so lookup_imported_hidden_friend doesn't find it. But then we load the namespace-scope declaration in lookup_template_class during substitution, and so when we get around to pushing the result of substitution, they conflict. Fixed by calling lazy_load_pendings in lookup_imported_hidden_friend. gcc/cp/ChangeLog: * name-lookup.cc (lookup_imported_hidden_friend): Call lazy_load_pendings.	2024-11-21 19:37:23 +01:00
Georg-Johann Lay	873cffc792	AVR: target/117726 - Better optimizations of ASHIFT:SI insns. This patch improves the 4-byte ASHIFT insns. 1) It adds a "r,r,C15" alternative for improved long << 15. 2) It adds 3-operand alternatives (depending on options) and splits them after peephole2 / before avr-fuse-move into a 3-operand byte shift and a 2-operand residual bit shift. For better control, it introduces new option -msplit-bit-shift that's activated at -O2 and higher per default. 2) is even performed with -Os, but not with -Oz. PR target/117726 gcc/ * config/avr/avr.opt (-msplit-bit-shift): Add new optimization option. * common/config/avr/avr-common.cc (avr_option_optimization_table) [OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift. * config/avr/avr.h (machine_function.n_avr_fuse_add_executed): New bool component. * config/avr/avr.md (attr "isa") <2op, 3op>: Add new values. (attr "enabled"): Handle them. (ashlsi3, ashlsi3, ashlsi3_const): Add "r,r,C15" alternative. Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op). (define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters. (define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch won't overlap with the output operand of the matched insn. (ashl<mode>3_const_split): Remove unused ashift:ALL4 splitter. config/avr/avr-passes.cc (emit_valid_insn) (emit_valid_move_clobbercc): Move out of anonymous namespace. (make_avr_pass_fuse_add) <gate>: Don't override. <execute>: Set n_avr_fuse_add_executed according to func->machine->n_avr_fuse_add_executed. (pass_data avr_pass_data_split_after_peephole2): New object. (avr_pass_split_after_peephole2): New rtl_opt_pass. (avr_emit_shift): New static function. (avr_shift_is_3op, avr_split_shift_p, avr_split_shift) (make_avr_pass_split_after_peephole2): New functions. * config/avr/avr-passes.def (avr_pass_split_after_peephole2): Insert new pass after pass_peephole2. * config/avr/avr-protos.h (n_avr_fuse_add_executed, avr_shift_is_3op, avr_split_shift_p) (avr_split_shift, avr_optimize_size_level) (make_avr_pass_split_after_peephole2): New prototypes. * config/avr/avr.cc (n_avr_fuse_add_executed): New global variable. (avr_optimize_size_level): New function. (avr_set_current_function): Set n_avr_fuse_add_executed according to cfun->machine->n_avr_fuse_add_executed. (ashlsi3_out) [case 15]: Output optimized code for this offset. (avr_rtx_costs_1) [ASHIFT, SImode]: Adjust costs of oggsets 15, 16. * config/avr/constraints.md (C4a, C4r, C4r): New constraints. * pass_manager.h (pass_manager): Adjust comments.	2024-11-21 17:59:38 +01:00
Georg-Johann Lay	938094abec	AVR: Fix a nit in avr-passes.cc::absint_t.dump(). gcc/ * config/avr/avr-passes.cc (absint_t::dump): Fix missing newline in dump.	2024-11-21 17:56:34 +01:00
Jeff Law	41fb3a5669	[RISC-V][PR target/116590] Avoid emitting multiple instructions from fmacc patterns So much like my patch from last week, this removes alternatives that create multiple instructions that we really should have never needed. In this case it fixes one of two bugs in pr116590. In particular we don't want vmvNr instructions for thead-vector. Those instructions were emitted as part of those two instruction sequences. I've tested this in my tester and assuming the pre-commit tester is happy, I'll push it to the trunk. PR target/116590 gcc * config/riscv/vector.md (pred_mul_<optab>mode_undef): Drop unnecessary alternatives. (pred_<madd_msub><mode>): Likewise. (pred_<macc_msac><mode>): Likewise. (pred_<madd_msub><mode>_scalar): Likewise. (pred_<macc_msac><mode>_scalar): Likewise. (pred_mul_neg_<optab><mode>_undef): Likewise. (pred_<nmsub_nmadd><mode>): Likewise. (pred_<nmsac_nmacc><mode>): Likewise. (pred_<nmsub_nmadd><mode>_scalar): Likewise. (pred_<nmsac_nmacc><mode>_scalar): Likewise. gcc/testsuite * gcc.target/riscv/pr116590.c: New test.	2024-11-21 08:24:10 -07:00
Pan Li	fbca864a7b	Match: Refactor the unsigned SAT_ADD match pattern [NFC] This patch would like to refactor the unsigned SAT_ADD pattern by: * Extract type check outside. * Extract common sub pattern. * Re-arrange the related match pattern forms together. * Remove unnecessary helper pattern matches. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Refactor sorts of unsigned SAT_ADD match pattern. Signed-off-by: Pan Li <pan2.li@intel.com> Signed-off-by: Pan Li <pan2.li@intel.com>	2024-11-21 22:15:50 +08:00
Tamar Christina	dbc38dd9e9	middle-end: Pass along SLP node when costing vector loads/stores With the support to SLP only we now pass the VMAT through the SLP node, however the majority of the costing calls inside vectorizable_load and vectorizable_store do no pass the SLP node along. Due to this the backend costing never sees the VMAT for these cases anymore. Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo are passed would only pass the SLP node along. However the SLP node doesn't contain all the info available in the stmt_vinfo and we'd have to go through the SLP_TREE_REPRESENTATIVE anyway. As such I changed the function to just Always pass both along. Unlike the VMAT changes, I don't believe there to be a correctness issue here but would minimize the number of churn in the backend costing until vectorizer costing as a whole is revisited in GCC 16. These changes re-enable the cost model on AArch64 and also correctly find the VMATs on loads and stores fixing testcases such as sve_iters_low_2.c. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP node. * tree-vect-stmts.cc (record_stmt_cost): Expose. (vect_get_store_cost, vect_get_load_cost): Extend with SLP node. (vectorizable_store, vectorizable_load): Pass SLP node to all costing. * tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and stmt_vinfo to costing. (vect_get_load_cost, vect_get_store_cost): Extend with SLP node.	2024-11-21 12:49:35 +00:00
Rainer Orth	116b1c5489	Use decl size in Solaris ASM_DECLARE_OBJECT_NAME [PR102296] Solaris has modified versions of ASM_DECLARE_OBJECT_NAME on both i386 and sparc. When commit `ce597aedd7` Author: Ilya Enkovich <ilya.enkovich@intel.com> Date: Thu Aug 7 08:04:55 2014 +0000 elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size. was applied, those were missed. At the same time, the testcase was restricted to Linux though there's nothing Linux-specific in there, so the error remained undetected. This patch fixes the definitions to match elfos.h and enables the test on Solaris, too. Bootstrapped without regressions on i386-pc-solaris2.11 and sparc-sun-solaris2.11. 2024-11-19 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: PR target/102296 * gcc.target/i386/struct-size.c: Enable on --solaris. gcc: PR target/102296 config/i386/sol2.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size. * config/sparc/sol2.h (ASM_DECLARE_OBJECT_NAME): Likewise.	2024-11-21 13:41:19 +01:00
Christoph Müllner	1c4d39ada3	forwprop: Try to blend two isomorphic VEC_PERM sequences This extends forwprop by yet another VEC_PERM optimization: It attempts to blend two isomorphic vector sequences by using the redundancy in the lane utilization in these sequences. This redundancy in lane utilization comes from the way how specific scalar statements end up vectorized: two VEC_PERMs on top, binary operations on both of them, and a final VEC_PERM to create the result. Here is an example of this sequence: v_in = {e0, e1, e2, e3} v_1 = VEC_PERM <v_in, v_in, {0, 2, 0, 2}> // v_1 = {e0, e2, e0, e2} v_2 = VEC_PERM <v_in, v_in, {1, 3, 1, 3}> // v_2 = {e1, e3, e1, e3} v_x = v_1 + v_2 // v_x = {e0+e1, e2+e3, e0+e1, e2+e3} v_y = v_1 - v_2 // v_y = {e0-e1, e2-e3, e0-e1, e2-e3} v_out = VEC_PERM <v_x, v_y, {0, 1, 6, 7}> // v_out = {e0+e1, e2+e3, e0-e1, e2-e3} To remove the redundancy, lanes 2 and 3 can be freed, which allows to change the last statement into: v_out' = VEC_PERM <v_x, v_y, {0, 1, 4, 5}> // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3} The cost of eliminating the redundancy in the lane utilization is that lowering the VEC PERM expression could get more expensive because of tighter packing of the lanes. Therefore this optimization is not done alone, but in only in case we identify two such sequences that can be blended. Once all candidate sequences have been identified, we try to blend them, so that we can use the freed lanes for the second sequence. On success we convert 2x (2x BINOP + 1x VEC_PERM) to 2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP. The implemented transformation reuses (rewrites) the statements of the first sequence and the last VEC_PERM of the second sequence. The remaining four statements of the second statment are left untouched and will be eliminated by DCE later. This targets x264_pixel_satd_8x4, which calculates the sum of absolute transformed differences (SATD) using Hadamard transformation. We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7% speedup on an AArch64 machine. Bootstrapped and reg-tested on x86-64 and AArch64 (all languages). gcc/ChangeLog: * tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data structure to store analysis results of a vec perm simplify sequence. (get_vect_selector_index_map): Helper to get an index map from the provided vector permute selector. (recognise_vec_perm_simplify_seq): Helper to recognise a vec perm simplify sequence. (narrow_vec_perm_simplify_seq): Helper to pack the lanes more tight. (can_blend_vec_perm_simplify_seqs_p): Test if two vec perm sequences can be blended. (calc_perm_vec_perm_simplify_seqs): Helper to calculate the new permutation indices. (blend_vec_perm_simplify_seqs): Helper to blend two vec perm simplify sequences. (process_vec_perm_simplify_seq_list): Helper to process a list of vec perm simplify sequences. (append_vec_perm_simplify_seq_list): Helper to add a vec perm simplify sequence to the list. (pass_forwprop::execute): Integrate new functionality. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/satd-hadamard.c: New test. * gcc.dg/tree-ssa/vector-10.c: New test. * gcc.dg/tree-ssa/vector-8.c: New test. * gcc.dg/tree-ssa/vector-9.c: New test. * gcc.target/aarch64/sve/satd-hadamard.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>	2024-11-21 13:38:54 +01:00
H.J. Lu	42a8005c63	apx-ndd-tls-1[ab].c: Add -std=gnu17 Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c to avoid: gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’: gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to function ‘l’ gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here * gcc.target/i386/apx-ndd-tls-1a.c: -std=gnu17. * gcc.target/i386/apx-ndd-tls-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-11-21 19:12:27 +08:00
Rainer Orth	0f7def8549	libgomp: testsuite: Fix libgomp.c/alloc-pinned-3.c etc. for C23 on non-Linux Since the switch to a C23 default, three libgomp tests FAIL on Solaris: FAIL: libgomp.c/alloc-pinned-3.c (test for excess errors) UNRESOLVED: libgomp.c/alloc-pinned-3.c compilation failed to produce executable FAIL: libgomp.c/alloc-pinned-4.c (test for excess errors) UNRESOLVED: libgomp.c/alloc-pinned-4.c compilation failed to produce executable FAIL: libgomp.c/alloc-pinned-6.c (test for excess errors) UNRESOLVED: libgomp.c/alloc-pinned-6.c compilation failed to produce executable Excess errors: /vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.c/alloc-pinned-3.c:104:3: error: too many arguments to function 'set_pin_limit' Fixed by adding the missing size argument to the stub functions. Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11. 2024-11-20 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> libgomp: * testsuite/libgomp.c/alloc-pinned-3.c [!__linux__] (set_pin_limit): Add size arg. * testsuite/libgomp.c/alloc-pinned-4.c [!__linux__] (set_pin_limit): Likewise. * testsuite/libgomp.c/alloc-pinned-6.c [!__linux__] (set_pin_limit): Likewise.	2024-11-21 11:46:36 +01:00
Jakub Jelinek	806563f11e	include: Add new post-DWARF 5 DW_LANG_* enumerators DWARF changed the language code assignment to be on a web page and after DWARF 5 has been published already 27 codes have been assigned. We have some of those already in the header, but most of them were missing, including one added just yesterday (DW_LANG_C23). Note, this is really post-DWARF 5 stuff rather than DWARF 6, because DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version} pair where we'll say DW_LNAME_C with 202311 version instead of this. 2024-11-21 Jakub Jelinek <jakub@redhat.com> * dwarf2.h (enum dwarf_source_language): Add comment where the post DWARF 5 additions start. Refresh list from https://dwarfstd.org/languages.html.	2024-11-21 10:17:03 +01:00
Richard Biener	7e9b0d90d3	tree-optimization/117720 - check alignment for VMAT_STRIDED_SLP While vectorizable_store was already checking alignment requirement of the stores and fall back to elementwise accesses if not honored the vectorizable_load path wasn't doing this. After the previous change to disregard alignment checking for VMAT_STRIDED_SLP in get_group_load_store_type this now tripped on power. PR tree-optimization/117720 * tree-vect-stmts.cc (vectorizable_load): For VMAT_STRIDED_SLP verify the choosen load type is OK with regard to alignment.	2024-11-21 10:04:55 +01:00
Jakub Jelinek	ab8d3606bb	c-family, docs: Adjust descriptions/documentation for C23 publication As C23 has been published already https://www.iso.org/standard/82075.html we don't need to say that it is expected to be published etc. Furthermore, standards.texi was still documenting that -std=gnu17 is the default. 2024-11-21 Jakub Jelinek <jakub@redhat.com> gcc/ * doc/invoke.texi (-std=c23): Adjust documentation for publication of the ISO/IEC 9899:2024 standard. * doc/standards.texi: Likewise. Document -std=gnu17 and -std=gnu23 options. Mention that -std=gnu23 rather than -std=gnu17 is now the default for C. gcc/c-family/ * c.opt (std=c23, std=gnu23, std=iso9899:2024): Adjust description for publication of the ISO/IEC 9899:2024 standard.	2024-11-21 09:40:37 +01:00
Jakub Jelinek	05ab9447fe	phiopt: Improve spaceship_replacement for HONOR_NANS [PR117612] The following patch optimizes spaceship followed by comparisons of the spaceship value even for floating point spaceship when NaNs can appear. operator<=> for this emits roughly signed char c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2; and I believe the /* The optimization may be unsafe due to NaNs. / comment just isn't true. Sure, the i == j comparison doesn't raise exceptions on qNaNs, but if one of the operands is qNaN, then i == j is false and i < j or i > j is then executed and raises exceptions even on qNaNs. And we can safely optimize say c == -1 comparison after the above into i < j, that also raises exceptions like before and handles NaNs the same way as the original. The only unsafe transormation would be c == 0 or c != 0, turning it into i == j or i != j wouldn't raise exception, so I'm not doing that optimization (but other parts of the compiler optimize the i < j comparison away anyway). Anyway, to match the HONOR_NANS case, we need to verify that the second comparison has true edge to the phi_bb (yielding there -1 or 1), it can't be the false edge because when NaNs are honored, the false edge is for both the case where the inverted comparison is true or when one of the operands is NaN. Similarly we need to ensure that the two non-equality comparisons are the opposite, while for -ffast-math we can in some cases get one comparison x >= 5.0 and the other x > 5.0 and it is fine, because NaN is UB, when NaNs are honored, they must be different to leave the unordered case with 2 value as the last one remaining. The patch also punts if HONOR_NANS and the phi has just 3 arguments instead of 4. When NaNs are honored, we also in some cases need to perform some comparison and then invert its result (so that exceptions are properly thrown and we get the correct result). 2024-11-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94589 PR tree-optimization/117612 tree-ssa-phiopt.cc (spaceship_replacement): Handle HONOR_NANS (TREE_TYPE (lhs1)) case when possible. * gcc.dg/pr94589-5.c: New test. * gcc.dg/pr94589-6.c: New test. * g++.dg/opt/pr94589-5.C: New test. * g++.dg/opt/pr94589-6.C: New test.	2024-11-21 09:39:06 +01:00
Jakub Jelinek	ca7430f145	phiopt: Fix a pasto in spaceship_replacement [PR117612] When working on the PR117612 fix, I've noticed a pasto in tree-ssa-phiopt.cc (spaceship_replacement). The code is if (absu_hwi (tree_to_shwi (arg2)) != 1) return false; if (e1->flags & EDGE_TRUE_VALUE) { if (tree_to_shwi (arg0) != 2 \|\| absu_hwi (tree_to_shwi (arg1)) != 1 \|\| wi::to_widest (arg1) == wi::to_widest (arg2)) return false; } else if (tree_to_shwi (arg1) != 2 \|\| absu_hwi (tree_to_shwi (arg0)) != 1 \|\| wi::to_widest (arg0) == wi::to_widest (arg1)) return false; where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1, otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1. But due to pasto in the latte case doesn't verify that arg0 is different from arg2, it could be both -1 or both 1 and we wouldn't punt. The wi::to_widest (arg0) == wi::to_widest (arg1) test is always false when we've made sure in the earlier conditions that arg1 is 2 and arg0 is -1 or 1, so never 2. 2024-11-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94589 PR tree-optimization/117612 * tree-ssa-phiopt.cc (spaceship_replacement): Fix up a pasto in check when arg1 is 2.	2024-11-21 09:38:01 +01:00
Jakub Jelinek	7272e09c9b	c: Add u{,l,ll,imax}abs builtins [PR117024] The following patch adds u{,l,ll,imax}abs builtins, which just fold to ABSU_EXPR, similarly to how {,l,ll,imax}abs builtins fold to ABS_EXPR. 2024-11-21 Jakub Jelinek <jakub@redhat.com> PR c/117024 gcc/ * coretypes.h (enum function_class): Add function_c2y_misc enumerator. * builtin-types.def (BT_FN_UINTMAX_INTMAX, BT_FN_ULONG_LONG, BT_FN_ULONGLONG_LONGLONG): New DEF_FUNCTION_TYPE_1s. * builtins.def (DEF_C2Y_BUILTIN): Define. (BUILT_IN_UABS, BUILT_IN_UIMAXABS, BUILT_IN_ULABS, BUILT_IN_ULLABS): New builtins. * builtins.cc (fold_builtin_abs): Handle also folding of uabs to ABSU_EXPR. (fold_builtin_1): Handle BUILT_IN_U{,L,LL,IMAX}ABS. gcc/lto/ChangeLog: lto-lang.cc (flag_isoc2y): New variable. gcc/ada/ChangeLog: * gcc-interface/utils.cc (flag_isoc2y): New variable. gcc/testsuite/ * gcc.c-torture/execute/builtins/lib/abs.c (uintmax_t): New typedef. (uabs, ulabs, ullabs, uimaxabs): New functions. * gcc.c-torture/execute/builtins/uabs-1.c: New test. * gcc.c-torture/execute/builtins/uabs-1.x: New file. * gcc.c-torture/execute/builtins/uabs-1-lib.c: New file. * gcc.c-torture/execute/builtins/uabs-2.c: New test. * gcc.c-torture/execute/builtins/uabs-2.x: New file. * gcc.c-torture/execute/builtins/uabs-2-lib.c: New file. * gcc.c-torture/execute/builtins/uabs-3.c: New test. * gcc.c-torture/execute/builtins/uabs-3.x: New test. * gcc.c-torture/execute/builtins/uabs-3-lib.c: New test.	2024-11-21 09:34:28 +01:00
Kewen Lin	10e702789e	rs6000: Adjust FLOAT128 signbit2 expander for P8 LE [PR114567] As the associated test case shows, signbit generated assembly is sub-optimal for _Float128 argument from memory on P8 LE. On P8 LE, p8swap pass puts an explicit AND -16 on the memory, which causes mode_dependent_address_p considers it's invalid to change its mode and combine fails to make use of the existing pattern signbit<SIGNBIT:mode>2_dm_mem. Considering it's always more efficient to make use of 8 bytes load and shift on P8 LE, this patch is to adjust the current expander and treat it specially. PR target/114567 gcc/ChangeLog: * config/rs6000/rs6000.md (expander signbit<FLOAT128:mode>2): Adjust. (signbit<mode>2_dm_mem): Rename to ... (signbit<mode>2_dm_mem): ... this. gcc/testsuite/ChangeLog: gcc.target/powerpc/pr114567.c: New test.	2024-11-21 07:41:34 +00:00
Kewen Lin	baf536754f	rs6000: Use standard name {add,sub}v1ti3 for altivec_v{add,sub}uqm This patch is to adjust define_insn altivec_v{add,sub}uqm with standard names, as the associated test case shows, w/o this patch, it ends up with scalar {add,subf}c/{add,subf}e, the standard names help to exploit v{add,sub}uqm. gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vadduqm): Rename to ... (addv1ti3): ... this. (altivec_vsubuqm): Rename to ... (subv1ti3): ... this. * config/rs6000/rs6000-builtins.def (__builtin_altivec_vadduqm): Replace bif expander altivec_vadduqm with addv1ti3. (__builtin_altivec_vsubuqm): Replace bif expander altivec_vsubuqm with subv1ti3. gcc/testsuite/ChangeLog: * gcc.target/powerpc/p8vector-int128-3.c: New test.	2024-11-21 07:41:33 +00:00

1 2 3 4 5 ...

215670 Commits