I've noticed a number of potential problems in hash tables, of three
kinds: insertion of entries that seem empty, dangling insertions, and
lookups during insertions.
These problems may all have the effect of replacing a deleted entry
with one that seems empty, which may disconnect double-hashing chains
involving that entry, and thus cause entries to go missing.
This patch detects such problems by recording a pending insertion and
checking that it's completed before other potentially-conflicting
operations. The additional field is only introduced when checking is
enabled.
for gcc/ChnageLog
* hash-table.h (check_complete_insertion, check_insert_slot):
New hash_table methods.
(m_inserting_slot): New hash_table field.
(begin, hash_table ctors, ~hash_table): Check previous insert.
(expand, empty_slow, clear_slot, find_with_hash): Likewise.
(remote_elt_with_hash, traverse_noresize): Likewise.
(gt_pch_nx): Likewise.
(find_slot_with_hash): Likewise. Record requested insert.
This adds tests from bugzilla for PR103770 and duplicates.
gcc/testsuite/
* gcc.dg/pr103770.c: New test.
* gcc.dg/pr103859.c: New test.
* gcc.dg/pr105065.c: New test.
In the M-Class Arm-ARM:
https://developer.arm.com/documentation/ddi0553/bu/?lang=en
these MVE instructions only have '!' writeback variant and at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714
we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.
Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).
No regressions in arm-none-eabi with MVE and MVE.FP.
gcc/ChangeLog:
PR target/107714
* config/arm/arm-protos.h (mve_struct_mem_operand): New protoype.
* config/arm/arm.cc (mve_struct_mem_operand): New function.
* config/arm/constraints.md (Ug): New constraint.
* config/arm/mve.md (mve_vst4q<mode>): Change constraint.
(mve_vst2q<mode>): Likewise.
(mve_vld4q<mode>): Likewise.
(mve_vld2q<mode>): Likewise.
* config/arm/predicates.md (mve_struct_operand): New predicate.
gcc/testsuite/ChangeLog:
PR target/107714
* gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.
Update test cases with error messages that changed as a result.
gcc/fortran/ChangeLog:
PR fortran/102595
* decl.cc (attr_decl1): Guard against NULL pointer.
* parse.cc (match_deferred_characteristics): Include BT_CLASS in check for
derived being undefined.
gcc/testsuite/ChangeLog:
PR fortran/102595
* gfortran.dg/class_result_4.f90: Update error message check.
* gfortran.dg/pr85779_3.f90: Update error message check.
Just like the recently-added checks for empty entries, add checks for
deleted entries as well. This didn't catch any problems, but it might
prevent future accidents. Suggested by David Malcolm.
for gcc/ChangeLog
* hash-map.h (put, get_or_insert): Check that added entry
doesn't look deleted either.
* hash-set.h (add): Likewise.
In take_address_of, we may refrain from completing a decl_address
INSERT if gsi is NULL, so dnn't even ask for an INSERT in this case.
for gcc/ChangeLog
* tree-parloops.cc (take_address_of): Skip INSERT if !gsi.
Check, after inserting entries, that they don't look empty.
for gcc/ChangeLog
* hash-map.h (put, get_or_insert): Check that entry does not
look empty after insertion.
Check, after adding a key to a hash set, that the entry does not look
empty.
for gcc/ChangeLog
* hash-set.h (add): Check that the inserted entry does not
look empty.
When decl is NULL, don't record its mapping in the
decl_to_instance_map.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc (Sloc_to_locus): Don't map NULL decl.
When adding a catch-all partition, we map NULL to it. That mapping is
ineffective and unnecessary. Drop it.
for gcc/lto/ChangeLog
* lto-partition.cc (lto_1_to_1_map): Drop NULL partition
mapping.
cxx_eval_call_expression requests an INSERT even in cases when it
would later decide not to insert. This could break double-hashing
chains. Arrange for it to use NO_INSERT when the insertion would not
be completed.
for gcc/cp/ChangeLog
* constexpr.cc (cxx_eval_call_expression): Do not request an
INSERT that would not be completed.
Insertion of a tm_restart_node in tm_restart failed to record the
newly-allocated node in the hash table.
for gcc/ChangeLog
* trans-mem.cc (split_bb_make_tm_edge): Record new node in
tm_restart.
lookup_expr_in_table is not used for insertions, but it mistakenly
used INSERT rather than NO_INSERT.
for gcc/ChangeLog
* postreload-gcse.cc (lookup_expr_in_table): Use NO_INSERT.
If a result doesn't have a default def, don't attempt to remap it.
for gcc/ChangeLog
* tree-inline.cc (declare_return_variable): Don't remap NULL
default def of result.
When a TREE_OPERAND is NULL, do not cache it.
for gcc/ChangeLog
* tree-ssa-loop-niter.cc (expand_simple_operands): Refrain
from caching NULL TREE_OPERANDs.
Use NO_INSERT to test whether inserting should be attempted.
for gcc/cp/ChangeLog
* constraint.cc (normalize_concept_check): Use NO_INSERT for
pre-insertion check.
Avoid adding NULL vnodes to referenced tables.
for gcc/ChangeLog
* varpool.cc (symbol_table::remove_unreferenced_decls): Do not
add NULL vnodes to referenced table.
Avoid hash table lookups between requesting an insert and storing the
inserted value in avail_exprs_stack. Lookups before the insert is
completed could fail to find double-hashed elements.
for gcc/ChangeLog
* tree-ssa-scopedtables.cc
(avail_exprs_stack::lookup_avail_expr): Finish hash table
insertion before further lookups.
Perhaps no problem, but for safety.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_expand_prologue): Fix to check
DF availability before use of DF_* macros.
The middle-end doesn't have a preferred canonical form for expressing
zero-extension, sometimes using an AND, sometimes pairs of SHIFTs,
and sometimes using zero_extend. Pending changes to RTL simplification
will/may alter some of these representations, so a few additional
patterns are required to recognize these alternate representations
and avoid any testsuite regressions.
As an example, *popcountsi2_zext is currently represented as:
[(set (match_operand:DI 0 "register_operand" "=r")
(and:DI
(subreg:DI
(popcount:SI
(match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
(const_int 63)))
(clobber (reg:CC FLAGS_REG))]
this patch adds an alternate/equivalent pattern that matches:
[(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
(popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))))
(clobber (reg:CC FLAGS_REG))]
Another example is *popcounthi2 which is currently represented as:
[(set (match_operand:SI 0 "register_operand")
(popcount:SI
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"))))
(clobber (reg:CC FLAGS_REG))]
this patch adds an alternate/equivalent pattern that matches:
[(set (match_operand:SI 0 "register_operand")
(zero_extend:SI
(popcount:HI (match_operand:HI 1 "nonimmediate_operand"))))
(clobber (reg:CC FLAGS_REG))]
The contents of the machine description definitions remain the same.
it's just the expected RTL is slightly different but equivalent.
Providing both forms makes the backend more robust to middle-end
changes [and possibly catches some missed optimizations].
2022-12-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split
to match ZERO_EXTEND form of *clzsi2_lzcnt_zext.
(*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match
ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep.
(*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match
ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi.
(*popcountsi2_zext_2): Likewise, new define_insn_and_split to match
ZERO_EXTEND form of *popcountsi2_zext.
(*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match
ZERO_EXTEND form of *popcountsi2_zext_falsedep.
(*popcounthi2_2): Likewise, new define_insn_and_split to match
ZERO_EXTEND form of *popcounthi2.
(define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using
parity flag peephole2.
This patch is a one line change, to call ix86_expand_clear instead of
emit_move_insn with const0_rtx in ix86_split_ashl, allowing the backend
to use an xor instruction to clear a register if appropriate.
The effect is demonstrated with the following function.
__int128 foo(__int128 x, unsigned long long b) {
return ((__int128)b << 72) + x;
}
previously with -O2, GCC would generate
foo: movl $0, %eax
salq $8, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
with this patch, it now generates
foo: xorl %eax, %eax
salq $8, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
2022-12-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_split_ashl): Call
ix86_expand_clear to generate an xor instruction.
gcc/testsuite/ChangeLog
* gcc.target/i386/ashlti3-1.c: New test case.
This patch restructures the loop over the GP registers
which saves/restores then as part of the prologue/epilogue.
No functional change is intended by this patch, but it
offers the possibility to use load-pair/store-pair instructions.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_next_saved_reg): New function.
(riscv_is_eh_return_data_register): New function.
(riscv_for_each_saved_reg): Restructure loop.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
The comment above the enumeration of existing attributes got out of
order and a few entries were forgotten.
This patch synchronizes the comments according to the list.
This commit does not include any functional change.
gcc/ChangeLog:
* config/riscv/riscv.md: Sync comments with code.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Newer binutils uses all caps, where it was all lower case
previously. Pushed as obvious.
This should resolve:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100383
gcc/
* configure.ac: use grep -i for case insensitive test.
* configure: Regenerate.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
This improves RTL dumps readability. No functional changes.
gcc/
* config/xtensa/xtensa.md (unspec): Extract UNSPEC_* constants
into this enum.
(unspecv): Extract UNSPECV_* constants into this enum.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_expand_prologue): Modify to
exit the inspection loops as soon as the necessity of stack
pointer is found.
Like d0bbecb1c4, we add a wrapper to
prevent it pull stdint.h from standard C library.
gcc/testsuite:
* gcc.target/riscv/rvv/vsetvl/riscv_vector.h: New.
This patch is to fix issue of visiting non-existing block of CFG.
Since blocks index of CFG in GCC are not always contiguous, we will potentially
visit a gap block which is no existing in the current CFG.
This patch can avoid visiting non existing block in CFG.
I noticed such issue in my internal regression of current testsuite
when I change the X86 server machine. This patch fix it:
17:27:15 job(build_and_test_rv32): Increased FAIL List:
17:27:15 job(build_and_test_rv32): FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-46.c
-O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: Segmentation fault)
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(pass_vsetvl::compute_global_backward_infos): Change to visit CFG.
(pass_vsetvl::prune_expressions): Ditto.
PR106680 shows that -m32 -mpowerpc64 is different from
-mpowerpc64 -m32, this is determined by the way how we
handle option powerpc64 in rs6000_handle_option.
Segher pointed out this difference should be taken as
a bug and we should ensure that option powerpc64 is
independent of -m32/-m64. So this patch removes the
handlings in rs6000_handle_option and add some necessary
supports in rs6000_option_override_internal instead.
With this patch, if users specify -m{no-,}powerpc64, the
specified value is honoured, otherwise, for 64bit it
always enables OPTION_MASK_POWERPC64; while for 32bit
and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
OPTION_MASK_POWERPC64.
btw, following Segher's suggestion, I did some tries to warn
when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
If warn for the case that powerpc64 is specified explicitly,
there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
they need some updates, meanwhile the artificial run
with "--target_board=unix'{-m32/-mpowerpc64}'" will have
noisy warnings on ppc64-linux. If warn for the case that
it's specified implicitly, they can just be initialized by
TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the
given cpu mask, we have to special case them and not to warn.
As Segher's latest comment, I decide not to warn them and
keep it consistent with before.
Bootstrapped and regress-tested on:
- powerpc64-linux-gnu P7 and P8 {-m64,-m32}
- powerpc64le-linux-gnu P9 and P10
- powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
- powerpc-darwin9 (with Iain's help)
PR target/106680
gcc/ChangeLog:
* common/config/rs6000/rs6000-common.cc (rs6000_handle_option): Remove
the adjustment for option powerpc64 in -m64 handling, and remove the
whole -m32 handling.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): When no
explicit powerpc64 option is provided, enable it for -m64. For 32 bit
and OS_MISSING_POWERPC64, disable powerpc64 if it's enabled but not
specified explicitly.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr106680-1.c: New test.
* gcc.target/powerpc/pr106680-2.c: New test.
* gcc.target/powerpc/pr106680-3.c: New test.
* gcc.target/powerpc/pr106680-4.c: New test.
2022-12-27 Kewen Lin <linkw@linux.ibm.com>
Iain Sandoe <iain@sandoe.co.uk>
if (mdaz-ftz)
link crtfastmath.o
else if ((Ofast || ffast-math || funsafe-math-optimizations)
&& !shared && !mno-daz-ftz)
link crtfastmath.o
else
Don't link crtfastmath.o
gcc/ChangeLog:
PR target/55522
PR target/36821
* config/i386/gnu-user-common.h (GNU_USER_TARGET_MATHFILE_SPEC):
Link crtfastmath.o whenever -mdaz-ftz is specified. Don't link
crtfastmath.o when -share or -mno-daz-ftz is specified.
* config/i386/i386.opt (mdaz-ftz): New option.
* doc/invoke.texi (x86 options): Document mftz-daz.
This patch tweaks the x86 backend to use the movss and movsd instructions
to perform some vector permutations on integer vectors (V4SI and V2DI) in
the same way they are used for floating point vectors (V4SF and V2DF).
As a motivating example, consider:
typedef unsigned int v4si __attribute__((vector_size(16)));
typedef float v4sf __attribute__((vector_size(16)));
v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; }
v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; }
which is currently compiled with -O2 to:
foo: movdqa %xmm0, %xmm2
shufps $80, %xmm0, %xmm1
movdqa %xmm1, %xmm0
shufps $232, %xmm2, %xmm0
ret
bar: movss %xmm1, %xmm0
ret
with this patch both functions compile to the same form.
Likewise for the V2DI case:
typedef unsigned long v2di __attribute__((vector_size(16)));
typedef double v2df __attribute__((vector_size(16)));
v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; }
v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; }
which currently generates:
foo: shufpd $2, %xmm0, %xmm1
movdqa %xmm1, %xmm0
ret
bar: movsd %xmm1, %xmm0
ret
2022-12-25 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-builtin.def (__builtin_ia32_movss): Update
CODE_FOR_sse_movss to CODE_FOR_sse_movss_v4sf.
(__builtin_ia32_movsd): Likewise, update CODE_FOR_sse2_movsd to
CODE_FOR_sse2_movsd_v2df.
* config/i386/i386-expand.cc (split_convert_uns_si_sse): Update
gen_sse_movss call to gen_sse_movss_v4sf, and gen_sse2_movsd call
to gen_sse2_movsd_v2df.
(expand_vec_perm_movs): Also allow V4SImode with TARGET_SSE and
V2DImode with TARGET_SSE2.
* config/i386/sse.md
(avx512fp16_fcmaddcsh_v8hf_mask3<round_expand_name>): Update
gen_sse_movss call to gen_sse_movss_v4sf.
(avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>): Likewise.
(sse_movss_<mode>): Renamed from sse_movss using VI4F_128 mode
iterator to handle both V4SF and V4SI.
(sse2_movsd_<mode>): Likewise, renamed from sse2_movsd using
VI8F_128 mode iterator to handle both V2DF and V2DI.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse-movss-4.c: New test case.
* gcc.target/i386/sse2-movsd-3.c: New test case.