mirror of
git://gcc.gnu.org/git/gcc.git
synced 2025-03-19 06:40:31 +08:00
aarch64: Avoid false dependencies for SVE unary operations
For calls like: z0 = svabs_s8_x (p0, z1) we previously generated: abs z0.b, p0/m, z1.b However, this creates a false dependency on z0 (the merge input). This can lead to strange results in some cases, e.g. serialising the operation behind arbitrary earlier operations, or preventing two iterations of a loop from being executed in parallel. This patch therefore ties the input to the output, using a MOVPRFX if necessary and possible. (The SVE2 unary long instructions do not support MOVPRFX.) When testing the patch, I hit a bug in the big-endian SVE move optimisation in aarch64_maybe_expand_sve_subreg_move. I don't have an indepenedent testcase for it, so I didn't split it out into a separate patch. gcc/ * config/aarch64/aarch64.c (aarch64_maybe_expand_sve_subreg_move): Do not optimize LRA subregs. * config/aarch64/aarch64-sve.md (@aarch64_pred_<SVE_INT_UNARY:optab><mode>): Tie the input to the output. (@aarch64_sve_revbhw_<SVE_ALL:mode><PRED_HSD:mode>): Likewise. (*<ANY_EXTEND:optab><SVE_PARTIAL_I:mode><SVE_HSDI:mode>2): Likewise. (@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise. (*cnot<mode>): Likewise. (@aarch64_pred_<SVE_COND_FP_UNARY:optab><mode>): Likewise. (@aarch64_sve_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>): Likewise. (@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>): Likewise. (@aarch64_sve_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>): Likewise. (@aarch64_sve_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>): Likewise. (@aarch64_sve_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>): Likewise. (@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise. (@aarch64_sve_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<SVE2_COND_FP_UNARY_LONG:sve_fp_op><mode>): Likewise. (@aarch64_pred_<SVE2_COND_FP_UNARY_NARROWB:sve_fp_op><mode>): Likewise. (@aarch64_pred_<SVE2_U32_UNARY:sve_int_op><mode>): Likewise. (@aarch64_pred_<SVE2_COND_INT_UNARY_FP:sve_fp_op><mode>): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/abs_f16.c (abs_f16_x_untied): Expect a MOVPRFX instruction. * gcc.target/aarch64/sve/acle/asm/abs_f32.c (abs_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/abs_f64.c (abs_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/abs_s16.c (abs_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/abs_s32.c (abs_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/abs_s64.c (abs_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/abs_s8.c (abs_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cls_s16.c (cls_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cls_s32.c (cls_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cls_s64.c (cls_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cls_s8.c (cls_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_s16.c (clz_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_s32.c (clz_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_s64.c (clz_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_s8.c (clz_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_u16.c (clz_u16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_u32.c (clz_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_u64.c (clz_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/clz_u8.c (clz_u8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_s16.c (cnot_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_s32.c (cnot_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_s64.c (cnot_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_s8.c (cnot_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_u16.c (cnot_u16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_u32.c (cnot_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_u64.c (cnot_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnot_u8.c (cnot_u8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_bf16.c (cnt_bf16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_f16.c (cnt_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_f32.c (cnt_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_f64.c (cnt_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_s16.c (cnt_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_s32.c (cnt_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_s64.c (cnt_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_s8.c (cnt_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_u16.c (cnt_u16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_u32.c (cnt_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_u64.c (cnt_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cnt_u8.c (cnt_u8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_bf16.c (cvt_bf16_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_f16.c (cvt_f16_f32_x_untied) (cvt_f16_f64_x_untied, cvt_f16_s16_x_untied, cvt_f16_s32_x_untied) (cvt_f16_s64_x_untied, cvt_f16_u16_x_untied, cvt_f16_u32_x_untied) (cvt_f16_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_f32.c (cvt_f32_f16_x_untied) (cvt_f32_f64_x_untied, cvt_f32_s16_x_untied, cvt_f32_s32_x_untied) (cvt_f32_s64_x_untied, cvt_f32_u16_x_untied, cvt_f32_u32_x_untied) (cvt_f32_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_f64.c (cvt_f64_f16_x_untied) (cvt_f64_f32_x_untied, cvt_f64_s16_x_untied, cvt_f64_s32_x_untied) (cvt_f64_s64_x_untied, cvt_f64_u16_x_untied, cvt_f64_u32_x_untied) (cvt_f64_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_s16.c (cvt_s16_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_s32.c (cvt_s32_f16_x_untied) (cvt_s32_f32_x_untied, cvt_s32_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_s64.c (cvt_s64_f16_x_untied) (cvt_s64_f32_x_untied, cvt_s64_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_u16.c (cvt_u16_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_u32.c (cvt_u32_f16_x_untied) (cvt_u32_f32_x_untied, cvt_u32_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/cvt_u64.c (cvt_u64_f16_x_untied) (cvt_u64_f32_x_untied, cvt_u64_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/extb_s16.c (extb_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/extb_s32.c (extb_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/extb_s64.c (extb_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/exth_s32.c (exth_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/exth_s64.c (exth_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/extw_s64.c (extw_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_f16.c (neg_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_f32.c (neg_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_f64.c (neg_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_s16.c (neg_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_s32.c (neg_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_s64.c (neg_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/neg_s8.c (neg_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_s16.c (not_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_s32.c (not_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_s64.c (not_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_s8.c (not_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_u16.c (not_u16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_u32.c (not_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_u64.c (not_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/not_u8.c (not_u8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_s16.c (rbit_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_s32.c (rbit_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_s64.c (rbit_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_s8.c (rbit_s8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_u16.c (rbit_u16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_u32.c (rbit_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_u64.c (rbit_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rbit_u8.c (rbit_u8_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/recpx_f16.c (recpx_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/recpx_f32.c (recpx_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/recpx_f64.c (recpx_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revb_s16.c (revb_s16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revb_s32.c (revb_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revb_s64.c (revb_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revb_u16.c (revb_u16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revb_u32.c (revb_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revb_u64.c (revb_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revh_s32.c (revh_s32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revh_s64.c (revh_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revh_u32.c (revh_u32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revh_u64.c (revh_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revw_s64.c (revw_s64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/revw_u64.c (revw_u64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rinta_f16.c (rinta_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rinta_f32.c (rinta_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rinta_f64.c (rinta_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rinti_f16.c (rinti_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rinti_f32.c (rinti_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rinti_f64.c (rinti_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintm_f16.c (rintm_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintm_f32.c (rintm_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintm_f64.c (rintm_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintn_f16.c (rintn_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintn_f32.c (rintn_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintn_f64.c (rintn_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintp_f16.c (rintp_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintp_f32.c (rintp_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintp_f64.c (rintp_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintx_f16.c (rintx_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintx_f32.c (rintx_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintx_f64.c (rintx_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintz_f16.c (rintz_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintz_f32.c (rintz_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/rintz_f64.c (rintz_f64_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/sqrt_f16.c (sqrt_f16_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/sqrt_f32.c (sqrt_f32_x_untied): Ditto. * gcc.target/aarch64/sve/acle/asm/sqrt_f64.c (sqrt_f64_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/cvtx_f32.c (cvtx_f32_f64_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/logb_f16.c (logb_f16_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/logb_f32.c (logb_f32_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/logb_f64.c (logb_f64_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qabs_s16.c (qabs_s16_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qabs_s32.c (qabs_s32_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qabs_s64.c (qabs_s64_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qabs_s8.c (qabs_s8_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qneg_s16.c (qneg_s16_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qneg_s32.c (qneg_s32_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qneg_s64.c (qneg_s64_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/qneg_s8.c (qneg_s8_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/recpe_u32.c (recpe_u32_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/rsqrte_u32.c (rsqrte_u32_x_untied): Ditto. * gcc.target/aarch64/sve2/acle/asm/cvtlt_f32.c (cvtlt_f32_f16_x_untied): Expect a MOV instruction. * gcc.target/aarch64/sve2/acle/asm/cvtlt_f64.c (cvtlt_f64_f32_x_untied): Likewise.
This commit is contained in:
parent
4aff491ffc
commit
a4d9837ee4
@ -2925,14 +2925,17 @@
|
||||
|
||||
;; Integer unary arithmetic predicated with a PTRUE.
|
||||
(define_insn "@aarch64_pred_<optab><mode>"
|
||||
[(set (match_operand:SVE_I 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_I
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(SVE_INT_UNARY:SVE_I
|
||||
(match_operand:SVE_I 2 "register_operand" "w"))]
|
||||
(match_operand:SVE_I 2 "register_operand" "0, w"))]
|
||||
UNSPEC_PRED_X))]
|
||||
"TARGET_SVE"
|
||||
"<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
"@
|
||||
<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>
|
||||
movprfx\t%0, %2\;<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated integer unary arithmetic with merging.
|
||||
@ -2998,15 +3001,18 @@
|
||||
|
||||
;; Predicated integer unary operations.
|
||||
(define_insn "@aarch64_pred_<optab><mode>"
|
||||
[(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_I
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(unspec:SVE_FULL_I
|
||||
[(match_operand:SVE_FULL_I 2 "register_operand" "w")]
|
||||
[(match_operand:SVE_FULL_I 2 "register_operand" "0, w")]
|
||||
SVE_INT_UNARY)]
|
||||
UNSPEC_PRED_X))]
|
||||
"TARGET_SVE && <elem_bits> >= <min_elem_bits>"
|
||||
"<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
"@
|
||||
<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>
|
||||
movprfx\t%0, %2\;<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Another way of expressing the REVB, REVH and REVW patterns, with this
|
||||
@ -3014,15 +3020,18 @@
|
||||
;; of lanes and the data mode decides the granularity of the reversal within
|
||||
;; each lane.
|
||||
(define_insn "@aarch64_sve_revbhw_<SVE_ALL:mode><PRED_HSD:mode>"
|
||||
[(set (match_operand:SVE_ALL 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_ALL 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_ALL
|
||||
[(match_operand:PRED_HSD 1 "register_operand" "Upl")
|
||||
[(match_operand:PRED_HSD 1 "register_operand" "Upl, Upl")
|
||||
(unspec:SVE_ALL
|
||||
[(match_operand:SVE_ALL 2 "register_operand" "w")]
|
||||
[(match_operand:SVE_ALL 2 "register_operand" "0, w")]
|
||||
UNSPEC_REVBHW)]
|
||||
UNSPEC_PRED_X))]
|
||||
"TARGET_SVE && <PRED_HSD:elem_bits> > <SVE_ALL:container_bits>"
|
||||
"rev<SVE_ALL:Vcwtype>\t%0.<PRED_HSD:Vetype>, %1/m, %2.<PRED_HSD:Vetype>"
|
||||
"@
|
||||
rev<SVE_ALL:Vcwtype>\t%0.<PRED_HSD:Vetype>, %1/m, %2.<PRED_HSD:Vetype>
|
||||
movprfx\t%0, %2\;rev<SVE_ALL:Vcwtype>\t%0.<PRED_HSD:Vetype>, %1/m, %2.<PRED_HSD:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated integer unary operations with merging.
|
||||
@ -3071,28 +3080,34 @@
|
||||
|
||||
;; Predicated sign and zero extension from a narrower mode.
|
||||
(define_insn "*<optab><SVE_PARTIAL_I:mode><SVE_HSDI:mode>2"
|
||||
[(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_HSDI 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_HSDI
|
||||
[(match_operand:<SVE_HSDI:VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<SVE_HSDI:VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(ANY_EXTEND:SVE_HSDI
|
||||
(match_operand:SVE_PARTIAL_I 2 "register_operand" "w"))]
|
||||
(match_operand:SVE_PARTIAL_I 2 "register_operand" "0, w"))]
|
||||
UNSPEC_PRED_X))]
|
||||
"TARGET_SVE && (~<SVE_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == 0"
|
||||
"<su>xt<SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vetype>, %1/m, %2.<SVE_HSDI:Vetype>"
|
||||
"@
|
||||
<su>xt<SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vetype>, %1/m, %2.<SVE_HSDI:Vetype>
|
||||
movprfx\t%0, %2\;<su>xt<SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vetype>, %1/m, %2.<SVE_HSDI:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated truncate-and-sign-extend operations.
|
||||
(define_insn "@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>"
|
||||
[(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_HSDI
|
||||
[(match_operand:<SVE_FULL_HSDI:VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<SVE_FULL_HSDI:VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(sign_extend:SVE_FULL_HSDI
|
||||
(truncate:SVE_PARTIAL_I
|
||||
(match_operand:SVE_FULL_HSDI 2 "register_operand" "w")))]
|
||||
(match_operand:SVE_FULL_HSDI 2 "register_operand" "0, w")))]
|
||||
UNSPEC_PRED_X))]
|
||||
"TARGET_SVE
|
||||
&& (~<SVE_FULL_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == 0"
|
||||
"sxt<SVE_PARTIAL_I:Vesize>\t%0.<SVE_FULL_HSDI:Vetype>, %1/m, %2.<SVE_FULL_HSDI:Vetype>"
|
||||
"@
|
||||
sxt<SVE_PARTIAL_I:Vesize>\t%0.<SVE_FULL_HSDI:Vetype>, %1/m, %2.<SVE_FULL_HSDI:Vetype>
|
||||
movprfx\t%0, %2\;sxt<SVE_PARTIAL_I:Vesize>\t%0.<SVE_FULL_HSDI:Vetype>, %1/m, %2.<SVE_FULL_HSDI:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated truncate-and-sign-extend operations with merging.
|
||||
@ -3212,20 +3227,23 @@
|
||||
)
|
||||
|
||||
(define_insn "*cnot<mode>"
|
||||
[(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_I
|
||||
[(unspec:<VPRED>
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 5 "aarch64_sve_ptrue_flag")
|
||||
(eq:<VPRED>
|
||||
(match_operand:SVE_FULL_I 2 "register_operand" "w")
|
||||
(match_operand:SVE_FULL_I 2 "register_operand" "0, w")
|
||||
(match_operand:SVE_FULL_I 3 "aarch64_simd_imm_zero"))]
|
||||
UNSPEC_PRED_Z)
|
||||
(match_operand:SVE_FULL_I 4 "aarch64_simd_imm_one")
|
||||
(match_dup 3)]
|
||||
UNSPEC_SEL))]
|
||||
"TARGET_SVE"
|
||||
"cnot\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
"@
|
||||
cnot\t%0.<Vetype>, %1/m, %2.<Vetype>
|
||||
movprfx\t%0, %2\;cnot\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated logical inverse with merging.
|
||||
@ -3383,14 +3401,17 @@
|
||||
|
||||
;; Predicated floating-point unary operations.
|
||||
(define_insn "@aarch64_pred_<optab><mode>"
|
||||
[(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_F 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_F
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:SVE_FULL_F 2 "register_operand" "w")]
|
||||
(match_operand:SVE_FULL_F 2 "register_operand" "0, w")]
|
||||
SVE_COND_FP_UNARY))]
|
||||
"TARGET_SVE"
|
||||
"<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
"@
|
||||
<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>
|
||||
movprfx\t%0, %2\;<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated floating-point unary arithmetic with merging.
|
||||
@ -8575,26 +8596,32 @@
|
||||
|
||||
;; Predicated float-to-integer conversion, either to the same width or wider.
|
||||
(define_insn "@aarch64_sve_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>"
|
||||
[(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_HSDI
|
||||
[(match_operand:<SVE_FULL_HSDI:VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<SVE_FULL_HSDI:VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:SVE_FULL_F 2 "register_operand" "w")]
|
||||
(match_operand:SVE_FULL_F 2 "register_operand" "0, w")]
|
||||
SVE_COND_FCVTI))]
|
||||
"TARGET_SVE && <SVE_FULL_HSDI:elem_bits> >= <SVE_FULL_F:elem_bits>"
|
||||
"fcvtz<su>\t%0.<SVE_FULL_HSDI:Vetype>, %1/m, %2.<SVE_FULL_F:Vetype>"
|
||||
"@
|
||||
fcvtz<su>\t%0.<SVE_FULL_HSDI:Vetype>, %1/m, %2.<SVE_FULL_F:Vetype>
|
||||
movprfx\t%0, %2\;fcvtz<su>\t%0.<SVE_FULL_HSDI:Vetype>, %1/m, %2.<SVE_FULL_F:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated narrowing float-to-integer conversion.
|
||||
(define_insn "@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>"
|
||||
[(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w")
|
||||
[(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w")
|
||||
(unspec:VNx4SI_ONLY
|
||||
[(match_operand:VNx2BI 1 "register_operand" "Upl")
|
||||
[(match_operand:VNx2BI 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:VNx2DF_ONLY 2 "register_operand" "w")]
|
||||
(match_operand:VNx2DF_ONLY 2 "register_operand" "0, w")]
|
||||
SVE_COND_FCVTI))]
|
||||
"TARGET_SVE"
|
||||
"fcvtz<su>\t%0.<VNx4SI_ONLY:Vetype>, %1/m, %2.<VNx2DF_ONLY:Vetype>"
|
||||
"@
|
||||
fcvtz<su>\t%0.<VNx4SI_ONLY:Vetype>, %1/m, %2.<VNx2DF_ONLY:Vetype>
|
||||
movprfx\t%0, %2\;fcvtz<su>\t%0.<VNx4SI_ONLY:Vetype>, %1/m, %2.<VNx2DF_ONLY:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated float-to-integer conversion with merging, either to the same
|
||||
@ -8756,26 +8783,32 @@
|
||||
;; Predicated integer-to-float conversion, either to the same width or
|
||||
;; narrower.
|
||||
(define_insn "@aarch64_sve_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>"
|
||||
[(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_F 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_F
|
||||
[(match_operand:<SVE_FULL_HSDI:VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<SVE_FULL_HSDI:VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:SVE_FULL_HSDI 2 "register_operand" "w")]
|
||||
(match_operand:SVE_FULL_HSDI 2 "register_operand" "0, w")]
|
||||
SVE_COND_ICVTF))]
|
||||
"TARGET_SVE && <SVE_FULL_HSDI:elem_bits> >= <SVE_FULL_F:elem_bits>"
|
||||
"<su>cvtf\t%0.<SVE_FULL_F:Vetype>, %1/m, %2.<SVE_FULL_HSDI:Vetype>"
|
||||
"@
|
||||
<su>cvtf\t%0.<SVE_FULL_F:Vetype>, %1/m, %2.<SVE_FULL_HSDI:Vetype>
|
||||
movprfx\t%0, %2\;<su>cvtf\t%0.<SVE_FULL_F:Vetype>, %1/m, %2.<SVE_FULL_HSDI:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated widening integer-to-float conversion.
|
||||
(define_insn "@aarch64_sve_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>"
|
||||
[(set (match_operand:VNx2DF_ONLY 0 "register_operand" "=w")
|
||||
[(set (match_operand:VNx2DF_ONLY 0 "register_operand" "=w, ?&w")
|
||||
(unspec:VNx2DF_ONLY
|
||||
[(match_operand:VNx2BI 1 "register_operand" "Upl")
|
||||
[(match_operand:VNx2BI 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:VNx4SI_ONLY 2 "register_operand" "w")]
|
||||
(match_operand:VNx4SI_ONLY 2 "register_operand" "0, w")]
|
||||
SVE_COND_ICVTF))]
|
||||
"TARGET_SVE"
|
||||
"<su>cvtf\t%0.<VNx2DF_ONLY:Vetype>, %1/m, %2.<VNx4SI_ONLY:Vetype>"
|
||||
"@
|
||||
<su>cvtf\t%0.<VNx2DF_ONLY:Vetype>, %1/m, %2.<VNx4SI_ONLY:Vetype>
|
||||
movprfx\t%0, %2\;<su>cvtf\t%0.<VNx2DF_ONLY:Vetype>, %1/m, %2.<VNx4SI_ONLY:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated integer-to-float conversion with merging, either to the same
|
||||
@ -8948,14 +8981,17 @@
|
||||
|
||||
;; Predicated float-to-float truncation.
|
||||
(define_insn "@aarch64_sve_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>"
|
||||
[(set (match_operand:SVE_FULL_HSF 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_HSF 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_HSF
|
||||
[(match_operand:<SVE_FULL_SDF:VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<SVE_FULL_SDF:VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:SVE_FULL_SDF 2 "register_operand" "w")]
|
||||
(match_operand:SVE_FULL_SDF 2 "register_operand" "0, w")]
|
||||
SVE_COND_FCVT))]
|
||||
"TARGET_SVE && <SVE_FULL_SDF:elem_bits> > <SVE_FULL_HSF:elem_bits>"
|
||||
"fcvt\t%0.<SVE_FULL_HSF:Vetype>, %1/m, %2.<SVE_FULL_SDF:Vetype>"
|
||||
"@
|
||||
fcvt\t%0.<SVE_FULL_HSF:Vetype>, %1/m, %2.<SVE_FULL_SDF:Vetype>
|
||||
movprfx\t%0, %2\;fcvt\t%0.<SVE_FULL_HSF:Vetype>, %1/m, %2.<SVE_FULL_SDF:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated float-to-float truncation with merging.
|
||||
@ -9002,14 +9038,17 @@
|
||||
|
||||
;; Predicated BFCVT.
|
||||
(define_insn "@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>"
|
||||
[(set (match_operand:VNx8BF_ONLY 0 "register_operand" "=w")
|
||||
[(set (match_operand:VNx8BF_ONLY 0 "register_operand" "=w, ?&w")
|
||||
(unspec:VNx8BF_ONLY
|
||||
[(match_operand:VNx4BI 1 "register_operand" "Upl")
|
||||
[(match_operand:VNx4BI 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:VNx4SF_ONLY 2 "register_operand" "w")]
|
||||
(match_operand:VNx4SF_ONLY 2 "register_operand" "0, w")]
|
||||
SVE_COND_FCVT))]
|
||||
"TARGET_SVE_BF16"
|
||||
"bfcvt\t%0.h, %1/m, %2.s"
|
||||
"@
|
||||
bfcvt\t%0.h, %1/m, %2.s
|
||||
movprfx\t%0, %2\;bfcvt\t%0.h, %1/m, %2.s"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated BFCVT with merging.
|
||||
@ -9099,14 +9138,17 @@
|
||||
|
||||
;; Predicated float-to-float extension.
|
||||
(define_insn "@aarch64_sve_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>"
|
||||
[(set (match_operand:SVE_FULL_SDF 0 "register_operand" "=w")
|
||||
[(set (match_operand:SVE_FULL_SDF 0 "register_operand" "=w, ?&w")
|
||||
(unspec:SVE_FULL_SDF
|
||||
[(match_operand:<SVE_FULL_SDF:VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<SVE_FULL_SDF:VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:SVE_FULL_HSF 2 "register_operand" "w")]
|
||||
(match_operand:SVE_FULL_HSF 2 "register_operand" "0, w")]
|
||||
SVE_COND_FCVT))]
|
||||
"TARGET_SVE && <SVE_FULL_SDF:elem_bits> > <SVE_FULL_HSF:elem_bits>"
|
||||
"fcvt\t%0.<SVE_FULL_SDF:Vetype>, %1/m, %2.<SVE_FULL_HSF:Vetype>"
|
||||
"@
|
||||
fcvt\t%0.<SVE_FULL_SDF:Vetype>, %1/m, %2.<SVE_FULL_HSF:Vetype>
|
||||
movprfx\t%0, %2\;fcvt\t%0.<SVE_FULL_SDF:Vetype>, %1/m, %2.<SVE_FULL_HSF:Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated float-to-float extension with merging.
|
||||
|
@ -1893,10 +1893,10 @@
|
||||
(unspec:SVE_FULL_SDF
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:<VNARROW> 2 "register_operand" "w")]
|
||||
(match_operand:<VNARROW> 2 "register_operand" "0")]
|
||||
SVE2_COND_FP_UNARY_LONG))]
|
||||
"TARGET_SVE2"
|
||||
"<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Ventype>"
|
||||
"<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Ventype>"
|
||||
)
|
||||
|
||||
;; Predicated convert long top with merging.
|
||||
@ -1978,14 +1978,17 @@
|
||||
;; Predicated FCVTX (equivalent to what would be FCVTXNB, except that
|
||||
;; it supports MOVPRFX).
|
||||
(define_insn "@aarch64_pred_<sve_fp_op><mode>"
|
||||
[(set (match_operand:VNx4SF_ONLY 0 "register_operand" "=w")
|
||||
[(set (match_operand:VNx4SF_ONLY 0 "register_operand" "=w, ?&w")
|
||||
(unspec:VNx4SF_ONLY
|
||||
[(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:<VWIDE> 2 "register_operand" "w")]
|
||||
(match_operand:<VWIDE> 2 "register_operand" "0, w")]
|
||||
SVE2_COND_FP_UNARY_NARROWB))]
|
||||
"TARGET_SVE2"
|
||||
"<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vewtype>"
|
||||
"@
|
||||
<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vewtype>
|
||||
movprfx\t%0, %2\;<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vewtype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated FCVTX with merging.
|
||||
@ -2076,15 +2079,18 @@
|
||||
|
||||
;; Predicated integer unary operations.
|
||||
(define_insn "@aarch64_pred_<sve_int_op><mode>"
|
||||
[(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w")
|
||||
[(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w")
|
||||
(unspec:VNx4SI_ONLY
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(unspec:VNx4SI_ONLY
|
||||
[(match_operand:VNx4SI_ONLY 2 "register_operand" "w")]
|
||||
[(match_operand:VNx4SI_ONLY 2 "register_operand" "0, w")]
|
||||
SVE2_U32_UNARY)]
|
||||
UNSPEC_PRED_X))]
|
||||
"TARGET_SVE2"
|
||||
"<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
"@
|
||||
<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>
|
||||
movprfx\t%0, %2\;<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated integer unary operations with merging.
|
||||
@ -2139,14 +2145,17 @@
|
||||
|
||||
;; Predicated FLOGB.
|
||||
(define_insn "@aarch64_pred_<sve_fp_op><mode>"
|
||||
[(set (match_operand:<V_INT_EQUIV> 0 "register_operand" "=w")
|
||||
[(set (match_operand:<V_INT_EQUIV> 0 "register_operand" "=w, ?&w")
|
||||
(unspec:<V_INT_EQUIV>
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl")
|
||||
[(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
|
||||
(match_operand:SI 3 "aarch64_sve_gp_strictness")
|
||||
(match_operand:SVE_FULL_F 2 "register_operand" "w")]
|
||||
(match_operand:SVE_FULL_F 2 "register_operand" "0, w")]
|
||||
SVE2_COND_INT_UNARY_FP))]
|
||||
"TARGET_SVE2"
|
||||
"<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
"@
|
||||
<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>
|
||||
movprfx\t%0, %2\;<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
|
||||
[(set_attr "movprfx" "*,yes")]
|
||||
)
|
||||
|
||||
;; Predicated FLOGB with merging.
|
||||
|
@ -5390,9 +5390,35 @@ bool
|
||||
aarch64_maybe_expand_sve_subreg_move (rtx dest, rtx src)
|
||||
{
|
||||
gcc_assert (BYTES_BIG_ENDIAN);
|
||||
if (SUBREG_P (dest))
|
||||
|
||||
/* Do not try to optimize subregs that LRA has created for matched
|
||||
reloads. These subregs only exist as a temporary measure to make
|
||||
the RTL well-formed, but they are exempt from the usual
|
||||
TARGET_CAN_CHANGE_MODE_CLASS rules.
|
||||
|
||||
For example, if we have:
|
||||
|
||||
(set (reg:VNx8HI R1) (foo:VNx8HI (reg:VNx4SI R2)))
|
||||
|
||||
and the constraints require R1 and R2 to be in the same register,
|
||||
LRA may need to create RTL such as:
|
||||
|
||||
(set (subreg:VNx4SI (reg:VNx8HI TMP) 0) (reg:VNx4SI R2))
|
||||
(set (reg:VNx8HI TMP) (foo:VNx8HI (subreg:VNx4SI (reg:VNx8HI TMP) 0)))
|
||||
(set (reg:VNx8HI R1) (reg:VNx8HI TMP))
|
||||
|
||||
which forces both the input and output of the original instruction
|
||||
to use the same hard register. But for this to work, the normal
|
||||
rules have to be suppressed on the subreg input, otherwise LRA
|
||||
would need to reload that input too, meaning that the process
|
||||
would never terminate. To compensate for this, the normal rules
|
||||
are also suppressed for the subreg output of the first move.
|
||||
Ignoring the special case and handling the first move normally
|
||||
would therefore generate wrong code: we would reverse the elements
|
||||
for the first subreg but not reverse them back for the second subreg. */
|
||||
if (SUBREG_P (dest) && !LRA_SUBREG_P (dest))
|
||||
dest = SUBREG_REG (dest);
|
||||
if (SUBREG_P (src))
|
||||
if (SUBREG_P (src) && !LRA_SUBREG_P (src))
|
||||
src = SUBREG_REG (src);
|
||||
|
||||
/* The optimization handles two single SVE REGs with different element
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_f16_x_tied1, svfloat16_t,
|
||||
|
||||
/*
|
||||
** abs_f16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** fabs z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_f32_x_tied1, svfloat32_t,
|
||||
|
||||
/*
|
||||
** abs_f32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** fabs z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_f64_x_tied1, svfloat64_t,
|
||||
|
||||
/*
|
||||
** abs_f64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** fabs z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** abs_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** abs z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** abs_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** abs z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** abs_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** abs z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (abs_s8_x_tied1, svint8_t,
|
||||
|
||||
/*
|
||||
** abs_s8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** abs z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cls_s16_z, svuint16_t, svint16_t,
|
||||
|
||||
/*
|
||||
** cls_s16_x:
|
||||
** movprfx z0, z4
|
||||
** cls z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cls_s32_z, svuint32_t, svint32_t,
|
||||
|
||||
/*
|
||||
** cls_s32_x:
|
||||
** movprfx z0, z4
|
||||
** cls z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cls_s64_z, svuint64_t, svint64_t,
|
||||
|
||||
/*
|
||||
** cls_s64_x:
|
||||
** movprfx z0, z4
|
||||
** cls z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cls_s8_z, svuint8_t, svint8_t,
|
||||
|
||||
/*
|
||||
** cls_s8_x:
|
||||
** movprfx z0, z4
|
||||
** cls z0\.b, p0/m, z4\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (clz_s16_z, svuint16_t, svint16_t,
|
||||
|
||||
/*
|
||||
** clz_s16_x:
|
||||
** movprfx z0, z4
|
||||
** clz z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (clz_s32_z, svuint32_t, svint32_t,
|
||||
|
||||
/*
|
||||
** clz_s32_x:
|
||||
** movprfx z0, z4
|
||||
** clz z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (clz_s64_z, svuint64_t, svint64_t,
|
||||
|
||||
/*
|
||||
** clz_s64_x:
|
||||
** movprfx z0, z4
|
||||
** clz z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (clz_s8_z, svuint8_t, svint8_t,
|
||||
|
||||
/*
|
||||
** clz_s8_x:
|
||||
** movprfx z0, z4
|
||||
** clz z0\.b, p0/m, z4\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (clz_u16_x_tied1, svuint16_t,
|
||||
|
||||
/*
|
||||
** clz_u16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** clz z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (clz_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** clz_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** clz z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (clz_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** clz_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** clz z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (clz_u8_x_tied1, svuint8_t,
|
||||
|
||||
/*
|
||||
** clz_u8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** clz z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** cnot_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** cnot_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** cnot_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_s8_x_tied1, svint8_t,
|
||||
|
||||
/*
|
||||
** cnot_s8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_u16_x_tied1, svuint16_t,
|
||||
|
||||
/*
|
||||
** cnot_u16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** cnot_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** cnot_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnot_u8_x_tied1, svuint8_t,
|
||||
|
||||
/*
|
||||
** cnot_u8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnot z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_bf16_z, svuint16_t, svbfloat16_t,
|
||||
|
||||
/*
|
||||
** cnt_bf16_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_f16_z, svuint16_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cnt_f16_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_f32_z, svuint32_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cnt_f32_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_f64_z, svuint64_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cnt_f64_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_s16_z, svuint16_t, svint16_t,
|
||||
|
||||
/*
|
||||
** cnt_s16_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_s32_z, svuint32_t, svint32_t,
|
||||
|
||||
/*
|
||||
** cnt_s32_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_s64_z, svuint64_t, svint64_t,
|
||||
|
||||
/*
|
||||
** cnt_s64_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -33,6 +33,7 @@ TEST_DUAL_Z (cnt_s8_z, svuint8_t, svint8_t,
|
||||
|
||||
/*
|
||||
** cnt_s8_x:
|
||||
** movprfx z0, z4
|
||||
** cnt z0\.b, p0/m, z4\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnt_u16_x_tied1, svuint16_t,
|
||||
|
||||
/*
|
||||
** cnt_u16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnt z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnt_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** cnt_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnt z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnt_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** cnt_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnt z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (cnt_u8_x_tied1, svuint8_t,
|
||||
|
||||
/*
|
||||
** cnt_u8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** cnt z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -66,6 +66,7 @@ TEST_DUAL_Z_REV (cvt_bf16_f32_x_tied1, svbfloat16_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_bf16_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** bfcvt z0\.h, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -421,6 +421,7 @@ TEST_DUAL_Z_REV (cvt_f16_f32_x_tied1, svfloat16_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvt z0\.h, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -439,6 +440,7 @@ TEST_DUAL_Z_REV (cvt_f16_f64_x_tied1, svfloat16_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_f64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvt z0\.h, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
@ -457,6 +459,7 @@ TEST_DUAL_Z_REV (cvt_f16_s16_x_tied1, svfloat16_t, svint16_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_s16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -475,6 +478,7 @@ TEST_DUAL_Z_REV (cvt_f16_s32_x_tied1, svfloat16_t, svint32_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_s32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.h, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -493,6 +497,7 @@ TEST_DUAL_Z_REV (cvt_f16_s64_x_tied1, svfloat16_t, svint64_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_s64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.h, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
@ -511,6 +516,7 @@ TEST_DUAL_Z_REV (cvt_f16_u16_x_tied1, svfloat16_t, svuint16_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_u16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -529,6 +535,7 @@ TEST_DUAL_Z_REV (cvt_f16_u32_x_tied1, svfloat16_t, svuint32_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_u32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.h, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -547,6 +554,7 @@ TEST_DUAL_Z_REV (cvt_f16_u64_x_tied1, svfloat16_t, svuint64_t,
|
||||
|
||||
/*
|
||||
** cvt_f16_u64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.h, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -319,6 +319,7 @@ TEST_DUAL_Z_REV (cvt_f32_f16_x_tied1, svfloat32_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_f32_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvt z0\.s, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -337,6 +338,7 @@ TEST_DUAL_Z_REV (cvt_f32_f64_x_tied1, svfloat32_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cvt_f32_f64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvt z0\.s, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
@ -355,6 +357,7 @@ TEST_DUAL_Z_REV (cvt_f32_s32_x_tied1, svfloat32_t, svint32_t,
|
||||
|
||||
/*
|
||||
** cvt_f32_s32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -373,6 +376,7 @@ TEST_DUAL_Z_REV (cvt_f32_s64_x_tied1, svfloat32_t, svint64_t,
|
||||
|
||||
/*
|
||||
** cvt_f32_s64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.s, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
@ -391,6 +395,7 @@ TEST_DUAL_Z_REV (cvt_f32_u32_x_tied1, svfloat32_t, svuint32_t,
|
||||
|
||||
/*
|
||||
** cvt_f32_u32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -409,6 +414,7 @@ TEST_DUAL_Z_REV (cvt_f32_u64_x_tied1, svfloat32_t, svuint64_t,
|
||||
|
||||
/*
|
||||
** cvt_f32_u64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.s, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -319,6 +319,7 @@ TEST_DUAL_Z_REV (cvt_f64_f16_x_tied1, svfloat64_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_f64_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvt z0\.d, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -337,6 +338,7 @@ TEST_DUAL_Z_REV (cvt_f64_f32_x_tied1, svfloat64_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_f64_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvt z0\.d, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -355,6 +357,7 @@ TEST_DUAL_Z_REV (cvt_f64_s32_x_tied1, svfloat64_t, svint32_t,
|
||||
|
||||
/*
|
||||
** cvt_f64_s32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.d, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -373,6 +376,7 @@ TEST_DUAL_Z_REV (cvt_f64_s64_x_tied1, svfloat64_t, svint64_t,
|
||||
|
||||
/*
|
||||
** cvt_f64_s64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** scvtf z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
@ -391,6 +395,7 @@ TEST_DUAL_Z_REV (cvt_f64_u32_x_tied1, svfloat64_t, svuint32_t,
|
||||
|
||||
/*
|
||||
** cvt_f64_u32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.d, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -409,6 +414,7 @@ TEST_DUAL_Z_REV (cvt_f64_u64_x_tied1, svfloat64_t, svuint64_t,
|
||||
|
||||
/*
|
||||
** cvt_f64_u64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** ucvtf z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -64,6 +64,7 @@ TEST_DUAL_Z_REV (cvt_s16_f16_x_tied1, svint16_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_s16_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -166,6 +166,7 @@ TEST_DUAL_Z_REV (cvt_s32_f16_x_tied1, svint32_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_s32_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.s, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -184,6 +185,7 @@ TEST_DUAL_Z_REV (cvt_s32_f32_x_tied1, svint32_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_s32_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -202,6 +204,7 @@ TEST_DUAL_Z_REV (cvt_s32_f64_x_tied1, svint32_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cvt_s32_f64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.s, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -166,6 +166,7 @@ TEST_DUAL_Z_REV (cvt_s64_f16_x_tied1, svint64_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_s64_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.d, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -184,6 +185,7 @@ TEST_DUAL_Z_REV (cvt_s64_f32_x_tied1, svint64_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_s64_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.d, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -202,6 +204,7 @@ TEST_DUAL_Z_REV (cvt_s64_f64_x_tied1, svint64_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cvt_s64_f64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzs z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -64,6 +64,7 @@ TEST_DUAL_Z_REV (cvt_u16_f16_x_tied1, svuint16_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_u16_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.h, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -166,6 +166,7 @@ TEST_DUAL_Z_REV (cvt_u32_f16_x_tied1, svuint32_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_u32_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.s, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -184,6 +185,7 @@ TEST_DUAL_Z_REV (cvt_u32_f32_x_tied1, svuint32_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_u32_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.s, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -202,6 +204,7 @@ TEST_DUAL_Z_REV (cvt_u32_f64_x_tied1, svuint32_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cvt_u32_f64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.s, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -166,6 +166,7 @@ TEST_DUAL_Z_REV (cvt_u64_f16_x_tied1, svuint64_t, svfloat16_t,
|
||||
|
||||
/*
|
||||
** cvt_u64_f16_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.d, p0/m, z4\.h
|
||||
** ret
|
||||
*/
|
||||
@ -184,6 +185,7 @@ TEST_DUAL_Z_REV (cvt_u64_f32_x_tied1, svuint64_t, svfloat32_t,
|
||||
|
||||
/*
|
||||
** cvt_u64_f32_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.d, p0/m, z4\.s
|
||||
** ret
|
||||
*/
|
||||
@ -202,6 +204,7 @@ TEST_DUAL_Z_REV (cvt_u64_f64_x_tied1, svuint64_t, svfloat64_t,
|
||||
|
||||
/*
|
||||
** cvt_u64_f64_x_untied:
|
||||
** movprfx z0, z4
|
||||
** fcvtzu z0\.d, p0/m, z4\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (extb_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** extb_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** sxtb z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (extb_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** extb_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** sxtb z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (extb_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** extb_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** sxtb z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (exth_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** exth_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** sxth z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (exth_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** exth_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** sxth z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (extw_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** extw_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** sxtw z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_f16_x_tied1, svfloat16_t,
|
||||
|
||||
/*
|
||||
** neg_f16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** fneg z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_f32_x_tied1, svfloat32_t,
|
||||
|
||||
/*
|
||||
** neg_f32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** fneg z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_f64_x_tied1, svfloat64_t,
|
||||
|
||||
/*
|
||||
** neg_f64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** fneg z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** neg_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** neg z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** neg_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** neg z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** neg_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** neg z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (neg_s8_x_tied1, svint8_t,
|
||||
|
||||
/*
|
||||
** neg_s8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** neg z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** not_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** not_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** not_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_s8_x_tied1, svint8_t,
|
||||
|
||||
/*
|
||||
** not_s8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_u16_x_tied1, svuint16_t,
|
||||
|
||||
/*
|
||||
** not_u16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** not_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** not_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (not_u8_x_tied1, svuint8_t,
|
||||
|
||||
/*
|
||||
** not_u8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** not z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** rbit_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** rbit_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** rbit_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_s8_x_tied1, svint8_t,
|
||||
|
||||
/*
|
||||
** rbit_s8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_u16_x_tied1, svuint16_t,
|
||||
|
||||
/*
|
||||
** rbit_u16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** rbit_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** rbit_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rbit_u8_x_tied1, svuint8_t,
|
||||
|
||||
/*
|
||||
** rbit_u8_x_untied:
|
||||
** movprfx z0, z1
|
||||
** rbit z0\.b, p0/m, z1\.b
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (recpx_f16_x_tied1, svfloat16_t,
|
||||
|
||||
/*
|
||||
** recpx_f16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frecpx z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (recpx_f32_x_tied1, svfloat32_t,
|
||||
|
||||
/*
|
||||
** recpx_f32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frecpx z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (recpx_f64_x_tied1, svfloat64_t,
|
||||
|
||||
/*
|
||||
** recpx_f64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frecpx z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revb_s16_x_tied1, svint16_t,
|
||||
|
||||
/*
|
||||
** revb_s16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revb z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revb_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** revb_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revb z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revb_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** revb_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revb z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revb_u16_x_tied1, svuint16_t,
|
||||
|
||||
/*
|
||||
** revb_u16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revb z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revb_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** revb_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revb z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revb_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** revb_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revb z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revh_s32_x_tied1, svint32_t,
|
||||
|
||||
/*
|
||||
** revh_s32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revh z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revh_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** revh_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revh z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revh_u32_x_tied1, svuint32_t,
|
||||
|
||||
/*
|
||||
** revh_u32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revh z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revh_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** revh_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revh z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revw_s64_x_tied1, svint64_t,
|
||||
|
||||
/*
|
||||
** revw_s64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revw z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (revw_u64_x_tied1, svuint64_t,
|
||||
|
||||
/*
|
||||
** revw_u64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** revw z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rinta_f16_x_tied1, svfloat16_t,
|
||||
|
||||
/*
|
||||
** rinta_f16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frinta z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rinta_f32_x_tied1, svfloat32_t,
|
||||
|
||||
/*
|
||||
** rinta_f32_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frinta z0\.s, p0/m, z1\.s
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rinta_f64_x_tied1, svfloat64_t,
|
||||
|
||||
/*
|
||||
** rinta_f64_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frinta z0\.d, p0/m, z1\.d
|
||||
** ret
|
||||
*/
|
||||
|
@ -73,6 +73,7 @@ TEST_UNIFORM_Z (rinti_f16_x_tied1, svfloat16_t,
|
||||
|
||||
/*
|
||||
** rinti_f16_x_untied:
|
||||
** movprfx z0, z1
|
||||
** frinti z0\.h, p0/m, z1\.h
|
||||
** ret
|
||||
*/
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user