mirror of
git://gcc.gnu.org/git/gcc.git
synced 2024-11-28 06:10:02 +08:00
1c4d39ada3
This extends forwprop by yet another VEC_PERM optimization: It attempts to blend two isomorphic vector sequences by using the redundancy in the lane utilization in these sequences. This redundancy in lane utilization comes from the way how specific scalar statements end up vectorized: two VEC_PERMs on top, binary operations on both of them, and a final VEC_PERM to create the result. Here is an example of this sequence: v_in = {e0, e1, e2, e3} v_1 = VEC_PERM <v_in, v_in, {0, 2, 0, 2}> // v_1 = {e0, e2, e0, e2} v_2 = VEC_PERM <v_in, v_in, {1, 3, 1, 3}> // v_2 = {e1, e3, e1, e3} v_x = v_1 + v_2 // v_x = {e0+e1, e2+e3, e0+e1, e2+e3} v_y = v_1 - v_2 // v_y = {e0-e1, e2-e3, e0-e1, e2-e3} v_out = VEC_PERM <v_x, v_y, {0, 1, 6, 7}> // v_out = {e0+e1, e2+e3, e0-e1, e2-e3} To remove the redundancy, lanes 2 and 3 can be freed, which allows to change the last statement into: v_out' = VEC_PERM <v_x, v_y, {0, 1, 4, 5}> // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3} The cost of eliminating the redundancy in the lane utilization is that lowering the VEC PERM expression could get more expensive because of tighter packing of the lanes. Therefore this optimization is not done alone, but in only in case we identify two such sequences that can be blended. Once all candidate sequences have been identified, we try to blend them, so that we can use the freed lanes for the second sequence. On success we convert 2x (2x BINOP + 1x VEC_PERM) to 2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP. The implemented transformation reuses (rewrites) the statements of the first sequence and the last VEC_PERM of the second sequence. The remaining four statements of the second statment are left untouched and will be eliminated by DCE later. This targets x264_pixel_satd_8x4, which calculates the sum of absolute transformed differences (SATD) using Hadamard transformation. We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7% speedup on an AArch64 machine. Bootstrapped and reg-tested on x86-64 and AArch64 (all languages). gcc/ChangeLog: * tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data structure to store analysis results of a vec perm simplify sequence. (get_vect_selector_index_map): Helper to get an index map from the provided vector permute selector. (recognise_vec_perm_simplify_seq): Helper to recognise a vec perm simplify sequence. (narrow_vec_perm_simplify_seq): Helper to pack the lanes more tight. (can_blend_vec_perm_simplify_seqs_p): Test if two vec perm sequences can be blended. (calc_perm_vec_perm_simplify_seqs): Helper to calculate the new permutation indices. (blend_vec_perm_simplify_seqs): Helper to blend two vec perm simplify sequences. (process_vec_perm_simplify_seq_list): Helper to process a list of vec perm simplify sequences. (append_vec_perm_simplify_seq_list): Helper to add a vec perm simplify sequence to the list. (pass_forwprop::execute): Integrate new functionality. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/satd-hadamard.c: New test. * gcc.dg/tree-ssa/vector-10.c: New test. * gcc.dg/tree-ssa/vector-8.c: New test. * gcc.dg/tree-ssa/vector-9.c: New test. * gcc.target/aarch64/sve/satd-hadamard.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> |
||
---|---|---|
.forgejo | ||
.github | ||
c++tools | ||
config | ||
contrib | ||
fixincludes | ||
gcc | ||
gnattools | ||
gotools | ||
include | ||
INSTALL | ||
libada | ||
libatomic | ||
libbacktrace | ||
libcc1 | ||
libcody | ||
libcpp | ||
libdecnumber | ||
libffi | ||
libgcc | ||
libgfortran | ||
libgm2 | ||
libgo | ||
libgomp | ||
libgrust | ||
libiberty | ||
libitm | ||
libobjc | ||
libphobos | ||
libquadmath | ||
libsanitizer | ||
libssp | ||
libstdc++-v3 | ||
libvtv | ||
lto-plugin | ||
maintainer-scripts | ||
zlib | ||
.b4-config | ||
.dir-locals.el | ||
.gitattributes | ||
.gitignore | ||
ABOUT-NLS | ||
ar-lib | ||
ChangeLog | ||
ChangeLog.jit | ||
ChangeLog.tree-ssa | ||
compile | ||
config-ml.in | ||
config.guess | ||
config.rpath | ||
config.sub | ||
configure | ||
configure.ac | ||
COPYING | ||
COPYING3 | ||
COPYING3.LIB | ||
COPYING.LIB | ||
COPYING.RUNTIME | ||
depcomp | ||
install-sh | ||
libtool-ldflags | ||
libtool.m4 | ||
lt~obsolete.m4 | ||
ltgcc.m4 | ||
ltmain.sh | ||
ltoptions.m4 | ||
ltsugar.m4 | ||
ltversion.m4 | ||
MAINTAINERS | ||
Makefile.def | ||
Makefile.in | ||
Makefile.tpl | ||
missing | ||
mkdep | ||
mkinstalldirs | ||
move-if-change | ||
multilib.am | ||
README | ||
SECURITY.txt | ||
symlink-tree | ||
test-driver | ||
ylwrap |
This directory contains the GNU Compiler Collection (GCC). The GNU Compiler Collection is free software. See the files whose names start with COPYING for copying permission. The manuals, and some of the runtime libraries, are under different terms; see the individual source files for details. The directory INSTALL contains copies of the installation information as HTML and plain text. The source of this information is gcc/doc/install.texi. The installation information includes details of what is included in the GCC sources and what files GCC installs. See the file gcc/doc/gcc.texi (together with other files that it includes) for usage and porting information. An online readable version of the manual is in the files gcc/doc/gcc.info*. See http://gcc.gnu.org/bugs/ for how to report bugs usefully. Copyright years on GCC source files may be listed using range notation, e.g., 1987-2012, indicating that every year in the range, inclusive, is a copyrightable year that could otherwise be listed individually.