From 8daa1407f25ea782a2e9fd35d6a30807cabfecc9 Mon Sep 17 00:00:00 2001
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Fri, 4 Nov 2011 15:10:01 +0100
Subject: [PATCH] gnat_ugn.texi (Performance Considerations): New sub-section.

2011-11-04  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat_ugn.texi (Performance Considerations) <Vectorization
	of loops>: New sub-section.  <Other Optimization Switches>:
	Minor tweak.

From-SVN: r180956
---
 gcc/ada/gnat_ugn.texi | 106 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 102 insertions(+), 4 deletions(-)
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 748a1d247bde..1da914346dd6 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -337,6 +337,7 @@ Performance Considerations
 * Optimization Levels::
 * Debugging Optimized Code::
 * Inlining of Subprograms::
+* Vectorization of loops::
 * Other Optimization Switches::
 * Optimization and Strict Aliasing::
 @ifset vms
@@ -10150,6 +10151,7 @@ some guidelines on debugging optimized code.
 * Optimization Levels::
 * Debugging Optimized Code::
 * Inlining of Subprograms::
+* Vectorization of loops::
 * Other Optimization Switches::
 * Optimization and Strict Aliasing::
 
@@ -10595,6 +10597,103 @@ that you should not automatically assume that @option{-O3} is better than
 @option{-O2}, and indeed you should use @option{-O3} only if tests show that
 it actually improves performance.
 
+@node Vectorization of loops
+@subsection Vectorization of loops
+@cindex Optimization Switches
+
+You can take advantage of the auto-vectorizer present in the @command{gcc}
+back end to vectorize loops with GNAT.  The corresponding command line switch
+is @option{-ftree-vectorize} but, as it is enabled by default at @option{-O3}
+and other aggressive optimizations helpful for vectorization also are enabled
+by default at this level, using @option{-O3} directly is recommended.
+
+You also need to make sure that the target architecture features a supported
+SIMD instruction set.  For example, for the x86 architecture, you should at
+least specify @option{-msse2} to get significant vectorization (but you don't
+need to specify it for x86-64 as it is part of the base 64-bit architecture).
+Similarly, for the PowerPC architecture, you should specify @option{-maltivec}.
+
+The preferred loop form for vectorization is the @code{for} iteration scheme.
+Loops with a @code{while} iteration scheme can also be vectorized if they are
+very simple, but the vectorizer will quickly give up otherwise.  With either
+iteration scheme, the flow of control must be straight, in particular no
+@code{exit} statement may appear in the loop body.  The loop may however
+contain a single nested loop, if it can be vectorized when considered alone:
+
+@smallexample @c ada
+@cartouche
+   A : array (1..4, 1..4) of Long_Float;
+   S : array (1..4) of Long_Float;
+
+   procedure Sum is
+   begin
+      for I in A'Range(1) loop
+         for J in A'Range(2) loop
+            S (I) := S (I) + A (I, J);
+         end loop;
+      end loop;
+   end Sum;
+@end cartouche
+@end smallexample
+
+The vectorizable operations depend on the targeted SIMD instruction set, but
+the adding and some of the multiplying operators are generally supported, as
+well as the logical operators for modular types.  Note that, in the former
+case, enabling overflow checks, for example with @option{-gnato}, totally
+disables vectorization.  The other checks are not supposed to have the same
+definitive effect, although compiling with @option{-gnatp} might well reveal
+cases where some checks do thwart vectorization.
+
+Type conversions may also prevent vectorization if they involve semantics that
+are not directly supported by the code generator or the SIMD instruction set.
+A typical example is direct conversion from floating-point to integer types.
+The solution in this case is to use the following idiom:
+
+@smallexample @c ada
+   Integer (S'Truncation (F))
+@end smallexample
+
+@noindent
+if @code{S} is the subtype of floating-point object @code{F}.
+
+In most cases, the vectorizable loops are loops that iterate over arrays.
+All kinds of array types are supported, i.e. constrained array types with
+static bounds:
+
+@smallexample @c ada
+   type Array_Type is array (1 .. 4) of Long_Float;
+@end smallexample
+
+@noindent
+constrained array types with dynamic bounds:
+
+@smallexample @c ada
+   type Array_Type is array (1 .. Q.N) of Long_Float;
+
+   type Array_Type is array (Q.K .. 4) of Long_Float;
+
+   type Array_Type is array (Q.K .. Q.N) of Long_Float;
+@end smallexample
+
+@noindent
+or unconstrained array types:
+
+@smallexample @c ada
+  type Array_Type is array (Positive range <>) of Long_Float;
+@end smallexample
+
+@noindent
+The quality of the generated code decreases when the dynamic aspect of the
+array type increases, the worst code being generated for unconstrained array
+types.  This is so because, the less information the compiler has about the
+bounds of the array, the more fallback code it needs to generate in order to
+fix things up at run time.
+
+You can obtain information about the vectorization performed by the compiler
+by specifying @option{-ftree-vectorizer-verbose=N}.  For more details of
+this switch, see @ref{Debugging Options,,Options for Debugging Your Program
+or GCC, gcc, Using the GNU Compiler Collection (GCC)}.
+
 @node Other Optimization Switches
 @subsection Other Optimization Switches
 @cindex Optimization Switches
@@ -10602,10 +10701,9 @@ it actually improves performance.
 Since @code{GNAT} uses the @command{gcc} back end, all the specialized
 @command{gcc} optimization switches are potentially usable. These switches
 have not been extensively tested with GNAT but can generally be expected
-to work. Examples of switches in this category are
-@option{-funroll-loops} and
-the various target-specific @option{-m} options (in particular, it has been
-observed that @option{-march=pentium4} can significantly improve performance
+to work. Examples of switches in this category are @option{-funroll-loops}
+and the various target-specific @option{-m} options (in particular, it has
+been observed that @option{-march=xxx} can significantly improve performance
 on appropriate machines). For full details of these switches, see
 @ref{Submodel Options,, Hardware Models and Configurations, gcc, Using
 the GNU Compiler Collection (GCC)}.