Commit Graph

117 Commits

Author SHA1 Message Date
Gael Guennebaud
bf5326c3ca * Added ReferencableBit flag to known if coeffRef is available.
(needed by the new product implementation)
* Make the packet* members template to support aligned and unaligned
  access. This makes Block vectorizable. Combined with ReferencableBit,
  we should be able to determine at runtime (in some specific cases) if
  an aligned vectorization is possible or not.
* Improved the new product implementation to robustly handle all cases,
  it now passes all the tests.
* Renamed the packet version ei_predux to ei_preduxp to avoid name collision.
2008-05-08 08:12:52 +00:00
Gael Guennebaud
64c49de7ba * split PacketMath.h to SSE and Altivec specific files
* improved the flexibility of the new product implementation,
  now all sizes seems to be properly handled.
2008-05-05 17:19:47 +00:00
Gael Guennebaud
46fa4c713f * Started support for unaligned vectorization.
* Introduce a new highly optimized matrix-matrix product for large
  matrices. The code is still highly experimental and it is activated
  only if you define EIGEN_WIP_PRODUCT at compile time.
  Currently the third dimension of the product must be a factor of
  the packet size (x4 for floats) and the right handed side matrix
  must be column major.
  Moreover, currently c = a*b; actually computes c += a*b !!
  Therefore, the code is provided for experimentation purpose only !
  These limitations will be fixed soon or later to become the default
  product implementation.
2008-05-05 10:23:29 +00:00
Benoit Jacob
8c6007f80e * Patch by Konstantinos Margaritis: AltiVec vectorization.
* Fix several warnings, temporarily disable determinant test.
2008-05-03 12:21:23 +00:00
Gael Guennebaud
0545df2149 slighly improved the cache friendly product to use mul-add only 2008-05-03 10:01:30 +00:00
Gael Guennebaud
a6655dd91a added packet mul-add function (ei_pmad) and updated Product to use it.
this change nothing for current SSE architecture but might be helpful
for altivec/cell and up comming AMD processors.
2008-05-03 00:45:08 +00:00
Gael Guennebaud
102e029dad Removed ei_pload1, use posix_memalign to allocate aligned memory,
and make Product ok when only one side is vectorizable (and the product
is still vectorized)
2008-05-02 13:30:12 +00:00
Benoit Jacob
890a8de962 Make products always eval into expressions. Improves performance
in benchmark. Still not as fasts as explicit eval(), strangely.
2008-05-02 08:53:23 +00:00
Gael Guennebaud
ef5b20bc50 fix flag and cost computations for nested expressions 2008-05-01 18:58:30 +00:00
Gael Guennebaud
5588def0cf nullary xpr are now vectorized 2008-05-01 14:28:53 +00:00
Gael Guennebaud
02f1615d2a Enable vectorization of product with dynamic matrices,
extended cache optimal product to work in any row/column
major situations, and a few bugfixes (forgot to add the
Cholesky header, vectorization of CwiseBinary)
2008-05-01 13:53:05 +00:00
Gael Guennebaud
6486991ac3 some cleaning in Cholesky and removed evil ei_sqrt of complex 2008-04-27 18:57:28 +00:00
Gael Guennebaud
64bacf1c3f * added ei_sqrt for complex
* updated Cholesky to support complex
* correct result_type for abs and abs2 functors
2008-04-27 14:05:40 +00:00
Gael Guennebaud
4ffffa670e added Cholesky module 2008-04-27 10:57:32 +00:00
Gael Guennebaud
1ec2d21ca5 Fixed a couple of issues introduced in previous commits.
Added a test for Triangular.
2008-04-26 20:28:27 +00:00
Gael Guennebaud
b4c974d059 Added triangular assignement, e.g.:
m.upper() = a+b;
only updates the upper triangular part of m.
Note that:
 m = (a+b).upper();
updates all coefficients of m (but half of the additions
will be skiped)

Updated back/forward substitution to better use Eigen's capability.
2008-04-26 19:20:26 +00:00
Gael Guennebaud
4c92150676 Added Triangular expression to extract upper or lower (strictly or not)
part of a matrix. Triangular also provide an optimised method for forward
and backward substitution. Further optimizations regarding assignments and
products might come later.

Updated determinant() to take into account triangular matrices.

Started the QR module with a QR decompostion algorithm.
Help needed to build a QR algorithm (eigen solver) based on it.
2008-04-26 18:26:05 +00:00
Gael Guennebaud
62bf0bbd59 fix a bug in determinant of 4x4 matrices and a small type issue in Inverse 2008-04-26 08:56:52 +00:00
Gael Guennebaud
6f2c72fb53 Various fixes in:
- vector to vector assign
 - PartialRedux
 - Vectorization criteria of Product
 - returned type of normalized
 - SSE integer mul
2008-04-25 23:10:37 +00:00
Gael Guennebaud
a451835bce Make the explicit vectorization much more flexible:
- support dynamic sizes
 - support arbitrary matrix size when the matrix can be seen as a 1D array
   (except for fixed size matrices where the size in Bytes must be a factor of 16,
    this is to allow compact storage of a vector of matrices)
Note that the explict vectorization is still experimental and far to be completely tested.
2008-04-25 15:46:18 +00:00
Gael Guennebaud
30d47b5250 forgot to add a file in the previous commit 2008-04-24 20:25:55 +00:00
Gael Guennebaud
9385793f71 Fix a couple of issue with the vectorization. In particular, default ei_p* functions
are provided to handle not suported types seemlessly.

Added a generic null-ary expression with null-ary functors. They replace
Zero, Ones, Identity and Random.
2008-04-24 18:35:39 +00:00
Benoit Jacob
6ae037dfb5 give up on OpenMP... for now 2008-04-18 07:57:46 +00:00
Benoit Jacob
acfd6f3bda - add _packetCoeff() to Inverse, allowing vectorization.
- let Inverse take template parameter MatrixType instead
  of ExpressionType, in order to reduce executable code size
  when taking inverses of xpr's.
- introduce ei_corrected_matrix_flags : the flags template
  parameter to the Matrix class is only a suggestion. This
  is also useful in ei_eval.
2008-04-16 07:18:27 +00:00
Benoit Jacob
43e2bc14fe +5% optimization in 4x4 inverse:
-only evaluate block expressions for which that is beneficial
-don't check for invertibility unless requested
2008-04-15 20:39:27 +00:00
Benoit Jacob
6747b45ae7 for 4x4 matrices implement the special algorithm that Markos proposed,
falling back to the general algorithm in the bad case.
2008-04-15 20:15:36 +00:00
Benoit Jacob
2a86f052a5 - optimized determinant calculations for small matrices (size <= 4)
(only 30 muls for size 4)
- rework the matrix inversion: now using cofactor technique for size<=3,
  so the ugly unrolling is only used for size 4 anymore, and even there
  I'm looking to get rid of it.
2008-04-14 17:07:12 +00:00
Benoit Jacob
9789c04467 when evaluating an xpr, the result can now be vectorizable
even if the xpr itself wasn't vectorizable.
2008-04-14 08:55:12 +00:00
Benoit Jacob
ea3ccb1e8c * Start of the LU module, with matrix inversion already there and
fully optimized.
* Even if LargeBit is set, only parallelize for large enough objects
  (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
2008-04-14 08:20:24 +00:00
Benoit Jacob
ab4046970b * Add fixed-size template versions of corner(), start(), end().
* Use them to write an unrolled path in echelon.cpp, as an
  experiment before I do this LU module.
* For floating-point types, make ei_random() use an amplitude
  of 1.
2008-04-12 17:37:27 +00:00
Benoit Jacob
dcebc46cdc - cleaner use of OpenMP (no code duplication anymore)
using a macro and _Pragma.
- use OpenMP also in cacheOptimalProduct and in the
  vectorized paths as well
- kill the vector assignment unroller. implement in
  operator= the logic for assigning a row-vector in
  a col-vector.
- CMakeLists support for building tests/examples
  with -fopenmp and/or -msse2
- updates in bench/, especially replace identity()
  by ones() which prevents underflows from perturbing
  bench results.
2008-04-11 14:28:42 +00:00
Benoit Jacob
7bee90a62a Merge Gael's experimental OpenMP parallelization support into Assign.h. 2008-04-11 08:18:47 +00:00
Gael Guennebaud
187b1543ce added a vectorized version of Product::_cacheOptimalProduct,
added the possibility to disable the vectorization using EIGEN_DONT_VECTORIZE
(some architectures has SSE support by default)
2008-04-10 12:34:22 +00:00
Benoit Jacob
613c49b475 * add typedefs for matrices/vectors with LargeBit
* add -pedantic to CXXFLAGS
* cleanup intricated expressions with && and ||
  which gave warnings because of "missing" parentheses
* fix compile error in NumTraits, apparently discovered
  by -pedantic
2008-04-10 10:33:50 +00:00
Benoit Jacob
ca448d2537 split those files in util/
some more renaming
2008-04-10 09:41:13 +00:00
Benoit Jacob
9d8876ce82 * rename XprCopy -> Nested
* rename OperatorEquals -> Assign
* move Util.h and FwDecl.h to a util/ subdir
2008-04-10 09:01:28 +00:00
Gael Guennebaud
212da8ffe0 fix priority operator bugs in the computation
of the VectorizableBit flag, now benchmark.cpp is properly vectorized
2008-04-09 18:24:13 +00:00
Gael Guennebaud
8f957564ec a better bugfix in ei_matrix_operator_equals_packet_unroller 2008-04-09 18:04:26 +00:00
Gael Guennebaud
d95d952e92 bugfix in ei_matrix_operator_equals_packet_unroller 2008-04-09 17:44:59 +00:00
Gael Guennebaud
1985fb0551 Added initial experimental support for explicit vectorization.
Currently only the following platform/operations are supported:
 - SSE2 compatible architecture
 - compiler compatible with intel's SSE2 intrinsics
 - float, double and int data types
 - fixed size matrices with a storage major dimension multiple of 4 (or 2 for double)
 - scalar-matrix product, component wise: +,-,*,min,max
 - matrix-matrix product only if the left matrix is vectorizable and column major
   or the right matrix is vectorizable and row major, e.g.:
   a.transpose() * b is not vectorized with the default column major storage.
To use it you must define EIGEN_VECTORIZE and EIGEN_INTEL_PLATFORM.
2008-04-09 12:31:55 +00:00
Benoit Jacob
4920f2011e finish making use of CoeffReadCost and the new XprCopy everywhere
seems appropriate to me.
2008-04-08 14:15:01 +00:00
Benoit Jacob
371d302efb - merge ei_xpr_copy and ei_eval_if_needed_before_nesting
- make use of CoeffReadCost to determine when to unroll the loops,
  for now only in Product.h and in OperatorEquals.h
performance remains the same: generally still not as good as before the
big changes.
2008-04-06 18:01:03 +00:00
Benoit Jacob
30ec34de36 fix compilation (finish removal of EIGEN_UNROLLED_LOOPS) 2008-04-05 14:20:30 +00:00
Benoit Jacob
61e58cf602 fixes as discussed with Gael on IRC. Mainly, in Fuzzy.h, and Dot.h, use
ei_xpr_copy to evaluate args when needed. Had to introduce an ugly
trick with ei_unref as when the XprCopy type is a reference one can't
directly access member typedefs such as Scalar.
2008-04-05 14:15:02 +00:00
Gael Guennebaud
b4a156671f * make use of the EvalBeforeNestingBit and EvalBeforeAssigningBit
in ei_xpr_copy and operator=, respectively.
 * added Matrix::lazyAssign() when EvalBeforeAssigningBit must be skipped
   (mainly internal use only)
 * all expressions are now stored by const reference
 * added Temporary xpr: .temporary() must be called on any temporary expression
   not directly returned by a function (mainly internal use only)
 * moved all functors in the Functors.h header
 * added some preliminaries stuff for the explicit vectorization
2008-04-05 11:10:54 +00:00
Gael Guennebaud
048910caae * added cwise comparisons
* added "all" and "any" special redux operators
 * added support bool matrices
 * added support for cost model of STL functors via ei_functor_traits
  (By default ei_functor_traits query the functor member Cost)
2008-04-03 18:13:27 +00:00
Benoit Jacob
249dc4f482 current state of the mess. One line fails in the tests, and
useless copies are made when evaluating nested expressions.
Changes:
- kill LazyBit, introduce EvalBeforeNestingBit and EvalBeforeAssigningBit
- product and random don't evaluate immediately anymore
- eval() always evaluates
- change the value of Dynamic to some large positive value,
  in preparation of future simplifications
2008-04-03 16:54:19 +00:00
Benoit Jacob
b8900d0b80 More clever evaluation of arguments: now it occurs in earlier, in operator*,
before the Product<> type is constructed. This resets template depth on each
intermediate evaluation, and gives simpler code. Introducing
ei_eval_if_expensive<Derived, n> which evaluates Derived if it's worth it
given that each of its coeffs will be accessed n times. Operator*
uses this with adequate values of n to evaluate args exactly when needed.
2008-04-03 14:17:56 +00:00
Gael Guennebaud
4448f2620d fix a compilation issue with gcc-3.3 and ei_result_of 2008-04-03 12:39:39 +00:00
Benoit Jacob
d1a29d6319 -new: recursive costs system, useful to determine automatically
when to evaluate arguments and when to meta-unroll.
-use it in Product to determine when to eval args. not yet used
 to determine when to unroll. for now, not used anywhere else but
 that'll follow.
-fix badness of my last commit
2008-04-03 11:10:17 +00:00