Commit Graph

5422 Commits

Author SHA1 Message Date
Christoph Hertzberg
84cb1d72b8 Removed IACA-defines
This caused redefinition warnings if IACA headers were included from elsewhere. For a clean solution we should define our own EIGEN_IACA_* macros
2014-05-05 15:06:37 +02:00
Christoph Hertzberg
56de8d3816 Fixed unused variable warnings 2014-05-05 15:03:29 +02:00
Christoph Hertzberg
b4beba72a2 Fix bug #807: Missing scalar type cast in umeyama() 2014-05-05 14:23:52 +02:00
Christoph Hertzberg
b5e3d76aa5 Fixed bug #806: Missing scalar type cast in Quaternion::setFromTwoVectors() 2014-05-05 14:22:27 +02:00
Benjamin Chretien
0b7f95a03f Fix typo in SparseMatrix assert. 2014-05-03 12:41:37 +02:00
Gael Guennebaud
d67aa1549b Add missing add_subdirectory directive 2014-05-03 10:46:11 +02:00
Gael Guennebaud
07986189b7 Fix bug #803: avoid char* to int* conversion 2014-05-01 23:03:54 +02:00
Benoit Steiner
c0f2cb016e Extended support for Tensors:
* Added ability to map a region of the memory to a tensor
  * Added basic support for unary and binary coefficient wise expressions, such as addition or square root
  * Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.
2014-04-28 10:32:27 -07:00
Gael Guennebaud
2fb64578aa Add a small benchmark to compare dense solvers for small to large problems. 2014-04-28 16:16:29 +02:00
Kolja Brix
ecf1f1d589 Make gdb pretty printer Python3-compatible (bug #800). 2014-04-28 14:10:22 +01:00
Gael Guennebaud
e7ef26fa44 TRMM: Make sure we have enough memory in rhs block to enforce alignment. 2014-04-25 23:36:22 +02:00
Gael Guennebaud
450d0c3de0 Make sure that calls to broadcast4 are 16 bytes aligned 2014-04-25 22:25:48 +02:00
Gael Guennebaud
f9d2f3903e Product kernel: skip loop on columns if there is no remaining rows 2014-04-25 16:54:30 +02:00
Gael Guennebaud
6f64b0b487 Fix sizeof unit test 2014-04-25 14:05:54 +02:00
Gael Guennebaud
c20e3641de Fix for mixed products 2014-04-25 13:22:34 +02:00
Gael Guennebaud
2dbfd83424 Implement pbroadcast4 on altivec 2014-04-25 02:46:57 -07:00
Gael Guennebaud
7388fdf560 pbroadcast4/2 assume aligned memory 2014-04-25 02:46:22 -07:00
Gael Guennebaud
c9788d55b9 Disable 3pX4 kernel on Altivec: despite this platform has 32 registers, this version seems significantly slower. 2014-04-25 11:48:22 +02:00
Gael Guennebaud
ae4d9434e2 Add unit test for pbroadcast4/2 2014-04-25 11:21:18 +02:00
Gael Guennebaud
4def7b1fa5 Fix ptranspose overload prototypes for NEON 2014-04-25 11:15:13 +02:00
Gael Guennebaud
c79bd4b64b Minor optimizations in product kernel:
- use pbroadcast4 (helpful when AVX is not available)
- process all remaining rows at once (significant speedup for small matrices)
2014-04-25 11:06:03 +02:00
Gael Guennebaud
cf7eaed38d Avoid blocking-size mismatch in unit tests calling Eigen's blas interface. 2014-04-25 11:04:02 +02:00
Gael Guennebaud
3d8d0f6269 Enable vectorization of pack_rhs with a column-major RHS.
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf Enable fused madd for Altivec 2014-04-24 23:17:18 +02:00
Gael Guennebaud
8d85ce88e1 Implement ptranspose on altivec and fix pgather/pscatter 2014-04-24 05:47:53 -07:00
Benoit Steiner
4eb92e5647 Fixed the NEON implementation of predux_max<Packet4i>. 2014-04-23 18:23:07 -07:00
Benoit Steiner
ccb4dec719 Created a NEON version of the ptranspose packet primitives 2014-04-23 18:22:10 -07:00
Gael Guennebaud
82b09fcb91 Add Altivec implementation of pgather/pscatter (not tested) 2014-04-23 13:09:26 +02:00
Gael Guennebaud
ecbd67a15a Fix EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT macro 2014-04-22 17:03:57 +02:00
Gael Guennebaud
934ce93886 merge with default branch 2014-04-22 17:00:38 +02:00
Gael Guennebaud
5c5231ab71 Workaround gcc's default ABI not being able to distinghish between vector types of different sizes. 2014-04-22 16:03:19 +02:00
Gael Guennebaud
2606abed53 Fix 128bit packet size assumptions in unit tests. 2014-04-18 21:14:40 +02:00
Gael Guennebaud
a7d20038df Fix alignment assertion. 2014-04-18 17:06:31 +02:00
Gael Guennebaud
3454b4e5f1 Fix calls to lazy products (lazy product does not like matrices with 0 length) 2014-04-18 17:06:03 +02:00
Gael Guennebaud
94684721bd Smarter block size computation 2014-04-18 15:35:34 +02:00
Gael Guennebaud
1388f4f9fd Fix typo (was working with clang\!) 2014-04-18 11:43:13 +02:00
Gael Guennebaud
6d665d446b Fixes for fixed sizes and non vectorizable types 2014-04-17 23:26:34 +02:00
Gael Guennebaud
2c3c95990d merge 2014-04-17 22:50:49 +02:00
Benoit Steiner
6d6df90c9a Implemented the pgather/pscatter packet primitives for the arm/NEON architecture 2014-04-17 12:28:01 -07:00
Gael Guennebaud
c354bd47f7 Make our gemm bench a little more powerful. 2014-04-17 21:03:26 +02:00
Gael Guennebaud
9777a5ca60 Various minor fixes in BTL 2014-04-17 21:01:45 +02:00
Gael Guennebaud
9746396d1b Optimize AVX pset1 for complexes and ploaddup 2014-04-17 20:51:04 +02:00
Benjamin Chretien
e5d0cb54a5 Fix typo in Reductions tutorial. 2014-04-17 18:49:23 +02:00
Gael Guennebaud
1dd015fea6 Reduce block sizes in unit tests. 2014-04-17 16:27:58 +02:00
Gael Guennebaud
45a4aad572 add unit tests for ploadquad and predux4, and split packetmath unit test wrt real/complex 2014-04-17 16:27:22 +02:00
Gael Guennebaud
e1d461352e Extend mixingtype unit test to check transposed cases. 2014-04-17 16:26:35 +02:00
Gael Guennebaud
11fbdcbc38 Fix and optimize mixed products 2014-04-17 16:04:30 +02:00
Gael Guennebaud
0fa8290366 Optimize ploaddup for AVX 2014-04-17 16:02:27 +02:00
Gael Guennebaud
d936ddc3d1 Fallback to lazy products for very small ones. 2014-04-16 23:15:42 +02:00
Gael Guennebaud
de8336a9bc Enable alloca on MAC OSX 2014-04-16 23:14:58 +02:00