Benoit Steiner
8044b00a7f
bug #782 : Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.
2014-04-03 23:41:47 +02:00
Benoit Steiner
aecc78325a
Pulled the latest updates from the eigen trunk.
2014-04-01 22:07:05 -07:00
Christoph Hertzberg
1cb8de1250
Make some actual verifications inside the autodiff unit test
2014-04-01 17:44:48 +02:00
Florian George
56c4851323
Fixed typo: symmretric -> symmetric
2014-04-01 15:52:25 +02:00
Gael Guennebaud
ceae5b4145
Fix lapack build
2014-04-01 11:52:23 +02:00
Gael Guennebaud
ec65e6648c
bug #775 : propagate generator when workingaround cmake bug #9220
2014-04-01 11:45:43 +02:00
Gael Guennebaud
d992634fbc
Fix bug #776 : it seems that mingw does not support weak linking
2014-04-01 11:31:21 +02:00
Benoit Steiner
5e8622477b
Rename the vector() factories defined in blas/common.h into make_vector() to prevent a possible name conflict with std::vector.
2014-04-01 11:23:28 +02:00
Gael Guennebaud
1221dd90aa
Fix no newline at end of file warning
2014-04-01 11:21:14 +02:00
Gael Guennebaud
93870d95b7
BTL: add blaze
2014-03-31 10:59:55 +02:00
Gael Guennebaud
f603823ef3
BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc.
2014-03-31 10:58:30 +02:00
Gael Guennebaud
8d0441052e
Finally, prefetching seems to help getting more stable performance
2014-03-31 10:42:19 +02:00
Gael Guennebaud
82c8163067
Enable repetition in mixing type unit test
2014-03-31 10:41:40 +02:00
Gael Guennebaud
1c0728043a
Workaround alignment warnings
2014-03-30 22:43:47 +02:00
Gael Guennebaud
e497a27ddc
Optimize gebp kernel:
...
1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000)
2 - improve pipelining when dealing with latest rows of the lhs
2014-03-30 21:57:05 +02:00
Benoit Steiner
ad59ade116
Vectorized the loop peeling of the inner loop of the block-panel matrix multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
2014-03-28 12:11:23 -07:00
Benoit Steiner
39bfbd43f0
Properly align the input data to prevent false failures of the packetmath.cpp test.
2014-03-28 12:00:08 -07:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Gael Guennebaud
8d2bb2c20d
merge with default branch
2014-03-28 09:24:18 +01:00
Gael Guennebaud
c94fde118a
Enable vectorization of gemv for PacketSize>4 through unaligned loads (still better than no vectorization)
2014-03-28 09:11:06 +01:00
Benoit Steiner
51e85c936d
Merged latest changes from parent.
2014-03-27 18:32:15 -07:00
Benoit Steiner
8a94cb3edd
Implemented the SSE version of the gather and scatter packet primitives.
2014-03-27 18:29:01 -07:00
Benoit Steiner
7f3162f707
Implemented the AVX version of the gather and scatter packet primitives.
2014-03-27 17:42:25 -07:00
Benoit Steiner
ee86679096
Introduced pscatter/pgather packet primitives. They will be used to optimize the loop peeling code of the block-panel matrix multiplication kernel.
2014-03-27 16:03:03 -07:00
Gael Guennebaud
58fe2fc2b2
enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)
2014-03-27 23:38:50 +01:00
Benoit Steiner
729363114f
Fixed compilation error when FMA instructions are enabled.
2014-03-27 11:20:41 -07:00
Benoit Steiner
1697d7a179
Silenced "unused variable" warnings when compiling with FMA.
2014-03-27 11:00:47 -07:00
Benoit Steiner
3e1fe8e416
Vectorized the packing of a col-major matrix used as the right hand side argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
2014-03-27 10:38:41 -07:00
Benoit Steiner
b776458ccb
Vectorized the packing of a row-major matrix used as the left hand side argument in a matrix-matrix product.
2014-03-27 10:02:24 -07:00
Benoit Steiner
c4902a3d01
Implemented the AVX version of the ptranspose packet primitive.
2014-03-27 09:34:51 -07:00
Gael Guennebaud
7d73c7f18b
Change abi version when enabling AVX with GCC
2014-03-27 15:38:40 +01:00
Gael Guennebaud
6f123d209e
Fix geo_* unit tests with respect to AVX
2014-03-27 15:29:56 +01:00
Gael Guennebaud
052aedd394
Implement pcplflip, palign, predux and the likes from AVC/complexes
2014-03-27 14:47:00 +01:00
Gael Guennebaud
fb03b56647
Fix warning
2014-03-27 11:38:35 +01:00
Jitse Niesen
6a81594771
Merged in infinitei/eigen (pull request PR-50)
...
Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls
2014-03-27 10:12:25 +00:00
Mark Borgerding
9ce0d78513
immintrin.h did not come until intel version 11
2014-03-26 22:26:07 -04:00
Benoit Steiner
a419cea4a0
Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions.
...
Implemented the primitive using SSE instructions.
2014-03-26 19:03:07 -07:00
Abhijit Kundu
ba3457cab2
Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls
2014-03-26 22:02:48 -04:00
Benoit Steiner
14bc4b9704
Made sure that the version of gemm_pack_rhs specialized for row major matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
2014-03-26 17:35:18 -07:00
Benoit Steiner
e45a6bed45
Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
2014-03-26 15:58:13 -07:00
Benoit Steiner
cc73164aa8
Merged latest updates from the parent branch
2014-03-26 15:23:59 -07:00
Gael Guennebaud
f0a4c9d5ab
Update gebp kernel to process a panle of 4 columns at once for the remaining ones.
2014-03-26 23:22:36 +01:00
Gael Guennebaud
8be011e776
Remove remaining bits of the dead working buffer
2014-03-26 23:14:44 +01:00
Benoit Steiner
a078f442a3
Vectorized the multiplication and division of complex numbers using AVX instructions.
2014-03-26 15:11:18 -07:00
Benoit Steiner
cf1a7bfbe1
Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives.
...
Silenced a few compilation warnings.
2014-03-26 12:03:31 -07:00
Gael Guennebaud
bc401eb6fa
Implement new 1 packet x 8 gebp kernel
2014-03-26 18:53:00 +01:00
Gael Guennebaud
b286a1e75c
add pbroadcast2/4 generic intrinsics
2014-03-26 16:46:36 +01:00
Benoit Steiner
6bf3cc2732
Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf>
2014-03-25 09:00:43 -07:00
Benoit Steiner
7ae9b0805d
Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
2014-03-24 13:33:40 -07:00
Benoit Steiner
08f7b3221d
Added proper support for AVX and FMA in the makefiles.
2014-03-24 09:52:45 -07:00