Gael Guennebaud
|
450d0c3de0
|
Make sure that calls to broadcast4 are 16 bytes aligned
|
2014-04-25 22:25:48 +02:00 |
|
Gael Guennebaud
|
f9d2f3903e
|
Product kernel: skip loop on columns if there is no remaining rows
|
2014-04-25 16:54:30 +02:00 |
|
Gael Guennebaud
|
6f64b0b487
|
Fix sizeof unit test
|
2014-04-25 14:05:54 +02:00 |
|
Gael Guennebaud
|
c20e3641de
|
Fix for mixed products
|
2014-04-25 13:22:34 +02:00 |
|
Gael Guennebaud
|
2dbfd83424
|
Implement pbroadcast4 on altivec
|
2014-04-25 02:46:57 -07:00 |
|
Gael Guennebaud
|
7388fdf560
|
pbroadcast4/2 assume aligned memory
|
2014-04-25 02:46:22 -07:00 |
|
Gael Guennebaud
|
c9788d55b9
|
Disable 3pX4 kernel on Altivec: despite this platform has 32 registers, this version seems significantly slower.
|
2014-04-25 11:48:22 +02:00 |
|
Gael Guennebaud
|
ae4d9434e2
|
Add unit test for pbroadcast4/2
|
2014-04-25 11:21:18 +02:00 |
|
Gael Guennebaud
|
4def7b1fa5
|
Fix ptranspose overload prototypes for NEON
|
2014-04-25 11:15:13 +02:00 |
|
Gael Guennebaud
|
c79bd4b64b
|
Minor optimizations in product kernel:
- use pbroadcast4 (helpful when AVX is not available)
- process all remaining rows at once (significant speedup for small matrices)
|
2014-04-25 11:06:03 +02:00 |
|
Gael Guennebaud
|
cf7eaed38d
|
Avoid blocking-size mismatch in unit tests calling Eigen's blas interface.
|
2014-04-25 11:04:02 +02:00 |
|
Gael Guennebaud
|
3d8d0f6269
|
Enable vectorization of pack_rhs with a column-major RHS.
Rename and generalize Kernel<*> to PacketBlock<*,N>.
|
2014-04-25 10:56:18 +02:00 |
|
Gael Guennebaud
|
b0e19db1cf
|
Enable fused madd for Altivec
|
2014-04-24 23:17:18 +02:00 |
|
Gael Guennebaud
|
8d85ce88e1
|
Implement ptranspose on altivec and fix pgather/pscatter
|
2014-04-24 05:47:53 -07:00 |
|
Benoit Steiner
|
4eb92e5647
|
Fixed the NEON implementation of predux_max<Packet4i>.
|
2014-04-23 18:23:07 -07:00 |
|
Benoit Steiner
|
ccb4dec719
|
Created a NEON version of the ptranspose packet primitives
|
2014-04-23 18:22:10 -07:00 |
|
Gael Guennebaud
|
82b09fcb91
|
Add Altivec implementation of pgather/pscatter (not tested)
|
2014-04-23 13:09:26 +02:00 |
|
Gael Guennebaud
|
ecbd67a15a
|
Fix EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT macro
|
2014-04-22 17:03:57 +02:00 |
|
Gael Guennebaud
|
934ce93886
|
merge with default branch
|
2014-04-22 17:00:38 +02:00 |
|
Gael Guennebaud
|
5c5231ab71
|
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
|
2014-04-22 16:03:19 +02:00 |
|
Gael Guennebaud
|
2606abed53
|
Fix 128bit packet size assumptions in unit tests.
|
2014-04-18 21:14:40 +02:00 |
|
Gael Guennebaud
|
a7d20038df
|
Fix alignment assertion.
|
2014-04-18 17:06:31 +02:00 |
|
Gael Guennebaud
|
3454b4e5f1
|
Fix calls to lazy products (lazy product does not like matrices with 0 length)
|
2014-04-18 17:06:03 +02:00 |
|
Gael Guennebaud
|
94684721bd
|
Smarter block size computation
|
2014-04-18 15:35:34 +02:00 |
|
Gael Guennebaud
|
1388f4f9fd
|
Fix typo (was working with clang\!)
|
2014-04-18 11:43:13 +02:00 |
|
Gael Guennebaud
|
6d665d446b
|
Fixes for fixed sizes and non vectorizable types
|
2014-04-17 23:26:34 +02:00 |
|
Gael Guennebaud
|
2c3c95990d
|
merge
|
2014-04-17 22:50:49 +02:00 |
|
Benoit Steiner
|
6d6df90c9a
|
Implemented the pgather/pscatter packet primitives for the arm/NEON architecture
|
2014-04-17 12:28:01 -07:00 |
|
Gael Guennebaud
|
c354bd47f7
|
Make our gemm bench a little more powerful.
|
2014-04-17 21:03:26 +02:00 |
|
Gael Guennebaud
|
9777a5ca60
|
Various minor fixes in BTL
|
2014-04-17 21:01:45 +02:00 |
|
Gael Guennebaud
|
9746396d1b
|
Optimize AVX pset1 for complexes and ploaddup
|
2014-04-17 20:51:04 +02:00 |
|
Benjamin Chretien
|
e5d0cb54a5
|
Fix typo in Reductions tutorial.
|
2014-04-17 18:49:23 +02:00 |
|
Gael Guennebaud
|
1dd015fea6
|
Reduce block sizes in unit tests.
|
2014-04-17 16:27:58 +02:00 |
|
Gael Guennebaud
|
45a4aad572
|
add unit tests for ploadquad and predux4, and split packetmath unit test wrt real/complex
|
2014-04-17 16:27:22 +02:00 |
|
Gael Guennebaud
|
e1d461352e
|
Extend mixingtype unit test to check transposed cases.
|
2014-04-17 16:26:35 +02:00 |
|
Gael Guennebaud
|
11fbdcbc38
|
Fix and optimize mixed products
|
2014-04-17 16:04:30 +02:00 |
|
Gael Guennebaud
|
0fa8290366
|
Optimize ploaddup for AVX
|
2014-04-17 16:02:27 +02:00 |
|
Gael Guennebaud
|
d936ddc3d1
|
Fallback to lazy products for very small ones.
|
2014-04-16 23:15:42 +02:00 |
|
Gael Guennebaud
|
de8336a9bc
|
Enable alloca on MAC OSX
|
2014-04-16 23:14:58 +02:00 |
|
Jitse Niesen
|
ffc995c9e4
|
Implement evaluator<ReturnByValue>.
All supported tests pass apart from Sparse and Geometry,
except test in adjoint_4 that a = a.transpose() raises an assert.
|
2014-04-16 18:16:36 +01:00 |
|
Gael Guennebaud
|
d5a795f673
|
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
This changeset also introduce new vector functions: ploadquad and predux4.
|
2014-04-16 17:05:11 +02:00 |
|
Jitse Niesen
|
b30706bd5c
|
Fix typo in Inverse.h
|
2014-04-15 22:51:46 +01:00 |
|
Mark Borgerding
|
e0dbb68c2f
|
Check IMKL version for compatibility with Eigen
|
2014-04-15 13:57:03 -04:00 |
|
Jitse Niesen
|
59f5f155c2
|
Port products with permutation matrices to evaluators.
|
2014-04-15 15:21:38 +01:00 |
|
Gael Guennebaud
|
20c840be15
|
Merged in benoitsteiner/eigen-fixes/nvcc_fixes (pull request PR-56)
Fixed a typo in CXX11Meta.h
|
2014-04-15 10:38:25 +02:00 |
|
Benoit Steiner
|
1afd50e0f3
|
Fixed a typo in CXX11Meta.h
|
2014-04-14 14:26:30 -07:00 |
|
Gael Guennebaud
|
3c66bb136b
|
bug #793: detect NaN and INF in EigenSolver instead of aborting with an assert.
|
2014-04-14 22:00:27 +02:00 |
|
Gael Guennebaud
|
7098e6d976
|
Add isfinite overload for complexes.
|
2014-04-14 21:57:49 +02:00 |
|
Benoit Steiner
|
feaf7c7e6d
|
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
|
2014-04-14 10:44:17 -07:00 |
|
Gael Guennebaud
|
d567e3b893
|
Merged in benoitsteiner/eigen-fixes (pull request PR-55)
CUDA fixes
|
2014-04-14 14:33:50 +02:00 |
|