Gael Guennebaud
44cb1e4802
it appears only the "on the left" case was tested
2010-07-20 10:32:56 +02:00
Gael Guennebaud
872523844a
fix trmm and symm wrt lhs packing
2010-07-20 10:06:41 +02:00
Gael Guennebaud
76eb9c9fd9
fix compilation by including file in correct order
2010-07-19 23:32:13 +02:00
Gael Guennebaud
70b1ce11c6
* fix SelfCwiseBinaryOp traits and handling of mixed types
...
* improve compilation error in case of type mismatch
2010-07-19 23:31:08 +02:00
Gael Guennebaud
8b0b121c9e
explicitely disable vectorization for mixed coeff based products
2010-07-19 23:28:57 +02:00
Gael Guennebaud
08c841eb87
fix lhs packing in the case of real * complex products
2010-07-19 23:16:03 +02:00
Gael Guennebaud
1ed4233fd2
port Jacobi to new ei_pset1/ei_pload API
2010-07-19 16:51:38 +02:00
Gael Guennebaud
c2ee454df4
* fix compilation of mixed scalar product
...
* optimize mixed scalar products
2010-07-19 16:49:09 +02:00
Gael Guennebaud
6e157dd7c6
* fix a couple of remaining issues with previous commit,
...
* merge ei_product_blocking_traits into ei_gepb_traits
2010-07-19 15:45:13 +02:00
Gael Guennebaud
f8aae7a908
* _mm_loaddup_pd is slow
...
* optimize SSE ei_ploaddup<Packet4f>
2010-07-19 15:43:27 +02:00
Gael Guennebaud
cd0e5dca9b
wip: extend the gebp kernel to optimize complex and mixed products
2010-07-19 08:50:59 +02:00
Gael Guennebaud
45362f4eae
update mixing type test
2010-07-15 08:40:09 +02:00
Gael Guennebaud
3f532edc6d
update unit test for new API
2010-07-15 08:38:31 +02:00
Gael Guennebaud
1dc9aaaf36
add support for mixing type in trsv
2010-07-13 16:03:49 +02:00
Gael Guennebaud
36d9b51a44
optimize non fused MADD, and add a flatten attribute macro to enforce
...
inlining within a function
2010-07-13 15:16:34 +02:00
Gael Guennebaud
b72b7ab76f
matrix product: move the alpha factor to gebp instead of the packing,
...
clean some temporaries, etc.
2010-07-12 16:31:46 +02:00
Gael Guennebaud
f8678272a4
mixing types step 3:
...
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
2010-07-11 23:57:23 +02:00
Gael Guennebaud
8e3c4283f5
make colmaj * vector uses pointers only
2010-07-11 16:01:48 +02:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Gael Guennebaud
e5bc9526f1
* generalize rowmajor by vector
...
* fix weird compilation error when constructing a matrix with a row by matrix product
2010-07-10 22:53:27 +02:00
Gael Guennebaud
c4ef69b5bd
fix compilation: make the check_coordinates* functions const
2010-07-10 22:37:16 +02:00
Benoit Jacob
6dcd373b9d
let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.
2010-07-09 18:51:17 -04:00
Konstantinos Margaritis
6ad3f1ab1f
Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>
...
minor fix in AltiVec Complex.h
2010-07-10 00:09:29 +03:00
Gael Guennebaud
96f9015807
disable MSVC optimization when the underlying compiler is ICC
2010-07-09 19:33:43 +02:00
Gael Guennebaud
b2effa2b2c
move ei_conj_if to a more appropriate file
2010-07-09 18:05:57 +02:00
Konstantinos Margaritis
642cc27eb1
forgot to commit ei_p4f_FORWARD;
2010-07-09 18:08:18 +03:00
Konstantinos Margaritis
f6bd508351
forgot to add the Complex.h include for AltiVec.
2010-07-09 17:56:53 +03:00
Konstantinos Margaritis
d9e134c73c
Altivec port of Complex.h.
...
Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code.
The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing),
with same CFLAGS. With some code reorganizing I managed to get some minor gain
on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting
to see if it's fixed on 4.5. I'll look into this a bit more.
2010-07-09 17:54:41 +03:00
Jitse Niesen
26cfe5a958
Be consistent in how the tutorial pages link together.
2010-07-09 11:59:29 +01:00
Jitse Niesen
2c03ca3325
Small changes to tutorial page 2 (matrix arithmetic):
...
* slightly more extensive discussion of aliasing
* layout: put example code and output side-by-side
* add some links, etc
2010-07-09 11:46:07 +01:00
Gael Guennebaud
b1a17dbfe4
fix a few weird issues with gcc 4.3 32bits and complex<float>
2010-07-09 08:27:58 +02:00
Thomas Capricelli
551cb9b7b4
bench: use of Eigen/Array is deprecated + fix includes for iostream
2010-07-09 03:59:36 +02:00
Gael Guennebaud
504d3a3586
fix SliceVectorizedTraversal for packetsize==1
2010-07-08 23:31:14 +02:00
Gael Guennebaud
51ec188da0
extend vectorization_logic
2010-07-08 23:30:16 +02:00
Carlos Becker
951da96f14
Added more redux types/examples in tutorial and fixed some display issues
2010-07-08 18:16:39 +01:00
Carlos Becker
cb3aad1d91
Reductions/Broadcasting/Visitor Tutorial added to index
2010-07-08 17:45:25 +01:00
Carlos Becker
9852e7b9cb
Reductions/Broadcasting/Visitor Tutorial added
2010-07-08 17:42:23 +01:00
Gael Guennebaud
300a226ffa
scalars fitting in a single packet requires more work, step 1
...
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
2a1500915a
compilation fix
2010-07-08 14:26:00 +02:00
Gael Guennebaud
2066ed91de
enabling aligned loads/store for complex<double> is much more tricky,
...
so the temporary fix is to always perform unaligned load/store
2010-07-07 22:50:19 +02:00
Gael Guennebaud
d89925e6de
an attempt to fix wrong unaligned store
2010-07-07 22:35:06 +02:00
Gael Guennebaud
02fd3acd81
update to support mixin types
2010-07-07 19:49:48 +02:00
Gael Guennebaud
31a36aa9c4
support for real * complex matrix product - step 1 (works for some special cases)
2010-07-07 19:49:09 +02:00
Gael Guennebaud
fc3fd8ab57
mention that array = matrix is fine too
2010-07-07 18:10:11 +02:00
Gael Guennebaud
861962c55f
sync
2010-07-07 16:44:05 +02:00
Gael Guennebaud
0f2d480af0
add support for complex
2010-07-07 16:41:29 +02:00
Gael Guennebaud
a2415388ef
optimized conjugate products for SSE3
2010-07-07 16:37:20 +02:00
Gael Guennebaud
65257f6b29
optimize for SSE3 => significant speed up !!
2010-07-07 15:34:46 +02:00
Gael Guennebaud
dd18b22f0b
optimize pmul for complex<double>
2010-07-07 15:29:04 +02:00