Gael Guennebaud
bb297abb9e
make sure we use the right eigen version
2016-12-08 12:00:11 +01:00
Gael Guennebaud
8b4b00d277
fix usage of custom compiler
2016-12-08 11:59:39 +01:00
Gael Guennebaud
7105596899
Add missing include and use -O3
2016-12-07 16:56:08 +01:00
Gael Guennebaud
780f3c1adf
Fix call to convert on linux
2016-12-07 16:30:11 +01:00
Gael Guennebaud
3855ab472f
Cleanup file structure
2016-12-07 14:23:49 +01:00
Gael Guennebaud
59a59fa8e7
Update perf monitoring scripts to generate html/svg outputs
2016-12-07 13:36:56 +01:00
Gael Guennebaud
f2c506b03d
Add a script example to run and upload performance tests
2016-12-06 16:46:52 +01:00
Gael Guennebaud
1b4e085a7f
generate png file for web upload
2016-12-06 16:46:22 +01:00
Gael Guennebaud
f725f1cebc
Mention the CMAKE_PREFIX_PATH variable.
2016-12-06 15:23:45 +01:00
Gael Guennebaud
f90c4aebc5
Update monitored changeset lists
2016-12-06 15:07:46 +01:00
Gael Guennebaud
eb621413c1
Revert vec/y to vec*(1/y) in row-major TRSM:
...
- div is extremely costly
- this is consistent with the column-major case
- this is consistent with all other BLAS implementations
2016-12-06 15:04:50 +01:00
Gael Guennebaud
8365c2c941
Fix BLAS backend for symmetric rank K updates.
2016-12-06 14:47:09 +01:00
Gael Guennebaud
0c4d05b009
Explain how to choose your favorite Eigen version
2016-12-06 11:34:06 +01:00
Silvio Traversaro
e049a2a72a
Added relocatable cmake support also for CMake before 3.0 and after 2.8.8
2016-12-06 10:37:34 +01:00
Silvio Traversaro
18481b518f
Make CMake config file relocatable
2016-12-05 10:39:52 +01:00
Gael Guennebaud
c68c8631e7
fix compilation of BTL's blaze interface
2016-12-05 23:02:16 +01:00
Gael Guennebaud
1ff1d4a124
Add performance monitoring for LLT
2016-12-05 23:01:52 +01:00
Angelos Mantzaflaris
18de92329e
use numext::abs
...
(grafted from 0a08d4c60b
)
2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris
e8a6aa518e
1. Add explicit template to abs2 (resolves deduction for some arithmetic types)
...
2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned)
(grafted from 4086187e49
)
2016-12-02 11:39:18 +01:00
Gael Guennebaud
a6b971e291
Fix memory leak in Ref<Sparse>
2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65
Optimize SparseLU::solve for rhs vectors
2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903
remove temporary in SparseLU::solve
2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4
bug #1356 : fix calls to evaluator::coeffRef(0,0) to get the address of the destination
...
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86
typo
2016-12-05 13:51:07 +01:00
Gael Guennebaud
445c015751
extend monitoring benchmarks with transpose matrix-vector and triangular matrix-vectors.
2016-12-05 13:36:26 +01:00
Gael Guennebaud
e3f613cbd4
Improve performance of row-major-dense-matrix * vector products for recent CPUs.
...
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354
Clean debugging code
2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a
Merged in srvasude/eigen (pull request PR-265)
...
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
4465d20403
Add missing generic load methods.
2016-12-03 21:25:04 +01:00
Gael Guennebaud
6a5fe86098
Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
...
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Benoit Steiner
2bfece5cd1
Merged eigen/eigen into default
2016-12-02 16:30:14 -08:00
Srinivas Vasudevan
09ee7f0c80
Fix small nit where I changed name of plog1p to pexpm1.
2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
a0d3ac760f
Sync from Head.
2016-12-02 14:14:45 -08:00
Srinivas Vasudevan
218764ee1f
Added support for expm1 in Eigen.
2016-12-02 14:13:01 -08:00
Gael Guennebaud
66f65ccc36
Ease compiler job to generate clean and efficient code in mat*vec.
2016-12-02 22:41:26 +01:00
Gael Guennebaud
fe696022ec
Operators += and -= do not resize!
2016-12-02 22:40:25 +01:00
Mehdi Goli
592acc5bfa
Makingt default numeric_list works with sycl.
2016-12-02 17:58:30 +00:00
Gael Guennebaud
8dfb3e00b8
merge
2016-12-02 11:34:21 +01:00
Gael Guennebaud
4c0d5f3c01
Add perf monitoring for gemv
2016-12-02 11:34:12 +01:00
Gael Guennebaud
d2718d662c
Re-enable A^T*A action in BTL
2016-12-02 11:32:03 +01:00
Christoph Hertzberg
22f7d398e2
bug #1355 : Fixed wrong line-endings on two files
2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4
Clean up SparseCore module regarding ReverseInnerIterator
2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09
typo UIntPtr
...
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655
fix two warnings(unused typedef, unused variable) and a typo
...
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb
fix member order
2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae
Merged in rmlarsen/eigen (pull request PR-256)
...
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Gael Guennebaud
f95e3b84a5
merge
2016-12-01 16:18:57 +01:00
Benoit Steiner
7ff26ddcbb
Merged eigen/eigen into default
2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d
Fix misleading-indentation warnings.
2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e
Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
2016-12-01 13:02:27 +00:00