Benoit Steiner
|
e2946d962d
|
Reimplement clamp as a static function.
|
2016-05-27 12:58:43 -07:00 |
|
Benoit Steiner
|
e96d36d4cd
|
Use NULL instead of nullptr to preserve the compatibility with cxx03
|
2016-05-27 12:54:06 -07:00 |
|
Benoit Steiner
|
abc815798b
|
Added a new operation to enable more powerful tensorindexing.
|
2016-05-27 12:22:25 -07:00 |
|
Benoit Steiner
|
5707537592
|
Fixed option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr' warning generated by nvcc 7.5
|
2016-05-27 10:47:53 -07:00 |
|
Gael Guennebaud
|
22a035db95
|
Fix compilation when defaulting to row-major
|
2016-05-27 10:31:11 +02:00 |
|
Benoit Steiner
|
1ae2567861
|
Fixed some compilation warnings
|
2016-05-26 15:57:19 -07:00 |
|
Benoit Steiner
|
1a47844529
|
Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float)
|
2016-05-26 14:37:09 -07:00 |
|
Benoit Steiner
|
36369ab63c
|
Resolved merge conflicts
|
2016-05-26 13:39:39 -07:00 |
|
Benoit Steiner
|
28fcb5ca2a
|
Merged latest reduction improvements
|
2016-05-26 12:19:33 -07:00 |
|
Benoit Steiner
|
c1c7f06c35
|
Improved the performance of inner reductions.
|
2016-05-26 11:53:59 -07:00 |
|
Benoit Steiner
|
22d02c9855
|
Improved the coverage of the fp16 reduction tests
|
2016-05-26 11:12:16 -07:00 |
|
Benoit Steiner
|
8288b0aec2
|
Code cleanup.
|
2016-05-26 09:00:04 -07:00 |
|
Benoit Steiner
|
2d7ed54ba2
|
Made the static storage class qualifier come first.
|
2016-05-25 22:16:15 -07:00 |
|
Benoit Steiner
|
e1fca8866e
|
Deleted unnecessary explicit qualifiers.
|
2016-05-25 22:15:26 -07:00 |
|
Benoit Steiner
|
9b0aaf5113
|
Don't mark inline functions as static since it confuses the ICC compiler
|
2016-05-25 22:10:11 -07:00 |
|
Benoit Steiner
|
037a463fd5
|
Marked unused variables as such
|
2016-05-25 22:07:48 -07:00 |
|
Benoit Steiner
|
3ac4045272
|
Made the IndexPair code compile in non cxx11 mode
|
2016-05-25 15:15:12 -07:00 |
|
Benoit Steiner
|
66556d0e05
|
Made the index pair list code more portable accross various compilers
|
2016-05-25 14:34:27 -07:00 |
|
Benoit Steiner
|
034aa3b2c0
|
Improved the performance of tensor padding
|
2016-05-25 11:43:08 -07:00 |
|
Benoit Steiner
|
58026905ae
|
Added support for statically known lists of pairs of indices
|
2016-05-25 11:04:14 -07:00 |
|
Benoit Steiner
|
0835667329
|
There is no need to make the fp16 full reduction kernel a static function.
|
2016-05-24 23:11:56 -07:00 |
|
Benoit Steiner
|
b5d6b52a4d
|
Fixed compilation warning
|
2016-05-24 23:10:57 -07:00 |
|
Benoit Steiner
|
a09cbf9905
|
Merged in rmlarsen/eigen (pull request PR-188)
Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
|
2016-05-23 12:55:12 -07:00 |
|
Christoph Hertzberg
|
718521d5cf
|
Silenced several double-promotion warnings
|
2016-05-22 18:17:04 +02:00 |
|
Christoph Hertzberg
|
b5a7603822
|
fixed macro name
|
2016-05-22 16:49:29 +02:00 |
|
Christoph Hertzberg
|
25a03c02d6
|
Fix some sign-compare warnings
|
2016-05-22 16:42:27 +02:00 |
|
Gael Guennebaud
|
ccaace03c9
|
Make EIGEN_HAS_CONSTEXPR user configurable
|
2016-05-20 15:10:08 +02:00 |
|
Gael Guennebaud
|
c3410804cd
|
Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable
|
2016-05-20 15:05:38 +02:00 |
|
Gael Guennebaud
|
48bf5ec216
|
Make EIGEN_HAS_RVALUE_REFERENCES user configurable
|
2016-05-20 14:54:20 +02:00 |
|
Gael Guennebaud
|
f43ae88892
|
Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES
|
2016-05-20 14:48:51 +02:00 |
|
Gael Guennebaud
|
2f656ce447
|
Remove std:: to enable custom scalar types.
|
2016-05-19 23:13:47 +02:00 |
|
Rasmus Larsen
|
b1e080c752
|
Merged eigen/eigen into default
|
2016-05-18 15:21:50 -07:00 |
|
Rasmus Munk Larsen
|
5624219b6b
|
Merge.
|
2016-05-18 15:16:06 -07:00 |
|
Rasmus Munk Larsen
|
7df811cfe5
|
Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
|
2016-05-18 15:09:48 -07:00 |
|
Benoit Steiner
|
bb3ff8e9d9
|
Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available.
|
2016-05-18 14:52:49 -07:00 |
|
Gael Guennebaud
|
548a487800
|
bug #1229: bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway.
|
2016-05-18 16:44:05 +02:00 |
|
Gael Guennebaud
|
43790e009b
|
Pass argument by const ref instead of by value in pow(AutoDiffScalar...)
|
2016-05-18 16:28:02 +02:00 |
|
Gael Guennebaud
|
1fbfab27a9
|
bug #1223: fix compilation of AutoDiffScalar's min/max operators, and add regression unit test.
|
2016-05-18 16:26:26 +02:00 |
|
Gael Guennebaud
|
448d9d943c
|
bug #1222: fix compilation in AutoDiffScalar and add respective unit test
|
2016-05-18 16:00:11 +02:00 |
|
Rasmus Munk Larsen
|
f519fca72b
|
Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.
|
2016-05-17 16:06:00 -07:00 |
|
Benoit Steiner
|
86ae94462e
|
#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
|
2016-05-17 14:06:15 -07:00 |
|
Benoit Steiner
|
997c335970
|
Fixed compilation error
|
2016-05-17 12:54:18 -07:00 |
|
Benoit Steiner
|
ebf6ada5ee
|
Fixed compilation error in the tensor thread pool
|
2016-05-17 12:33:46 -07:00 |
|
Rasmus Munk Larsen
|
0bb61b04ca
|
Merge upstream.
|
2016-05-17 10:26:10 -07:00 |
|
Rasmus Munk Larsen
|
0dbd68145f
|
Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.
|
2016-05-17 10:25:19 -07:00 |
|
Rasmus Larsen
|
00228f2506
|
Merged eigen/eigen into default
|
2016-05-17 09:49:31 -07:00 |
|
Benoit Steiner
|
e7e64c3277
|
Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit:
Before"
M_broadcasting/10 500000 3690 27.10 MFlops/s
BM_broadcasting/80 500000 4014 1594.24 MFlops/s
BM_broadcasting/640 100000 14770 27731.35 MFlops/s
BM_broadcasting/4K 5000 632711 39512.48 MFlops/s
After:
BM_broadcasting/10 500000 4287 23.33 MFlops/s
BM_broadcasting/80 500000 4455 1436.41 MFlops/s
BM_broadcasting/640 200000 10195 40173.01 MFlops/s
BM_broadcasting/4K 5000 423746 58997.57 MFlops/s
|
2016-05-17 09:24:35 -07:00 |
|
Benoit Steiner
|
5fa27574dd
|
Allow vectorized padding on GPU. This helps speed things up a little
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
|
2016-05-17 09:17:26 -07:00 |
|
Benoit Steiner
|
a910bcee43
|
Merged latest updates from trunk
|
2016-05-17 09:14:22 -07:00 |
|
Benoit Steiner
|
8d06c02ffd
|
Allow vectorized padding on GPU. This helps speed things up a little.
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
|
2016-05-17 09:13:27 -07:00 |
|