Gael Guennebaud
d142165942
bug #667 : declare several critical functions as FORECE_INLINE to make ICC happier.
...
<g.gael@free.fr> HG: branch 'default' HG: changed Eigen/src/Core/ArrayBase.h HG: changed Eigen/src/Core/AssignEvaluator.h HG: changed
Eigen/src/Core/CoreEvaluators.h HG: changed Eigen/src/Core/CwiseUnaryOp.h HG: changed Eigen/src/Core/DenseBase.h HG: changed Eigen/src/Core/MatrixBase.h
2016-01-31 16:34:10 +01:00
Gael Guennebaud
a4e4542b89
Avoid overflow in unit test.
2016-01-30 22:26:17 +01:00
Gael Guennebaud
3ba8a3ab1a
Disable underflow unit test on the i387 FPU.
2016-01-30 22:14:04 +01:00
Benoit Steiner
483082ef6e
Fixed a few memory leaks in the cuda tests
2016-01-30 11:59:22 -08:00
Benoit Steiner
bd21aba181
Sharded the cxx11_tensor_cuda test and fixed a memory leak
2016-01-30 11:47:09 -08:00
Benoit Steiner
9de155d153
Added a test to cover threaded tensor shuffling
2016-01-30 10:56:47 -08:00
Benoit Steiner
32088c06a1
Made the comparison between single and multithreaded contraction results more resistant to numerical noise to prevent spurious test failures.
2016-01-30 10:51:14 -08:00
Benoit Steiner
2053478c56
Made sure to use a tensor of rank 0 to store the result of a full reduction in the tensor thread pool test
2016-01-30 10:46:36 -08:00
Benoit Steiner
d0db95f730
Sharded the tensor thread pool test
2016-01-30 10:43:57 -08:00
Benoit Steiner
ba27c8a7de
Made the CUDA contract test more robust to numerical noise.
2016-01-30 10:28:43 -08:00
Benoit Steiner
4281eb1e2c
Added 2 benchmarks to the suite of tensor benchmarks running on GPU
2016-01-30 10:20:43 -08:00
Gael Guennebaud
102fa96a96
Extend doc on dense+sparse
2016-01-30 14:58:21 +01:00
Gael Guennebaud
1bc207c528
backout changeset d4a9e61569
...
: the extended SparseView is not needed anymore
2016-01-30 14:43:21 +01:00
Gael Guennebaud
8ed1553d20
bug #632 : implement general coefficient-wise "dense op sparse" operations through specialized evaluators instead of using SparseView.
...
This permits to deal with arbitrary storage order, and to by-pass the more complex iterator of the sparse-sparse case.
2016-01-30 14:39:50 +01:00
Gael Guennebaud
699634890a
bug #946 : generalize Cholmod::solve to handle any rhs expression
2016-01-29 23:02:22 +01:00
Gael Guennebaud
15084cf1ac
bug #632 : add support for "dense +/- sparse" operations. The current implementation is based on SparseView to make the dense subexpression compatible with the sparse one.
2016-01-29 22:09:45 +01:00
Gael Guennebaud
d4a9e61569
Extend SparseView to allow keeping explicit zeros. This is equivalent to sparseView(1,-1) but faster because the test is removed at compile-time.
2016-01-29 22:07:56 +01:00
Gael Guennebaud
d8d37349c3
bug #696 : enable zero-sized block at compile-time by relaxing the respective assertion
2016-01-29 12:44:49 +01:00
Gael Guennebaud
e8ccc06fe5
merge
2016-01-29 09:40:38 +01:00
Benoit Steiner
963f2d2a8f
Marked several methods EIGEN_DEVICE_FUNC
2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0
Fixed a couple of compilation warnings.
2016-01-28 23:15:45 -08:00
Benoit Steiner
e4f83bae5d
Fixed the tensor benchmarks on apple devices
2016-01-28 21:08:07 -08:00
Benoit Steiner
10bea90c4a
Fixed clang related compilation error
2016-01-28 20:52:08 -08:00
Benoit Steiner
d3f533b395
Fixed compilation warning
2016-01-28 20:09:45 -08:00
Abhijit Kundu
3fde202215
Making ceil() functor generic w.r.t packet type
2016-01-28 21:27:00 -05:00
Benoit Steiner
211d350fc3
Fixed a typo
2016-01-28 17:13:04 -08:00
Benoit Steiner
bd2e5a788a
Made sure the number of floating point operations done by a benchmark is computed using 64 bit integers to avoid overflows.
2016-01-28 17:10:40 -08:00
Benoit Steiner
120e13b1b6
Added a readme to explain how to compile the tensor benchmarks.
2016-01-28 17:06:00 -08:00
Benoit Steiner
a68864b6bc
Updated the benchmarking code to print the number of flops processed instead of the number of bytes.
2016-01-28 16:51:40 -08:00
Benoit Steiner
8217281ae4
Merge latest updates from trunk
2016-01-28 16:20:53 -08:00
Benoit Steiner
c8d5f21941
Added extra tensor benchmarks
2016-01-28 16:20:36 -08:00
Benoit Steiner
7b3044d086
Made sure to call nvcc with the relaxed-constexpr flag.
2016-01-28 15:36:34 -08:00
Rasmus Munk Larsen
acce4dd050
Change Eigen's ColPivHouseholderQR to use the numerically stable norm downdate formula from http://www.netlib.org/lapack/lawnspdf/lawn176.pdf , which has been used in LAPACK's xGEQPF and xGEQP3 since 2006. With the old formula, the code chooses the wrong pivots and fails to correctly determine rank on graded matrices.
...
This change also adds additional checks for non-increasing diagonal in R11 to existing unit tests, and adds a new unit test with the Kahan matrix, which consistently fails for the original code.
Benchmark timings on Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz. Code compiled with AVX & FMA. I just ran on square matrices of 3 difference sizes.
Benchmark Time(ns) CPU(ns) Iterations
-------------------------------------------------------
Before:
BM_EigencolPivQR/64 53677 53627 12890
BM_EigencolPivQR/512 15265408 15250784 46
BM_EigencolPivQR/4k 15403556228 15388788368 2
After (non-vectorized version):
Benchmark Time(ns) CPU(ns) Iterations Degradation
--------------------------------------------------------------------
BM_EigencolPivQR/64 63736 63669 10844 18.5%
BM_EigencolPivQR/512 16052546 16037381 43 5.1%
BM_EigencolPivQR/4k 15149263620 15132025316 2 -2.0%
Performance-wise there seems to be a ~18.5% degradation for small (64x64) matrices, probably due to the cost of more O(min(m,n)^2) sqrt operations that are not needed for the unstable formula.
2016-01-28 15:07:26 -08:00
Gael Guennebaud
b908e071a8
bug #178 : get rid of some const_cast in SparseCore
2016-01-28 22:11:18 +01:00
Gael Guennebaud
c1d900af61
bug #178 : remove additional const on nested expression, and remove several const_cast.
2016-01-28 21:43:20 +01:00
Benoit Steiner
12f8bd12a2
Merged in jiayq/eigen (pull request PR-159)
...
Modifications to the tensor benchmarks to allow compilation in a standalone fashion.
2016-01-28 11:28:55 -08:00
Yangqing Jia
270c4e1ecd
bugfix
2016-01-28 11:11:45 -08:00
Yangqing Jia
c4e47630b1
benchmark modifications to make it compilable in a standalone fashion.
2016-01-28 10:35:14 -08:00
Gael Guennebaud
f50bb1e6f3
Fix compilation with gcc
2016-01-28 13:25:26 +01:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Gael Guennebaud
df15fbc452
bug #1158 : PartialReduxExpr is a vector expression, and it thus must expose the LinearAccessBit flag
2016-01-28 13:16:30 +01:00
Gael Guennebaud
9bcadb7fd1
Disable stupid MSVC warning
2016-01-28 12:14:16 +01:00
Gael Guennebaud
b4d87fff4a
Fix MSVC warning.
2016-01-28 12:12:30 +01:00
Gael Guennebaud
2bad3e78d9
bug #96 , bug #1006 : fix by value argument in result_of.
2016-01-28 12:12:06 +01:00
Gael Guennebaud
7802a6bb1c
Fix unit test filename.
2016-01-28 09:35:37 +01:00
Benoit Steiner
4bf9eaf77a
Deleted an invalid assertion that prevented the assignment of empty tensors.
2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885
Fixed some compilation problems with nvcc + clang
2016-01-27 15:37:03 -08:00
Benoit Steiner
47ca9dc809
Fixed the tensor_cuda test
2016-01-27 14:58:48 -08:00
Benoit Steiner
55a5204319
Fixed the flags passed to nvcc to compile the tensor code.
2016-01-27 14:46:34 -08:00
Gael Guennebaud
4865e1e732
Update link to suitesparse.
2016-01-27 22:48:40 +01:00