Commit Graph

2524 Commits

Author SHA1 Message Date
Joel Holdsworth
9e68977578 test/packetmath: Made negate non-mandatory 2020-03-10 22:46:19 +00:00
Rasmus Munk Larsen
6ac37768a9 Revert "add some static checks for packet-picking logic"
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8 Disable test in test/vectorization_logic.cpp, which is currently failing with AVX. 2020-02-24 23:28:25 +00:00
Francesco Mazzoli
7769600245 add some static checks for packet-picking logic 2020-02-07 18:16:16 +01:00
Christoph Hertzberg
1d0c45122a Removing executable bit from file mode 2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Christoph Hertzberg
8333e03590 Use data.data() instead of &data (since it is not obvious that Array is trivially copyable) 2020-01-09 11:38:19 +01:00
Ilya Tokar
19876ced76 Bug #1785: Introduce numext::rint.
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
Everton Constantino
eedb7eeacf Protecting integer_types's long long test with a check to see if we have CXX11 support. 2020-01-07 14:35:35 +00:00
Christoph Hertzberg
870e53c0f2 Bug #1788: Fix rule-of-three violations inside the stable modules.
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587)
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f Fix unit-test which I broke in previous fix 2019-12-19 13:42:14 +01:00
Christoph Hertzberg
72166d0e6e Fix some maybe-unitialized warnings 2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac Workaround class-memaccess warnings on newer GCC versions 2019-12-18 16:37:26 +01:00
Rasmus Munk Larsen
a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
Ilya Tokar
06e99aaf40 Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Srinivas Vasudevan
88062b7fed Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one. 2019-12-12 01:56:54 +00:00
Joel Holdsworth
1b6e0395e6 Added io test 2019-12-11 18:22:57 +00:00
Gael Guennebaud
6358599ecb Fix QuaternionBase::cast for quaternion map and wrapper. 2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013 bug #1776: fix vector-wise STL iterator's operator-> using a proxy as pointer type.
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Joel Holdsworth
743c925286 test/packetmath: Silence alignment warnings 2019-11-05 19:06:12 +00:00
Hans Johnson
8c8cab1afd STYLE: Convert CMake-language commands to lower case
Ancient CMake versions required upper-case commands.  Later command names
became case-insensitive.  Now the preferred style is lower-case.
2019-10-31 11:36:37 -05:00
Hans Johnson
6fb3e5f176 STYLE: Remove CMake-language block-end command arguments
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308 1. Fix a bug in psqrt and make it return 0 for +inf arguments.
2. Simplify handling of special cases by taking advantage of the fact that the
   builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
   This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:

Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)

Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.

4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.

Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
8af045a287 bug #1774: fix VectorwiseOp::begin()/end() return types regarding constness. 2019-11-14 11:45:52 +01:00
Gael Guennebaud
8496f86f84 Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices. 2019-11-13 21:16:53 +01:00
Gael Guennebaud
e7d8ba747c bug #1752: make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled. 2019-10-10 17:41:47 +02:00
Gael Guennebaud
fb557aec5c bug #1752: disable some is_convertible tests for recent compilers. 2019-10-10 11:40:21 +02:00
Gael Guennebaud
36da231a41 Disable an expected warning in unit test 2019-10-08 16:28:14 +02:00
Gael Guennebaud
87427d2eaa PR 719: fix real/imag namespace conflict 2019-10-08 09:15:17 +02:00
Rasmus Larsen
d38e6fbc27 Merged in rmlarsen/eigen (pull request PR-704)
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Christoph Hertzberg
efd9867ff0 bug #1746: Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types 2019-09-24 11:09:58 +02:00
Rasmus Munk Larsen
6de5ed08d8 Add generic PacketMath implementation of the Error Function (erf). 2019-09-19 12:48:30 -07:00
Srinivas Vasudevan
df0816b71f Merging eigen/eigen. 2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109 Add Bessel functions to SpecialFunctions.
- Split SpecialFunctions files in to a separate BesselFunctions file.

In particular add:
    - Modified bessel functions of the second kind k0, k1, k0e, k1e
    - Bessel functions of the first kind j0, j1
    - Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Srinivas Vasudevan
facdec5aa7 Add packetized versions of i0e and i1e special functions.
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
  - Move chebevl to GenericPacketMathFunctions.


A brief benchmark with building Eigen with FMA, AVX and AVX2 flags

Before:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            57.3           57.3     10000000
BM_eigen_i0e_double/8           398            398        1748554
BM_eigen_i0e_double/64         3184           3184         218961
BM_eigen_i0e_double/512       25579          25579          27330
BM_eigen_i0e_double/4k       205043         205042           3418
BM_eigen_i0e_double/32k     1646038        1646176            422
BM_eigen_i0e_double/256k   13180959       13182613             53
BM_eigen_i0e_double/1M     52684617       52706132             10
BM_eigen_i0e_float/1             28.4           28.4     24636711
BM_eigen_i0e_float/8             75.7           75.7      9207634
BM_eigen_i0e_float/64           512            512        1000000
BM_eigen_i0e_float/512         4194           4194         166359
BM_eigen_i0e_float/4k         32756          32761          21373
BM_eigen_i0e_float/32k       261133         261153           2678
BM_eigen_i0e_float/256k     2087938        2088231            333
BM_eigen_i0e_float/1M       8380409        8381234             84
BM_eigen_i1e_double/1            56.3           56.3     10000000
BM_eigen_i1e_double/8           397            397        1772376
BM_eigen_i1e_double/64         3114           3115         223881
BM_eigen_i1e_double/512       25358          25361          27761
BM_eigen_i1e_double/4k       203543         203593           3462
BM_eigen_i1e_double/32k     1613649        1613803            428
BM_eigen_i1e_double/256k   12910625       12910374             54
BM_eigen_i1e_double/1M     51723824       51723991             10
BM_eigen_i1e_float/1             28.3           28.3     24683049
BM_eigen_i1e_float/8             74.8           74.9      9366216
BM_eigen_i1e_float/64           505            505        1000000
BM_eigen_i1e_float/512         4068           4068         171690
BM_eigen_i1e_float/4k         31803          31806          21948
BM_eigen_i1e_float/32k       253637         253692           2763
BM_eigen_i1e_float/256k     2019711        2019918            346
BM_eigen_i1e_float/1M       8238681        8238713             86


After:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            15.8           15.8     44097476
BM_eigen_i0e_double/8            99.3           99.3      7014884
BM_eigen_i0e_double/64          777            777         886612
BM_eigen_i0e_double/512        6180           6181         100000
BM_eigen_i0e_double/4k        48136          48140          14678
BM_eigen_i0e_double/32k      385936         385943           1801
BM_eigen_i0e_double/256k    3293324        3293551            228
BM_eigen_i0e_double/1M     12423600       12424458             57
BM_eigen_i0e_float/1             16.3           16.3     43038042
BM_eigen_i0e_float/8             30.1           30.1     23456931
BM_eigen_i0e_float/64           169            169        4132875
BM_eigen_i0e_float/512         1338           1339         516860
BM_eigen_i0e_float/4k         10191          10191          68513
BM_eigen_i0e_float/32k        81338          81337           8531
BM_eigen_i0e_float/256k      651807         651984           1000
BM_eigen_i0e_float/1M       2633821        2634187            268
BM_eigen_i1e_double/1            16.2           16.2     42352499
BM_eigen_i1e_double/8           110            110        6316524
BM_eigen_i1e_double/64          822            822         851065
BM_eigen_i1e_double/512        6480           6481         100000
BM_eigen_i1e_double/4k        51843          51843          10000
BM_eigen_i1e_double/32k      414854         414852           1680
BM_eigen_i1e_double/256k    3320001        3320568            212
BM_eigen_i1e_double/1M     13442795       13442391             53
BM_eigen_i1e_float/1             17.6           17.6     41025735
BM_eigen_i1e_float/8             35.5           35.5     19597891
BM_eigen_i1e_float/64           240            240        2924237
BM_eigen_i1e_float/512         1424           1424         485953
BM_eigen_i1e_float/4k         10722          10723          65162
BM_eigen_i1e_float/32k        86286          86297           8048
BM_eigen_i1e_float/256k      691821         691868           1000
BM_eigen_i1e_float/1M       2777336        2777747            256


This shows anywhere from a 50% to 75% improvement on these operations.

I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).

Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Gael Guennebaud
747c6a51ca bug #1736: fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i) 2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d bug #1741: fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride 2019-09-11 15:04:25 +02:00
Gael Guennebaud
c06e6fd115 bug #1741: fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride 2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956 bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1 2019-09-10 16:25:24 +02:00
Srinivas Vasudevan
e38dd48a27 PR 681: Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
1187bb65ad Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf. 2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen
9aba527405 Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly. 2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories. 2019-08-27 11:30:31 -07:00
Rasmus Munk Larsen
a3298b22ec Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)

The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.

Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
Rasmus Munk Larsen
85928e5f47 Guard against repeated definition of EIGEN_MPL2_ONLY 2019-08-07 14:19:00 -07:00
Mehdi Goli
16a56b2ddd [SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL  backend in Core
*  Supporting Vectorization
2019-06-27 12:25:09 +01:00
Rasmus Munk Larsen
988f24b730 Various fixes for packet ops.
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Rasmus Larsen
c1b0aea653 Merged in Artem-B/eigen (pull request PR-654)
Minor build improvements

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1 Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures. 2019-05-31 15:26:06 -07:00
tra
b4c49bf00e Minor build improvements
* Allow specifying multiple GPU architectures. E.g.:
  cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
2019-05-31 14:08:34 -07:00
Christoph Hertzberg
4ccd1ece92 bug #1707: Fix deprecation warnings, or disable warnings when testing deprecated functions 2019-05-10 14:57:05 +02:00
Eugene Zhulenev
e9f0eb8a5e Add masked_store_available to unpacket_traits 2019-05-02 14:52:58 -07:00
Eugene Zhulenev
b4010f02f9 Add masked pstoreu to AVX and AVX512 PacketMath 2019-05-02 13:14:18 -07:00
Anuj Rawat
8c7a6feb8e Adding lowlevel APIs for optimized RHS packet load in TensorFlow
SpatialConvolution

Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.

This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.

Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim

Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.

AVX512:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128,   24x24,  1, 64,   8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
 32,   24x24,  3, 64,   5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128,   24x24,  3, 64,   3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
 32,   14x14, 24, 64,   5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128,  3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X

AVX2:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
 32,   24x24,  3, 64,   5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128,   24x24,  1, 64,   5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128,   24x24,  3, 64,   3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128,  3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X

In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).

On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:

AVX512:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  |  41350                     | 15073                   | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |   7277                     |  7341                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |   8675                     |  8681                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  |  24155                     | 16079                   | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  |  25052                     | 17152                   | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) |  18269                     | 18345                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) |  19468                     | 19872                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 156060                     | 42432                   | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 132701                     | 36944                   | 3.59X

AVX2:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  | 26233                      | 12393                   | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |  6091                      |  6062                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |  7427                      |  7408                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  | 23453                      | 20826                   | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  | 23167                      | 22091                   | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422                      | 23682                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165                      | 23663                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 72689                      | 44969                   | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 61732                      | 39779                   | 1.55X

All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Gael Guennebaud
48898a988a fix unit test in c++03: c++03 does not allow passing local or anonymous enum as template param 2019-03-18 11:38:36 +01:00
Gael Guennebaud
cf7e2e277f bug #1692: enable enum as sizes of Matrix and Array 2019-03-17 21:59:30 +01:00
David Tellenbach
b013176e52 Remove undefined std::complex<int> 2019-03-14 11:40:28 +01:00
David Tellenbach
97f9a46cb9 PR 593: Add variadtic ctor for DiagonalMatrix with unit tests 2019-03-14 10:18:24 +01:00
Gael Guennebaud
45ab514fe2 revert debug stuff 2019-03-14 10:08:12 +01:00
Gael Guennebaud
d7d2f0680e bug #1684: partially workaround clang's 6/7 bug #40815 2019-03-13 10:40:01 +01:00
Gael Guennebaud
b0d406d91c Enable construction of Ref<VectorType> from a runtime vector. 2019-03-03 15:25:25 +01:00
Gael Guennebaud
32502f3c45 bug #1684: add simplified regression test for respective clang's bug (this also reveal the same bug in Apples's clang) 2019-02-22 10:29:06 +01:00
Gael Guennebaud
2a39659d79 Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases. 2019-02-20 15:23:23 +01:00
Gael Guennebaud
44b54fa4a3 Protect c++11 type alias with Eigen's macro, and add respective unit test. 2019-02-20 14:43:05 +01:00
Gael Guennebaud
4e8047cdcf Fix compilation with gcc and remove TR1 stuff. 2019-02-20 13:59:34 +01:00
Gael Guennebaud
edd413c184 bug #1409: make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
- this helps clang 5 and 6 to support alignas in STL's containers.
 - this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
3b5deeb546 bug #899: make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems. 2019-02-19 22:57:51 +01:00
Gael Guennebaud
292d61970a Fix C++17 compilation 2019-02-19 21:59:41 +01:00
Gael Guennebaud
2cfc025bda fix unit compilation in c++17: std::ptr_fun has been removed. 2019-02-19 14:05:22 +01:00
Gael Guennebaud
7d10c78738 bug #1046: add unit tests for correct propagation of alignment through std::alignment_of 2019-02-19 10:31:56 +01:00
Gael Guennebaud
e23bf40dc2 Add unit test for LinSpaced and complex numbers. 2019-02-18 22:03:47 +01:00
Gael Guennebaud
31b6e080a9 Fix regression: .conjugate() was popped out but not re-introduced. 2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0 Set cost of conjugate to 0 (in practice it boils down to a no-op).
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1 GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).
2019-02-18 11:47:54 +01:00
Gael Guennebaud
dada863d23 Enable unit tests of PartialPivLU on fixed size matrices, and increase tested matrix size (blocking was not tested!) 2019-02-11 17:56:20 +01:00
Gael Guennebaud
8a06c699d0 bug #1669: fix PartialPivLU/inverse with zero-sized matrices. 2019-01-29 10:27:13 +01:00
Gael Guennebaud
f489f44519 bug #1574: implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs. 2019-01-28 17:29:50 +01:00
Gael Guennebaud
53560f9186 bug #1672: fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro) 2019-01-28 13:47:28 +01:00
Christoph Hertzberg
934b8a1304 Avoid I as an identifier, since it may clash with the C-header complex.h 2019-01-25 14:54:39 +01:00
Gael Guennebaud
6908ce2a15 More thoroughly check variadic template ctor of fixed-size vectors 2019-01-24 10:24:28 +01:00
David Tellenbach
db152b9ee6 PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
Gael Guennebaud
543529da6a Add more extensive tests of Array ctors, including {} variants 2019-01-22 15:30:50 +01:00
Gael Guennebaud
d18f49cbb3 Fix compilation of unit tests with gcc and c++17 2019-01-18 11:12:42 +01:00
Christoph Hertzberg
d575505d25 After fixing bug #1557, boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success) 2019-01-17 19:14:07 +01:00
Gael Guennebaud
0fe6b7d687 Make nestByValue works again (broken since 3.3) and add unit tests. 2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82 Extend reshaped unit tests and remove useless const_cast 2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1 Cleanup useless const_cast and add missing broadcast assignment tests 2019-01-17 16:55:42 +01:00
Patrick Peltzer
bba2f05064 Boosttest only available for Boost version >= 1.53.0 2019-01-17 11:54:37 +01:00
Patrick Peltzer
15e53d5d93 PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
This changeset also includes:
 * add HouseholderSequence::conjugateIf
 * define int as the StorageIndex type for all dense solvers
 * dedicated unit tests, including assertion checking
 * _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
 * CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
 * Cholesky: add missing assertions
 * FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
 * BDCSVD: Unambiguous return type for ternary operator
 * SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11 Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it. 2019-01-17 11:33:43 +01:00
Gael Guennebaud
c8e40edac9 Remove Eigen2ToEigen3 migration page (obsolete since 3.3) 2019-01-16 16:27:00 +01:00
Gael Guennebaud
aeffdf909e bug #1617: add unit tests for empty triangular solve. 2019-01-16 15:24:59 +01:00
Gael Guennebaud
502f717980 bug #1646: disable aliasing detection for empty and 1x1 expression 2019-01-16 14:33:45 +01:00
Gael Guennebaud
2b70b2f570 Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry 2019-01-15 22:50:42 +01:00
Gael Guennebaud
6ec6bf0b0d Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests) 2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24 bug #1592: makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests) 2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
32d7232aec fix always true warning with gcc 4.7 2019-01-15 11:18:48 +01:00
Gael Guennebaud
e7d4d4f192 cleanup 2019-01-15 10:51:03 +01:00
Rasmus Larsen
5a59452aae Merged eigen/eigen into default 2019-01-14 10:23:23 -08:00
Gael Guennebaud
61b6eb05fe AVX512 (r)sqrt(double) was mistakenly disabled with clang and others 2019-01-14 17:28:47 +01:00
Greg Coombe
9d988a1e1a Initialize isometric transforms like affine transforms.
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
f566724023 Fix StorageIndex FIXME in dense LU solvers 2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
28ba1b2c32 Add support for inverse hyperbolic functions.
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
fcfced13ed Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate. 2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
8f04442526 Collapsed revision
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
cb955df9a6 Add packet up "pones". Write pnot(a) as pxor(pones(a), a). 2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4 Merged eigen/eigen into default 2019-01-09 15:04:17 -08:00
Gael Guennebaud
e6b217b8dd bug #1652: implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
  - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen
055f0b73db Add support for pcmp_eq and pnot, including for complex types. 2019-01-07 16:53:36 -08:00
Gael Guennebaud
697fba3bb0 Fix unit test 2018-12-27 11:20:47 +01:00
Gael Guennebaud
0f6f75bd8a Implement a faster fix for sin/cos of large entries that also correctly handle INF input. 2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8 Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate) 2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb Fix plog(+INF): it returned ~87 instead of +INF 2018-12-23 15:40:52 +01:00
Gael Guennebaud
cfc70dc13f Add regression test for bug #1174 2018-12-12 18:03:31 +01:00
Gael Guennebaud
2de8da70fd bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the diagonal. 2018-12-12 17:30:08 +01:00
Gael Guennebaud
72c0bbe2bd Simplify handling of tests that must fail to compile.
Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).
2018-12-12 15:48:36 +01:00
Gael Guennebaud
81c27325ae bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 2018-12-08 14:27:48 +01:00
Gael Guennebaud
cd25b538ab Fix noise in sparse_basic_3 (numerical cancellation) 2018-12-08 00:13:37 +01:00
Gael Guennebaud
efaf03bf96 Fix noise in lu unit test 2018-12-08 00:05:03 +01:00
Gael Guennebaud
aab749b1c3 fix test regarding AVX512 vectorization of complexes. 2018-12-06 16:55:00 +01:00
Gael Guennebaud
c53eececb0 Implement AVX512 vectorization of std::complex<float/double> 2018-12-06 15:58:06 +01:00
Christoph Hertzberg
919414b9fe bug #785: Make Cholesky decomposition work for empty matrices 2018-12-03 16:18:15 +01:00
Gael Guennebaud
69ace742be Several improvements regarding packet-bitwise operations:
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. 2018-11-28 13:15:06 +01:00
Gael Guennebaud
382279eb7f Extend unit test to recursively check half-packet types and non packet types 2018-11-26 14:10:07 +01:00
Gael Guennebaud
e3b22a6bd0 merge 2018-11-23 16:06:21 +01:00
Gael Guennebaud
572d62697d check two ctors 2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b Fix double = bool ! 2018-11-23 15:12:06 +01:00
Christoph Hertzberg
ea60a172cf Add default constructor to Bar to make test compile again with clang-3.8 2018-11-23 14:24:22 +01:00
Gael Guennebaud
c685fe9838 Move regression test to right unit test file 2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8 Workaround weird MSVC bug 2018-11-21 15:53:37 +01:00
Gael Guennebaud
43c987b1c1 Add explicit regression test for bug #1622 2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
Commit aa110e681b
 optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation.  However, it
introduced a mismatch between the packet size and the requestedAlignment.  This
mismatch can lead to crashes when the destination is not aligned.  This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize

This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment

A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Gael Guennebaud
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators 2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba bug #1619: make const and non-const iterators compatible 2018-11-09 16:49:19 +01:00
Gael Guennebaud
f62a0f69c6 Fix max-size in indexed-view 2018-11-08 18:40:22 +01:00
Gael Guennebaud
9d318b92c6 add unit tests for bug #1619 2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Rasmus Munk Larsen
954b4ca9d0 Suppress compiler warning about unused global variable. 2018-10-22 13:48:56 -07:00
Gael Guennebaud
e3b85771d7 Show call stack in case of failing sparse solving. 2018-10-16 00:43:44 +02:00
Gael Guennebaud
3a33db4de5 merge 2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1 Suppress unused variable compiler warning in sparse subtest 3. 2018-10-12 13:41:57 -07:00
Gael Guennebaud
8214cf1896 Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways 2018-10-11 10:27:23 +02:00
Gael Guennebaud
2ef1b39674 Relaxed fastmath unit test: if std::foo fails, then let's only trigger a warning is numext::foo fails too.
A true error will triggered only if std::foo works but our numext::foo fails.
2018-10-11 09:45:30 +02:00
Gael Guennebaud
1d5a6363ea relax numerical tests from equal to approx (x87) 2018-10-11 09:29:56 +02:00
Gael Guennebaud
ce243ee45b bug #520: add diagmat +/- diagmat operators. 2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47 Merged in ezhulenev/eigen-02 (pull request PR-525)
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688 bug #632: add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d Fix bug in partial reduction of expressions requiring evaluation 2018-10-10 13:23:52 -07:00
Gael Guennebaud
76ceae49c1 bug #1609: add inplace transposition unit test 2018-10-10 21:48:58 +02:00
Christoph Hertzberg
f3130ee1ba Avoid empty macro arguments 2018-10-10 08:23:40 +02:00
Rasmus Munk Larsen
e8918743c1 Merged in ezhulenev/eigen-01 (pull request PR-523)
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
befcac883d Hide stl-container detection test under #if 2018-10-09 15:36:01 -07:00
Eugene Zhulenev
c0ca8a9fa3 Compile time detection for unimplemented stl-style iterators 2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454 bug #65: add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean() 2018-10-09 23:36:50 +02:00
Gael Guennebaud
c0c3be26ed Extend unit tests for partial reductions 2018-10-09 22:54:54 +02:00
Gael Guennebaud
c6e2dde714 fix c++11 deprecated warning 2018-10-08 18:26:05 +02:00
Gael Guennebaud
649d4758a6 merge 2018-10-08 17:35:18 +02:00
Gael Guennebaud
c9643f4a6f Disable C++11 deprecated warning when limiting Eigen to C++98 2018-10-08 10:43:43 +02:00
Gael Guennebaud
6c3f6cd52b Fix maybe-uninitialized warning 2018-10-07 23:29:51 +02:00
Gael Guennebaud
16b2001ece Fix gcc 8.1 warning: "maybe use uninitialized" 2018-10-07 21:54:49 +02:00
Gael Guennebaud
409132bb81 Workaround gcc bug making it trigger an invalid warning 2018-10-07 09:23:15 +02:00
Gael Guennebaud
d92f004ab7 Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns 2018-10-05 23:11:21 +02:00
Gael Guennebaud
3e64b1fc86 Move iterators to internal, improve doc, make unit test c++03 friendly 2018-10-03 15:13:15 +02:00
Gael Guennebaud
8a1e98240e add unit tests 2018-10-03 11:56:27 +02:00
Gael Guennebaud
5f26f57598 Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25 Add pointer-based iterator for direct-access expressions 2018-10-02 23:44:36 +02:00
Gael Guennebaud
12487531ce Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns) 2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893 Use Index instead of ptrdiff_t or int, fix random-accessors. 2018-10-02 13:29:32 +02:00
Gael Guennebaud
b0c66adfb1 bug #231: initial implementation of STL iterators for dense expressions 2018-10-01 23:21:37 +02:00
Gael Guennebaud
626942d9dd fix alignment issue in ploaddup for AVX512 2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36 Merge with default. 2018-09-23 21:52:58 +02:00
Christoph Hertzberg
e3c8289047 Replace unused PREDICATE by corresponding STATIC_ASSERT 2018-09-21 21:15:51 +02:00
Gael Guennebaud
1bf12880ae Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all) 2018-09-21 16:50:04 +02:00
Gael Guennebaud
03a0cb2b72 fix unalignedcount for avx512 2018-09-21 14:40:26 +02:00
Gael Guennebaud
91716f03a7 Fix vectorization logic unit test for AVX512 2018-09-21 14:32:24 +02:00
Gael Guennebaud
b00e48a867 Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks) 2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787 merge with default Eigen 2018-09-21 11:51:49 +02:00
Gael Guennebaud
dfa8439e4d Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
Gael Guennebaud
2014c7ae28 Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end. 2018-09-15 14:35:10 +02:00
Gael Guennebaud
e0f6d352fb Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header. 2018-09-20 18:07:32 +02:00
Gael Guennebaud
eeeb18814f Fix warning 2018-09-20 17:48:56 +02:00
Gael Guennebaud
2cf6d3050c Disable ignoring attributes warning 2018-09-20 11:38:19 +02:00
Gael Guennebaud
82772e8d9d Rename Symbolic namespace to symbolic to be consistent with numext namespace 2018-09-15 14:16:20 +02:00
Deven Desai
c64fe9ea1f Updates to fix HIP-clang specific compile errors.
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
luz.paz"
43fd42a33b Fix doxy and misc. typos
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
 Eigen/src/Core/ProductEvaluators.h |  4 ++--
 Eigen/src/Core/arch/GPU/Half.h     |  2 +-
 Eigen/src/Core/util/Memory.h       |  2 +-
 Eigen/src/Geometry/Hyperplane.h    |  2 +-
 Eigen/src/Geometry/Transform.h     |  2 +-
 Eigen/src/Geometry/Translation.h   | 12 ++++++------
 doc/PreprocessorDirectives.dox     |  2 +-
 doc/TutorialGeometry.dox           |  2 +-
 test/boostmultiprec.cpp            |  2 +-
 test/triangular.cpp                |  2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Christoph Hertzberg
a80a290079 Fix 'template argument uses local type'-warnings (when compiled in C++03 mode) 2018-09-10 18:57:28 +02:00
Christoph Hertzberg
73ca600bca Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
Gael Guennebaud
5747288676 Disable a bonus unit-test which is broken with gcc 4.7 2018-08-27 13:07:34 +02:00
Gael Guennebaud
d5ed64512f bug #1573: workaround gcc 4.7 and 4.8 bug 2018-08-27 10:38:20 +02:00
Rasmus Munk Larsen
8278ae6313 Add support for thread local support on platforms that do not support it through emulation using a hash map. 2018-08-13 15:31:23 -07:00
Gael Guennebaud
3ec60215df Merged in rmlarsen/eigen2 (pull request PR-466)
Move sigmoid functor to core and rename it to 'logistic'.
2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen
d6e283ba96 sigmoid -> logistic 2018-08-13 11:14:50 -07:00
Mehdi Goli
908b906d79 Disabling assert inside SYCL kernel. 2018-08-08 10:01:10 +01:00
Rasmus Munk Larsen
fa68342ef8 Move sigmoid functor to core. 2018-08-03 17:31:23 -07:00
Gustavo Lima Chaves
2bf1cc8cf7 Fix 256 bit packet size assumptions in unit tests.
Like in change 2606abed53
, we have hit the threshould again. With
AVX512 builds we would never have Vector8f packets aligned at 64
bytes (the new value of EIGEN_MAX_ALIGN_BYTES after change 405859f18d
,
for AVX512-enabled builds).

This makes test/dynalloc.cpp pass for those builds.
2018-08-02 15:55:36 -07:00
Gael Guennebaud
723856dec1 bug #1577: fix msvc compilation of unit test, msvc defines ptrdiff_t as long long 2018-07-30 14:52:15 +02:00
Christoph Hertzberg
397b0547e1 DIsable static assertions only when necessary and disable double-promotion warnings in that case as well 2018-07-26 00:01:24 +02:00
Gael Guennebaud
c747cde69a Add lastN shorcuts to seq/seqN. 2018-07-23 16:20:25 +02:00
Gael Guennebaud
de70671937 Oopps, EIGEN_COMP_MSVC is not available before including Eigen. 2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build. 2018-07-20 08:36:38 -07:00
Gael Guennebaud
2424e3b7ac Pass by const ref. 2018-07-19 18:48:19 +02:00
Gael Guennebaud
6e5a3b898f Add regression for bugs #1573 and #1575 2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88 bug #1432: fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable 2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72 Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests. 2018-07-18 23:27:37 +02:00
Gael Guennebaud
3a2dc3869e Fix weird issue with MSVC 2013 2018-07-18 02:26:43 -07:00
Gael Guennebaud
dff3a92d52 Remove usage of #if EIGEN_TEST_PART_XX in unit tests that does not require them (splitting can thus be avoided for them) 2018-07-17 15:52:58 +02:00
Gael Guennebaud
82f0ce2726 Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Gael Guennebaud
37f4bdd97d Fix VERIFY_EVALUATION_COUNT(EXPR,N) with a complex expression as N 2018-07-17 13:20:49 +02:00
Gael Guennebaud
40797dbea3 bug #1572: use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11. 2018-07-17 00:11:20 +02:00
Gael Guennebaud
add5757488 Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder. 2018-07-16 18:55:40 +02:00
Gael Guennebaud
901c7d31f0 Fix usage of EIGEN_SPLIT_LARGE_TESTS=ON: some unit tests, such as indexed_view have to be split unconditionally. 2018-07-16 18:35:05 +02:00
Gael Guennebaud
a87cff20df Fix GeneralizedEigenSolver when requesting for eigenvalues only. 2018-07-14 09:38:49 +02:00
Gael Guennebaud
20991c3203 bug #1571: fix is_convertible<from,to> with "from" a reference. 2018-07-13 17:47:28 +02:00
Gael Guennebaud
195c9c054b Print more debug info in gpu_basic 2018-07-13 16:05:07 +02:00
Gael Guennebaud
12e1ebb68b Remove local Index typedef from unit-tests 2018-07-12 17:16:40 +02:00
Gael Guennebaud
63185be8b2 Disable eigenvalues test for clang-cuda 2018-07-12 17:03:14 +02:00
Gael Guennebaud
bec013b2c9 fix unused warning 2018-07-12 17:02:18 +02:00
Gael Guennebaud
da0c604078 Merged in deven-amd/eigen (pull request PR-402)
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
8a40dda5a6 Add some basic unit-tests 2018-07-12 09:59:00 +02:00
Gael Guennebaud
21cf4a1a8b Make is_convertible more robust and conformant to std::is_convertible 2018-07-12 09:57:19 +02:00
Gael Guennebaud
d193cc87f4 Fix regression in 9357838f94 2018-07-11 17:09:23 +02:00
Deven Desai
876f392c39 Updates corresponding to the latest round of PR feedback
The major changes are

1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
    The above three changes effectively enable the Eigen "Packet" layer for the HIP platform

4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places

The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
1fe0b74904 deleting hip specific files that are no longer required 2018-07-11 09:28:44 -04:00
Deven Desai
dec47a6493 renaming CUDA* to GPU* for some header files 2018-07-11 09:26:54 -04:00
Deven Desai
38807a2575 merging updates from upstream 2018-07-11 09:17:33 -04:00
Gael Guennebaud
9357838f94 bug #1543: improve linear indexing for general block expressions 2018-07-10 09:10:15 +02:00
Gael Guennebaud
de9e31a06d Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
2018-07-09 15:41:14 +02:00
Gael Guennebaud
a937c50208 palign is not used anymore, so let's relax the unit test 2018-07-06 17:41:52 +02:00
Gael Guennebaud
56a33ae57d test product kernel with half-floats. 2018-07-06 17:14:04 +02:00
Gael Guennebaud
f4d623ffa7 Complete Packet8h implementation and test it in packetmath unit test 2018-07-06 17:13:36 +02:00
Gael Guennebaud
a8ab6060df Add unitests for inverse and selfadjoint-eigenvalues on CUDA 2018-07-06 09:58:45 +02:00
Deven Desai
b6cc0961b1 updates based on PR feedback
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)

1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`

After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.

2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
d1d22ef0f4 syncing this fork with upstream 2018-06-13 12:09:52 -04:00
Gael Guennebaud
3ae2083e23 Make is_same_dense compatible with different scalar types. 2018-07-03 13:21:43 +02:00
Gael Guennebaud
d428a199ab bug #1562: optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x... 2018-07-02 11:41:09 +02:00
Gael Guennebaud
a7b313a16c Fix unit test 2018-07-01 22:45:47 +02:00
Gael Guennebaud
ee5864f72e bug #1560 fix product with a 1x1 diagonal matrix 2018-06-25 10:30:12 +02:00
Gael Guennebaud
cb4c9a6a94 bug #1531: make dedicatd unit testing for NumDimensions 2018-06-08 17:11:45 +02:00
Gael Guennebaud
89d65bb9d6 bug #1531: expose NumDimensions for compatibility with Tensor 2018-06-08 16:50:17 +02:00
Gael Guennebaud
f4d1461874 Fix the way matrix folder is passed to the tests. 2018-06-08 09:55:46 +02:00
Deven Desai
8fbd47052b Adding support for using Eigen in HIP kernels.
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.

Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)


Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Gael Guennebaud
999b552c16 Search for sequential Pastix. 2018-05-29 20:49:25 +02:00
Gael Guennebaud
eef4b7bd87 Fix handling of path names containing spaces and the likes. 2018-05-29 20:49:06 +02:00
Christoph Hertzberg
750af06362 Add an option to test with external BLAS library 2018-05-22 21:04:32 +02:00
Christoph Hertzberg
d06a753d10 Make qr_fullpivoting unit test run for fixed-sized matrices 2018-05-22 20:29:17 +02:00
Gael Guennebaud
4dd767f455 add some internal checks 2018-05-18 13:59:55 +02:00