Commit Graph

11021 Commits

Author SHA1 Message Date
Gael Guennebaud
d0f5d4bc50 add a banner to advertise the survey 2020-07-29 19:01:38 +02:00
David Tellenbach
5e484fa11d Fix StlDeque for GCC 10
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641 Fix undefine BF16 union behavior in AVX512. 2020-07-29 02:20:21 +00:00
Rasmus Munk Larsen
b92206676c Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned. 2020-07-28 19:19:37 +00:00
David Tellenbach
99da2e1a8d Fix clang-tidy warnings in generic bfloat16 implementation
See !172 for related discussions.
2020-07-27 16:00:24 +02:00
qxxxb
649fd1c2ae Fix CMake install command 2020-07-25 16:35:13 -04:00
David Tellenbach
e48d8e4725 Don't allow failure for CI build stage anymore 2020-07-24 21:12:15 +02:00
David Tellenbach
b8ca93842c Improve CI configuration
- Fix docker Fedora image to Fedora:31
  - Fix gcc version to gcc-9.2.1
  - Use GitLab CI dag
  - Fix usage of build cache
  - Introduce build artificats
2020-07-24 15:58:44 +00:00
Gael Guennebaud
fb0c6868ad Add missing footer declaration 2020-07-24 10:28:44 +02:00
David Tellenbach
c1ffe452fc Fix bfloat16 casts
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Gael Guennebaud
2ce2f51989 remove piwik tracker 2020-07-23 13:51:39 +02:00
Rasmus Munk Larsen
1b84f21e32 Revert change that made conversion from bfloat16 to {float, double} implicit.
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
David Tellenbach
38b91f256b Fix cast of blfoat16 to std::complex<T>
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen
bed7fbe854 Make sure we take the little-endian path if __BYTE_ORDER__ is not defined. 2020-07-22 18:54:38 +00:00
Niels Dekker
0e1a33a461 Faster conversion from integer types to bfloat16
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.

A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen
acab22c205 Avoid division by zero in nonZerosEstimate() for empty blocks. 2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00
Rasmus Munk Larsen
0aeaf5f451 Make numext::as_uint a device function. 2020-07-22 00:33:41 +00:00
Alexander Turkin
60faa9f897 user-defined copy operations removed in favor of compiler-generated ones 2020-07-20 14:59:35 +03:00
Niels Dekker
b11f817bcf Avoid undefined behavior by union type punning in float_to_bfloat16_rtne
Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

`numext::as_uint` was suggested by David Tellenbach
2020-07-14 19:55:20 +02:00
Sheng Yang
56b3e3f3f8 AVX path for BF16 2020-07-14 01:34:03 +00:00
Niels Dekker
4ab32e2de2 Allow implicit conversion from bfloat16 to float and double
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.

Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d Guard operator<< test by EIGEN_NO_IO. 2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
ed00df445d Guard operator<< by EIGEN_NO_IO. 2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen
fb77b7288c Add operator<< to print a quaternion. 2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48 Fix test basic stuff
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Forrest Voight
8889a2c1c6 Add operator==/operator!= to Quaternion. Fixes #1876. 2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183 Make test packetmath C++98 compliant 2020-07-01 20:41:59 +02:00
Sheng Yang
116c5235ac BF16 for scalar_cmp_with_cast_op 2020-07-01 18:33:42 +00:00
Kan Chen
8731452b97 Delete duplicate test cases in vectorization_logic.cpp 2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f Fix denormal check pre c++11.
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d Report custom C++ flags in CMake testing summary 2020-06-30 17:18:54 +00:00
David Tellenbach
f3b8d441f6 Remote CI tags to enable shared runners 2020-06-29 22:15:41 +02:00
Christoph Grüninger
dc0b81fb1d Pass CMAKE_MAKE_PROGRAM to Fortran language support test
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
2020-06-27 23:52:38 +02:00
David Tellenbach
13d25f5ed8 Add initial CI configuration file.
The initial CI configuration consists of jobs to build and run tests and
to build docs.
2020-06-27 00:03:35 +00:00
Antonio Sanchez
7222f0b6b5 Fix packetmath_1 float tests for arm/aarch64.
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.

Modified `log(denorm)` tests.  Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.

Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Simon Pfreundschuh
14f84978e8 Replaced call to deprecated 'load' function with appropriate call to 'on'. 2020-06-23 11:23:13 +02:00
Antonio Sanchez
ff4e7a0820 Add missing Packet2l/Packet2ul ops for NEON.
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-22 11:24:43 -07:00
Antonio Sanchez
03ebdf6acb Added missing NEON pcasts, update packetmath tests.
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.

Removed incorrect `HasHalfPacket`  definition for NEON's
`Packet2l`/`Packet2ul`.

Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
  packets that are vectorizable)
- added 8:1 cast tests
- random number generation
  - original had uninteresting 0 to 0 casts for many casts between
    floating-point and integers, and exhibited signed overflow
    undefined behavior

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
6b9c92fe7e Add Apache 2.0 license text in COPYING.APACHE. 2020-06-18 12:45:27 -07:00
Nicolas Mellado
cf7adf3a5d Update things you can do message using cmake commands
Print cmake commands instead of make commands, which should work for any generator.
2020-06-16 21:04:33 +00:00
Ilya Tokar
231ce21535 Run two independent chains, when reducing tensors.
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.

Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.

BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
Pedro Caldeira
a475bf14d4 Fix pscatter and pgather for Altivec Complex double 2020-06-16 16:41:02 -03:00
David Tellenbach
c6c84ed961 Fix unused variable warning on Arm 2020-06-15 00:14:58 +02:00
Sebastien Boisvert
6228f27234 Fix #1818: SparseLU: add methods nnzL() and nnzU()
Now this compiles without errors:

$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
2020-06-11 23:49:49 +00:00
Sebastien Boisvert
39cbd6578f Fix #1911: add benchmark for move semantics with fixed-size matrix
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
        -o bench_move_semantics

$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00