12689 Commits

Author SHA1 Message Date
Antonio Sanchez
3580a38298 Use native _Float16 for AVX512FP16 and update vectorization.
This allows us to do faster native scalar operations.  Also
updated half/quarter packets to use the native type if available.

Benchmark improvement:
```
Comparing ./2910_without_float16 to ./2910_with_float16
Benchmark                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------
BM_CalcMat<float>/10000/768/500                      -0.0041         -0.0040      58276392      58039442      58273420      58039582
BM_CalcMat<_Float16>/10000/768/500                   +0.0073         +0.0073     642506339     647214446     642481384     647188303
BM_CalcMat<Eigen::half>/10000/768/500                -0.3170         -0.3170      92511115      63182101      92506771      63179258
BM_CalcVec<float>/10000/768/500                      +0.0022         +0.0022       5198157       5209469       5197913       5209334
BM_CalcVec<_Float16>/10000/768/500                   +0.0025         +0.0026      10133324      10159111      10132641      10158507
BM_CalcVec<Eigen::half>/10000/768/500                -0.7760         -0.7760      45337937      10156952      45336532      10156389
OVERALL_GEOMEAN                                      -0.2677         -0.2677             0             0             0             0
```

Fixes #2910.
2025-03-18 10:46:32 -07:00
Markus Vieth
0259a52b0e
Use more .noalias() 2025-03-17 19:41:00 +01:00
Antonio Sánchez
14f845a1a8 Fix givens rotation. 2025-03-14 17:15:57 +00:00
Guilhem Saurel
33b04fe518 CMake: add install-doc target 2025-03-14 00:35:00 +00:00
Charles Schlosser
10e62ccd22 Fix x86 complex vectorized fma 2025-03-12 17:06:32 +00:00
Rasmus Munk Larsen
464c1d0978 Format TensorDeviceThreadPool.h & use if constexpr for c++20. 2025-03-08 01:09:36 +00:00
Rasmus Munk Larsen
21223f6bb6 Fix addition of different enum types. 2025-03-07 22:18:00 +00:00
Rasmus Munk Larsen
350544eb01 Clean up TensorDeviceThreadPool.h 2025-03-07 18:14:17 +00:00
Kevin
43810fc1be Fix extra semicolon in DeviceWrapper 2025-03-07 01:07:23 +00:00
Charles Schlosser
d28041ed5a refactor AssignmentFunctors.h, unify with existing scalar_op 2025-03-06 01:28:39 +00:00
Gopinath Vasalamarri
9a86214039 Optimize division operations in TensorVolumePatch.h 2025-02-28 22:34:13 +00:00
Antonio Sánchez
be5147b090 Fix STL feature detection for c++20. 2025-02-28 19:52:37 +00:00
Antonio Sanchez
179a49684a Fix CMake BOOST warning 2025-02-28 07:33:26 -08:00
Antonio Sanchez
dd56367554 Fix docs job for nightlies 2025-02-26 16:01:33 +00:00
Antonio Sánchez
d79bac0d3c Fix boolean scatter and random generation for tensors. 2025-02-25 21:37:09 +00:00
Tyler Veness
9935396b15 Specify constructor template arguments for ConstexprTest struct 2025-02-25 19:38:47 +00:00
Rasmus Munk Larsen
72adf891d5 Slightly simplify ForkJoin code, and make sure the test is actually run. 2025-02-25 17:22:43 +00:00
Antonio Sanchez
6aebfa9acc Build docs on push, and don't expire 2025-02-24 08:29:21 -08:00
Markus Vieth
bddaa99e15 Fix bitwise operation error when compiling as C++26 2025-02-23 02:30:55 +00:00
C. Antonio Sanchez
e42dceb3a1 Fix implicit copy-constructor warning in TensorRef. 2025-02-22 08:37:56 -08:00
Antonio Sanchez
5fc6fc9881 Initialize matrix in bicgstab test 2025-02-21 10:27:29 -08:00
Tyler Veness
0ae7b59018 Make assignment constexpr 2025-02-21 18:16:46 +00:00
Charles Schlosser
4dda5b927a fix Warray-bounds in inner product 2025-02-20 22:40:55 +00:00
C. Antonio Sanchez
66f7f51b7e Disable fno-check-new on clang. 2025-02-18 21:24:47 -08:00
Charles Schlosser
151f6127df Fix Warray-bounds warning for fixed-size assignments 2025-02-18 19:23:14 +00:00
C. Antonio Sanchez
1d8b82b074 Fix power builds for no VSX and no POWER8. 2025-02-15 13:56:47 -08:00
Charles Schlosser
eb3f9f443d refactor AssignmentEvaluator 2025-02-15 00:39:41 +00:00
Antonio Sánchez
9c211430b5 Fix TensorRef details 2025-02-14 18:33:26 +00:00
Antonio Sanchez
22cd7307dd Remove assumption of std::complex for complex scalar types. 2025-02-12 15:44:32 -08:00
Antonio Sánchez
6b4881ad48 Eliminate type-punning UB in Eigen::half. 2025-02-12 21:12:33 +00:00
Antonio Sánchez
420d891de7 Add missing mathjax/latex configuration. 2025-02-12 21:11:50 +00:00
Antonio Sánchez
becefd59e2 Returns condition number of zero if matrix is not invertible. 2025-02-12 07:09:20 +00:00
Antonio Sánchez
809d266b49 Fix numerical issues with BiCGSTAB. 2025-02-11 19:41:59 +00:00
Antonio Sánchez
ef475f2770 Add missing graphviz to doc build. 2025-02-11 16:03:41 +00:00
Antonio Sánchez
a0591cbc93 Fix doxygen-generated pages 2025-02-11 01:20:27 +00:00
Antonio Sánchez
715deac188 Add EIGEN_CI_CTEST_ARGS to allow for custom timeout. 2025-02-06 21:32:38 +00:00
Antonio Sánchez
4c38131a16 Fix android hardware_destructive_inference_size issue. 2025-02-05 23:53:55 +00:00
Antonio Sánchez
4c2611d27c Update check for std::hardware_destructive_interference_size 2025-02-05 19:41:07 +00:00
Antonio Sánchez
c079ee5e44 Fix tensor documentation. 2025-02-05 17:36:00 +00:00
Antonio Sanchez
74264c391a Add missing return statements for ppc. 2025-02-05 08:12:27 -08:00
Antonio Sánchez
3ebe898b5f Build and deploy nightly docs. 2025-02-05 00:35:34 +00:00
Antonio Sánchez
b73bb766a5 Increase max alignment to 256. 2025-02-04 20:06:28 +00:00
Antonio Sánchez
b1e74b1ccd Fix all the doxygen warnings. 2025-02-01 00:00:31 +00:00
Antonio Sánchez
9589cc4e7f Fix loongarch64 emulated tests. 2025-01-31 19:30:42 +00:00
Johannes Zipfel
2926b2e0a9 added functions to fetch L and U Factors from IncompleteLUT 2025-01-31 18:32:38 +00:00
William Kong
b6849f675d Change the midpoint chosen in Eigen::ForkJoinScheduler. 2025-01-30 20:21:30 +00:00
William Kong
1b2e84e55a Fix minor typos in ForkJoin.h 2025-01-29 20:12:04 +00:00
Tyler Veness
872c179f58
Fix UTF-8 encoding errors introduced by #1801 2025-01-28 16:52:46 -08:00
Rasmus Munk Larsen
2a35a917be Fix syntax error in NonBlockingThreadPool.h 2025-01-28 18:34:31 +00:00
Charles Schlosser
a056b93114 improve Simplicial Cholesky analyzePattern 2025-01-28 17:53:43 +00:00