Antonio Sanchez
3580a38298
Use native _Float16 for AVX512FP16 and update vectorization.
...
This allows us to do faster native scalar operations. Also
updated half/quarter packets to use the native type if available.
Benchmark improvement:
```
Comparing ./2910_without_float16 to ./2910_with_float16
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------------------------------
BM_CalcMat<float>/10000/768/500 -0.0041 -0.0040 58276392 58039442 58273420 58039582
BM_CalcMat<_Float16>/10000/768/500 +0.0073 +0.0073 642506339 647214446 642481384 647188303
BM_CalcMat<Eigen::half>/10000/768/500 -0.3170 -0.3170 92511115 63182101 92506771 63179258
BM_CalcVec<float>/10000/768/500 +0.0022 +0.0022 5198157 5209469 5197913 5209334
BM_CalcVec<_Float16>/10000/768/500 +0.0025 +0.0026 10133324 10159111 10132641 10158507
BM_CalcVec<Eigen::half>/10000/768/500 -0.7760 -0.7760 45337937 10156952 45336532 10156389
OVERALL_GEOMEAN -0.2677 -0.2677 0 0 0 0
```
Fixes #2910 .
2025-03-18 10:46:32 -07:00
Markus Vieth
0259a52b0e
Use more .noalias()
2025-03-17 19:41:00 +01:00
Charles Schlosser
10e62ccd22
Fix x86 complex vectorized fma
2025-03-12 17:06:32 +00:00
Rasmus Munk Larsen
21223f6bb6
Fix addition of different enum types.
2025-03-07 22:18:00 +00:00
Kevin
43810fc1be
Fix extra semicolon in DeviceWrapper
2025-03-07 01:07:23 +00:00
Charles Schlosser
d28041ed5a
refactor AssignmentFunctors.h, unify with existing scalar_op
2025-03-06 01:28:39 +00:00
Antonio Sánchez
be5147b090
Fix STL feature detection for c++20.
2025-02-28 19:52:37 +00:00
Antonio Sánchez
d79bac0d3c
Fix boolean scatter and random generation for tensors.
2025-02-25 21:37:09 +00:00
Rasmus Munk Larsen
72adf891d5
Slightly simplify ForkJoin code, and make sure the test is actually run.
2025-02-25 17:22:43 +00:00
Markus Vieth
bddaa99e15
Fix bitwise operation error when compiling as C++26
2025-02-23 02:30:55 +00:00
Tyler Veness
0ae7b59018
Make assignment constexpr
2025-02-21 18:16:46 +00:00
Charles Schlosser
4dda5b927a
fix Warray-bounds in inner product
2025-02-20 22:40:55 +00:00
Charles Schlosser
151f6127df
Fix Warray-bounds warning for fixed-size assignments
2025-02-18 19:23:14 +00:00
C. Antonio Sanchez
1d8b82b074
Fix power builds for no VSX and no POWER8.
2025-02-15 13:56:47 -08:00
Charles Schlosser
eb3f9f443d
refactor AssignmentEvaluator
2025-02-15 00:39:41 +00:00
Antonio Sanchez
22cd7307dd
Remove assumption of std::complex for complex scalar types.
2025-02-12 15:44:32 -08:00
Antonio Sánchez
6b4881ad48
Eliminate type-punning UB in Eigen::half.
2025-02-12 21:12:33 +00:00
Antonio Sánchez
becefd59e2
Returns condition number of zero if matrix is not invertible.
2025-02-12 07:09:20 +00:00
Antonio Sánchez
809d266b49
Fix numerical issues with BiCGSTAB.
2025-02-11 19:41:59 +00:00
Antonio Sánchez
4c38131a16
Fix android hardware_destructive_inference_size issue.
2025-02-05 23:53:55 +00:00
Antonio Sánchez
4c2611d27c
Update check for std::hardware_destructive_interference_size
2025-02-05 19:41:07 +00:00
Antonio Sanchez
74264c391a
Add missing return statements for ppc.
2025-02-05 08:12:27 -08:00
Antonio Sánchez
b73bb766a5
Increase max alignment to 256.
2025-02-04 20:06:28 +00:00
Antonio Sánchez
b1e74b1ccd
Fix all the doxygen warnings.
2025-02-01 00:00:31 +00:00
Johannes Zipfel
2926b2e0a9
added functions to fetch L and U Factors from IncompleteLUT
2025-01-31 18:32:38 +00:00
William Kong
b6849f675d
Change the midpoint chosen in Eigen::ForkJoinScheduler.
2025-01-30 20:21:30 +00:00
William Kong
1b2e84e55a
Fix minor typos in ForkJoin.h
2025-01-29 20:12:04 +00:00
Tyler Veness
872c179f58
Fix UTF-8 encoding errors introduced by #1801
2025-01-28 16:52:46 -08:00
Rasmus Munk Larsen
2a35a917be
Fix syntax error in NonBlockingThreadPool.h
2025-01-28 18:34:31 +00:00
Charles Schlosser
a056b93114
improve Simplicial Cholesky analyzePattern
2025-01-28 17:53:43 +00:00
William Kong
5d866a7a78
Fix potential data race on spin_count_
NonBlockingThreadPool member variable
2025-01-28 17:22:15 +00:00
William Kong
bc67025ba7
Clean up and fix the documentation of ForkJoin.h
2025-01-27 23:12:17 +00:00
Antonio Sánchez
dc1126e762
Fix threadpool for c++14.
2025-01-27 21:57:23 +00:00
Rasmus Munk Larsen
cd511a09aa
Fix initialization order and remove unused variables in NonBlockingThreadPool.h.
2025-01-27 19:35:49 +00:00
William Kong
f9705adabb
Fix typo introduced in the refactor of NonBlockingThreadPool
2025-01-25 17:13:24 +00:00
William Kong
4a6ac97d13
Add a ForkJoin-based ParallelFor algorithm to the ThreadPool module
2025-01-24 22:12:05 +00:00
Pengzhou0810
e986838464
Add LoongArch64 architecture LSX support.(build/test )
2025-01-20 18:37:44 +00:00
Markus Vieth
c486af5ad3
Change Eigen::aligned_allocator to not inherit from std::allocator
2025-01-20 16:04:43 +00:00
Antonio Sánchez
abac563f5d
Update documentation to clarify cross product for complex numbers.
2025-01-16 00:52:40 +00:00
Antonio Sánchez
ad13df7ea4
Fix std::fill_n reference.
2025-01-14 00:43:00 +00:00
Frédéric Simonis
9836e8d035
Fix read of uninitialized threshold in SparseQR
2025-01-08 23:40:58 +00:00
xsjk
7bb8c58e7c
Fix the missing CUDA device qualifier
2024-12-28 15:17:55 +00:00
Joerg Buchwald
24e0c2a125
use omp_get_max_threads if setNbThreads is not set
2024-12-20 21:16:15 +00:00
Jordan Rupprecht
a32db43966
Add missing #include <new>
2024-12-19 11:06:08 +00:00
Charles Schlosser
c01ff45312
Enable fill_n and memset optimizations for construction and assignment
2024-12-14 14:25:04 +00:00
Charles Schlosser
4a9e32ae0b
matrix equality operator
2024-12-10 12:40:39 +00:00
Charles Schlosser
5e8916050b
move constructor / move assignment doc strings
2024-12-04 17:42:20 +00:00
Charles Schlosser
41e46ed243
fix IOFormat alignment
2024-12-04 01:13:48 +00:00
Charles Schlosser
a0d32e40d9
fix map fill logic
2024-11-30 13:39:02 +00:00
Charles Schlosser
d34b100c13
Fix UB in setZero
2024-11-27 19:32:14 +00:00