eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	2f6ddaa25c	Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%	2020-04-20 20:16:28 +00:00
Aaron Franke	5c22c7a7de	Make file formatting comply with POSIX and Unix standards UTF-8, LF, no BOM, and newlines at the end of files	2020-03-23 18:09:02 +00:00
Srinivas Vasudevan	f6c6de5d63	Ensure Igamma does not NaN or Inf for large values.	2020-01-14 21:32:48 +00:00
Srinivas Vasudevan	2e099e8d8f	Added special_packetmath test and tweaked bounds on tests. Refactor shared packetmath code to header file. (Squashed from PR !38)	2020-01-11 10:31:21 +00:00
Christoph Hertzberg	1e9664b147	Bug #1796 : Make matrix squareroot usable for Map and Ref types	2019-12-20 18:10:22 +01:00
Christoph Hertzberg	c21771ac04	Use double-braces initialization (as everywhere else in the test-suite).	2019-12-19 19:20:48 +01:00
Eugene Zhulenev	ae07801dd8	Tensor block evaluation cost model	2019-12-18 20:07:00 +00:00
Eugene Zhulenev	1c879eb010	Remove V2 suffix from TensorBlock	2019-12-10 15:40:23 -08:00
Eugene Zhulenev	dbca11e880	Remove TensorBlock.h and old TensorBlock/BlockMapper	2019-12-10 14:31:44 -08:00
Janek Kozicki	11d6465326	fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing.	2019-12-06 21:07:39 +01:00
Rasmus Munk Larsen	366cf005b0	Add missing initialization in cxx11_tensor_trace.cpp.	2019-12-04 23:56:37 +00:00
Mehdi Goli	00f32752f7	[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake	2019-11-28 10:08:54 +00:00
Hans Johnson	6fb3e5f176	STYLE: Remove CMake-language block-end command arguments Ancient versions of CMake required else(), endif(), and similar block termination commands to have arguments matching the command starting the block. This is no longer the preferred style.	2019-10-31 11:36:27 -05:00
Gael Guennebaud	b9837ca9ae	bug #1281 : fix AutoDiffScalar's make_coherent for nested expression of constant ADs.	2019-11-14 14:58:08 +01:00
Eugene Zhulenev	13c3327f5c	Remove legacy block evaluation support	2019-11-12 10:12:28 -08:00
Rasmus Munk Larsen	ebf04fb3e8	Fix data race in css11_tensor_notification test.	2019-11-08 17:44:50 -08:00
Rasmus Munk Larsen	97c0c5d485	Add block evaluation V2 to TensorAsyncExecutor. Add async evaluation to a number of ops.	2019-10-22 12:42:44 -07:00
Rasmus Munk Larsen	668ab3fc47	Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers.	2019-10-18 16:42:00 -07:00
Eugene Zhulenev	0d2a14ce11	Cleanup Tensor block destination and materialized block storage allocation	2019-10-16 17:14:37 -07:00
Eugene Zhulenev	02431cbe71	TensorBroadcasting support for random/uniform blocks	2019-10-16 13:26:28 -07:00
Eugene Zhulenev	d380c23b2c	Block evaluation for TensorGenerator/TensorReverse/TensorShuffling	2019-10-14 14:31:59 -07:00
Eugene Zhulenev	a411e9f344	Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op	2019-10-10 10:56:58 -07:00
Eugene Zhulenev	33e1746139	Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing	2019-10-09 12:45:31 -07:00
Gael Guennebaud	f0a4642bab	Implement c++03 compatible fix for changeset `7a43af1a33`	2019-10-09 16:00:57 +02:00
Gael Guennebaud	7a43af1a33	Fix compilation of FFTW unit test	2019-10-08 08:58:35 +02:00
Eugene Zhulenev	f74ab8cb8d	Add block evaluation to TensorEvalTo and fix few small bugs	2019-10-07 15:34:26 -07:00
Eugene Zhulenev	98bdd7252e	Fix compilation warnings and errors with clang in TensorBlockV2 code and tests	2019-10-04 10:15:33 -07:00
Eugene Zhulenev	60ae24ee1a	Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect	2019-10-02 12:44:06 -07:00
Eugene Zhulenev	7c8bc0d928	Fix cxx11_tensor_block_io test	2019-09-25 11:48:11 -07:00
Eugene Zhulenev	71d5bedf72	Fix compilation warnings and errors with clang in TensorBlockV2	2019-09-25 11:25:22 -07:00
Eugene Zhulenev	c97b208468	Add new TensorBlock api implementation + tests	2019-09-24 15:17:35 -07:00
Eugene Zhulenev	ef9dfee7bd	Tensor block evaluation V2 support for unary/binary/broadcsting	2019-09-24 12:52:45 -07:00
Rasmus Munk Larsen	1d5af0693c	Add support for asynchronous evaluation of tensor casting expressions.	2019-09-19 13:54:49 -07:00
Srinivas Vasudevan	df0816b71f	Merging eigen/eigen.	2019-09-16 19:33:29 -04:00
Srinivas Vasudevan	6e215cf109	Add Bessel functions to SpecialFunctions. - Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1	2019-09-14 12:16:47 -04:00
Deven Desai	cdb377d0cb	Fix for the HIP build+test errors introduced by the ndtri support. The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)	2019-09-06 16:03:49 +00:00
Eugene Zhulenev	d918bd9a8b	Update ThreadLocal to use separate Initialize/Release callables	2019-09-10 16:13:32 -07:00
Eugene Zhulenev	e3dec4dcc1	ThreadLocal container that does not rely on thread local storage	2019-09-09 15:18:14 -07:00
Srinivas Vasudevan	e38dd48a27	PR 681: Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Eugene Zhulenev	47fefa235f	Allow move-only done callback in TensorAsyncDevice	2019-09-03 17:20:56 -07:00
Eugene Zhulenev	a8d264fa9c	Add test for const TensorMap underlying data mutation	2019-09-03 11:38:39 -07:00
Eugene Zhulenev	f0b36fb9a4	evalSubExprsIfNeededAsync + async TensorContractionThreadPool	2019-08-30 15:13:38 -07:00
Eugene Zhulenev	66665e7e76	Asynchronous expression evaluation with TensorAsyncDevice	2019-08-30 14:49:40 -07:00
Eugene Zhulenev	bc40d4522c	Const correctness in TensorMap<const Tensor<T, ...>> expressions	2019-08-28 17:46:05 -07:00
Eugene Zhulenev	071311821e	Remove XSMM support from Tensor module	2019-08-19 11:44:25 -07:00
Rasmus Munk Larsen	facc4e4536	Disable tests for contraction with output kernels when using libxsmm, which does not support this.	2019-08-07 14:11:15 -07:00
Eugene Zhulenev	6e7c76481a	Merge with Eigen head	2019-06-28 11:22:46 -07:00
Eugene Zhulenev	878845cb25	Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred	2019-06-28 11:13:44 -07:00
Mehdi Goli	7d08fa805a	[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.	2019-06-28 10:08:23 +01:00
tra	b4c49bf00e	Minor build improvements * Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path	2019-05-31 14:08:34 -07:00

1 2 3 4 5 ...

1115 Commits