eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	f6c51d9209	Fix missing header inclusion and colliding definitions for half type casting, which broke build with -march=native on Haswell/Skylake.	2019-08-30 14:03:29 -07:00
Eugene Zhulenev	bc40d4522c	Const correctness in TensorMap<const Tensor<T, ...>> expressions	2019-08-28 17:46:05 -07:00
Rasmus Munk Larsen	1187bb65ad	Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.	2019-08-28 12:20:21 -07:00
Eugene Zhulenev	6e77f9bef3	Remove shadow warnings in TensorDeviceThreadPool	2019-08-28 10:32:19 -07:00
Rasmus Munk Larsen	9aba527405	Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.	2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen	b021cdea6d	Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.	2019-08-27 11:30:31 -07:00
Rasmus Larsen	84fefdf321	Merged in ezhulenev/eigen-01 (pull request PR-683) Asynchronous parallelFor in Eigen ThreadPoolDevice	2019-08-26 21:49:17 +00:00
maratek	8b5ab0e4dd	Fix get_random_seed on Native Client Newlib in Native Client SDK does not provide ::random function. Implement get_random_seed for NaCl using ::rand, similarly to Windows version.	2019-08-23 15:25:56 -07:00
Eugene Zhulenev	6901788013	Asynchronous parallelFor in Eigen ThreadPoolDevice	2019-08-22 10:50:51 -07:00
Christoph Hertzberg	2fb24384c9	Merged in jaopaulolc/eigen (pull request PR-679) Fixes for Altivec/VSX and compilation with clang on PowerPC	2019-08-22 15:57:33 +00:00
Rasmus Larsen	57f6b62597	Merged in rmlarsen/eigen (pull request PR-680) Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.	2019-08-22 00:25:29 +00:00
Eugene Zhulenev	071311821e	Remove XSMM support from Tensor module	2019-08-19 11:44:25 -07:00
João P. L. de Carvalho	5ac7984ffa	Fix debug macros in p{load,store}u	2019-08-14 11:59:12 -06:00
João P. L. de Carvalho	db9147ae40	Add missing pcmp_XX methods for double/Packet2d This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.	2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen	a3298b22ec	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM	2019-08-12 13:53:28 -07:00
João P. L. de Carvalho	787f6ef025	Fix packed load/store for PowerPC's VSX The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts. For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f. Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.	2019-08-09 16:02:55 -06:00
João P. L. de Carvalho	4d29aa0294	Fix offset argument of ploadu/pstoreu for Altivec If no offset is given, them it should be zero. Also passes full address to vec_vsx_ld/st builtins. Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT. Removes unnecessary casts.	2019-08-09 15:59:26 -06:00
João P. L. de Carvalho	66d073c38e	bug #1718 : Add cast to successfully compile with clang on PowerPC Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h	2019-08-09 15:56:26 -06:00
Rasmus Munk Larsen	d55d392e7b	Fix bugs in log1p and expm1 where repeated using statements would clobber each other. Add specializations for complex types since std::log1p and std::exp1m do not support complex.	2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen	85928e5f47	Guard against repeated definition of EIGEN_MPL2_ONLY	2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen	facc4e4536	Disable tests for contraction with output kernels when using libxsmm, which does not support this.	2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen	eab7e52db2	[Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX. A few benchmark numbers: name old time/op new time/op delta BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21% (p=0.008 n=5+5) BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5) BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)	2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen	0987126165	Clean up unnecessary namespace specifiers in TensorBlock.h.	2019-08-07 12:12:52 -07:00
Gael Guennebaud	0050644b23	Fix doc regarding alignment and c++17	2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen	e2999d4c38	Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662 . The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------------------- BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s VS Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------------------- BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s	2019-08-02 11:18:13 -07:00
Kyle Vedder	f22b7283a3	Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code.	2019-07-18 18:12:14 +00:00
Michael Grupp	6e17491f45	Fix typo in Umeyama method documentation	2019-07-17 11:20:41 +00:00
Christoph Hertzberg	e0f5a2a456	Remove {} accidentally added in previous commit	2019-07-18 20:22:17 +02:00
Christoph Hertzberg	ea6d7eb32f	Move variadic constructors outside `#ifndef EIGEN_PARSED_BY_DOXYGEN` block, to make it actually appear in the generated documentation.	2019-07-12 19:46:37 +02:00
Christoph Hertzberg	9237883ff1	Escape \# inside doxygen docu	2019-07-12 19:45:13 +02:00
Christoph Hertzberg	c2671e5315	Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING Also, document LinSpaced only where it is implemented	2019-07-12 19:43:32 +02:00
Eugene Zhulenev	3cd148f983	Fix expression evaluation heuristic for TensorSliceOp	2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen	23b958818e	Fix compiler for unsigned integers.	2019-07-09 11:18:25 -07:00
Eugene Zhulenev	6083014594	Add outer/inner chipping optimization for chipping dimension specified at runtime	2019-07-03 11:35:25 -07:00
Deven Desai	7eb2e0a95b	adding the EIGEN_DEVICE_FUNC attribute to the constCast routine. Not having this attribute results in the following failures in the `--config=rocm` TF build. ``` In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20: In file included from ./tensorflow/core/framework/register_types.h:20: In file included from ./tensorflow/core/framework/numeric_types.h:20: In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1: In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data' typename Storage::Type result = constCast(m_impl.data()); ^ external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data' external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\ Device>::data' requested here return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data()); ``` Adding the EIGEN_DEVICE_FUNC attribute resolves those errors	2019-07-02 20:02:46 +00:00
Gael Guennebaud	ef8aca6a89	Merged in codeplaysoftware/eigen (pull request PR-667) [SYCL] : Approved-by: Gael Guennebaud <g.gael@free.fr> Approved-by: Rasmus Larsen <rmlarsen@google.com>	2019-07-02 12:45:23 +00:00
Eugene Zhulenev	4ac93f8edc	Allocate non-const scalar buffer for block evaluation with DefaultDevice	2019-07-01 10:55:19 -07:00
Mehdi Goli	9ea490c82c	[SYCL] : * Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`. * Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro. * Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`. * Fixing the SYCL device macro in SpecialFunctionsImpl.h.	2019-07-01 16:27:28 +01:00
Eugene Zhulenev	81a03bec75	Fix TensorReverse on GPU with m_stride[i]==0	2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen	8053eeb51e	Fix CUDA compilation error for pselect<half>.	2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen	74a9dd1102	Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.	2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen	70d4020ad9	Remove comma causing warning in c++03 mode.	2019-06-28 11:39:45 -07:00
Eugene Zhulenev	6e7c76481a	Merge with Eigen head	2019-06-28 11:22:46 -07:00
Eugene Zhulenev	878845cb25	Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred	2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen	1f61aee5ca	[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.	2019-06-28 10:11:56 -07:00
Mehdi Goli	7d08fa805a	[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.	2019-06-28 10:08:23 +01:00
Mehdi Goli	16a56b2ddd	[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization	2019-06-27 12:25:09 +01:00
Christoph Hertzberg	adec097c61	Remove extra comma (causes warnings in C++03)	2019-06-26 16:14:28 +02:00
Eugene Zhulenev	229db81572	Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp	2019-06-25 15:41:37 -07:00
Deven Desai	ba506d5bd2	fix for a ROCm/HIP specificcompile errror introduced by a recent commit.	2019-06-22 00:06:05 +00:00

1 2 3 4 5 ...

10647 Commits