eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Mehdi Goli	0b24e1cb5c	[SYCL] Adding the SYCL memory model. The SYCL memory model provides : * an interface for SYCL buffers to behave as a non-dereferenceable pointer * an interface for placeholder accessor to behave like a pointer on both host and device	2019-07-01 16:02:30 +01:00
Eugene Zhulenev	81a03bec75	Fix TensorReverse on GPU with m_stride[i]==0	2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen	8053eeb51e	Fix CUDA compilation error for pselect<half>.	2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen	74a9dd1102	Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.	2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen	70d4020ad9	Remove comma causing warning in c++03 mode.	2019-06-28 11:39:45 -07:00
Eugene Zhulenev	6e7c76481a	Merge with Eigen head	2019-06-28 11:22:46 -07:00
Eugene Zhulenev	878845cb25	Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred	2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen	1f61aee5ca	[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.	2019-06-28 10:11:56 -07:00
Mehdi Goli	7d08fa805a	[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.	2019-06-28 10:08:23 +01:00
Mehdi Goli	16a56b2ddd	[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization	2019-06-27 12:25:09 +01:00
Christoph Hertzberg	adec097c61	Remove extra comma (causes warnings in C++03)	2019-06-26 16:14:28 +02:00
Eugene Zhulenev	229db81572	Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp	2019-06-25 15:41:37 -07:00
Deven Desai	ba506d5bd2	fix for a ROCm/HIP specificcompile errror introduced by a recent commit.	2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen	c9394d7a0e	Remove extra "one" in comment.	2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen	b8f8dac4eb	Update comment as suggested by tra@google.com.	2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen	e5e63c2cad	Fix grammar.	2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen	302a404b7e	Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause.	2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen	b5237f53b1	Fix CUDA build on Mac.	2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen	988f24b730	Various fixes for packet ops. 1. Fix buggy pcmp_eq and unit test for half types. 2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types. 3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.	2019-06-20 11:47:49 -07:00
Christoph Hertzberg	e0be7f30e1	bug #1724 : Mask buggy warnings with g++-7 (grafted from `427f2f66d6` )	2019-06-14 14:57:46 +02:00
Rasmus Munk Larsen	6d432eae5d	Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off.	2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen	f715f6e816	Add workaround for choosing the right include files with FP16C support with clang.	2019-06-05 13:36:37 -07:00
Rasmus Larsen	c1b0aea653	Merged in Artem-B/eigen (pull request PR-654) Minor build improvements Approved-by: Rasmus Larsen <rmlarsen@google.com>	2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen	b08527b0c1	Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.	2019-05-31 15:26:06 -07:00
tra	b4c49bf00e	Minor build improvements * Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path	2019-05-31 14:08:34 -07:00
Christoph Hertzberg	5614400581	digits10() needs to return an integer Problem reported on https://stackoverflow.com/questions/56395899	2019-05-31 15:45:41 +02:00
Rasmus Larsen	36e0a2b93f	Merged in deven-amd/eigen-hip-fix-190524 (pull request PR-649) fix for HIP build errors that were introduced by a commit earlier this week	2019-05-24 16:05:31 +00:00
Deven Desai	2c38930161	fix for HIP build errors that were introduced by a commit earlier this week	2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves	56bc4974fb	GEMV: remove double declaration of constant. That was hurting users with compilers that would object to proceed with that: """ ./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow] LhsPacketSize = Traits::LhsPacketSize, ^ ./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here static const Index LhsPacketSize = Traits::LhsPacketSize; """	2019-05-23 14:50:29 -07:00
Christoph Hertzberg	ac21a08c13	Cast Index to RealScalar This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types). Reported by Peter Anderson-Sprecher via eMail	2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen	3eb5ad0ed0	Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177 and are part of LLVM 3.8.0.	2019-05-20 17:19:20 -07:00
Rasmus Larsen	e92486b8c3	Merged in rmlarsen/eigen (pull request PR-643) Make Eigen build with cuda 10 and clang. Approved-by: Justin Lebar <justin.lebar@gmail.com>	2019-05-20 17:02:39 +00:00
Rasmus Munk Larsen	fd595d42a7	Merge	2019-05-20 09:39:11 -07:00
Gael Guennebaud	cc7ecbb124	Merged in scramsby/eigen (pull request PR-646) Eigen: Fix MSVC C++17 language standard detection logic	2019-05-20 07:19:10 +00:00
Eugene Zhulenev	01654d97fa	Prevent potential division by zero in TensorExecutor	2019-05-17 14:02:25 -07:00
Rasmus Larsen	78d3015722	Merged in ezhulenev/eigen-01 (pull request PR-644) Always evaluate Tensor expressions with broadcasting via tiled evaluation code path	2019-05-17 19:44:25 +00:00
Rasmus Larsen	bf9cbed8d0	Merged in glchaves/eigen (pull request PR-635) Speed up GEMV on AVX-512 builds, just as done for GEBP previously. Approved-by: Rasmus Larsen <rmlarsen@google.com>	2019-05-17 19:40:50 +00:00
Eugene Zhulenev	96a276803c	Always evaluate Tensor expressions with broadcasting via tiled evaluation code path	2019-05-16 16:15:45 -07:00
Rasmus Munk Larsen	ab0a30e429	Make Eigen build with cuda 10 and clang.	2019-05-15 13:32:15 -07:00
Rasmus Munk Larsen	734a50dc60	Make Eigen build with cuda 10 and clang.	2019-05-15 13:32:15 -07:00
Rasmus Larsen	c8d8d5c0fc	Merged in rmlarsen/eigen_threadpool (pull request PR-640) Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>	2019-05-13 20:04:35 +00:00
Christoph Hertzberg	5f32b79edc	Collapsed revision from PR-641 * SparseLU.h - corrected example, it didn't compile * Changed encoding back to UTF8	2019-05-13 19:02:30 +02:00
Anuj Rawat	ad372084f5	Removing unused API to fix compile error in TensorFlow due to AVX512VL, AVX512BW usage	2019-05-12 14:43:10 +00:00
Christoph Hertzberg	4ccd1ece92	bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions	2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen	d3ef7cf03e	Fix build with clang on Windows.	2019-05-09 11:07:04 -07:00
Rasmus Munk Larsen	e5ac8cbd7a	A) fix deadlocks in thread pool caused by EventCount This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: `01da8caf00` bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.	2019-05-08 10:16:46 -07:00
Michael Tesch	c5019f722b	Use pade for matrix exponential also for complex values.	2019-05-08 17:04:55 +02:00
Eugene Zhulenev	45b40d91ca	Fix AVX512 & GCC 6.3 compilation	2019-05-07 16:44:55 -07:00
Christoph Hertzberg	e6667a7060	Fix stupid shadow-warnings (with old clang versions)	2019-05-07 18:32:19 +02:00
Christoph Hertzberg	e54dc24d62	Restore C++03 compatibility	2019-05-07 18:30:44 +02:00

1 2 3 4 5 ...

10608 Commits