eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-03-07 18:27:40 +08:00

Author	SHA1	Message	Date
Benoit Steiner	97605c7b27	New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.	2016-05-13 17:11:29 -07:00
Benoit Steiner	c4fc8b70ec	Removed unnecessary thread synchronization	2016-05-13 10:49:38 -07:00
Benoit Steiner	7aa3557d31	Fixed compilation errors triggered by old versions of gcc	2016-05-12 18:59:04 -07:00
Benoit Steiner	ae9688f313	Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.	2016-05-12 12:06:51 -07:00
Benoit Steiner	2a54b70d45	Fixed potential race condition in the non blocking thread pool	2016-05-12 11:45:48 -07:00
Benoit Steiner	a071629fec	Replace implicit cast with an explicit one	2016-05-12 10:40:07 -07:00
Benoit Steiner	2f9401b061	Worked around compilation errors with older versions of gcc	2016-05-11 23:39:20 -07:00
Benoit Steiner	09653e1f82	Improved the portability of the tensor code	2016-05-11 23:29:09 -07:00
Benoit Steiner	b6a517c47d	Added the ability to load fp16 using the texture path. Improved the performance of some reductions on fp16	2016-05-11 21:26:48 -07:00
Christoph Hertzberg	1a1ce6ff61	Removed deprecated flag (which apparently was ignored anyway)	2016-05-11 23:05:37 +02:00
Christoph Hertzberg	2150f13d65	fixed some double-promotion and sign-compare warnings	2016-05-11 23:02:26 +02:00
Benoit Steiner	217d984abc	Fixed a typo in my previous commit	2016-05-11 10:22:15 -07:00
Benoit Steiner	08348b4e48	Fix potential race condition in the CUDA reduction code.	2016-05-11 10:08:51 -07:00
Benoit Steiner	6a5717dc74	Explicitely initialize all the atomic variables.	2016-05-11 10:04:41 -07:00
Benoit Steiner	4ede059de1	Properly gate the use of half2.	2016-05-10 17:04:01 -07:00
Benoit Steiner	661e710092	Added support for fp16 to the sigmoid functor.	2016-05-10 12:25:27 -07:00
Benoit Steiner	0eb69b7552	Small improvement to the full reduction of fp16	2016-05-10 11:58:18 -07:00
Benoit Steiner	4013b8feca	Simplified the reduction code a little.	2016-05-10 09:40:42 -07:00
Benoit Steiner	4670d7d5ce	Improved the performance of full reductions on GPU: Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s	2016-05-09 17:09:54 -07:00
Benoit Steiner	c3859a2b58	Added the ability to use a scratch buffer in cuda kernels	2016-05-09 17:05:53 -07:00
Benoit Steiner	ba95e43ea2	Added a new parallelFor api to the thread pool device.	2016-05-09 10:45:12 -07:00
Benoit Steiner	dc7dbc2df7	Optimized the non blocking thread pool: * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks.	2016-05-09 10:17:17 -07:00
Benoit Steiner	c54ae65c83	Marked a few tensor operations as read only	2016-05-05 17:18:47 -07:00
Benoit Steiner	910e013506	Relaxed an assertion that was tighter that necessary.	2016-05-05 15:38:16 -07:00
Benoit Steiner	28d5572658	Fixed some incorrect assertions	2016-05-05 10:02:26 -07:00
Benoit Steiner	a4d6e8fef0	Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.	2016-05-05 09:25:55 -07:00
Benoit Steiner	f363e533aa	Added tests for full contractions using thread pools and gpu devices. Fixed a couple of issues in the corresponding code.	2016-05-05 09:05:45 -07:00
Benoit Steiner	06d774bf58	Updated the contraction code to ensure that full contraction return a tensor of rank 0	2016-05-05 08:37:47 -07:00
Christoph Hertzberg	dacb469bc9	Enable and fix -Wdouble-conversion warnings	2016-05-05 13:35:45 +02:00
Benoit Steiner	dd2b45feed	Removed extraneous 'explicit' keywords	2016-05-04 16:57:52 -07:00
Benoit Steiner	968ec1c2ae	Use numext::isfinite instead of std::isfinite	2016-05-03 19:56:40 -07:00
Benoit Steiner	aad9a04da4	Deleted superfluous explicit keyword.	2016-05-03 09:37:19 -07:00
Benoit Steiner	8a9228ed9b	Fixed compilation error	2016-05-01 14:48:01 -07:00
Benoit Steiner	d6c9596fd8	Added missing accessors to fixed sized tensors	2016-04-29 18:51:33 -07:00
Benoit Steiner	17fe7f354e	Deleted trailing commas	2016-04-29 18:39:01 -07:00
Benoit Steiner	e5f71aa6b2	Deleted useless trailing commas	2016-04-29 18:36:10 -07:00
Benoit Steiner	44f592dceb	Deleted unnecessary trailing commas.	2016-04-29 18:33:46 -07:00
Benoit Steiner	f100d1494c	Return the proper size (ie 1) for tensors of rank 0	2016-04-29 18:14:33 -07:00
Benoit Steiner	a8c0405cf5	Deleted unused default values for template parameters	2016-04-29 16:34:43 -07:00
Benoit Steiner	c07404f6a1	Restore Tensor support for non c++11 compilers	2016-04-29 15:19:19 -07:00
Benoit Steiner	ba32ded021	Fixed include path	2016-04-29 15:11:09 -07:00
Gael Guennebaud	318e65e0ae	Fix missing inclusion of Eigen/Core	2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen	463738ccbe	Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.	2016-04-27 12:26:18 -07:00
Gael Guennebaud	3dddd34133	Refactor the unsupported CXX11/Core module to internal headers only.	2016-04-26 11:20:25 +02:00
Benoit Steiner	4a164d2c46	Fixed the partial evaluation of non vectorizable tensor subexpressions	2016-04-25 10:43:03 -07:00
Benoit Steiner	fd9401f260	Refined the cost of the striding operation.	2016-04-25 09:16:08 -07:00
Benoit Steiner	4bbc97be5e	Provide access to the base threadpool classes	2016-04-21 17:59:33 -07:00
Benoit Steiner	33adce5c3a	Added the ability to switch to the new thread pool with a #define	2016-04-21 11:59:58 -07:00
Benoit Steiner	f670613e4b	Fixed several compilation warnings	2016-04-21 11:03:02 -07:00
Benoit Steiner	2dde1b1028	Don't crash when attempting to reduce empty tensors.	2016-04-20 18:08:20 -07:00

1 2 3 4 5 ...

1400 Commits