eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-03-07 18:27:40 +08:00

Author	SHA1	Message	Date
Benoit Steiner	217d984abc	Fixed a typo in my previous commit	2016-05-11 10:22:15 -07:00
Benoit Steiner	08348b4e48	Fix potential race condition in the CUDA reduction code.	2016-05-11 10:08:51 -07:00
Benoit Steiner	cbb14ed47e	Added a few tests to validate the generation of random tensors on GPU.	2016-05-11 10:05:56 -07:00
Benoit Steiner	6a5717dc74	Explicitely initialize all the atomic variables.	2016-05-11 10:04:41 -07:00
Benoit Steiner	4ede059de1	Properly gate the use of half2.	2016-05-10 17:04:01 -07:00
Benoit Steiner	661e710092	Added support for fp16 to the sigmoid functor.	2016-05-10 12:25:27 -07:00
Benoit Steiner	0eb69b7552	Small improvement to the full reduction of fp16	2016-05-10 11:58:18 -07:00
Benoit Steiner	6bf8273bc0	Added a test to validate the new non blocking thread pool	2016-05-10 10:49:34 -07:00
Benoit Steiner	4013b8feca	Simplified the reduction code a little.	2016-05-10 09:40:42 -07:00
Benoit Steiner	75bd2bd32d	Fixed compilation warning	2016-05-09 19:24:41 -07:00
Benoit Steiner	4670d7d5ce	Improved the performance of full reductions on GPU: Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s	2016-05-09 17:09:54 -07:00
Benoit Steiner	c3859a2b58	Added the ability to use a scratch buffer in cuda kernels	2016-05-09 17:05:53 -07:00
Benoit Steiner	ba95e43ea2	Added a new parallelFor api to the thread pool device.	2016-05-09 10:45:12 -07:00
Benoit Steiner	dc7dbc2df7	Optimized the non blocking thread pool: * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks.	2016-05-09 10:17:17 -07:00
Benoit Steiner	691614bd2c	Worked around a bug in nvcc on tegra x1	2016-05-07 13:28:53 -07:00
Benoit Steiner	c54ae65c83	Marked a few tensor operations as read only	2016-05-05 17:18:47 -07:00
Benoit Steiner	69a8a4e1f3	Added a test to validate full reduction on tensor of half floats	2016-05-05 16:52:50 -07:00
Benoit Steiner	678a17ba79	Made the testing of contractions on fp16 more robust	2016-05-05 16:36:39 -07:00
Benoit Steiner	e3d053e14e	Refined the testing of log and exp on fp16	2016-05-05 16:24:15 -07:00
Benoit Steiner	9a48688d37	Further improved the testing of fp16	2016-05-05 15:58:05 -07:00
Benoit Steiner	910e013506	Relaxed an assertion that was tighter that necessary.	2016-05-05 15:38:16 -07:00
Benoit Steiner	28d5572658	Fixed some incorrect assertions	2016-05-05 10:02:26 -07:00
Benoit Steiner	2aba40d208	Avoid unecessary type promotion	2016-05-05 09:26:57 -07:00
Benoit Steiner	a4d6e8fef0	Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.	2016-05-05 09:25:55 -07:00
Benoit Steiner	7875437ca0	Avoided unecessary type promotion	2016-05-05 09:08:42 -07:00
Benoit Steiner	f363e533aa	Added tests for full contractions using thread pools and gpu devices. Fixed a couple of issues in the corresponding code.	2016-05-05 09:05:45 -07:00
Benoit Steiner	06d774bf58	Updated the contraction code to ensure that full contraction return a tensor of rank 0	2016-05-05 08:37:47 -07:00
Christoph Hertzberg	b300a84989	Fixed some singed/unsigned comparison warnings	2016-05-05 13:36:28 +02:00
Christoph Hertzberg	dacb469bc9	Enable and fix -Wdouble-conversion warnings	2016-05-05 13:35:45 +02:00
Benoit Steiner	62b710072e	Reduced the memory footprint of the cxx11_tensor_image_patch test	2016-05-04 21:08:22 -07:00
Benoit Steiner	dd2b45feed	Removed extraneous 'explicit' keywords	2016-05-04 16:57:52 -07:00
Benoit Steiner	968ec1c2ae	Use numext::isfinite instead of std::isfinite	2016-05-03 19:56:40 -07:00
Benoit Steiner	2c5568a757	Added a test to validate the computation of exp and log on 16bit floats	2016-05-03 12:06:07 -07:00
Benoit Steiner	aad9a04da4	Deleted superfluous explicit keyword.	2016-05-03 09:37:19 -07:00
Benoit Steiner	8a9228ed9b	Fixed compilation error	2016-05-01 14:48:01 -07:00
Benoit Steiner	d6c9596fd8	Added missing accessors to fixed sized tensors	2016-04-29 18:51:33 -07:00
Benoit Steiner	17fe7f354e	Deleted trailing commas	2016-04-29 18:39:01 -07:00
Benoit Steiner	e5f71aa6b2	Deleted useless trailing commas	2016-04-29 18:36:10 -07:00
Benoit Steiner	44f592dceb	Deleted unnecessary trailing commas.	2016-04-29 18:33:46 -07:00
Benoit Steiner	2b890ae618	Fixed compilation errors generated by clang	2016-04-29 18:30:40 -07:00
Benoit Steiner	d217217842	Added a few tests to ensure that the dimensions of rank 0 tensors are correctly computed	2016-04-29 18:15:34 -07:00
Benoit Steiner	f100d1494c	Return the proper size (ie 1) for tensors of rank 0	2016-04-29 18:14:33 -07:00
Benoit Steiner	d14105f158	Made several tensor tests compatible with cxx03	2016-04-29 17:22:37 -07:00
Benoit Steiner	c0882ef4d9	Moved a number of tensor tests that don't require cxx11 to work properly outside the EIGEN_TEST_CXX11 test section	2016-04-29 17:13:51 -07:00
Benoit Steiner	9d1dbd1ec0	Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support	2016-04-29 16:53:55 -07:00
Benoit Steiner	a8c0405cf5	Deleted unused default values for template parameters	2016-04-29 16:34:43 -07:00
Benoit Steiner	4f53178e62	Made a coupe of tensor tests compile without requiring c++11 support.	2016-04-29 16:09:54 -07:00
Benoit Steiner	1131a984a6	Made the cxx11_tensor_forced_eval compile without c++11.	2016-04-29 15:48:59 -07:00
Benoit Steiner	c07404f6a1	Restore Tensor support for non c++11 compilers	2016-04-29 15:19:19 -07:00
Benoit Steiner	ba32ded021	Fixed include path	2016-04-29 15:11:09 -07:00
Benoit Steiner	a524a26fdc	Fixed a few memory leaks	2016-04-28 18:55:53 -07:00
Gael Guennebaud	318e65e0ae	Fix missing inclusion of Eigen/Core	2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen	463738ccbe	Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.	2016-04-27 12:26:18 -07:00
Gael Guennebaud	3dddd34133	Refactor the unsupported CXX11/Core module to internal headers only.	2016-04-26 11:20:25 +02:00
Benoit Steiner	4a164d2c46	Fixed the partial evaluation of non vectorizable tensor subexpressions	2016-04-25 10:43:03 -07:00
Benoit Steiner	fd9401f260	Refined the cost of the striding operation.	2016-04-25 09:16:08 -07:00
Benoit Steiner	4bbc97be5e	Provide access to the base threadpool classes	2016-04-21 17:59:33 -07:00
Benoit Steiner	33adce5c3a	Added the ability to switch to the new thread pool with a #define	2016-04-21 11:59:58 -07:00
Benoit Steiner	f670613e4b	Fixed several compilation warnings	2016-04-21 11:03:02 -07:00
Benoit Steiner	32ffce04fc	Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable.	2016-04-21 08:47:28 -07:00
Benoit Steiner	2dde1b1028	Don't crash when attempting to reduce empty tensors.	2016-04-20 18:08:20 -07:00
Benoit Steiner	a792cd357d	Added more tests	2016-04-20 17:33:58 -07:00
Benoit Steiner	c7c2054bb5	Started to implement a portable way to yield.	2016-04-19 17:59:58 -07:00
Benoit Steiner	2b72163028	Implemented a more portable version of thread local variables	2016-04-19 15:56:02 -07:00
Benoit Steiner	04f954956d	Fixed a few typos	2016-04-19 15:27:09 -07:00
Benoit Steiner	5b1106c56b	Fixed a compilation error with nvcc 7.	2016-04-19 14:57:57 -07:00
Benoit Steiner	7129d998db	Simplified the code that launches cuda kernels.	2016-04-19 14:55:21 -07:00
Benoit Steiner	b9ea40c30d	Don't take the address of a kernel on CUDA devices that don't support this feature.	2016-04-19 14:35:11 -07:00
Benoit Steiner	884c075058	Use numext::ceil instead of std::ceil	2016-04-19 14:33:30 -07:00
Benoit Steiner	a278414d1b	Avoid an unnecessary copy of the evaluator.	2016-04-19 13:54:28 -07:00
Benoit Steiner	f953c60705	Fixed 2 recent regression tests	2016-04-19 12:57:39 -07:00
Benoit Steiner	50968a0a3e	Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.	2016-04-19 11:53:58 -07:00
Benoit Steiner	84543c8be2	Worked around the lack of a rand_r function on windows systems	2016-04-17 19:29:27 -07:00
Benoit Steiner	5fbcfe5eb4	Worked around the lack of a rand_r function on windows systems	2016-04-17 18:42:31 -07:00
Benoit Steiner	c8e8f93d6c	Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.	2016-04-15 16:48:10 -07:00
Benoit Steiner	7cff898e0a	Deleted unnecessary variable	2016-04-15 15:46:14 -07:00
Benoit Steiner	6c43c49e4a	Fixed a few compilation warnings	2016-04-15 15:34:34 -07:00
Benoit Steiner	eb669f989f	Merged in rmlarsen/eigen (pull request PR-178) Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.	2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen	3718bf654b	Get rid of void* casting when calling EvalRange::run.	2016-04-15 12:51:33 -07:00
Benoit Steiner	40c9923a8a	Fixed compilation errors with msvc	2016-04-15 11:27:52 -07:00
Benoit Steiner	a62e924656	Added ability to access the cache sizes from the tensor devices	2016-04-14 21:25:06 -07:00
Benoit Steiner	18e6f67426	Added support for exclusive or	2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen	07ac4f7e02	Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.	2016-04-14 18:28:23 -07:00
Benoit Steiner	9624a1ea3d	Added missing definition of PacketSize in the gpu evaluator of convolution	2016-04-14 17:16:58 -07:00
Benoit Steiner	6fbedf5a4e	Merged in rmlarsen/eigen (pull request PR-177) Eigen Tensor cost model part 1.	2016-04-14 17:13:19 -07:00
Benoit Steiner	bebb89acfa	Enabled the new threadpool tests	2016-04-14 16:44:10 -07:00
Benoit Steiner	9c064b5a97	Cleanup	2016-04-14 16:41:31 -07:00
Benoit Steiner	1372156c41	Prepared the migration to the new non blocking thread pool	2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen	aeb5494a0b	Improvements to cost model.	2016-04-14 15:52:58 -07:00
Benoit Steiner	a8e8837ba7	Added tests for the non blocking thread pool	2016-04-14 15:23:49 -07:00
Benoit Steiner	78a51abc12	Added a more scalable non blocking thread pool	2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen	d2e95492e7	Merge upstream updates.	2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen	235e83aba6	Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.	2016-04-14 13:57:35 -07:00
Benoit Steiner	5912ad877c	Silenced a compilation warning	2016-04-14 11:40:14 -07:00
Benoit Steiner	2b6e3de02f	Added tests to validate flooring and ceiling of fp16	2016-04-14 11:39:18 -07:00
Benoit Steiner	6f23e945f6	Added simple test for numext::sqrt and numext::pow on fp16	2016-04-14 10:32:52 -07:00
Benoit Steiner	72510c80e1	Added basic test for trigonometric functions on fp16	2016-04-14 10:27:24 -07:00
Benoit Steiner	c7167fee0e	Added support for fp16 to the sigmoid function	2016-04-14 10:08:33 -07:00
Benoit Steiner	f6003f0873	Made the test msvc friendly	2016-04-14 09:47:26 -07:00
Gael Guennebaud	7d1391d049	Turn a converge check to a warning	2016-04-13 22:50:54 +02:00

1 2 3 4 5 ...

1821 Commits