glibc

mirror of git://sourceware.org/git/glibc.git synced 2025-01-06 12:00:24 +08:00

History

Adhemerval Zanella Netto 34b9f8bc17 math: Improve fmod This uses a new algorithm similar to already proposed earlier [1]. With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), the simplest implementation is: mx * 2^ex == 2 * mx * 2^(ex - 1) while (ex > ey) { mx = 2; --ex; mx %= my; } With mx/my being mantissa of double floating pointer, on each step the argument reduction can be improved 11 (which is sizeo of uint64_t minus MANTISSA_WIDTH plus the signal bit): while (ex > ey) { mx << 11; ex -= 11; mx %= my; } / The implementation uses builtin clz and ctz, along with shifts to convert hx/hy back to doubles. Different than the original patch, this path assume modulo/divide operation is slow, so use multiplication with invert values. I see the following performance improvements using fmod benchtests (result only show the 'mean' result): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- x86_64 (Ryzen 9) \| subnormals \| 19.1584 \| 12.5049 x86_64 (Ryzen 9) \| normal \| 1016.51 \| 296.939 x86_64 (Ryzen 9) \| close-exponents \| 18.4428 \| 16.0244 aarch64 (N1) \| subnormal \| 11.153 \| 6.81778 aarch64 (N1) \| normal \| 528.649 \| 155.62 aarch64 (N1) \| close-exponents \| 11.4517 \| 8.21306 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it a lot of soft-fp implementation (for modulo, clz, ctz, and multiplication): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- armhf (N1) \| subnormal \| 15.908 \| 15.1083 armhf (N1) \| normal \| 837.525 \| 244.833 armhf (N1) \| close-exponents \| 16.2111 \| 21.8182 Instead of using the math_private.h definitions, I used the math_config.h instead which is used on newer math implementations. Co-authored-by: kirill <kirill.okhotnikov@gmail.com> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>		2023-04-03 16:36:24 -03:00
..
dbl-64	math: Improve fmod	2023-04-03 16:36:24 -03:00
float128	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
flt-32	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
ldbl-64-128	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
ldbl-96	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
ldbl-128	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
ldbl-128ibm	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
ldbl-128ibm-compat	Move libc_freeres_ptrs and libc_subfreeres to hidden/weak functions	2023-03-27 13:57:55 -03:00
ldbl-opt	C2x scanf binary constant handling	2023-03-02 19:10:37 +00:00
soft-fp	math: Suppress -O0 warnings for soft-fp fsqrt [BZ #19444 ]	2023-01-11 17:50:51 -03:00
ieee754.h	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
k_standard.c
k_standardf.c	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
k_standardl.c	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
libm-alias-finite.h	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
Makefile
s_lib_version.c
s_matherr.c
s_signgam.c