math: Use log10p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log10p1f.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 68.5251 32.2627 52.92%
x86_64v2 68.8912 32.7887 52.41%
x86_64v3 59.3427 27.0521 54.41%
i686 162.026 103.383 36.19%
aarch64 26.8513 14.5695 45.74%
power10 12.7426 8.4929 33.35%
powerpc 16.6768 9.29135 44.29%
reciprocal-throughput master patched improvement
x86_64 26.0969 12.4023 52.48%
x86_64v2 25.0045 11.0748 55.71%
x86_64v3 20.5610 10.2995 49.91%
i686 89.8842 78.5211 12.64%
aarch64 17.1200 9.4832 44.61%
power10 6.7814 6.4258 5.24%
powerpc 15.769 7.6825 51.28%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>