mirror of
git://sourceware.org/git/glibc.git
synced 2024-11-27 03:41:23 +08:00
master
1 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Maciej W. Rozycki
|
7ec4d7e3d1 |
stdio-common: Add tests for formatted printf output specifiers
This is a collection of tests for formatted printf output specifiers covering the d, i, o, u, x, and X integer conversions, the e, E, f, F, g, and G floating-point conversions, the c character conversion, and the s string conversion. Also the hh, h, l, and ll length modifiers are covered with the integer conversions as is the L length modifier with the floating-point conversions. The -, +, space, #, and 0 flags are iterated over, as permitted by the conversion handled, in tuples of 1..5, including tuples with repetitions of 2, and combined with field width and/or precision, again as permitted by the conversion. The resulting format string is then used to produce output from respective sets of input data corresponding to the specific conversion under test. POSIX extensions beyond ISO C are not used. Output is produced in the form of records which include both the format string (and width and/or precision where given in the form of separate arguments) and the conversion result, and is verified with GNU AWK using the format obtained from each such record against the reference value also supplied, relying on the fact that GNU AWK has its own independent implementation of format processing, striving to be ISO C compatible. In the course of implementation I have determined that in the non-bignum mode GNU AWK uses system sprintf(3) for the floating-point conversions, defeating the objective of doing the verification against an independent implementation. Additionally the bignum mode (using MPFR) is required to correctly output wider integer and floating-point data. Therefore for the conversions affected the relevant shell scripts sanity-check AWK and terminate with unsupported status if the bignum mode is unavailable for floating-point data or where data is output incorrectly. The f and F floating-point conversions are build-time options for GNU AWK, depending on the environment, so they are probed for before being used. Similarly the a and A floating-point conversions, however they are currently not used, see below. Also GNU AWK does not handle the b or B integer conversions at all at the moment, as at 5.3.0. Support for the a, A, b, and B conversions can however be easily added following the approach taken for the f and F conversions. Output produced by gawk for the a and A floating-point conversions does not match one produced by us: insufficient precision is used where one hasn't been explicitly given, e.g. for the negated maximum finite IEEE 754 64-bit value of -1.79769313486231570814527423731704357e+308 and "%a" format we produce -0x1.fffffffffffffp+1023 vs gawk's -0x1.000000p+1024 and a different exponent is chosen otherwise, such as with "%.a" where we output -0x2p+1023 vs gawk's -0x1p+1024 for the same value, or "%.20a" where -0x1.fffffffffffff0000000p+1023 is our output, but gawk produces -0xf.ffffffffffff80000000p+1020 instead. Consequently I chose not to include a and A conversions in testing at this time. And last but not least there are numerous corner cases that GNU AWK does not handle correctly, which are worked around by explicit handling in the AWK script. These are in particular: - extraneous leading 0 produced for the alternative form with the o conversion, e.g. { printf "%#.2o", 1 } produces "001" rather than "01", - unexpected 0 produced where no characters are expected for the input of 0 and the alternative form with the precision of 0 and the integer hexadecimal conversions, e.g. { printf "%#.x", 0 } produces "0" rather than "", - missing + character in the non-bignum mode only for the input of 0 with the + flag, precision of 0 and the signed integer conversions, e.g. { printf "%+.i", 0 } produces "" rather than "+", - missing space character in the non-bignum mode only for the input of 0 with the space flag, precision of 0 and the signed integer conversions, e.g. { printf "% .i", 0 } produces "" rather than " ", - for released gawk versions of up to 4.2.1 missing - character for the input of -NaN with the floating-point conversions, e.g. { printf "%e", "-nan" }' produces "nan" rather than "-nan", - for released gawk versions from 5.0.0 onwards + character output for the input of -NaN with the floating-point conversions, e.g. { printf "%e", "-nan" }' produces "+nan" rather than "-nan", - for released gawk versions from 5.0.0 onwards + character output for the input of Inf or NaN in the absence of the + or space flags with the floating-point conversions, e.g. { printf "%e", "inf" }' produces "+inf" rather than "inf", - for released gawk versions of up to 4.2.1 missing + character for the input of Inf or NaN with the + flag and the floating-point conversions, e.g. { printf "%+e", "inf" }' produces "inf" rather than "+inf", - for released gawk versions of up to 4.2.1 missing space character for the input of Inf or NaN with the space flag and the floating-point conversions, e.g. { printf "% e", "nan" }' produces "nan" rather than " nan", - for released gawk versions from 5.0.0 onwards + character output for the input of Inf or NaN with the space flag and the floating-point conversions, e.g. { printf "% e", "inf" }' produces "+inf" rather than " inf", - for released gawk versions from 5.0.0 onwards the field width is ignored for the input of Inf or NaN and the floating-point conversions, e.g. { printf "%20e", "-inf" }' produces "-inf" rather than " -inf", NB for released gawk versions of up to 4.2.1 floating-point conversion issues apply to the bignum mode only, as in the non-bignum mode system sprintf(3) is used. As from version 5.0.0 specialized handling has been added for [-]Inf and [-]NaN inputs and the issues listed apply to both modes. The '--posix' flag makes gawk versions from 5.0.0 onwards avoid the issue with field width and the + character unconditionally output for the input of Inf or NaN, however not the remaining issues and then the 'gensub' function is not supported in the POSIX mode, so to go this path I deemed not worth it. Each test completes within single seconds except for the long double one. There the F/f formats produce a large number of digits, which appears to be computationally intensive and CPU-bound. Standalone execution time for 'tst-printf-format-p-ldouble --direct f' is in the range of 00m36s for POWER9@2.166GHz and 09m52s for FU740@1.2GHz and output redirected locally to /dev/null, and 10m11s for FU740 and output redirected over 100Mbps network via SSH to /dev/null, so the throughput of the network adds very little (~3.2% in this case) to the processing time. This is with IEEE 754 quad. So I have scaled the timeout for 'tst-printf-format-skeleton-ldouble' accordingly. Regardless, following recent practice the test has been added to the standard rather than extended set. However, unlike most of the remaining tests it has been split by the conversion specifier, so as to allow better parallelization of this long-running test. As a side effect this lets the test report the unsupported status for the F/f conversions where applicable, so 'tst-printf-format-p-double' has been split for consistency as well. Only printf itself is handled at the moment, but the infrastructure provides for all the printf family functions to be verified, changes for which to be supplied separately. The complication around having some tests iterating over all the relevant conversion specifiers and other verifying conversion specifiers individually combined with iterating over printf family functions has hit a peculiarity in GNU make where the use of multiple targets with a pattern rule is handled differently from such use with an ordinary rule. Consequently it seems impossible to bulk-define a pattern rule using '$(foreach ...)', where each target would simply trigger the recipe according to the pattern and matching dependencies individually (such a rule does work, but implies all targets to be updated with a single recipe execution). Therefore as a compromise a single single-target pattern rule has been defined that has listed all the conversion-specific scripts and all the test executables as dependencies. Consequently tests will be rerun in the absence of changes to their actual sources or scripts whenever an unrelated file has changed that has been listed. Also all the formatted printf output tests will always be built whenever any single one is to be run. This only affects test development and not test runs in the field, though it does change the order of execution of the individual steps and also acts as a Makefile barrier in parallel runs. As the execution time dominates the compilation time for these tests it is not seen as a serious shortcoming. As pointed out by Florian Weimer <fweimer@redhat.com> the malloc tracing facility can take a substantial amount of time in calling dladdr(3) to determine the caller's location. This is not needed by the verification made with these tests, so I chose to interpose the symbol with a stub implementation that always fails in the shared skeleton. We have total control over the test environment, so I think it is a safe and minimal impact approach. If there's ever anything else added to the tests that would actually rely on dladdr(3) returning usable results, only then we can think of a different approach. Reviewed-by: DJ Delorie <dj@redhat.com> |