glibc

mirror/glibc

Fork 0

mirror of git://sourceware.org/git/glibc.git synced 2024-11-27 03:41:23 +08:00

Commit Graph

Author	SHA1	Message	Date
Maciej W. Rozycki	7ec4d7e3d1	stdio-common: Add tests for formatted printf output specifiers This is a collection of tests for formatted printf output specifiers covering the d, i, o, u, x, and X integer conversions, the e, E, f, F, g, and G floating-point conversions, the c character conversion, and the s string conversion. Also the hh, h, l, and ll length modifiers are covered with the integer conversions as is the L length modifier with the floating-point conversions. The -, +, space, #, and 0 flags are iterated over, as permitted by the conversion handled, in tuples of 1..5, including tuples with repetitions of 2, and combined with field width and/or precision, again as permitted by the conversion. The resulting format string is then used to produce output from respective sets of input data corresponding to the specific conversion under test. POSIX extensions beyond ISO C are not used. Output is produced in the form of records which include both the format string (and width and/or precision where given in the form of separate arguments) and the conversion result, and is verified with GNU AWK using the format obtained from each such record against the reference value also supplied, relying on the fact that GNU AWK has its own independent implementation of format processing, striving to be ISO C compatible. In the course of implementation I have determined that in the non-bignum mode GNU AWK uses system sprintf(3) for the floating-point conversions, defeating the objective of doing the verification against an independent implementation. Additionally the bignum mode (using MPFR) is required to correctly output wider integer and floating-point data. Therefore for the conversions affected the relevant shell scripts sanity-check AWK and terminate with unsupported status if the bignum mode is unavailable for floating-point data or where data is output incorrectly. The f and F floating-point conversions are build-time options for GNU AWK, depending on the environment, so they are probed for before being used. Similarly the a and A floating-point conversions, however they are currently not used, see below. Also GNU AWK does not handle the b or B integer conversions at all at the moment, as at 5.3.0. Support for the a, A, b, and B conversions can however be easily added following the approach taken for the f and F conversions. Output produced by gawk for the a and A floating-point conversions does not match one produced by us: insufficient precision is used where one hasn't been explicitly given, e.g. for the negated maximum finite IEEE 754 64-bit value of -1.79769313486231570814527423731704357e+308 and "%a" format we produce -0x1.fffffffffffffp+1023 vs gawk's -0x1.000000p+1024 and a different exponent is chosen otherwise, such as with "%.a" where we output -0x2p+1023 vs gawk's -0x1p+1024 for the same value, or "%.20a" where -0x1.fffffffffffff0000000p+1023 is our output, but gawk produces -0xf.ffffffffffff80000000p+1020 instead. Consequently I chose not to include a and A conversions in testing at this time. And last but not least there are numerous corner cases that GNU AWK does not handle correctly, which are worked around by explicit handling in the AWK script. These are in particular: - extraneous leading 0 produced for the alternative form with the o conversion, e.g. { printf "%#.2o", 1 } produces "001" rather than "01", - unexpected 0 produced where no characters are expected for the input of 0 and the alternative form with the precision of 0 and the integer hexadecimal conversions, e.g. { printf "%#.x", 0 } produces "0" rather than "", - missing + character in the non-bignum mode only for the input of 0 with the + flag, precision of 0 and the signed integer conversions, e.g. { printf "%+.i", 0 } produces "" rather than "+", - missing space character in the non-bignum mode only for the input of 0 with the space flag, precision of 0 and the signed integer conversions, e.g. { printf "% .i", 0 } produces "" rather than " ", - for released gawk versions of up to 4.2.1 missing - character for the input of -NaN with the floating-point conversions, e.g. { printf "%e", "-nan" }' produces "nan" rather than "-nan", - for released gawk versions from 5.0.0 onwards + character output for the input of -NaN with the floating-point conversions, e.g. { printf "%e", "-nan" }' produces "+nan" rather than "-nan", - for released gawk versions from 5.0.0 onwards + character output for the input of Inf or NaN in the absence of the + or space flags with the floating-point conversions, e.g. { printf "%e", "inf" }' produces "+inf" rather than "inf", - for released gawk versions of up to 4.2.1 missing + character for the input of Inf or NaN with the + flag and the floating-point conversions, e.g. { printf "%+e", "inf" }' produces "inf" rather than "+inf", - for released gawk versions of up to 4.2.1 missing space character for the input of Inf or NaN with the space flag and the floating-point conversions, e.g. { printf "% e", "nan" }' produces "nan" rather than " nan", - for released gawk versions from 5.0.0 onwards + character output for the input of Inf or NaN with the space flag and the floating-point conversions, e.g. { printf "% e", "inf" }' produces "+inf" rather than " inf", - for released gawk versions from 5.0.0 onwards the field width is ignored for the input of Inf or NaN and the floating-point conversions, e.g. { printf "%20e", "-inf" }' produces "-inf" rather than " -inf", NB for released gawk versions of up to 4.2.1 floating-point conversion issues apply to the bignum mode only, as in the non-bignum mode system sprintf(3) is used. As from version 5.0.0 specialized handling has been added for [-]Inf and [-]NaN inputs and the issues listed apply to both modes. The '--posix' flag makes gawk versions from 5.0.0 onwards avoid the issue with field width and the + character unconditionally output for the input of Inf or NaN, however not the remaining issues and then the 'gensub' function is not supported in the POSIX mode, so to go this path I deemed not worth it. Each test completes within single seconds except for the long double one. There the F/f formats produce a large number of digits, which appears to be computationally intensive and CPU-bound. Standalone execution time for 'tst-printf-format-p-ldouble --direct f' is in the range of 00m36s for POWER9@2.166GHz and 09m52s for FU740@1.2GHz and output redirected locally to /dev/null, and 10m11s for FU740 and output redirected over 100Mbps network via SSH to /dev/null, so the throughput of the network adds very little (~3.2% in this case) to the processing time. This is with IEEE 754 quad. So I have scaled the timeout for 'tst-printf-format-skeleton-ldouble' accordingly. Regardless, following recent practice the test has been added to the standard rather than extended set. However, unlike most of the remaining tests it has been split by the conversion specifier, so as to allow better parallelization of this long-running test. As a side effect this lets the test report the unsupported status for the F/f conversions where applicable, so 'tst-printf-format-p-double' has been split for consistency as well. Only printf itself is handled at the moment, but the infrastructure provides for all the printf family functions to be verified, changes for which to be supplied separately. The complication around having some tests iterating over all the relevant conversion specifiers and other verifying conversion specifiers individually combined with iterating over printf family functions has hit a peculiarity in GNU make where the use of multiple targets with a pattern rule is handled differently from such use with an ordinary rule. Consequently it seems impossible to bulk-define a pattern rule using '$(foreach ...)', where each target would simply trigger the recipe according to the pattern and matching dependencies individually (such a rule does work, but implies all targets to be updated with a single recipe execution). Therefore as a compromise a single single-target pattern rule has been defined that has listed all the conversion-specific scripts and all the test executables as dependencies. Consequently tests will be rerun in the absence of changes to their actual sources or scripts whenever an unrelated file has changed that has been listed. Also all the formatted printf output tests will always be built whenever any single one is to be run. This only affects test development and not test runs in the field, though it does change the order of execution of the individual steps and also acts as a Makefile barrier in parallel runs. As the execution time dominates the compilation time for these tests it is not seen as a serious shortcoming. As pointed out by Florian Weimer <fweimer@redhat.com> the malloc tracing facility can take a substantial amount of time in calling dladdr(3) to determine the caller's location. This is not needed by the verification made with these tests, so I chose to interpose the symbol with a stub implementation that always fails in the shared skeleton. We have total control over the test environment, so I think it is a safe and minimal impact approach. If there's ever anything else added to the tests that would actually rely on dladdr(3) returning usable results, only then we can think of a different approach. Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-07 06:14:24 +00:00

Author

SHA1

Message

Date

Maciej W. Rozycki

7ec4d7e3d1

stdio-common: Add tests for formatted printf output specifiers

This is a collection of tests for formatted printf output specifiers
covering the d, i, o, u, x, and X integer conversions, the e, E, f, F,
g, and G floating-point conversions, the c character conversion, and the
s string conversion.  Also the hh, h, l, and ll length modifiers are
covered with the integer conversions as is the L length modifier with
the floating-point conversions.

The -, +, space, #, and 0 flags are iterated over, as permitted by the
conversion handled, in tuples of 1..5, including tuples with repetitions
of 2, and combined with field width and/or precision, again as permitted
by the conversion.  The resulting format string is then used to produce
output from respective sets of input data corresponding to the specific
conversion under test.  POSIX extensions beyond ISO C are not used.

Output is produced in the form of records which include both the format
string (and width and/or precision where given in the form of separate
arguments) and the conversion result, and is verified with GNU AWK using
the format obtained from each such record against the reference value
also supplied, relying on the fact that GNU AWK has its own independent
implementation of format processing, striving to be ISO C compatible.

In the course of implementation I have determined that in the non-bignum
mode GNU AWK uses system sprintf(3) for the floating-point conversions,
defeating the objective of doing the verification against an independent
implementation.  Additionally the bignum mode (using MPFR) is required
to correctly output wider integer and floating-point data.  Therefore
for the conversions affected the relevant shell scripts sanity-check AWK
and terminate with unsupported status if the bignum mode is unavailable
for floating-point data or where data is output incorrectly.

The f and F floating-point conversions are build-time options for GNU
AWK, depending on the environment, so they are probed for before being
used.  Similarly the a and A floating-point conversions, however they
are currently not used, see below.  Also GNU AWK does not handle the b
or B integer conversions at all at the moment, as at 5.3.0.  Support for
the a, A, b, and B conversions can however be easily added following the
approach taken for the f and F conversions.

Output produced by gawk for the a and A floating-point conversions does
not match one produced by us: insufficient precision is used where one
hasn't been explicitly given, e.g. for the negated maximum finite IEEE
754 64-bit value of -1.79769313486231570814527423731704357e+308 and "%a"
format we produce -0x1.fffffffffffffp+1023 vs gawk's -0x1.000000p+1024
and a different exponent is chosen otherwise, such as with "%.a" where
we output -0x2p+1023 vs gawk's -0x1p+1024 for the same value, or "%.20a"
where -0x1.fffffffffffff0000000p+1023 is our output, but gawk produces
-0xf.ffffffffffff80000000p+1020 instead.  Consequently I chose not to
include a and A conversions in testing at this time.

And last but not least there are numerous corner cases that GNU AWK does
not handle correctly, which are worked around by explicit handling in
the AWK script.  These are in particular:

- extraneous leading 0 produced for the alternative form with the o
  conversion, e.g. { printf "%#.2o", 1 } produces "001" rather than
  "01",

- unexpected 0 produced where no characters are expected for the input
  of 0 and the alternative form with the precision of 0 and the integer
  hexadecimal conversions, e.g. { printf "%#.x", 0 } produces "0" rather
  than "",

- missing + character in the non-bignum mode only for the input of 0
  with the + flag, precision of 0 and the signed integer conversions,
  e.g. { printf "%+.i", 0 } produces "" rather than "+",

- missing space character in the non-bignum mode only for the input of 0
  with the space flag, precision of 0 and the signed integer
  conversions, e.g. { printf "% .i", 0 } produces "" rather than " ",

- for released gawk versions of up to 4.2.1 missing - character for the
  input of -NaN with the floating-point conversions, e.g. { printf "%e",
  "-nan" }' produces "nan" rather than "-nan",

- for released gawk versions from 5.0.0 onwards + character output for
  the input of -NaN with the floating-point conversions, e.g. { printf
  "%e", "-nan" }' produces "+nan" rather than "-nan",

- for released gawk versions from 5.0.0 onwards + character output for
  the input of Inf or NaN in the absence of the + or space flags with
  the floating-point conversions, e.g. { printf "%e", "inf" }' produces
  "+inf" rather than "inf",

- for released gawk versions of up to 4.2.1 missing + character for the
  input of Inf or NaN with the + flag and the floating-point
  conversions, e.g. { printf "%+e", "inf" }' produces "inf" rather than
  "+inf",

- for released gawk versions of up to 4.2.1 missing space character for
  the input of Inf or NaN with the space flag and the floating-point
  conversions, e.g. { printf "% e", "nan" }' produces "nan" rather than
  " nan",

- for released gawk versions from 5.0.0 onwards + character output for
  the input of Inf or NaN with the space flag and the floating-point
  conversions, e.g. { printf "% e", "inf" }' produces "+inf" rather than
  " inf",

- for released gawk versions from 5.0.0 onwards the field width is
  ignored for the input of Inf or NaN and the floating-point
  conversions, e.g. { printf "%20e", "-inf" }' produces "-inf" rather
  than "                -inf",

NB for released gawk versions of up to 4.2.1 floating-point conversion
issues apply to the bignum mode only, as in the non-bignum mode system
sprintf(3) is used.  As from version 5.0.0 specialized handling has been
added for [-]Inf and [-]NaN inputs and the issues listed apply to both
modes.  The '--posix' flag makes gawk versions from 5.0.0 onwards avoid
the issue with field width and the + character unconditionally output
for the input of Inf or NaN, however not the remaining issues and then
the 'gensub' function is not supported in the POSIX mode, so to go this
path I deemed not worth it.

Each test completes within single seconds except for the long double
one.  There the F/f formats produce a large number of digits, which
appears to be computationally intensive and CPU-bound.  Standalone
execution time for 'tst-printf-format-p-ldouble --direct f' is in the
range of 00m36s for POWER9@2.166GHz and 09m52s for FU740@1.2GHz and
output redirected locally to /dev/null, and 10m11s for FU740 and output
redirected over 100Mbps network via SSH to /dev/null, so the throughput
of the network adds very little (~3.2% in this case) to the processing
time.  This is with IEEE 754 quad.

So I have scaled the timeout for 'tst-printf-format-skeleton-ldouble'
accordingly.  Regardless, following recent practice the test has been
added to the standard rather than extended set.  However, unlike most
of the remaining tests it has been split by the conversion specifier,
so as to allow better parallelization of this long-running test.  As
a side effect this lets the test report the unsupported status for the
F/f conversions where applicable, so 'tst-printf-format-p-double' has
been split for consistency as well.

Only printf itself is handled at the moment, but the infrastructure
provides for all the printf family functions to be verified, changes
for which to be supplied separately.  The complication around having
some tests iterating over all the relevant conversion specifiers and
other verifying conversion specifiers individually combined with
iterating over printf family functions has hit a peculiarity in GNU
make where the use of multiple targets with a pattern rule is handled
differently from such use with an ordinary rule.  Consequently it
seems impossible to bulk-define a pattern rule using '$(foreach ...)',
where each target would simply trigger the recipe according to the
pattern and matching dependencies individually (such a rule does work,
but implies all targets to be updated with a single recipe execution).

Therefore as a compromise a single single-target pattern rule has been
defined that has listed all the conversion-specific scripts and all the
test executables as dependencies.  Consequently tests will be rerun in
the absence of changes to their actual sources or scripts whenever an
unrelated file has changed that has been listed.  Also all the formatted
printf output tests will always be built whenever any single one is to
be run.  This only affects test development and not test runs in the
field, though it does change the order of execution of the individual
steps and also acts as a Makefile barrier in parallel runs.  As the
execution time dominates the compilation time for these tests it is not
seen as a serious shortcoming.

As pointed out by Florian Weimer <fweimer@redhat.com> the malloc tracing
facility can take a substantial amount of time in calling dladdr(3) to
determine the caller's location.  This is not needed by the verification
made with these tests, so I chose to interpose the symbol with a stub
implementation that always fails in the shared skeleton.  We have total
control over the test environment, so I think it is a safe and minimal
impact approach.  If there's ever anything else added to the tests that
would actually rely on dladdr(3) returning usable results, only then we
can think of a different approach.

Reviewed-by: DJ Delorie <dj@redhat.com>

2024-11-07 06:14:24 +00:00

1 Commits