The buffer used by snprintf might not be large enough for all possible
inputs, as indicated by gcc with -O1:
../locale/programs/linereader.c: In function ‘utf8_sequence_error’:
../locale/programs/linereader.c:713:58: error: ‘%02x’ directive output
may be truncated writing between 2 and 8 bytes into a region of size
between 1 and 13 [-Werror=format-truncation=]
713 | snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x",
| ^~~~
../locale/programs/linereader.c:713:34: note: directive argument in the
range [0, 2147483647]
713 | snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../locale/programs/linereader.c:713:5: note: ‘snprintf’ output between
20 and 38 bytes into a destination of size 30
713 | snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
714 | ch1, ch2, ch3, ch4);
| ~~~~~~~~~~~~~~~~~~~
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Previously, they were assumed to be in ISO-8859-1, and that the output
charset overlapped with ISO-8859-1 for the characters actually used.
However, this did not work as intended on many architectures even for
an ISO-8859-1 output encoding because of the char signedness bug in
lr_getc. Therefore, this commit switches to UTF-8 without making
provisions for backwards compatibility.
The following Elisp code can be used to convert locale definition files
to UTF-8:
(defun glibc/convert-localedef (from to)
(interactive "r")
(save-excursion
(save-restriction
(narrow-to-region from to)
(goto-char (point-min))
(save-match-data
(while (re-search-forward "<U\\([0-9a-fA-F]+\\)>" nil t)
(let* ((codepoint (string-to-number (match-string 1) 16))
(converted
(cond
((memq codepoint '(?/ ?\ ?< ?>))
(string ?/ codepoint))
((= codepoint ?\") "<U0022>")
(t (string codepoint)))))
(replace-match converted t)))))))
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
This will permit reusing the Unicode character processing for
different character encodings, not just the current <U...> encoding.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
And introduce struct lr_buffer. The functions addc and adds can
be called from functions, enabling subsequent refactoring.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
In "Is it OK to write ASCII strings directly into locale source files?"
https://sourceware.org/ml/libc-alpha/2017-07/msg00807.html there is
universal consensus that we do not have to keep writing <Uxxxx> symbolic
characters in locale files.
Ulrich Drepper's historical comment was that symbolic characters were
used for the eventuality of converting the source files to any encoding
system. Fast forward to today and UTF-8 is the standard. So the
requirement of <Uxxxx> is hard to justify.
Zack Weinberg's excellent scripts are coming along we can use these to
find instances of human errors in the scripts:
https://sourceware.org/ml/libc-alpha/2017-07/msg00860.htmlhttps://sourceware.org/ml/libc-alpha/2017-08/msg00136.html
It still won't be easy to distinguish from i for í, but that's still the
case for <Uxxxx> characters which humans can't read either.
Since we all agreed that we should be able to use non-symbolic (<Uxxxx>)
characters in locale files, the following change removes the verbose
warning that is raised if you use non-symbolic characters in the locale
file.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
into a macro. Use preprocessor to decide how to initialize
attempts [Coverity CID 67].
* io/fts.c (fts_build): Comment out dead code [Coverity CID 68].
* sunrpc/rpc_parse.c (def_union): Comment out dead code
[Coverity CID 70].
* locale/programs/linereader.c (lr_token): Remove duplicate
handling of EOF [Coverity CID 71].
* locale/programs/ld-numeric.c (numeric_read) [case tok_grouping]:
We bail out early if ignore_content is set, so there is no need to
check it later again [Coverity CID 72].
* inet/inet6_option.c (inet6_option_find): Check *tptrp for NULL,
not tptrp [Coverity CID 73].
* inet/inet6_option.c (inet6_option_next): Check *tptrp for NULL,
not tptrp [Coverity CID 74].
* misc/tsearch.c (__tsearch): Don't rotate tree if memory
allocation failed [Coverity CID 78].
* intl/dcigettext.c (_nl_find_msg): Add a cast.
* nis/nis_clone_dir.c (nis_clone_directory): Use char * for ADDR.
* nis/nis_clone_obj.c (nis_clone_object): Likewise.
* nis/nis_clone_res.c (nis_clone_result): Likewise.
* resolv/nss_dns/dns-network.c (getanswer_r): Use const unsigned char *
for END_OF_MESSAGE and CP.
* resolv/res_send.c (send_dg): Add else branch for case impossible
unless `poll' is buggy.
* crypt/crypt_util.c (__setkey_r): Add a cast.
* locale/programs/linereader.c (get_toplvl_escape): Use size_t for
NBYTES, and unsigned char * for BYTES.
* locale/programs/charmap.c (charmap_new_char): Use size_t and
unsighed char * for NBYTES, BYTES parameters.
* sysdeps/generic/dl-hash.h (_dl_elf_hash): Take const char * argument
and cast it.
* sysdeps/i386/i686/dl-hash.h (_dl_elf_hash): Likewise.
* sunrpc/create_xid.c (_create_xid): Don't use unsigned long for RES.
* sunrpc/svcauth_des.c (_svcauth_des): Fix cast type.
* sunrpc/auth_des.c (authdes_create): Don't use u_char for PKEY_DATA.
(authdes_marshal): Don't use unsigned int for LEN.
* sunrpc/xdr.c (xdr_hyper): Don't use unsigned long for T2.
(xdr_u_hyper): Likewise.
(xdr_u_short): Don't use u_long for L.
* sunrpc/xdr_intXX_t.c (xdr_int64_t): Don't use uint32_t for T2.
* inet/rexec.c (rexec_af): Use socklen_t.
* sunrpc/key_call.c (getkeyserv_handle): Likewise.
* sunrpc/rtime.c (rtime): Likewise.
* resolv/res_send.c (send_vc, send_dg): Likewise.
* nis/nis_callback.c (__nis_create_callback): Likewise.
* sysdeps/generic/libc-start.c: Use unsigned int for nthreads ptr.
* sysdeps/posix/getaddrinfo.c (gaih_inet): Fix type of ADDR local.
* libio/libio.h (_IO_BE): Add parenthesis around EXPR.
* intl/dcigettext.c (INTVARDEF, INTUSE): Macros removed.
(_nl_default_dirname): Use libc_hidden_data_def instead of INTVARDEF.
(libc_freeres_fn, DCIGETTEXT): Don't use INTUSE.
* intl/bindtextdom.c (INTUSE): Macro removed.
(_nl_default_dirname): Use libc_hidden_proto.
(set_binding_values): Don't use INTUSE.
* include/libintl.h (_libc_intl_domainname_internal): Decl removed.
(_libc_intl_domainname): Use libc_hidden_proto.
* posix/regex_internal.h (gettext): Remove INTUSE on it.
* locale/SYS_libc.c (_libc_intl_domainname): Use libc_hidden_data_def
rather than INTDEF.
* include/libintl.h (_): Don't use *_internal name.
* ctype/ctype-extn.c (__ctype_tolower, __ctype_toupper): Use int32_t,
not uint32_t.
* locale/lc-ctype.c (_nl_postload_ctype): Likewise for assignments.
* iconv/gconv_open.c (__gconv_open): Remove useless cast.
[BZ #721]
* sysdeps/i386/dl-machine.h (ELF_MACHINE_NO_RELA): Define this outside
of [RESOLVE_MAP].
* sysdeps/sh/dl-machine.h (ELF_MACHINE_NO_REL): Likewise.
* sysdeps/powerpc/powerpc32/dl-machine.h
(elf_machine_rel, elf_machine_rel_relative): Removed.
* sysdeps/powerpc/powerpc64/dl-machine.h
(elf_machine_rel, elf_machine_rel_relative): Removed.
2005-02-03 Alexandre Oliva <aoliva@redhat.com>
[BZ #721]
* elf/dynamic-link.h: Don't declare nested auto functions that are
not going to be defined.
2004-07-23 Jakub Jelinek <jakub@redhat.com>
[BZ #284]
* include/features.h (_POSIX_SOURCE, _POSIX_C_SOURCE): Define
if _XOPEN_SOURCE >= 500 even if __STRICT_ANSI__ is defined.
2005-02-16 Roland McGrath <roland@redhat.com>
circular dependencies only if the locale in question hasn't
been finished.
* locale/programs/linereader.c (get_string): Pass LC_CTYPE not
CTYPE_LOCALE to load_locale.
* locale/programs/locfile.c (locfile_read): Don't include
unneeded but available locales in locale_mask.
* locale/programs/locarchive.c (enlarge_archive): If quiet, don't
print any messages about enlarging archive.
2001-07-06 Paul Eggert <eggert@twinsun.com>
* manual/argp.texi: Remove ignored LGPL copyright notice; it's
not appropriate for documentation anyway.
* manual/libc-texinfo.sh: "Library General Public License" ->
"Lesser General Public License".
2001-07-06 Andreas Jaeger <aj@suse.de>
* All files under GPL/LGPL version 2: Place under LGPL version
2.1.
2001-02-09 Ulrich Drepper <drepper@redhat.com>
* locale/programs/linereader.c (get_ident): Stop loop if EOF. Use
lr_ungetc to push back last read character.
* locale/programs/linereader.h (lr_ungetc): Don't push back is
character is EOF.
(lr_ignore_rest): Don't warn about garbage if it is really the end
of the file.
* manual/Makefile: Use ifnottext and not ifinfo to protect Top node
definition.
2001-02-04 Ulrich Drepper <drepper@redhat.com>
* iconv/Makefile (iconv_prog-modules): Define. Add vpath to find
files in locale/programs. Add CFLAGS definition to allow compiling
localedef files.
* iconv/dummy-repertoire.c: New file.
* iconv/iconv_charmap.c: New file.
* iconv/iconv_prog.h: New file.
* iconv/iconv_prog.c: Make verbose and omit_invalid global.
(main): If parameter for -f and -t contain slashes try first to resolve
the strings as filenames of charmap files. Use them for conversion
in this case.
* iconvdata/run-iconv-test.sh: If charmaps exist also run tests with
iconv getting charmap names as parameters.
* locale/programs/linereader.c (lr_token): Take extra parameters
verbose and pass it to get_string.
(get_string): Take extra parameters verbose.
* locale/programs/charmap.c (parse_charmap): Take extra parameters
verbose and be_quiet. Change all callers of lr_token and
parse_charmap.
* locale/programs/charmap.h: Likewise.
* locale/programs/ld-address.c: Likewise.
* locale/programs/ld-collate.c: Likewise.
* locale/programs/ld-ctype.c: Likewise.
* locale/programs/ld-identification.c: Likewise.
* locale/programs/ld-measurement.c: Likewise.
* locale/programs/ld-messages.c: Likewise.
* locale/programs/ld-monetary.c: Likewise.
* locale/programs/ld-name.c: Likewise.
* locale/programs/ld-numeric.c: Likewise.
* locale/programs/ld-paper.c: Likewise.
* locale/programs/ld-telephone.c: Likewise.
* locale/programs/ld-time.c: Likewise.
* locale/programs/linereader.c: Likewise.
* locale/programs/linereader.h: Likewise.
* locale/programs/localedef.c: Likewise.
* locale/programs/locfile.c: Likewise.
* locale/programs/locfile.h: Likewise.
* locale/programs/repertoire.c: Likewise.
2000-04-06 Ulrich Drepper <drepper@redhat.com>
* locale/programs/charmap.c (charmap_new_char): Add parameter step.
Support ..(2).. ellipsis.
(parse_charmap): Recognize ..(2).. etc and pass step down.
Correctly generate names for UCS4 characters.
* locale/programs/ld-ctype.c (struct translit_ignore_t): Add step.
(ctype_finish): We know the wide character value for <SP>,
don't search.
(charclass_symbolic_ellipsis): Handle ..(2).. ellipsis.
(charclass_ucs4_ellipsis): Likewise.
(read_translit_ignore_entry): Store ellipsis step.
(ctype_read): Recognize ..(2).. etc and pass step down.
* locale/programs/linereader.c (lr_token): When seeing comment
character ignore only rest of line in sources but stop at escaped
newline.
Recognize ..(2).. and ....(2).....
* locale/programs/locfile-token.h (enum token_t): Add tok_ellipsis2_2
and tok_ellipsis4_2.
* locale/programs/ld-collate.c (collate_output): Don't start with empty
extrapool and indirectpool obstacks since we need the offsets to be
nonzero.
(collate_read): Call load_locale, not find_locale.
* locale/programs/ld-ctype.c (ctype_finish): If LC_CTYPE category
wasn't defined in the file also initialize repertoire if possible.
* locale/programs/ld-time.c (time_finish): Fix message string.
* locale/programs/linereader.c: Cast parameters of lr_error to
correct type to prevnet warning.
* locale/programs/localedef.c (load_locale): New file.
* locale/programs/localedef.h: Add its prototype.
* locale/programs/repertoire.c (repertoire_new_char): Add missing
parameters to lr_error call.
* localedata/Makefile: Enable running tests again.
* localedata/tests/test2.def: Adjust syntax to new specification.
* localedata/tests/test3.def: Likewise.
* localedata/tst-trans.sh: Redirect output of program into file.
* string/strcoll.c: Fix many error in new implementation to make it
pass (at least) the test suite.
* locale/Makefile: Don't link localedef statically anymore.
* locale/ld-collate.c (struct element_t): Add field is_character and
use it to distinguish real character from collating elements and
symbols.
* locale/programs/ld-time.c: Likewise.
1999-11-05 Ulrich Drepper <drepper@cygnus.com>
* sysdeps/unix/sysv/linux/bits/resource.h (RLIM_INFINITY): Adjust
for kernel changes.
* sysdeps/unix/sysv/linux/bits/types.h (__rlim_t, __rlim64_t): Make
unsigned.
1999-10-04 Tim Waugh <twaugh@redhat.com>
* posix/wordexp-test.c: More tests.
* posix/wordexp.c (wordexp): Explicit null words should be kept.
1999-11-04 Shinya Hanataka <hanataka@abyss.rim.or.jp>
* locale/programs/linereader.c (get_string): Correct type of buf2
variable.
* locale/programs/ld-ctype.c (ctype_output): Store index correctly
for _NL_CTYPE_INDIGITS_MB_LEN, _NL_CTYPE_INDIGITS_WC_LEN,
_NL_CTYPE_INDIGITS*_MB, _NL_CTYPE_OUTDIGIT*_MB, and
_NL_CTYPE_OUTDIGIT*_WC.
(allocate_arrays): Completely initialize mapping tables.
* locale/programs/ld-time.c (time_startup): We need the wide car
string.
(time_finish): Correct handling of era.
(time_output): Fix a few array indeces.
(time_read): Pass the repertoire map to lr_token.