Jakub Jelinek 572f5e1bc6 libcpp: Named universal character escapes and delimited escape sequence tweaks
On Tue, Aug 30, 2022 at 09:10:37PM +0000, Joseph Myers wrote:
> I'm seeing build failures of glibc for powerpc64, as illustrated by the
> following C code:
>
> #if 0
> \NARG
> #endif
>
> (the actual sysdeps/powerpc/powerpc64/sysdep.h code is inside #ifdef
> __ASSEMBLER__).
>
> This shows some problems with this feature - and with delimited escape
> sequences - as it affects C.  It's fine to accept it as an extension
> inside string and character literals, because \N or \u{...} would be
> invalid in the absence of the feature (i.e. the syntax for such literals
> fails to match, meaning that the rule about undefined behavior for a
> single ' or " as a pp-token applies).  But outside string and character
> literals, the usual lexing rules apply, the \ is a pp-token on its own and
> the code is valid at the preprocessing level, and with expansion of macros
> appearing before or after the \ (e.g. u defined as a macro in the \u{...}
> case) it may be valid code at the language level as well.  I don't know
> what older C++ versions say about this, but for C this means e.g.
>
> #define z(x) 0
> #define a z(
> int x = a\NARG);
>
> needs to be accepted as expanding to "int x = 0;", not interpreted as
> using the \N feature in an identifier and produce an error.

The following patch changes this, so that:
1) outside of string/character literals, \N without following { is never
   treated as an error nor warning, it is silently treated as \ separate
   token followed by whatever is after it
2) \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} are not handled as
   extension at all outside of string/character literals in the strict
   standard modes (-std=c*) except for -std=c++{23,2b}, only in the
   -std=gnu* modes, because it changes behavior on valid sources, e.g.
   #define z(x) 0
   #define a z(
   int x = a\u{123});
   int y = a\N{LATIN SMALL LETTER A WITH ACUTE});
3) introduces -Wunicode warning (on by default) and warns for cases
   of what looks like invalid delimited escape sequence or named
   universal character escape outside of string/character literals
   and is treated as separate tokens

2022-09-07  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
	(enum cpp_warning_reason): Add CPP_W_UNICODE.
	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
	In possible identifier contexts, don't emit an error and punt
	if \N isn't followed by {, or if \N{} surrounds some lower case
	letters or _.  In possible identifier contexts when not C++23, don't
	emit an error but warning about unknown character names and treat as
	separate tokens.  When treating as separate tokens \u{ or \N{, emit
	warnings.
gcc/
	* doc/invoke.texi (-Wno-unicode): Document.
gcc/c-family/
	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
	" in comments" from description.
	(Wunicode): New option.
gcc/testsuite/
	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
	* g++.dg/cpp23/named-universal-char-escape2.C: New test.
2022-09-07 08:44:38 +02:00
2022-03-19 00:16:22 +00:00
2022-09-01 00:17:39 +00:00
2022-09-01 00:17:39 +00:00
2022-09-01 00:17:39 +00:00
2022-09-01 00:17:39 +00:00
2022-08-31 00:16:45 +00:00
2022-07-13 00:16:33 +00:00
2021-11-30 00:16:44 +00:00
2022-08-26 00:16:21 +00:00
2022-08-31 00:16:45 +00:00
2022-07-09 00:16:54 +00:00
2022-06-28 00:16:58 +00:00
2022-06-04 00:16:27 +00:00
2022-05-21 00:16:32 +00:00
2021-11-16 00:16:31 +00:00
2022-09-01 00:17:39 +00:00
2022-08-27 00:17:09 +00:00
2022-09-06 00:17:07 +00:00
2022-08-26 00:16:21 +00:00
2022-09-01 00:17:39 +00:00
2022-08-26 00:16:21 +00:00
2022-08-26 00:16:21 +00:00
2022-08-28 00:16:28 +00:00
2022-08-26 00:16:21 +00:00
2022-09-05 00:16:27 +00:00
2022-08-26 00:16:21 +00:00
2022-09-07 00:17:51 +00:00
2022-08-26 00:16:21 +00:00
2022-08-02 00:16:51 +00:00
2022-07-29 00:16:21 +00:00
2022-08-26 00:16:21 +00:00
2022-07-19 17:07:04 +03:00
2022-09-07 00:17:51 +00:00
2021-12-21 09:10:57 +01:00

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.
Description
No description provided
Readme 2.1 GiB
Languages
C++ 31.9%
C 31.3%
Ada 12%
D 6.5%
Go 6.4%
Other 11.5%