mirror of
git://sourceware.org/git/glibc.git
synced 2024-11-21 01:12:26 +08:00
a7b5eb821d
Unicode 16.0.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 16.0.0, using the generator scripts contributed by Mike FABIAN (Red Hat). Changes in CHARMAP and WIDTH: Total added characters in newly generated CHARMAP: 5185 Total removed characters in newly generated WIDTH: 1 Total added characters in newly generated WIDTH: 170 The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA. It changed like this: UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;; UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;; EastAsianWidth.txt 15.1.0: 1171D..1171F ; N # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA EastAsianWidth.txt 16.0.0: 1171E ; N # Mc AHOM CONSONANT SIGN MEDIAL RA I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing combining). So it should now have width 1 instead of 0, therefore it is OK that it was removed from WIDTH, characters not in WIDTH get width 1 by default. Nothing suspicious when browsing the list of the 170 added characters. Changes in ctype: alpha: Added 4452 characters in new ctype which were not in old ctype combining: Added 51 characters in new ctype which were not in old ctype combining_level3: Added 43 characters in new ctype which were not in old ctype graph: Added 5185 characters in new ctype which were not in old ctype lower: Added 25 characters in new ctype which were not in old ctype print: Added 5185 characters in new ctype which were not in old ctype punct: Missing 33 characters of old ctype in new ctype punct: Added 766 characters in new ctype which were not in old ctype tolower: Added 27 characters in new ctype which were not in old ctype totitle: Added 27 characters in new ctype which were not in old ctype toupper: Added 27 characters in new ctype which were not in old ctype upper: Added 27 characters in new ctype which were not in old ctype Nothing suspicous in the additions. About the 33 characters removed from `punct`: U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt: DerivedCoreProperties.txt 15.1.0: not there. DerivedCoreProperties.txt 16.0.0: 0363..036F ; Alphabetic # Mn [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X So that’s the reason why they are added to `alpha` and removed from `punct`. Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt: DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4 ; Alphabetic # Mn [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4 ; Alphabetic # Mn [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`. Resolves: BZ #32168 Reviewed-by: Carlos O'Donell <carlos@redhat.com> |
||
---|---|---|
.. | ||
ANSI_X3.4-1968 | ||
ANSI_X3.110-1983 | ||
ARMSCII-8 | ||
ASMO_449 | ||
BIG5 | ||
BIG5-HKSCS | ||
BRF | ||
BS_4730 | ||
BS_VIEWDATA | ||
CP737 | ||
CP770 | ||
CP771 | ||
CP772 | ||
CP773 | ||
CP774 | ||
CP775 | ||
CP949 | ||
CP1125 | ||
CP1250 | ||
CP1251 | ||
CP1252 | ||
CP1253 | ||
CP1254 | ||
CP1255 | ||
CP1256 | ||
CP1257 | ||
CP1258 | ||
CP10007 | ||
CSA_Z243.4-1985-1 | ||
CSA_Z243.4-1985-2 | ||
CSA_Z243.4-1985-GR | ||
CSN_369103 | ||
CWI | ||
DEC-MCS | ||
DIN_66003 | ||
DS_2089 | ||
EBCDIC-AT-DE | ||
EBCDIC-AT-DE-A | ||
EBCDIC-CA-FR | ||
EBCDIC-DK-NO | ||
EBCDIC-DK-NO-A | ||
EBCDIC-ES | ||
EBCDIC-ES-A | ||
EBCDIC-ES-S | ||
EBCDIC-FI-SE | ||
EBCDIC-FI-SE-A | ||
EBCDIC-FR | ||
EBCDIC-IS-FRISS | ||
EBCDIC-IT | ||
EBCDIC-PT | ||
EBCDIC-UK | ||
EBCDIC-US | ||
ECMA-CYRILLIC | ||
ES | ||
ES2 | ||
EUC-JISX0213 | ||
EUC-JP | ||
EUC-JP-MS | ||
EUC-KR | ||
EUC-TW | ||
GB2312 | ||
GB18030 | ||
GB_1988-80 | ||
GBK | ||
GEORGIAN-ACADEMY | ||
GEORGIAN-PS | ||
GOST_19768-74 | ||
GREEK7 | ||
GREEK7-OLD | ||
GREEK-CCITT | ||
HP-GREEK8 | ||
HP-ROMAN8 | ||
HP-ROMAN9 | ||
HP-THAI8 | ||
HP-TURKISH8 | ||
IBM037 | ||
IBM038 | ||
IBM256 | ||
IBM273 | ||
IBM274 | ||
IBM275 | ||
IBM277 | ||
IBM278 | ||
IBM280 | ||
IBM281 | ||
IBM284 | ||
IBM285 | ||
IBM290 | ||
IBM297 | ||
IBM420 | ||
IBM423 | ||
IBM424 | ||
IBM437 | ||
IBM500 | ||
IBM850 | ||
IBM851 | ||
IBM852 | ||
IBM855 | ||
IBM856 | ||
IBM857 | ||
IBM858 | ||
IBM860 | ||
IBM861 | ||
IBM862 | ||
IBM863 | ||
IBM864 | ||
IBM865 | ||
IBM866 | ||
IBM866NAV | ||
IBM868 | ||
IBM869 | ||
IBM870 | ||
IBM871 | ||
IBM874 | ||
IBM875 | ||
IBM880 | ||
IBM891 | ||
IBM903 | ||
IBM904 | ||
IBM905 | ||
IBM918 | ||
IBM922 | ||
IBM1004 | ||
IBM1026 | ||
IBM1047 | ||
IBM1124 | ||
IBM1129 | ||
IBM1132 | ||
IBM1133 | ||
IBM1160 | ||
IBM1161 | ||
IBM1162 | ||
IBM1163 | ||
IBM1164 | ||
IEC_P27-1 | ||
INIS | ||
INIS-8 | ||
INIS-CYRILLIC | ||
INVARIANT | ||
ISIRI-3342 | ||
ISO_646.BASIC | ||
ISO_646.IRV | ||
ISO_2033-1983 | ||
ISO_5427 | ||
ISO_5427-EXT | ||
ISO_5428 | ||
ISO_6937 | ||
ISO_6937-2-25 | ||
ISO_6937-2-ADD | ||
ISO_8859-1,GL | ||
ISO_8859-SUPP | ||
ISO_10367-BOX | ||
ISO_10646 | ||
ISO_11548-1 | ||
ISO-8859-1 | ||
ISO-8859-2 | ||
ISO-8859-3 | ||
ISO-8859-4 | ||
ISO-8859-5 | ||
ISO-8859-6 | ||
ISO-8859-7 | ||
ISO-8859-8 | ||
ISO-8859-9 | ||
ISO-8859-9E | ||
ISO-8859-10 | ||
ISO-8859-11 | ||
ISO-8859-13 | ||
ISO-8859-14 | ||
ISO-8859-15 | ||
ISO-8859-16 | ||
ISO-IR-90 | ||
ISO-IR-197 | ||
ISO-IR-209 | ||
IT | ||
JIS_C6220-1969-JP | ||
JIS_C6220-1969-RO | ||
JIS_C6229-1984-A | ||
JIS_C6229-1984-B | ||
JIS_C6229-1984-B-ADD | ||
JIS_C6229-1984-HAND | ||
JIS_C6229-1984-HAND-ADD | ||
JIS_C6229-1984-KANA | ||
JIS_X0201 | ||
JOHAB | ||
JUS_I.B1.002 | ||
JUS_I.B1.003-MAC | ||
JUS_I.B1.003-SERB | ||
KOI8-R | ||
KOI8-RU | ||
KOI8-T | ||
KOI8-U | ||
KOI-8 | ||
KSC5636 | ||
LATIN-GREEK | ||
LATIN-GREEK-1 | ||
MAC-CENTRALEUROPE | ||
MAC-CYRILLIC | ||
MAC-IS | ||
MAC-SAMI | ||
MAC-UK | ||
MACINTOSH | ||
MIK | ||
MSZ_7795.3 | ||
NATS-DANO | ||
NATS-DANO-ADD | ||
NATS-SEFI | ||
NATS-SEFI-ADD | ||
NC_NC00-10 | ||
NEXTSTEP | ||
NF_Z_62-010 | ||
NF_Z_62-010_1973 | ||
NS_4551-1 | ||
NS_4551-2 | ||
PT | ||
PT2 | ||
PT154 | ||
RK1048 | ||
SAMI | ||
SAMI-WS2 | ||
SEN_850200_B | ||
SEN_850200_C | ||
SHIFT_JIS | ||
SHIFT_JISX0213 | ||
T.61-7BIT | ||
T.61-8BIT | ||
T.101-G2 | ||
TCVN5712-1 | ||
TIS-620 | ||
TSCII | ||
UTF-8 | ||
VIDEOTEX-SUPPL | ||
VISCII | ||
WINDOWS-31J |