glibc/localedata/charmaps
Mike FABIAN a7b5eb821d Update to Unicode 16.0.0 [BZ #32168]
Unicode 16.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 16.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

Changes in CHARMAP and WIDTH:

    Total added characters in newly generated CHARMAP: 5185
    Total removed characters in newly generated WIDTH: 1
    Total added characters in newly generated WIDTH: 170

The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA.
It changed like this:

UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;;
UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;;

EastAsianWidth.txt 15.1.0: 1171D..1171F   ; N  # Mn     [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
EastAsianWidth.txt 16.0.0: 1171E          ; N  # Mc         AHOM CONSONANT SIGN MEDIAL RA

I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing
combining). So it should now have width 1 instead of 0, therefore it
is OK that it was removed from WIDTH, characters not in WIDTH get
width 1 by default.

Nothing suspicious when browsing the list of the 170 added characters.

Changes in ctype:

    alpha: Added 4452 characters in new ctype which were not in old ctype
    combining: Added 51 characters in new ctype which were not in old ctype
    combining_level3: Added 43 characters in new ctype which were not in old ctype
    graph: Added 5185 characters in new ctype which were not in old ctype
    lower: Added 25 characters in new ctype which were not in old ctype
    print: Added 5185 characters in new ctype which were not in old ctype
    punct: Missing 33 characters of old ctype in new ctype
    punct: Added 766 characters in new ctype which were not in old ctype
    tolower: Added 27 characters in new ctype which were not in old ctype
    totitle: Added 27 characters in new ctype which were not in old ctype
    toupper: Added 27 characters in new ctype which were not in old ctype
    upper: Added 27 characters in new ctype which were not in old ctype

Nothing suspicous in the additions.

About the 33 characters removed from `punct`:

U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt:

DerivedCoreProperties.txt 15.1.0: not there.
DerivedCoreProperties.txt 16.0.0: 0363..036F    ; Alphabetic # Mn  [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X

So that’s the reason why they are added to `alpha` and removed from `punct`.

Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt:

DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4    ; Alphabetic # Mn  [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4    ; Alphabetic # Mn  [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS

So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`.

Resolves: BZ #32168

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-09-27 14:43:38 +02:00
..
ANSI_X3.4-1968
ANSI_X3.110-1983
ARMSCII-8
ASMO_449
BIG5
BIG5-HKSCS
BRF
BS_4730
BS_VIEWDATA
CP737
CP770
CP771
CP772
CP773
CP774
CP775
CP949
CP1125
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
CP1258
CP10007
CSA_Z243.4-1985-1
CSA_Z243.4-1985-2
CSA_Z243.4-1985-GR
CSN_369103
CWI
DEC-MCS
DIN_66003
DS_2089
EBCDIC-AT-DE
EBCDIC-AT-DE-A
EBCDIC-CA-FR
EBCDIC-DK-NO
EBCDIC-DK-NO-A
EBCDIC-ES
EBCDIC-ES-A
EBCDIC-ES-S
EBCDIC-FI-SE
EBCDIC-FI-SE-A
EBCDIC-FR
EBCDIC-IS-FRISS
EBCDIC-IT
EBCDIC-PT
EBCDIC-UK
EBCDIC-US
ECMA-CYRILLIC
ES
ES2
EUC-JISX0213
EUC-JP
EUC-JP-MS
EUC-KR
EUC-TW
GB2312
GB18030 add GB18030-2022 charmap and test the entire GB18030 charmap [BZ #30243] 2023-08-29 19:02:30 +02:00
GB_1988-80
GBK
GEORGIAN-ACADEMY
GEORGIAN-PS
GOST_19768-74
GREEK7
GREEK7-OLD
GREEK-CCITT
HP-GREEK8
HP-ROMAN8
HP-ROMAN9
HP-THAI8
HP-TURKISH8
IBM037
IBM038
IBM256 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM273
IBM274
IBM275
IBM277 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM278 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM280 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM281
IBM284 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM285
IBM290
IBM297 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM420
IBM423
IBM424 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] 2021-05-18 07:21:45 +02:00
IBM437
IBM500
IBM850
IBM851
IBM852
IBM855
IBM856
IBM857
IBM858
IBM860
IBM861
IBM862
IBM863
IBM864
IBM865
IBM866
IBM866NAV
IBM868
IBM869
IBM870
IBM871
IBM874
IBM875
IBM880
IBM891
IBM903
IBM904
IBM905
IBM918
IBM922
IBM1004
IBM1026
IBM1047
IBM1124
IBM1129
IBM1132
IBM1133
IBM1160
IBM1161
IBM1162
IBM1163
IBM1164
IEC_P27-1
INIS
INIS-8
INIS-CYRILLIC
INVARIANT
ISIRI-3342
ISO_646.BASIC
ISO_646.IRV
ISO_2033-1983
ISO_5427
ISO_5427-EXT
ISO_5428
ISO_6937
ISO_6937-2-25
ISO_6937-2-ADD
ISO_8859-1,GL
ISO_8859-SUPP
ISO_10367-BOX
ISO_10646
ISO_11548-1
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-8859-9E
ISO-8859-10
ISO-8859-11
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-8859-16
ISO-IR-90
ISO-IR-197
ISO-IR-209
IT
JIS_C6220-1969-JP
JIS_C6220-1969-RO
JIS_C6229-1984-A
JIS_C6229-1984-B
JIS_C6229-1984-B-ADD
JIS_C6229-1984-HAND
JIS_C6229-1984-HAND-ADD
JIS_C6229-1984-KANA
JIS_X0201
JOHAB
JUS_I.B1.002
JUS_I.B1.003-MAC
JUS_I.B1.003-SERB
KOI8-R
KOI8-RU
KOI8-T
KOI8-U
KOI-8
KSC5636
LATIN-GREEK
LATIN-GREEK-1
MAC-CENTRALEUROPE
MAC-CYRILLIC
MAC-IS
MAC-SAMI
MAC-UK
MACINTOSH
MIK
MSZ_7795.3
NATS-DANO
NATS-DANO-ADD
NATS-SEFI
NATS-SEFI-ADD
NC_NC00-10
NEXTSTEP
NF_Z_62-010
NF_Z_62-010_1973
NS_4551-1
NS_4551-2
PT
PT2
PT154
RK1048
SAMI
SAMI-WS2
SEN_850200_B
SEN_850200_C
SHIFT_JIS
SHIFT_JISX0213
T.61-7BIT
T.61-8BIT
T.101-G2
TCVN5712-1
TIS-620
TSCII
UTF-8 Update to Unicode 16.0.0 [BZ #32168] 2024-09-27 14:43:38 +02:00
VIDEOTEX-SUPPL
VISCII
WINDOWS-31J