Commit Graph

1711 Commits

Author SHA1 Message Date
Mike FABIAN
e171ad7d59 localedata: dz_BT, bo_CN: convert to UTF-8 2024-01-08 17:02:09 +01:00
Valery Ushakov
4c2b356be5 localedata: dz_BT, bo_CN: Fix spelling of "phur bu" in both Tibetan and Dzongkha
Resolves: BZ # 31086
2024-01-08 16:44:28 +01:00
Valery Ushakov
6b8419ba5f localedata: bo_CN: Fix spelling errors in Tibetan data
Resolves: BZ # 31086
2024-01-08 16:39:31 +01:00
Valery Ushakov
c4f648ed4d localedata: bo_CN: Fix incomplete edit in Tibetan yesexpr
Resolves: BZ # 31086
2024-01-08 16:08:07 +01:00
Valery Ushakov
460f26e51b localedata: dz_BT: Fix spelling errors in Dzongha data
Resolves: BZ # 31086
2024-01-08 16:04:59 +01:00
Mike FABIAN
d333a2e0fb localedata: unicode-gen: Remove redundant \s* from regexp, fix comments 2024-01-08 10:06:42 +01:00
Mike FABIAN
6f87f46bf4 localedata: convert the remaining *_RU locales to UTF-8 2024-01-08 10:06:42 +01:00
Mike FABIAN
e9f5dc7e4a localedata: ru_RU, ru_UA: convert to UTF-8 2024-01-04 16:32:44 +01:00
Mike FABIAN
d61a2bd782 localedata: es_??: convert to UTF-8 2024-01-04 16:03:08 +01:00
Mike FABIAN
734abeda98 localedata: miq_NI: convert to UTF-8 2024-01-04 16:03:08 +01:00
Mike FABIAN
b31a01909c localedata: fy_DE: make this "Western Frisian" to agree with the language code "fy"
Resolves: BZ # 14522
2024-01-03 20:55:44 +01:00
Mike FABIAN
3c173c1f63 localedata: fy_DE, fy_NL: convert to UTF-8 2024-01-03 20:07:21 +01:00
Mike FABIAN
bec492c1da localedata: ast_ES: convert to UTF-8 2024-01-03 17:44:52 +01:00
Mike FABIAN
521e96c13f localedata: ast_ES: Remove wrong copyright text
Resolves: BZ # 27601
2024-01-03 17:43:55 +01:00
Mike FABIAN
5448a127e4 localedata: de_{AT,BE,CH,IT,LU}: convert to UTF-8 2024-01-03 13:54:34 +01:00
Mike FABIAN
a8f7f742be localedata: lv_LV, it_IT, it_CH: convert to UTF-8 2024-01-03 13:54:34 +01:00
Mike FABIAN
61171bb2b9 localedata: it_IT, lv_LV: currency symbol should follow the amount
Resolves: BZ # 28558
2024-01-03 13:54:34 +01:00
Mike FABIAN
fe316dad7c localedata: ms_MY should not use 12-hour format
Resolves: BZ # 29504
2024-01-03 11:07:27 +01:00
Mike FABIAN
b5b558ab4b localedata: es_ES: convert to UTF-8 2024-01-02 21:30:42 +01:00
Mike FABIAN
e3e98b0327 localedata: es_ES: Add am_pm strings
Resolves: BZ # 24013

Use <U202F> instead of a plain space because CLDR also uses that.
2024-01-02 21:30:42 +01:00
Mike FABIAN
67f371e882 localedata: convert uz_UZ and uz_UZ@cyrillic to UTF-8 2024-01-02 16:36:43 +01:00
Mike FABIAN
cdce63a767 localedata: uz_UZ and uz_UZ@cyrillic: Fix decimal point and thousands separator
Resolves: BZ # 31204
2024-01-02 16:36:43 +01:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Mike FABIAN
fce5528fcb localedata: yo_NT: remove redundant comments
See: https://sourceware.org/pipermail/libc-alpha/2023-December/153538.html
2023-12-26 13:27:07 +01:00
Mike FABIAN
6b3ace3a1d localedata: convert en_AU, en_NZ, mi_NZ, niu_NZ to UTF-8 2023-12-26 10:05:50 +01:00
Mike FABIAN
89d727efd7 localedata: First day of the week in AU is Monday, LC_TIME in en_NZ is identical to LC_TIME in en_AU then
Resolves: BZ # 24877
2023-12-26 09:59:10 +01:00
Mike FABIAN
e65ca11515 localedata: convert yo_NG to UTF-8, check that language name in Yoruba agrees with CLDR
Related: BZ # 24878
2023-12-25 21:04:38 +01:00
Mike FABIAN
1e70252508 localedata: id_ID: change first weekday to Sunday
Resolves: BZ # 30412

See: https://sourceware.org/bugzilla/show_bug.cgi?id=30412#c7

CLDR also has ID in the list of territories which have Sunday as the
first day of the week.
2023-12-19 11:23:19 +01:00
RushingAlien
12ab77e893 id_ID: Update Time Locales
Hello! I am Indonesian, was born and raised in Indonesia and still do live in
Indonesia.

This patch brings a few changes to the time locales of id_ID, which
includes :
\- Defining am_pm and time_fmpt_ampm
\- Changing time_fmt and d_t_fmt to use the 24-hour format
\- Changing first_weekday to Monday
This is a squashed version of what is previously a 5 patch set

Here are reasons and details of the changes :

Change 1 part 1

id_ID: Define `am_pm` string

Current formatting does not define am_pm string, leading to AM and PM
not being specified in 12 H time format. This change defines the string
by changing it from an empty string to "AM";"PM".

output of `date +%r`:
before commit: 01:23
after commit: 01:23 PM

Change 1 part 2

id_ID: Define time_fmt_ampm, change from an empty string

Currently, time_fmpt_ampm is set to an empty string, causing some
programs to not be able to display time in the 12-hour format, for
example, glib: https://gitlab.gnome.org/GNOME/glib/-/issues/2967.
This commit changes it from an empty string to "%I:%M:%S %p"

Change 2 part 1

id_ID: Use 24-hour format for time_fmt

Indonesian standard and formal time format uses the 24-hour format inst-
ead of the 12-hour format. This commit aims to change the id_ID locale's
time_fmt to match that accordingly.

Change 2 part 2

id_ID: Use 24-hour format for d_t_fmt.

Indonesian standard and formal time format uses the 24-hour format inst-
ead of the 12-hour format. This commit aims to change the id_ID locale's
d_t_fmt to match that accordingly.

Change 3

id_ID: Change first_weekday to monday

Indonesian calendar starts of the week with Monday, let's comply

Message-ID: <20230821035530.9075-1-rushing27alien@gmail.com>
Resolves: BZ # 30412
Reviewed-by: Mike Fabian <mfabian@redhat.com>
2023-12-18 09:57:33 +01:00
Mike FABIAN
73d92c4b73 localedata: Convert el_GR and el_CY locales to UTF-8 2023-12-15 21:08:44 +01:00
Mike FABIAN
14a94f2e35 localedata: el_GR: Greece now uses the 24h format for time
Resolves: BZ # 23012
2023-12-15 21:08:44 +01:00
Mike FABIAN
958478889c localedata: Convert day names in nn_NO locale to UTF-8 2023-12-07 08:28:25 +01:00
Mike FABIAN
ff25f355af localedata: Remove trailing whitespace in weekday names in nn_NO locale
Resolves: BZ # 25868
2023-12-07 08:28:25 +01:00
Mike FABIAN
dae3cf4134 localedata: Convert oc_FR locale to UTF-8 2023-11-16 23:58:17 +01:00
Mike FABIAN
70246b8495 localedata: Add information for Occitan
Resolves: BZ # 28787
2023-11-16 23:58:17 +01:00
Mike FABIAN
3fddfe3c5d New Zealand locales (en_NZ & mi_NZ) first day of week should be Monday
Resolves: BZ #29486
2023-11-16 13:59:00 +01:00
Mike FABIAN
d2d797a49b Remove unused localedata/th_TH.in 2023-09-21 10:34:35 +02:00
Mike FABIAN
aceda10bd5 Adapt collation in th_TH locale to use the iso14651_t1_common file and sync the collation with CLDR
I made it to agree as much as possible with the rules from CLDR (see:
https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml).

It seems to be impossible to follow the CLDR rules

  &[before 1]๚<ฯ # should be "variable"

and

  &๛<ๆ # should be "variable"

exactly though. These ask for a primary difference in punctuation
characters whose primary weight should be "IGNORE". But using a
secondary differnence instead still sorts the test data correctly and
the previously used collation in th_TH used tertiary differences for
these characters.

There was old localedata/th_TH.in test data in TIS-620 encoding which
was not used (it was not in the localedata/Makefile). I converted this
to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to
localedata/Makefile.

Using the existing collation rules in the th_TH locale did not sort that
test file completely correct, I think my new collation rules based on
iso14651_t1 are better.
2023-09-21 10:34:35 +02:00
Mike FABIAN
bb5bbc2070 Update to Unicode 15.1.0 [BZ #30854]
Unicode 15.1.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 15.1.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

    Total removed characters in newly generated CHARMAP: 0
    Total changed characters in newly generated CHARMAP: 0
    Total added characters in newly generated CHARMAP: 627
    Total removed characters in newly generated WIDTH: 0
    Total changed characters in newly generated WIDTH: 0
    Total added characters in newly generated WIDTH: 627

    alpha: Added 622 characters in new ctype which were not in old ctype
    graph: Added 627 characters in new ctype which were not in old ctype
    print: Added 627 characters in new ctype which were not in old ctype
    punct: Added 5 characters in new ctype which were not in old ctype
        The five characters added to punct are:
        2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;;
        2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;;
        2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;;
        2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;;
        31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;;

    The Unicode announcement blog entry says "[...] adds 627
    characters, [...] additions include 622 CJK unified ideographs in
    a new block, [...]", so that looks OK. The Unicode
    blog mentions "six completely new emoji" but they don't appear here as
    they are all sequences and not single code points.

Resolves: BZ #30854

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-09-16 08:37:03 +02:00
Mike FABIAN
71de3aead9 localedata/unicode-gen/utf8_gen.py: adapt regexp to get relevant lines from EastAsianWidth.txt
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-09-16 08:37:02 +02:00
Mike FABIAN
ba017b4f9d Fix regexp syntax warnings in localedata/unicode-gen/ctype_compatibility.py
Fix these:

$ python -m py_compile ./ctype_compatibility.py
./ctype_compatibility.py:146: SyntaxWarning: invalid escape sequence '\)'

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-09-16 08:37:02 +02:00
lijianglin
e1d3312015 add GB18030-2022 charmap and test the entire GB18030 charmap [BZ #30243]
support GB18030-2022 after add and change some transcoding relationship
of GB18030-2022.Details are as follows:
add 25 transcoding relationship
  UE81E 0x82359037
  UE826 0x82359038
  UE82B 0x82359039
  UE82C 0x82359130
  UE832 0x82359131
  UE843 0x82359132
  UE854 0x82359133
  UE864 0x82359134
  UE78D 0x84318236
  UE78F 0x84318237
  UE78E 0x84318238
  UE790 0x84318239
  UE791 0x84318330
  UE792 0x84318331
  UE793 0x84318332
  UE794 0x84318333
  UE795 0x84318334
  UE796 0x84318335
  UE816 0xfe51
  UE817 0xfe52
  UE818 0xfe53
  UE831 0xfe6c
  UE83B 0xfe76
  UE855 0xfe91
change 6 transcoding relationship
  U20087 0x95329031
  U20089 0x95329033
  U200CC 0x95329730
  U215D7 0x9536b937
  U2298F 0x9630ba35
  U241FE 0x9635b630
Test the entire GB18030 charmap, not only the Unicode BMP part.

Co-authored-by: yangyanchao <yangyanchao6@huawei.com>
Co-authored-by: liqingqing <liqingqing3@huawei.com>
Co-authored-by: Bruno Haible <bruno@clisp.org>
Reviewed-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Mike FABIAN <mfabian@redhat.com>
2023-08-29 19:02:30 +02:00
Colin Leroy-Mira
dfe8c44588 localedata: Translit common emojis to smileys [BZ #30649]
Add common emojis to the translit-able characters (mostly
faces and hearts), and translit them to old-fashioned
smileys.

Signed-off-by: Colin Leroy-Mira <colin@colino.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-08-29 09:31:23 +02:00
Florian Weimer
4dc6b2dfb0 localedata: de_DE should not use Fräulein
This honorific has fallen out of use quite some time ago.
2023-02-27 16:54:22 +01:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
Mike FABIAN
7fe6734d28 Update to Unicode 15.0.0 [BZ #29604]
Unicode 15.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 15.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

    Total added characters in newly generated CHARMAP: 4489
    Total removed characters in newly generated WIDTH: 0
    Total changed characters in newly generated WIDTH: 0
    Total added characters in newly generated WIDTH: 4257

    alpha: Added 4389 characters in new ctype which were not in old ctype
    combining: Added 42 characters in new ctype which were not in old ctype
    combining_level3: Added 34 characters in new ctype which were not in old ctype
    graph: Added 4489 characters in new ctype which were not in old ctype
    lower: Added 73 characters in new ctype which were not in old ctype
    print: Added 4489 characters in new ctype which were not in old ctype
    punct: Missing 5 characters of old ctype in new ctype
        punct: Missing: ఄ 0xc04 TELUGU SIGN COMBINING ANUSVARA ABOVE
        punct: Missing: ྂ 0xf82 TIBETAN SIGN NYI ZLA NAA DA
        punct: Missing: ྃ 0xf83 TIBETAN SIGN SNA LDAN
        punct: Missing: 𑂀 0x11080 KAITHI SIGN CANDRABINDU
        punct: Missing: 𑂁 0x11081 KAITHI SIGN ANUSVARA
            That’s OK, because these are now Alphabetic in DerivedCoreProperties.txt
    punct: Added 105 characters in new ctype which were not in old ctype

Resolves: BZ #29604
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-10-06 08:58:33 +02:00
Adhemerval Zanella Netto
de477abcaa Use '%z' instead of '%Z' on printf functions
The Z modifier is a nonstandard synonymn for z (that predates z
itself) and compiler might issue an warning for in invalid
conversion specifier.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-09-22 08:48:04 -03:00
Florian Weimer
1d78299911 localedata: Convert French language locales (fr_*) to UTF-8 2022-08-17 11:07:00 +02:00
Florian Weimer
01441ae333 de_DE: Convert to UTF-8
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 09:07:02 +02:00
Emil Soleyman-Zomalan
3e29dc5233 Add locale for syr_SY 2022-04-21 13:05:40 +02:00