postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2025-01-06 15:24:56 +08:00

Author	SHA1	Message	Date
Peter Eisentraut	f85a485f89	Add support for automatically updating Unicode derived files We currently have several sets of files generated from data provided by Unicode. These all have ad hoc rules and instructions for updating when new Unicode versions appear, and it's not done consistently. This patch centralizes and automates the process and makes it part of the release checklist. The Unicode and CLDR versions are specified in Makefile.global.in. There is a new make target "update-unicode" that downloads all the relevant files and runs the generation script. There is also a new script for generating the table of combining characters for ucs_wcwidth(). That table is now in a separate include file rather than hardcoded into the middle of other code. This is based on the script that was used for generating `d8594d123c`, but the script itself wasn't committed at that time. Reviewed-by: John Naylor <john.naylor@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com	2020-01-09 10:08:14 +01:00
Peter Eisentraut	bdb839cbde	Update unicode.org URLs Use https, consistent host name, remove references to ftp. Also update the URLs for CLDR, which has moved from Trac to GitHub.	2019-10-13 22:10:38 +02:00
Alvaro Herrera	0afc0a7841	Fix unaccent generation script in Windows As originally coded, the script would fail on Windows 10 and Python 3 because stdout would not be switched to UTF-8 only for Python 2. This patch makes that apply to both versions. Also add python 2 compatibility markers so that we know what to remove once we drop support for that. Also use a "with" clause to ensure file descriptor is closed promptly. Author: Hugh Ranalli, Ramanarayana Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CAKm4Xs7_61XMyOWmHs3n0mmkS0O4S0pvfWk=7cQ5P0gs177f7A@mail.gmail.com Discussion: https://postgr.es/m/15548-cef1b3f8de190d4f@postgresql.org	2019-09-10 18:15:15 -03:00
Thomas Munro	456e3718e7	Add combining characters to unaccent.rules. Strip certain classes of combining characters, so that accents encoded this way are removed. Author: Hugh Ranalli Discussion: https://postgr.es/m/15548-cef1b3f8de190d4f%40postgresql.org	2019-02-01 15:23:01 +01:00
Michael Paquier	e1c1d5444e	Update unaccent rules with release 34 of CLDR for Latin-ASCII.xml This has required an update of the python script generating the rules, as its format has changed in release 29. This release has also added new punctuation and symbols, and a new set of rules has been generated to include them. The way to find newest versions of Latin-ASCII gets also more clearly documented. Author: Hugh Ranalli, Michael Paquier Discussion: https://postgr.es/m/15548-cef1b3f8de190d4f@postgresql.org	2019-01-10 14:10:21 +09:00
Peter Eisentraut	3d59da9ccd	unaccent: Make generate_unaccent_rules.py Python 3 compatible Python 2 is still supported. Author: Hugh Ranalli <hugh@whtc.ca> Discussion: https://www.postgresql.org/message-id/CAAhbUMNyZ+PhNr_mQ=G161K0-hvbq13Tz2is9M3WK+yX9cQOCw@mail.gmail.com	2019-01-04 11:12:31 +01:00
Thomas Munro	5e8d670c31	Add Greek characters to unaccent.rules. Author: Tasos Maschalidis Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/153495048900.1368.11566580687623014380%40wrigleys.postgresql.org Discussion: https://postgr.es/m/VI1PR01MB38537EBD529FE5EE3FE9A5FEB5370%40VI1PR01MB3853.eurprd01.prod.exchangelabs.com	2018-09-02 07:12:24 +12:00
Tom Lane	ec0a69e49b	Extend the default rules file for contrib/unaccent with Vietnamese letters. Improve generate_unaccent_rules.py to handle composed characters whose base is another composed character rather than a plain letter. The net effect of this is to add a bunch of multi-accented Vietnamese characters to unaccent.rules. Original complaint from Kha Nguyen, diagnosis of the script's shortcoming by Thomas Munro. Dang Minh Huong and Michael Paquier Discussion: https://postgr.es/m/CALo3sF6EC8cy1F2JUz=GRf5h4LMUJTaG3qpdoiLrNbWEXL-tRg@mail.gmail.com	2017-08-16 16:51:56 -04:00
Teodor Sigaev	ce91b9209f	fix typo in comment	2016-03-16 17:18:14 +03:00
Teodor Sigaev	9a206d063c	Improve script generating unaccent rules Script now use the standard Unicode transliterator Latin-ASCII. Author: Leonard Benedetti	2016-03-16 16:47:03 +03:00
Teodor Sigaev	1bbd52cb9a	Make unaccent handle all diacritics known to Unicode, and expand ligatures correctly Add Python script for buiding unaccent.rules from Unicode data. Don't backpatch because unaccent changes may require tsvector/index rebuild. Thomas Munro <thomas.munro@enterprisedb.com>	2015-09-04 12:51:53 +03:00

11 Commits