mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-01-12 18:34:36 +08:00
cb7cbc16fa
I have implemented a framework of encoding translation between the backend and the frontend. Also I have added a new variable setting command: SET CLIENT_ENCODING TO 'encoding'; Other features include: Latin1 support more 8 bit cleaness See doc/README.mb for more details. Note that the pacthes are against May 30 snapshot. Tatsuo Ishii
143 lines
4.1 KiB
Plaintext
143 lines
4.1 KiB
Plaintext
postgresql 6.4 multi-byte (MB) support README Jun 5 1998
|
|
|
|
Tatsuo Ishii
|
|
t-ishii@sra.co.jp
|
|
http://www.sra.co.jp/people/t-ishii/PostgreSQL/
|
|
|
|
0. Introduction
|
|
|
|
The MB support is intended for allowing PostgreSQL to handle
|
|
multi-byte character sets such as EUC(Extended Unix Code), Unicode and
|
|
Mule internal code. With the MB enabled you can use multi-byte
|
|
character sets in regexp ,LIKE and some functions. The encoding system
|
|
chosen is determined at the compile time.
|
|
|
|
MB also fixes some problems concerning with 8-bit single byte
|
|
character sets including ISO8859. (I would not say all of problems
|
|
have been fixed. I just confirmed that the regression test ran fine
|
|
and a few French characters could be used with the patch. Please let
|
|
me know if you find any problem while using 8-bit characters)
|
|
|
|
1. How to use
|
|
|
|
create src/Makefile.custom with a line including:
|
|
|
|
MB=encoding_system
|
|
|
|
or run configure with the mb option:
|
|
|
|
% configure --with-mb=encoding_system
|
|
|
|
where encoding_system is one of:
|
|
|
|
EUC_JP Japanese EUC
|
|
EUC_CN Chinese EUC
|
|
EUC_KR Korean EUC
|
|
EUC_TW Taiwan EUC
|
|
UNICODE Unicode(UTF-8)
|
|
MULE_INTERNAL Mule internal
|
|
LATIN1 ISO 8859-1 English and some European laguages
|
|
|
|
Example:
|
|
|
|
% cat Makefile.custom
|
|
MB=EUC_JP
|
|
|
|
or
|
|
|
|
% configure --with-mb=EUC_JP
|
|
|
|
If MB is disabled, nothing is changed except better supporting for
|
|
8-bit single byte character sets.
|
|
|
|
2. PGCLIENTENCODING
|
|
|
|
If an environment variable PGCLIENTENCODING is defined on the
|
|
frontend, automatic encoding translation is done by the backend. For
|
|
example, if the backend has been compiled with MB=EUC_JP and
|
|
PGCLIENTENCODING=SJIS(Shift JIS: yet another Japanese encoding
|
|
system), then any SJIS strings coming from the frontend would be
|
|
translated to EUC_JP before going into the parser. Outputs from the
|
|
backend would be translated to SJIS of course.
|
|
|
|
Supported encodings for PGCLIENTENCODING are:
|
|
|
|
EUC_JP Japanese EUC
|
|
SJIS Yet another Japanese encoding
|
|
EUC_CN Chinese EUC
|
|
EUC_KR Korean EUC
|
|
EUC_TW Taiwan EUC
|
|
MULE_INTERNAL Mule internal
|
|
LATIN1 ISO 8859-1 English and some European laguages
|
|
|
|
Note that UNICODE is not supported(yet). Also note that the
|
|
translation is not always possible. Suppose you choose EUC_JP for the
|
|
backend, LATIN1 for the frotend, then some Japanese characters cannot
|
|
be translated into latin. In this case, a letter cannot be represented
|
|
in the Latin character set, would be transformed as:
|
|
|
|
(HEXA DECIMAL)
|
|
|
|
3. SET CLIENT_ENCODING TO command
|
|
|
|
Actually setting the frontend side encoding information is done by a
|
|
new command:
|
|
|
|
SET CLIENT_ENCODING TO 'encoding';
|
|
|
|
where encoding is one of the encodings those can be set to
|
|
PGCLIENTENCODING. To query the current the frontend encoding:
|
|
|
|
SHOW CLIENT_ENCODING;
|
|
|
|
To return to the default encoding:
|
|
|
|
RESET CLIENT_ENCODING;
|
|
|
|
This would reset the frontend encoding to same as the backend
|
|
encoding, thus no endoing translation would be performed.
|
|
|
|
4. References
|
|
|
|
These are good sources to start learning various kind of encoding
|
|
systems.
|
|
|
|
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
|
|
Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
|
|
appear in section 3.2.
|
|
|
|
Unicode: http://www.unicode.org/
|
|
The homepage of UNICODE.
|
|
|
|
RFC 2044
|
|
UTF-8 is defined here.
|
|
|
|
5. History
|
|
|
|
Jun 5, 1988
|
|
* add support for the encoding translation between the backend
|
|
and the frontend
|
|
* new command SET CLIENT_ENCODING etc. added
|
|
* add support for LATIN1 character set
|
|
* enhance 8 bit cleaness
|
|
|
|
April 21, 1998 some enhancements/fixes
|
|
* character_length(), position(), substring() are now aware of
|
|
multi-byte characters
|
|
* add octet_length()
|
|
* add --with-mb option to configure
|
|
* new regression tests for EUC_KR
|
|
(contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
|
|
* add some test cases to the EUC_JP regression test
|
|
* fix problem in regress/regress.sh in case of System V
|
|
* fix toupper(), tolower() to handle 8bit chars
|
|
|
|
Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
|
|
|
|
Mar 10, 1998 PL2 released
|
|
* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
|
|
* add an English document (this file)
|
|
* fix problems concerning 8-bit single byte characters
|
|
|
|
Mar 1, 1998 PL1 released
|