Add a README file for multi-byte. This file is contributed by

Chih-Chang Hsieh <cch@cc.kmu.edu.tw>, written in traditional Chinese (Big5).
2025-02-17 19:30:00 +08:00 · 2001-01-09 09:54:11 +00:00 · 2001-01-09 09:54:11 +00:00 · eea348b72b
commit eea348b72b
parent 7edff1618e
1 changed files with 326 additions and 0 deletions
--- a/doc/README.mb.big5
+++ b/doc/README.mb.big5
@ -0,0 +1,326 @@
+PostgreSQL 7.0.1 multi-byte (MB) support README	  May 20 2000
+
+						Tatsuo Ishii
+						ishii@postgresql.org
+		  http://www.sra.co.jp/people/t-ishii/PostgreSQL/
+
+[註] 1. 感謝石井達夫 (Tatsuo Ishii) 先生!
+     2. 註釋部份原文所無, 中譯若有錯誤, 請聯絡 cch@cc.kmu.edu.tw
+
+
+0. 簡介
+
+MB 支援是為了讓 PostgreSQL 能處理多位元組字元 (multi-byte character),
+例如: EUC (Extended Unix Code), Unicode (統一碼) 和 Mule internal code
+(多國語言內碼). 在 MB 的支援下, 你可以在正規表示式 (regexp), LIKE 及
+其他一些函式中使用多位元組字元. 預設的編碼系統可取決於你安裝 PostgreSQL
+時的 initdb(1) 命令, 亦可由 createdb(1) 命令或建立資料庫的 SQL 命令決定.
+所以你可以有多個不同編碼系統的資料庫.
+
+MB 支援也解決了一些 8 位元單位元組字元集 (包含 ISO-8859-1) 的相關問題,
+(我並沒有說所有的相關問題都解決了, 我只是確認了迴歸測試執行成功,
+而一些法語字元在 MB 修補下可以使用. 如果你在使用 8 位元字元時發現了
+任何問題, 請通知我)
+
+1. 如何使用
+
+編譯 PostgreSQL 前, 執行 configure 時使用 multibyte 的選項
+
+	% ./configure --enable-multibyte[=encoding_system]
+	% ./configure --enable-multibyte[=編碼系統]
+
+其中的編碼系統可以指定為下面其中之一:
+
+	SQL_ASCII		ASCII
+	EUC_JP			Japanese EUC
+	EUC_CN			Chinese EUC
+	EUC_KR			Korean EUC
+	EUC_TW			Taiwan EUC
+	UNICODE			Unicode(UTF-8)
+	MULE_INTERNAL		Mule internal
+	LATIN1			ISO 8859-1 English and some European languages
+	LATIN2			ISO 8859-2 English and some European languages
+	LATIN3			ISO 8859-3 English and some European languages
+	LATIN4			ISO 8859-4 English and some European languages
+	LATIN5			ISO 8859-5 English and some European languages
+	KOI8			KOI8-R
+	WIN			Windows CP1251
+	ALT			Windows CP866
+
+例如:
+
+	% ./configure --enable-multibyte=EUC_JP
+
+如果省略指定編碼系統, 那麼預設值就是 SQL_ASCII.
+
+2. 如何設定編碼
+
+initdb 命令定義 PostgresSQL 安裝後的預設編碼, 例如:
+
+	% initdb -E EUC_JP
+
+將預設的編碼設定為 EUC_JP (Extended Unix Code for Japanese), 如果你喜歡
+較長的字串, 你也可以用 "--encoding" 而不用 "-E". 如果沒有使用 -E 或
+--encoding 的選項, 那麼編繹時的設定會成為預設值.
+
+你可以建立使用不同編碼的資料庫:
+
+	% createdb -E EUC_KR korean
+
+這個命令會建立一個叫做 "korean" 的資料庫, 而其採用 EUC_KR 編碼.
+另外有一個方法, 是使用 SQL 命令, 也可以達到同樣的目的:
+
+	CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
+
+在 pg_database 系統規格表 (system catalog) 中有一個 "encoding" 的欄位,
+就是用來紀錄一個資料庫的編碼. 你可以用 psql -l 或進入 psql 後用 \l 的
+命令來查看資料庫採用何種編碼:
+
+$ psql -l
+            List of databases
+   Database    |  Owner  |   Encoding    
+---------------+---------+---------------
+ euc_cn        | t-ishii | EUC_CN
+ euc_jp        | t-ishii | EUC_JP
+ euc_kr        | t-ishii | EUC_KR
+ euc_tw        | t-ishii | EUC_TW
+ mule_internal | t-ishii | MULE_INTERNAL
+ regression    | t-ishii | SQL_ASCII
+ template1     | t-ishii | EUC_JP
+ test          | t-ishii | EUC_JP
+ unicode       | t-ishii | UNICODE
+(9 rows)
+
+3. 前端與後端編碼的自動轉換
+
+[註: 前端泛指客戶端的程式, 可能是 psql 命令解譯器, 或採用 libpq 的 C 
+程式, Perl 程式, 或者是透過 ODBC 的視窗應用程式. 而後端就是指 PostgreSQL
+資料庫的伺服程式]
+
+PostgreSQL 支援某些編碼在前端與後端間做自動轉換: [註: 這裡所謂的自動
+轉換是指你在前端及後端所宣告採用的編碼不同, 但只要 PostgreSQL 支援這
+兩種編碼間的轉換, 那麼它會幫你在存取前做轉換]
+
+  encoding of backend			available encoding of frontend
+  --------------------------------------------------------------------
+	EUC_JP				EUC_JP, SJIS
+  
+	EUC_TW				EUC_TW, BIG5
+  
+  	LATIN2				LATIN2, WIN1250
+  
+	LATIN5				LATIN5, WIN, ALT
+  
+	MULE_INTERNAL			EUC_JP, SJIS, EUC_KR, EUC_CN, 
+					EUC_TW, BIG5, LATIN1 to LATIN5, 
+					WIN, ALT, WIN1250
+
+在啟動自動編碼轉換之前, 你必須告訴 PostgreSQL 你要在前端採用何種編碼.
+有好幾個方法可以達到這個目的:
+
+o 在 psql 命令解譯器中使用 \encoding 這個命令
+
+\encoding 這個命令可以讓你馬上切換前端編碼, 例如, 你要將前端編碼切換為 SJIS,
+那麼請打:
+
+	\encoding SJIS
+
+o 使用 libpq [註: PostgreSQL 資料庫的 C API 程式庫] 的函式
+
+psql 的 \encoding 命令其實只是去呼叫 PQsetClientEncoding() 這個函式來達到目的.
+
+  int PQsetClientEncoding(PGconn *conn, const char *encoding)
+
+上式中 conn 這個參數代表一個對後端的連線, encoding 這個參數要放你想用的編碼,
+假如它成功地設定了編碼, 便會傳回 0 值, 失敗的話傳回 -1. 至於目前連線的編碼可
+利用以下函式查知:
+
+  int PQclientEncoding(const PGconn *conn)
+
+這裡要注意的是: 這個函式傳回的是編碼的代號 (encoding id, 是個整數值),
+而不是編碼的名稱字串 (如 "EUC_JP"), 如果你要由編碼代號得知編碼名稱,
+必須呼叫:
+
+char *pg_encoding_to_char(int encoding_id)
+
+o 使用 PGCLIENTENCODING 這個環境變數
+
+如果前端底設定了 PGCLIENTENCODING 這一個環境變數, 那麼後端會做編碼自動轉換.
+
+[註] PostgreSQL 7.0.0 ~ 7.0.3 有個 bug -- 不認這個環境變數
+
+o 使用 SET CLIENT_ENCODING TO 這個 SQL 的命令
+
+要設定前端的編碼可以用以下這個 SQL 命令:
+
+	SET CLIENT_ENCODING TO 'encoding';
+
+你也可以使用 SQL92 的語法 "SET NAMES" 達到同樣的目的:
+
+	SET NAMES 'encoding';
+
+查詢目前的前端編碼可以用以下這個 SQL 命令:
+
+	SHOW CLIENT_ENCODING;
+
+切換為原來預設的編碼, 用以下這個 SQL 命令:
+
+	RESET CLIENT_ENCODING;
+
+[註] 使用 psql 命令解譯器時, 建議不要用這個方法, 請用 \encoding
+
+4. 關於 Unicode (統一碼)
+
+統一碼和其他編碼間的轉換可能要在 7.1 版後才會實現.
+
+5. 如果無法轉換會發生什麼事?
+
+假設你在後端選擇了 EUC_JP 這個編碼, 前端使用 LATIN1, (某些日文字元無法轉換成 
+LATIN1) 在這個狀況下, 某個字元若不能轉成 LATIN1 字元集, 就會被轉成以下的型式:
+
+	(十六進位值)
+
+6. 參考資料
+
+These are good sources to start learning various kind of encoding
+systems.
+
+ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
+	Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
+	appear in section 3.2.
+
+Unicode: http://www.unicode.org/
+	The homepage of UNICODE.
+
+	RFC 2044
+	UTF-8 is defined here.
+
+5. History
+
+May 20, 2000
+	* SJIS UDC (NEC selection IBM kanji) support contributed
+	  by Eiji Tokuya
+	* Changes above will appear in 7.0.1
+
+Mar 22, 2000
+	* Add new libpq functions PQsetClientEncoding, PQclientEncoding
+	* ./configure --with-mb=EUC_JP
+	  now deprecated. use 
+	  ./configure --enable-multibyte=EUC_JP
+	  instead
+  	* Add SQL_ASCII regression test case
+	* Add SJIS User Defined Character (UDC) support
+	* All of above will appear in 7.0
+
+July 11, 1999
+	* Add support for WIN1250 (Windows Czech) as a client encoding
+	  (contributed by Pavel Behal)
+	* fix some compiler warnings (contributed by Tomoaki Nishiyama)
+
+Mar 23, 1999
+	* Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
+	  (thanks Oleg Broytmann for testing)
+	* Fix problem with MB and locale
+
+Jan 26, 1999
+	* Add support for Big5 for fronend encoding
+	  (you need to create a database with EUC_TW to use Big5)
+	* Add regression test case for EUC_TW
+	  (contributed by Jonah Kuo <jonahkuo@mail.ttn.com.tw>)
+
+Dec 15, 1998
+	* Bugs related to SQL_ASCII support fixed
+
+Nov 5, 1998
+	* 6.4 release. In this version, pg_database has "encoding"
+	  column that represents the database encoding
+
+Jul 22, 1998
+	* determine encoding at initdb/createdb rather than compile time
+	* support for PGCLIENTENCODING when issuing COPY command
+	* support for SQL92 syntax "SET NAMES"
+	* support for LATIN2-5
+	* add UNICODE regression test case
+	* new test suite for MB
+	* clean up source files
+
+Jun 5, 1998
+	* add support for the encoding translation between the backend
+	  and the frontend
+	* new command SET CLIENT_ENCODING etc. added
+	* add support for LATIN1 character set
+	* enhance 8 bit cleaness
+
+April 21, 1998 some enhancements/fixes
+	* character_length(), position(), substring() are now aware of 
+	  multi-byte characters
+	* add octet_length()
+	* add --with-mb option to configure
+	* new regression tests for EUC_KR
+  	  (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
+	* add some test cases to the EUC_JP regression test
+	* fix problem in regress/regress.sh in case of System V
+	* fix toupper(), tolower() to handle 8bit chars
+
+Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
+
+Mar 10, 1998 PL2 released
+	* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
+	* add an English document (this file)
+	* fix problems concerning 8-bit single byte characters
+
+Mar 1, 1998 PL1 released
+
+Appendix:
+
+[Here is a good documentation explaining how to use WIN1250 on
+Windows/ODBC from Pavel Behal. Please note that Installation step 1)
+is not necceary in 6.5.1 -- Tatsuo]
+
+Version: 0.91 for PgSQL 6.5
+Author: Pavel Behal
+Revised by: Tatsuo Ishii
+Email: behal@opf.slu.cz
+Licence: The Same as PostgreSQL
+
+Sorry for my Eglish and C code, I'm not native :-)
+
+!!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+Instalation:
+------------
+1) Change three affected files in source directories 
+    (I don't have time to create proper patch diffs, I don't know how)
+2) Compile with enabled locale and multibyte set to LATIN2
+3) Setup properly your instalation, do not forget to create locale
+   variables in your profile (environment). Ex. (may not be exactly true):
+	LC_ALL=cs_CZ.ISO8859-2
+	LC_COLLATE=cs_CZ.ISO8859-2
+	LC_CTYPE=cs_CZ.ISO8859-2
+	LC_MONETARY=cs_CZ.ISO8859-2
+	LC_NUMERIC=cs_CZ.ISO8859-2
+	LC_TIME=cs_CZ.ISO8859-2
+4) You have to start the postmaster with locales set!
+5) Try it with Czech language, it have to sort
+5) Install ODBC driver for PgSQL into your M$ Windows
+6) Setup properly your data source. Include this line in your ODBC
+   configuration dialog in field "Connect Settings:" :
+	SET CLIENT_ENCODING = 'WIN1250';
+7) Now try it again, but in Windows with ODBC.
+
+Description:
+------------
+- Depends on proper system locales, tested with RH6.0 and Slackware 3.6,
+  with cs_CZ.iso8859-2 loacle
+- Never try to set-up server multibyte database encoding to WIN1250,
+  always use LATIN2 instead. There is not WIN1250 locale in Unix
+- WIN1250 encoding is useable only for M$W ODBC clients. The characters are
+  on thy fly re-coded, to be displayed and stored back properly
+ 
+Important:
+----------
+- it reorders your sort order depending on your LC_... setting, so don't be
+  confused with regression tests, they don't use locale
+- "ch" is corectly sorted only in some newer locales (Ex. RH6.0)
+- you have to insert money as '162,50' (with comma in aphostrophes!)
+- not tested properly