mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-12-03 08:00:21 +08:00
Convert more charset/locale documentation to DocBook
This commit is contained in:
parent
333cbc2dab
commit
0ba77c14aa
@ -1,113 +0,0 @@
|
|||||||
|
|
||||||
PostgreSQL Charsets README
|
|
||||||
Josef Balatka, <balatka@email.cz>
|
|
||||||
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999
|
|
||||||
|
|
||||||
This document is a brief overview of the national charsets support
|
|
||||||
that PostgreSQL ver. 6.5 has implemented. Various compilation options
|
|
||||||
and setup tips are mentioned here to be helpful in the particular use.
|
|
||||||
|
|
||||||
---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Table of Contents
|
|
||||||
|
|
||||||
1. Locale awareness
|
|
||||||
|
|
||||||
2. Single-byte charsets recoding
|
|
||||||
|
|
||||||
3. Multi-byte support/recoding
|
|
||||||
|
|
||||||
4. Credits
|
|
||||||
|
|
||||||
---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
1. Locale awareness
|
|
||||||
|
|
||||||
PostgreSQL server supports both locale aware and locale not aware
|
|
||||||
(default) operational modes. You can determine this mode during the
|
|
||||||
configuration stage of the installation with --enable-locale option.
|
|
||||||
|
|
||||||
If you don't use --enable-locale, the multi-language code will not be
|
|
||||||
compiled and PostgreSQL will behave as an ASCII compliant application.
|
|
||||||
This mode is useful for its speed but only provided that you don't
|
|
||||||
have to consider national specific chars.
|
|
||||||
|
|
||||||
With --enable-locale you will get a locale aware server using LC_*
|
|
||||||
environment variables to determine how to process national specifics.
|
|
||||||
In this case strcoll(3) and similar functions are used internally
|
|
||||||
so speed is somewhat lower.
|
|
||||||
|
|
||||||
Notice here that --enable-locale is sufficient when all your clients
|
|
||||||
use the same single-byte encoding as the database server does.
|
|
||||||
|
|
||||||
When your clients use encoding different from the server than you have
|
|
||||||
to use, moreover, --enable-recode or --with-mb=<encoding> options on
|
|
||||||
the server side or a particular client that does recoding itself (e.g.
|
|
||||||
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic
|
|
||||||
encoding capability). Option --with-mb=<encoding> is necessary for the
|
|
||||||
multi-byte charsets support.
|
|
||||||
|
|
||||||
|
|
||||||
2. Single-byte charsets recoding
|
|
||||||
|
|
||||||
You can set up this feature with --enable-recode option. This option
|
|
||||||
is described as 'enable Cyrillic recode support' which doesn't express
|
|
||||||
all its power. It can be used for *any* single-byte charset recoding.
|
|
||||||
|
|
||||||
This method uses charset.conf file located in the $PGDATA directory.
|
|
||||||
It's a typical configuration text file where spaces and newlines
|
|
||||||
separate items and records and # specifies comments. Three keywords
|
|
||||||
with the following syntax are recognized here:
|
|
||||||
|
|
||||||
BaseCharset <server_charset>
|
|
||||||
RecodeTable <from_charset> <to_charset> <file_name>
|
|
||||||
HostCharset <host_spec> <host_charset>
|
|
||||||
|
|
||||||
BaseCharset defines encoding of the database server. All charset
|
|
||||||
names are only used for mapping inside the charset.conf so you can
|
|
||||||
freely use typing-friendly names.
|
|
||||||
|
|
||||||
RecodeTable records specify translation table between server and client.
|
|
||||||
The file name is relative to the $PGDATA directory. Table file format
|
|
||||||
is very simple. There are no keywords and characters are represented by
|
|
||||||
a pair of decimal or hexadecimal (0x prefixed) values on single lines:
|
|
||||||
|
|
||||||
<char_value> <translated_char_value>
|
|
||||||
|
|
||||||
HostCharset records define IP address and charset. You can use a single
|
|
||||||
IP address, an IP mask range starting from the given address or an IP
|
|
||||||
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40)
|
|
||||||
|
|
||||||
The charset.conf is always processed up to the end, so you can easily
|
|
||||||
specify exceptions from the previous rules. In the src/data you will
|
|
||||||
find charset.conf example and a few recoding tables.
|
|
||||||
|
|
||||||
As this solution is based on the client's IP address / charset mapping
|
|
||||||
there are obviously some restrictions as well. You can't use different
|
|
||||||
encoding on the same host at the same time. It's also inconvenient when
|
|
||||||
you boot your client hosts into more operating systems.
|
|
||||||
Nevertheless, when these restrictions are not limiting and you don't
|
|
||||||
need multi-byte chars than it's a simple and effective solution.
|
|
||||||
|
|
||||||
|
|
||||||
3. Multi-byte support/recoding
|
|
||||||
|
|
||||||
It's a new generation of charset encoding in PostgreSQL designed as a
|
|
||||||
more complex solution supporting both single-byte and multi-byte chars.
|
|
||||||
You can set up this feature with --with-mb=<encoding> option.
|
|
||||||
|
|
||||||
There is no IP mapping file and recoding is controlled through the new
|
|
||||||
SQL statements. Recoding tables are included in the code. Many national
|
|
||||||
charsets are already supported and further will follow.
|
|
||||||
|
|
||||||
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how
|
|
||||||
to use the multibyte support. In the file doc/README.locale there is
|
|
||||||
a particular instruction on usage of the multibyte support with Cyrillic.
|
|
||||||
|
|
||||||
|
|
||||||
4. Credits
|
|
||||||
|
|
||||||
I'd like to thank the PostgreSQL development team and all contributors
|
|
||||||
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and
|
|
||||||
Tatsuo Ishii for opening the door into the multi-language world.
|
|
||||||
|
|
@ -1,107 +0,0 @@
|
|||||||
===========
|
|
||||||
1999 Jul 21
|
|
||||||
===========
|
|
||||||
|
|
||||||
Josef Balatka, <balatka@email.cz> asked us not to remove RECODE and sent me
|
|
||||||
Czech ISO-8859-2 -> WIN-1250 translation table.
|
|
||||||
RECODE is no longer contains just Cyrillic RECODE and will stay in
|
|
||||||
PostgreSQL.
|
|
||||||
|
|
||||||
He also created some bits of documentation, mostly concerning RECODE -
|
|
||||||
see README.Charsets.
|
|
||||||
|
|
||||||
|
|
||||||
===========
|
|
||||||
1999 Apr 14
|
|
||||||
===========
|
|
||||||
|
|
||||||
Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it
|
|
||||||
to Cyrillic language. Now PostgreSQL supports KOI8-R, WIN-1251, ISO8859-5
|
|
||||||
and CP866 (ALT) encodings.
|
|
||||||
|
|
||||||
Short instruction on using this feature follows. Longer discussion of
|
|
||||||
Multibyte support is in README.mb.
|
|
||||||
|
|
||||||
WARNING! Now with Multibyte support Cyrillic RECODE declared obsolete
|
|
||||||
and will be removed from Postgres. If you are using RECODE consider
|
|
||||||
switching to Multibyte support.
|
|
||||||
|
|
||||||
Instructions on how to prepare Postgres for Cyrillic Multibyte support.
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
First, you need to backup all your databases. I recommend to backup the
|
|
||||||
entire Postgres directory, including binaries and libraries - thus you can
|
|
||||||
easily restore if something goes wrong.
|
|
||||||
|
|
||||||
Dump you data: pg_dumpall > dump.db
|
|
||||||
|
|
||||||
Stop postmaster.
|
|
||||||
|
|
||||||
Configure, compile and install Postgres. (I'll mostly talk about KOI8-R
|
|
||||||
encoding, this is just to make examples a little more clear; you can use
|
|
||||||
any supported encoding.)
|
|
||||||
|
|
||||||
cd src
|
|
||||||
./configure --enable-locale --with-mb=KOI8
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
Make sure you've backed up your databases. Doublecheck your backup. I
|
|
||||||
really mean it - make regular backups and test your backups sometimes by
|
|
||||||
fake restore.
|
|
||||||
|
|
||||||
Remove your data directory (better, rename or move it).
|
|
||||||
|
|
||||||
Run initdb saying your primary encoding: initdb -e KOI8. If you omit
|
|
||||||
encoding, primary encoding from configure will be taken.
|
|
||||||
|
|
||||||
Start postmaster.
|
|
||||||
|
|
||||||
Create databases: createdb -e KOI8. Again, you can omit encoding -
|
|
||||||
default encoding will be used. You are not forced to use the same encoding
|
|
||||||
for all your databases - you can create different databases with different
|
|
||||||
encodings.
|
|
||||||
|
|
||||||
Load your data from the dump you've created: psql < dump.db
|
|
||||||
|
|
||||||
That's all! Now you are ready to enjoy the full power of Multibyte
|
|
||||||
support.
|
|
||||||
|
|
||||||
To use Multibyte support you do not need to do something special - just
|
|
||||||
execute your queries. If client program does not set encoding, it will get
|
|
||||||
the data in database encoding. But client may ask Postgres to do automatic
|
|
||||||
server-to-client and client-to-server conversions. There are 2 (two) ways
|
|
||||||
client program declares its encoding:
|
|
||||||
1) client explicitly executes the query SET CLIENT_ENCODING TO 'win';
|
|
||||||
2) client started with environment variable set. Examples -
|
|
||||||
using sh syntax:
|
|
||||||
PGCLIENTENCODING='win'; export PGCLIENTENCODING
|
|
||||||
using csh syntax:
|
|
||||||
setenv PGCLIENTENCODING 'win'
|
|
||||||
|
|
||||||
Setting PGCLIENTENCODING even if you use same client encding as the
|
|
||||||
database would omit an overhead of asking the database encoding while
|
|
||||||
initiating the connection, so it is good idea to set it in any case.
|
|
||||||
|
|
||||||
Now you may run test suite and see Multibyte support in action. Go to
|
|
||||||
.../src/test/locale and run
|
|
||||||
make clean all test-koi2win
|
|
||||||
|
|
||||||
|
|
||||||
===========
|
|
||||||
1998 Nov 20
|
|
||||||
===========
|
|
||||||
|
|
||||||
I extended locale support, originally written by Oleg Bartunov
|
|
||||||
<oleg@sai.msu.su>. Now ORDER BY (if PostgreSQL configured with
|
|
||||||
--enable-locale) uses strcoll() for all text fields: char(n), varchar(n),
|
|
||||||
text.
|
|
||||||
|
|
||||||
I included test suite .../src/test/locale. I didn't include this in
|
|
||||||
the regression test because not so much people require locale support. Read
|
|
||||||
.../src/test/locale/README for details on the test suite.
|
|
||||||
|
|
||||||
Many thanks to Oleg Bartunov (oleg@sai.msu.su) and Thomas G. Lockhart
|
|
||||||
(lockhart@alumni.caltech.edu) for hints, tips, help and discussion.
|
|
||||||
|
|
||||||
Oleg.
|
|
@ -1,5 +1,5 @@
|
|||||||
<!--
|
<!--
|
||||||
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.26 2000/09/12 05:37:07 thomas Exp $
|
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.27 2000/09/30 16:58:20 petere Exp $
|
||||||
|
|
||||||
Postgres Administrator's Guide.
|
Postgres Administrator's Guide.
|
||||||
Derived from postgres.sgml.
|
Derived from postgres.sgml.
|
||||||
@ -98,9 +98,9 @@ Derived from postgres.sgml.
|
|||||||
&intro-ag;
|
&intro-ag;
|
||||||
&installation;
|
&installation;
|
||||||
&installw;
|
&installw;
|
||||||
&charset;
|
|
||||||
&runtime;
|
&runtime;
|
||||||
&client-auth;
|
&client-auth;
|
||||||
|
&charset;
|
||||||
&manage-ag;
|
&manage-ag;
|
||||||
&user-manag;
|
&user-manag;
|
||||||
&backup;
|
&backup;
|
||||||
|
@ -1,44 +1,235 @@
|
|||||||
<chapter id="charset">
|
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.3 2000/09/30 16:58:20 petere Exp $ -->
|
||||||
<title>Character Sets</title>
|
|
||||||
|
|
||||||
<abstract>
|
<chapter id="charset">
|
||||||
<para>
|
<title>Localization</>
|
||||||
Describes the available language and character set support in
|
|
||||||
<productname>Postgres</productname>.
|
<abstract>
|
||||||
</para>
|
<para>
|
||||||
</abstract>
|
Describes the available localization features from the point of
|
||||||
|
view of the administrator.
|
||||||
|
</para>
|
||||||
|
</abstract>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
<productname>Postgres</productname> supports non-ASCII character
|
<productname>Postgres</productname> supports localization with
|
||||||
sets with two approaches:
|
three approaches:
|
||||||
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Using locale features in underlying
|
Using the locale features of the operating system to provide
|
||||||
system libraries. This allows single-byte character sets to be
|
locale-specific collation order, number formatting, and other
|
||||||
configured with a locale-specific collation order, provided that
|
aspects.
|
||||||
the underlying system supports the required locale. This
|
|
||||||
technique supports only one character set per server, and can
|
|
||||||
not support multi-byte character sets.
|
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Using explicit multiple-byte character sets defined in the
|
Using explicit multiple-byte character sets defined in the
|
||||||
<productname>Postgres</productname> server. These character sets
|
<productname>Postgres</productname> server to support languages
|
||||||
are also known to some client libraries. The number of character
|
that require more characters than will fit into a single byte,
|
||||||
sets is fixed at the time the server is compiled, and internal
|
and to provide character set recoding between client and server.
|
||||||
operations such as string comparisons require expansion of each
|
The number of supported character sets is fixed at the time the
|
||||||
character into a 32-bit word.
|
server is compiled, and internal operations such as string
|
||||||
|
comparisons require expansion of each character into a 32-bit
|
||||||
|
word.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Single byte character recoding provides a more light-weight
|
||||||
|
solution for users of multiple, yet single-byte character sets.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
|
||||||
|
<sect1 id="locale">
|
||||||
|
<title>Locale Support</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<firstterm>Locale</> support refers to an application respecting
|
||||||
|
cultural preferences regarding alphabets, sorting, number
|
||||||
|
formatting, etc. <productname>PostgreSQL</> uses the standard ISO
|
||||||
|
C and POSIX-like locale facilities provided by the server operating
|
||||||
|
system. For additional information refer the documentation of your
|
||||||
|
system.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Overview</>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Locale support is not build into <productname>PostgreSQL</> by
|
||||||
|
default; to enable it, supply the <option>--enable-locale</> option
|
||||||
|
to the <filename>configure</> script:
|
||||||
|
<informalexample>
|
||||||
|
<screen>
|
||||||
|
<prompt>$ </><userinput>./configure --enable-locale</>
|
||||||
|
</screen>
|
||||||
|
</informalexample>
|
||||||
|
Locale support only affects the server; all clients are compatible
|
||||||
|
with servers with or without locale support.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The information about which particular cultural rules to use is
|
||||||
|
determined by standard environment variables. If you are getting
|
||||||
|
localized behavior from other programs you probably have them set
|
||||||
|
up already. The simplest way to set the localization information
|
||||||
|
is the <envar>LANG</> variable, for example:
|
||||||
|
<programlisting>
|
||||||
|
export LANG=sv_SE
|
||||||
|
</programlisting>
|
||||||
|
This sets the locale to Swedish (<literal>sv</>) as spoken in
|
||||||
|
Sweden (<literal>SE</>). Other possibilities might be
|
||||||
|
<literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
|
||||||
|
French). If more than one character set can be useful for a locale
|
||||||
|
then the specifications look like this:
|
||||||
|
<literal>cs_CZ.ISO8859-2</>. What locales are available under what
|
||||||
|
names on your system depends on what was provided by the operating
|
||||||
|
system vendor and what was installed.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Occasionally it is useful to mix rules from several locales, e.g.,
|
||||||
|
use U.S. rules but Spanish messages. To do that a set of
|
||||||
|
environment variables exist that override the default of
|
||||||
|
<envar>LANG</> for a particular category:
|
||||||
|
|
||||||
|
<informaltable>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>LC_COLLATE</>
|
||||||
|
<entry>String sort order</>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>LC_CTYPE</>
|
||||||
|
<entry>Character classification (What is a letter? What is the upper-case equivalent of this letter?)</>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>LC_MESSAGES</>
|
||||||
|
<entry>Language of messages</>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>LC_MONETARY</>
|
||||||
|
<entry>Formatting of currency amounts</>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>LC_NUMERIC</>
|
||||||
|
<entry>Formatting of numbers</>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>LC_TIME</>
|
||||||
|
<entry>Formatting of dates and times</>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</informaltable>
|
||||||
|
|
||||||
|
<envar>LC_MESSAGES</> only affects the messages that come from the
|
||||||
|
operating system, not <productname>PostgreSQL</>.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If you want the system to behave as if it had no locale support,
|
||||||
|
use the special locale <literal>C</> or <literal>POSIX</>, or
|
||||||
|
simply unset all locale related variables.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Once you have chosen a set of localization rules this way you must
|
||||||
|
keep them fixed for any particular database cluster. That means
|
||||||
|
that the locales that were active when you ran <filename>initdb</>
|
||||||
|
must be kept the same when you start the postmaster. Otherwise,
|
||||||
|
the changed sort order can corrupt indexes or make your data
|
||||||
|
disappear mysteriously. It is currently not possible to change the
|
||||||
|
locales after database initialization or to use more than one set
|
||||||
|
of locales for a given database cluster.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Benefits</>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Locale support influences in particular the following features:
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Sort order in <command>ORDER BY</> queries.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The <function>to_char</> family of functions
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The <literal>LIKE</> and <literal>~</> operators for pattern
|
||||||
|
matching
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The only severe drawback of using the locale support in
|
||||||
|
<productname>PostgreSQL</> is its speed. So use locale only if you
|
||||||
|
actually need it.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Problems</>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If locale support doesn't work in spite of the explanation above,
|
||||||
|
check that the locale support in your operating system is okay.
|
||||||
|
To check whether a given locale is installed and functional you
|
||||||
|
can use <application>Perl</>, for example. Perl has also support
|
||||||
|
for locales and if a locale is broken <command>perl -v</> will
|
||||||
|
complain something like this:
|
||||||
|
<screen>
|
||||||
|
<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
|
||||||
|
<prompt>$</> <userinput>perl -v</>
|
||||||
|
<computeroutput>
|
||||||
|
perl: warning: Setting locale failed.
|
||||||
|
perl: warning: Please check that your locale settings:
|
||||||
|
LC_ALL = (unset),
|
||||||
|
LC_CTYPE = "not_exist",
|
||||||
|
LANG = (unset)
|
||||||
|
are supported and installed on your system.
|
||||||
|
perl: warning: Falling back to the standard locale ("C").
|
||||||
|
</computeroutput>
|
||||||
|
</screen>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Check that your locale files are in the right location. Possible
|
||||||
|
locations include: <filename>/usr/lib/locale</filename> (Linux,
|
||||||
|
Solaris), <filename>/usr/share/locale</filename> (Linux),
|
||||||
|
<filename>/usr/lib/nls/loc</filename> (DUX 4.0). Check the locale
|
||||||
|
man page of your system if you are not sure.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The directory <filename>src/test/locale</> contains a test suite
|
||||||
|
for <productname>PostgreSQL</>'s locale support.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
|
||||||
<sect1 id="multibyte">
|
<sect1 id="multibyte">
|
||||||
<title>Multi-byte Support</title>
|
<title>Multibyte Support</title>
|
||||||
|
|
||||||
<note>
|
<note>
|
||||||
<title>Author</title>
|
<title>Author</title>
|
||||||
@ -53,7 +244,7 @@
|
|||||||
</note>
|
</note>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Multi-byte (<acronym>MB</acronym>) support is intended to allow
|
Multibyte (<acronym>MB</acronym>) support is intended to allow
|
||||||
<productname>Postgres</productname> to handle
|
<productname>Postgres</productname> to handle
|
||||||
multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
|
multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
|
||||||
Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte
|
Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte
|
||||||
@ -680,7 +871,78 @@ SET CLIENT_ENCODING = 'WIN1250';
|
|||||||
</procedure>
|
</procedure>
|
||||||
</sect2>
|
</sect2>
|
||||||
</sect1>
|
</sect1>
|
||||||
</chapter>
|
|
||||||
|
|
||||||
|
<sect1 id="recode">
|
||||||
|
<title>Single-byte character set recoding</>
|
||||||
|
<!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You can set up this feature with the <option>--enable-recode</> option
|
||||||
|
to <filename>configure</>. This option was formerly described as
|
||||||
|
<quote>Cyrillic recode support</> which doesn't express all its
|
||||||
|
power. It can be used for <emphasis>any</> single-byte character
|
||||||
|
set recoding.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This method uses a file <filename>charset.conf</> file located in
|
||||||
|
the database directory (<envar>PGDATA</>). It's a typical
|
||||||
|
configuration text file where spaces and newlines separate items
|
||||||
|
and records and # specifies comments. Three keywords with the
|
||||||
|
following syntax are recognized here:
|
||||||
|
<synopsis>
|
||||||
|
BaseCharset <replaceable>server_charset</>
|
||||||
|
RecodeTable <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
|
||||||
|
HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
|
||||||
|
</synopsis>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<token>BaseCharset</> defines the encoding of the database server.
|
||||||
|
All character set names are only used for mapping inside of
|
||||||
|
<filename>charset.conf</> so you can freely use typing-friendly
|
||||||
|
names.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<token>RecodeTable</> records specify translation tables between
|
||||||
|
server and client. The file name is relative to the
|
||||||
|
<envar>PGDATA</> directory. The table file format is very
|
||||||
|
simple. There are no keywords and characters are represented by a
|
||||||
|
pair of decimal or hexadecimal (0x prefixed) values on single
|
||||||
|
lines:
|
||||||
|
<synopsis>
|
||||||
|
<replaceable>char_value</> <replaceable>translated_char_value</>
|
||||||
|
</synopsis>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<token>HostCharset</> records define the client character set by IP
|
||||||
|
address. You can use a single IP address, an IP mask range starting
|
||||||
|
from the given address or an IP interval (e.g., 127.0.0.1,
|
||||||
|
192.168.1.100/24, 192.168.1.20-192.168.1.40).
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <filename>charset.conf</> file is always processed up to the
|
||||||
|
end, so you can easily specify exceptions from the previous
|
||||||
|
rules. In the src/data you will find charset.conf example and a few
|
||||||
|
recoding tables.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
As this solution is based on the client's IP address and character
|
||||||
|
set mapping there are obviously some restrictions as well. You
|
||||||
|
cannot use different encodings on the same host at the same
|
||||||
|
time. It is also inconvenient when you boot your client hosts into
|
||||||
|
more operating systems. Nevertheless, when these restrictions are
|
||||||
|
not limiting and you do not need multi-byte characters than it is a
|
||||||
|
simple and effective solution.
|
||||||
|
</para>
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
</chapter>
|
||||||
|
|
||||||
<!-- Keep this comment at the end of the file
|
<!-- Keep this comment at the end of the file
|
||||||
Local variables:
|
Local variables:
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.21 2000/09/29 20:21:34 petere Exp $ -->
|
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.22 2000/09/30 16:58:20 petere Exp $ -->
|
||||||
|
|
||||||
<chapter id="installation">
|
<chapter id="installation">
|
||||||
<title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title>
|
<title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title>
|
||||||
@ -447,8 +447,9 @@ su - postgres
|
|||||||
<term>--enable-recode</term>
|
<term>--enable-recode</term>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Enables character set recode support. See
|
Enables single-byte character set recode support. See
|
||||||
<filename>doc/README.Charsets</> for details on this feature.
|
<![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
|
||||||
|
<![%flattext-install-ignore[<xref linkend="recode">]]> about this feature.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
@ -459,7 +460,10 @@ su - postgres
|
|||||||
<para>
|
<para>
|
||||||
Allows the use of multibyte character encodings. This is
|
Allows the use of multibyte character encodings. This is
|
||||||
primarily for languages like Japanese, Korean, and Chinese.
|
primarily for languages like Japanese, Korean, and Chinese.
|
||||||
Read <filename>doc/README.mb</> for details.
|
Read
|
||||||
|
<![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
|
||||||
|
<![%flattext-install-ignore[<xref linkend="multibyte">]]>
|
||||||
|
for details.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
<!--
|
<!--
|
||||||
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 thomas Exp $
|
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.42 2000/09/30 16:58:20 petere Exp $
|
||||||
-->
|
-->
|
||||||
|
|
||||||
<!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
|
<!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
|
||||||
@ -173,9 +173,9 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 th
|
|||||||
-->
|
-->
|
||||||
&installation;
|
&installation;
|
||||||
&installw;
|
&installw;
|
||||||
&charset;
|
|
||||||
&runtime;
|
&runtime;
|
||||||
&client-auth;
|
&client-auth;
|
||||||
|
&charset;
|
||||||
&manage-ag;
|
&manage-ag;
|
||||||
&user-manag;
|
&user-manag;
|
||||||
&backup;
|
&backup;
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
<!--
|
<!--
|
||||||
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.25 2000/09/29 20:21:34 petere Exp $
|
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.26 2000/09/30 16:58:20 petere Exp $
|
||||||
-->
|
-->
|
||||||
|
|
||||||
<Chapter Id="runtime">
|
<Chapter Id="runtime">
|
||||||
@ -1553,126 +1553,6 @@ set semsys:seminfo_semmsl=32
|
|||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
|
||||||
<sect1 id="locale">
|
|
||||||
<title>Locale Support</title>
|
|
||||||
|
|
||||||
<note>
|
|
||||||
<title>Acknowledgement</title>
|
|
||||||
<para>
|
|
||||||
Written by Oleg Bartunov. See <ulink
|
|
||||||
url="http://www.sai.msu.su/~megera/postgres/">Oleg's web
|
|
||||||
page</ulink> for additional information on locale and Russian
|
|
||||||
language support.
|
|
||||||
</para>
|
|
||||||
</note>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
While doing a project for a company in Moscow, Russia, I
|
|
||||||
encountered the problem that <productname>Postgres</> had no
|
|
||||||
support of national alphabets. After looking for possible
|
|
||||||
workarounds I decided to develop support of locale myself. I'm not
|
|
||||||
a C programmer but already had some experience with locale
|
|
||||||
programming when I work with <productname>Perl</> (debugging) and
|
|
||||||
<productname>Glimpse</>. After several days of digging through the
|
|
||||||
<productname>Postgres</> source tree I made very minor corections
|
|
||||||
to <filename>src/backend/utils/adt/varlena.c</> and
|
|
||||||
<filename>src/backend/main/main.c</> and got what I needed! I did
|
|
||||||
support only for <envar>LC_CTYPE</envar> and
|
|
||||||
<envar>LC_COLLATE</envar>, but later <envar>LC_MONETARY</envar> was
|
|
||||||
added by others. I got many messages from people about this patch
|
|
||||||
so I decided to send it to developers and (to my surprise) it was
|
|
||||||
incorporated into the <productname>Postgres</> distribution.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
People often complain that locale doesn't work for them. There are
|
|
||||||
several common mistakes:
|
|
||||||
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Didn't properly configure <productname>Postgres</> before
|
|
||||||
compilation. You must run <filename>configure</> with the
|
|
||||||
<option>--enable-locale</> option to enable locale support.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Didn't setup environment correctly when starting postmaster. You
|
|
||||||
must define environment variables <envar>LC_CTYPE</envar> and
|
|
||||||
<envar>LC_COLLATE</envar> before running postmaster because
|
|
||||||
backend gets information about locale from environment. I use
|
|
||||||
following shell script:
|
|
||||||
<programlisting>
|
|
||||||
#!/bin/sh
|
|
||||||
|
|
||||||
export LC_CTYPE=koi8-r
|
|
||||||
export LC_COLLATE=koi8-r
|
|
||||||
postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe'
|
|
||||||
</programlisting>
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Broken locale support in the operating system (for example,
|
|
||||||
locale support in libc under Linux several times has changed and
|
|
||||||
this caused a lot of problems). Perl has also support of locale
|
|
||||||
and if locale is broken <command>perl -v</> will complain
|
|
||||||
something like:
|
|
||||||
<screen>
|
|
||||||
<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
|
|
||||||
<prompt>$</> <userinput>perl -v</>
|
|
||||||
<computeroutput>
|
|
||||||
perl: warning: Setting locale failed.
|
|
||||||
perl: warning: Please check that your locale settings:
|
|
||||||
LC_ALL = (unset),
|
|
||||||
LC_CTYPE = "not_exist",
|
|
||||||
LANG = (unset)
|
|
||||||
are supported and installed on your system.
|
|
||||||
perl: warning: Falling back to the standard locale ("C").
|
|
||||||
</computeroutput>
|
|
||||||
</screen>
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Wrong location of locale files. Possible locations include:
|
|
||||||
<filename>/usr/lib/locale</filename> (Linux, Solaris),
|
|
||||||
<filename>/usr/share/locale</filename> (Linux),
|
|
||||||
<filename>/usr/lib/nls/loc</filename> (DUX 4.0).
|
|
||||||
|
|
||||||
Check <command>man locale</command> to find the correct
|
|
||||||
location. Under Linux I made a symbolic link between
|
|
||||||
<filename>/usr/lib/locale</filename> and
|
|
||||||
<filename>/usr/share/locale</filename> to be sure that the next
|
|
||||||
libc will not break my locale.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<formalpara>
|
|
||||||
<title>What are the Benefits?</title>
|
|
||||||
<para>
|
|
||||||
You can use ~* and order by operators for strings contain
|
|
||||||
characters from national alphabets. Non-english users definitely
|
|
||||||
need that.
|
|
||||||
</para>
|
|
||||||
</formalpara>
|
|
||||||
|
|
||||||
<formalpara>
|
|
||||||
<title>What are the Drawbacks?</title>
|
|
||||||
<para>
|
|
||||||
There is one evident drawback of using locale - its speed! So, use
|
|
||||||
locale only if you really need it.
|
|
||||||
</para>
|
|
||||||
</formalpara>
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
|
|
||||||
<sect1 id="postmaster-shutdown">
|
<sect1 id="postmaster-shutdown">
|
||||||
<title>Shutting down the server</title>
|
<title>Shutting down the server</title>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user