From a612b17120fc011cefcdec6948b1cc8543529d06 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 8 Mar 2011 17:10:34 -0500 Subject: [PATCH] Assorted editing for collation documentation. I made a pass over this to familiarize myself with the feature, and found some things that could be improved. --- doc/src/sgml/catalogs.sgml | 57 +++++++------ doc/src/sgml/charset.sgml | 108 +++++++++++++++---------- doc/src/sgml/ref/create_collation.sgml | 39 ++++----- doc/src/sgml/ref/drop_collation.sgml | 2 +- 4 files changed, 118 insertions(+), 88 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index cc0cbe134c..297ad53208 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1128,8 +1128,8 @@ oid pg_collation.oid - The defined collation of the column, zero if the column does - not have a collatable type. + The defined collation of the column, or zero if the column is + not of a collatable datatype. @@ -2088,7 +2088,7 @@ The catalog pg_collation describes the available collations, which are essentially mappings from an SQL name to operating system locale categories. - See for more information. + See for more information. @@ -2132,38 +2132,48 @@ collencodingint4 - - Encoding to which the collation is applicable. SQL-level - commands such as ALTER COLLATION only - operate on the collation belonging to the current database - encoding. But this field is necessary because when this - catalog is initialized, the encoding of future databases is not - yet known. For practical purposes, collations that do not - match the current database encoding should be considered - invalid or invisible. It could be useful, however, to create - collations whose encoding does not match the database encoding - in template databases. This would currently have to be done - manually. - + Encoding to which the collation is applicable collcollate name - LC_COLLATE for this collation object + LC_COLLATE for this collation object collctype name - LC_CTYPE for this collation object + LC_CTYPE for this collation object
+ + Note that the unique key on this catalog is (collname, + collencoding, collnamespace) not just + (collname, collnamespace). + PostgreSQL generally ignores all + collations not belonging to the current database's encoding; therefore + it is sufficient to use a qualified SQL name + (schema.name) to identify a collation, + even though this is not unique according to the catalog definition. + The current database's encoding is automatically used as an additional + lookup key. The reason for defining the catalog this way is that + initdb fills it in at cluster initialization time with + entries for all locales available on the system, so it must be able to + hold entries for all encodings that might ever be used in the cluster. + + + + In the template0 database, it could be useful to create + collations whose encoding does not match the database encoding, + since they could match the encodings of databases later cloned from + template0. This would currently have to be done manually. + @@ -6123,12 +6133,11 @@ pg_collation.oid typcollation specifies the collation - of the type. If a type does not support collations, this will - be zero, collation analysis at parse time is skipped, and - the use of COLLATE clauses with the type is - invalid. A base type that supports collations will have - DEFAULT_COLLATION_OID here. A domain can have - another collation OID, if one was defined for the domain. + of the type. If the type does not support collations, this will + be zero. A base type that supports collations will have + DEFAULT_COLLATION_OID here. A domain over a + collatable type can have some other collation OID, if one was defined + for the domain. diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 046c3d1416..dd96d00950 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -15,6 +15,8 @@ Using the locale features of the operating system to provide locale-specific collation order, number formatting, translated messages, and other aspects. + This is covered in and + . @@ -23,6 +25,7 @@ Providing a number of different character sets to support storing text in all kinds of languages, and providing character set translation between client and server. + This is covered in . @@ -138,9 +141,12 @@ initdb --locale=sv_SE fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE - and LC_CTYPE are these type of categories. They affect + and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on - text columns would become corrupt. The default values for these + text columns would become corrupt. + (But you can alleviate this restriction using collations, as discussed + in .) + The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command. @@ -153,7 +159,7 @@ initdb --locale=sv_SE linkend="runtime-config-client-format"> for details). The values that are chosen by initdb are actually only written into the configuration file postgresql.conf to - serve as defaults when the server is started. If you disable these + serve as defaults when the server is started. If you remove these assignments from postgresql.conf then the server will inherit the settings from its execution environment. @@ -308,17 +314,17 @@ initdb --locale=sv_SE Collation Support - The collation support allows specifying the sort order and certain - other locale aspects of data per column or per operation at run - time. This alleviates the problem that the + The collation feature allows specifying the sort order and certain + other locale aspects of data per-column, or even per-operation. + This alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of a database cannot be changed after its creation. - The collation support feature is currently only known to work on - Linux/glibc and Mac OS X platforms. + Collation support is currently only known to work on + Linux (glibc) and Mac OS X platforms. @@ -326,48 +332,51 @@ initdb --locale=sv_SE Concepts - Conceptually, every datum of a collatable data type has a - collation. (Collatable data types in the base system are + Conceptually, every expression of a collatable data type has a + collation. (The built-in collatable data types are text, varchar, and char. User-defined base types can also be marked collatable.) If the - datum is a column reference, the collation of the datum is the - defined collation of the column. If the datum is a constant, the + expression is a column reference, the collation of the expression is the + defined collation of the column. If the expression is a constant, the collation is the default collation of the data type of the - constant. The collation of more complex expressions is derived - from the input collations as described below. + constant. The collation of a more complex expression is derived + from the collations of its inputs, as described below. - The collation of a datum can also be the default - collation, which reverts to the locale settings defined for the - database. In some cases, a datum can also have no known + The collation of an expression can be the default + collation, which means the locale settings defined for the + database. In some cases, an expression can also have no known collation. In such cases, ordering operations and other operations that need to know the collation will fail. When the database system has to perform an ordering or a - comparison, it considers the collation of the input data. This - happens in two situations: an ORDER BY clause - and a function or operator call such as <. - The collation to apply for the performance of the ORDER - BY clause is simply the collation of the sort key. The - collation to apply for a function or operator call is derived from - the arguments, as described below. Additionally, collations are - taken into account by functions that convert between lower and - upper case letters, that is, lower, - upper, and initcap. + comparison, it uses the collation of the input expression. This + happens, for example, with ORDER BY clauses + and function or operator calls such as <. + The collation to apply for an ORDER BY clause + is simply the collation of the sort key. The collation to apply for a + function or operator call is derived from the arguments, as described + below. In addition to comparison operators, collations are taken into + account by functions that convert between lower and upper case + letters, such as lower, upper, and + initcap. - For a function call, the collation that is derived from combining - the argument collations is both used for performing any - comparisons or ordering and for the collation of the function - result, if the result type is collatable. + For a function or operator call, the collation that is derived by + examining the argument collations is used at run time for performing + the specified operation. If the result of the function or operator + call is of a collatable data type, the collation is also used at parse + time as the defined collation of the function or operator expression, + in case there is a surrounding expression that requires knowledge of + its collation. - The collation derivation of a datum can be + The collation derivation of an expression can be implicit or explicit. This distinction affects how collations are combined when multiple different collations appear in an expression. An explicit collation derivation arises when a @@ -379,9 +388,9 @@ initdb --locale=sv_SE - If any input item has an explicit collation derivation, then - all explicitly derived collations among the input items must be - the same, otherwise an error is raised. If an explicitly + If any input expression has an explicit collation derivation, then + all explicitly derived collations among the input expressions must be + the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination. @@ -389,8 +398,8 @@ initdb --locale=sv_SE - Otherwise, all input items must have the same implicit - collation derivation or the default collation. If an + Otherwise, all input expressions must have the same implicit + collation derivation or the default collation. If any implicitly derived collation is present, that is the result of the collation combination. Otherwise, the result is the default collation. @@ -428,19 +437,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1; A collation is an SQL schema object that maps an SQL name to operating system locales. In particular, it maps to a combination of LC_COLLATE and LC_CTYPE. (As - the name would indicate, the main purpose of a collation is to set + the name would suggest, the main purpose of a collation is to set LC_COLLATE, which controls the sort order. But it is rarely necessary in practice to have an LC_CTYPE setting that is different from LC_COLLATE, so it is more convenient to collect these under one concept than to create another infrastructure for - setting LC_CTYPE per datum.) Also, a collation - is tied to a character encoding. The same collation name may - exist for different encodings. + setting LC_CTYPE per expression.) Also, a collation + is tied to a character set encoding (see ). + The same collation name may exist for different encodings. - When a database system is initialized, initdb + When a database cluster is initialized, initdb populates the system catalog pg_collation with collations based on all the locales it finds on the operating system at the time. For example, the operating system might @@ -463,8 +472,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1; collation may be created using the command. That command can also be used to create a new collation from an existing - collation, which can be useful to be able to use operating-system - independent collation names in applications. + collation, which can be useful to be able to use + operating-system-independent collation names in applications. + + + + Within any particular database, only collations that use that + database's encoding are of interest. Other entries in + pg_collation are ignored. Thus, a stripped collation + name such as de_DE can be considered unique + within a given database even though it would not be unique globally. + Use of the stripped collation names is recommendable, since it will + make one less thing you need to change if you decide to change to + another database encoding. diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml index 9d03ca5a4e..fc79225001 100644 --- a/doc/src/sgml/ref/create_collation.sgml +++ b/doc/src/sgml/ref/create_collation.sgml @@ -21,7 +21,7 @@ CREATE COLLATION name ( [ LOCALE = locale, ] [ LC_COLLATE = lc_collate, ] - [ LC_CTYPE = lc_ctype, ] + [ LC_CTYPE = lc_ctype ] ) CREATE COLLATION name FROM existing_collation @@ -32,7 +32,8 @@ CREATE COLLATION name FROM existing_coll CREATE COLLATION defines a new collation using - the specified operating system locales or from an existing collation. + the specified operating system locale settings, + or by copying an existing collation. @@ -53,26 +54,14 @@ CREATE COLLATION name FROM existing_coll The name of the collation. The collation name can be schema-qualified. If it is not, the collation is defined in the - current schema. The collation name must be unique within a + current schema. The collation name must be unique within that schema. (The system catalogs can contain collations with the - same name for other encodings, but these are not usable if the + same name for other encodings, but these are ignored if the database encoding does not match.) - - existing_collation - - - - The name of an existing collation to copy. The new collation - will have the same properties as the existing one, but they - will become independent objects. - - - - locale @@ -80,7 +69,7 @@ CREATE COLLATION name FROM existing_coll This is a shortcut for setting LC_COLLATE and LC_CTYPE at once. If you specify this, - you cannot specify either of the other parameters. + you cannot specify either of those parameters. @@ -112,6 +101,18 @@ CREATE COLLATION name FROM existing_coll + + + existing_collation + + + + The name of an existing collation to copy. The new collation + will have the same properties as the existing one, but they + will become independent objects. + + + @@ -145,8 +146,8 @@ CREATE COLLATION french (LOCALE = 'fr_FR.utf8'); CREATE COLLATION german FROM "de_DE"; - This can be convenient to be able to use operating-system - independent collation names in applications. + This can be convenient to be able to use operating-system-independent + collation names in applications. diff --git a/doc/src/sgml/ref/drop_collation.sgml b/doc/src/sgml/ref/drop_collation.sgml index 7be9317932..0afcaaf2de 100644 --- a/doc/src/sgml/ref/drop_collation.sgml +++ b/doc/src/sgml/ref/drop_collation.sgml @@ -94,7 +94,7 @@ DROP COLLATION german; The DROP COLLATION command conforms to the SQL standard, apart from the IF - EXISTS option, which is a PostgreSQL extension.. + EXISTS option, which is a PostgreSQL extension.