Document collation handling in SQL and plpgsql functions.

This is pretty minimal but covers the bare facts.
This commit is contained in:
Tom Lane 2011-03-25 18:21:25 -04:00
parent a4425e3200
commit 9b19c12e1d
3 changed files with 132 additions and 0 deletions

View File

@ -316,6 +316,8 @@ initdb --locale=sv_SE
<sect1 id="collation">
<title>Collation Support</title>
<indexterm zone="collation"><primary>collation</></>
<para>
The collation feature allows specifying the sort order and certain
other locale aspects of data per-column, or even per-operation.

View File

@ -706,6 +706,81 @@ SELECT merge_fields(t.*) FROM table1 t WHERE ... ;
structure on-the-fly.
</para>
</sect2>
<sect2 id="plpgsql-declaration-collation">
<title>Collation of <application>PL/pgSQL</application> Variables</title>
<indexterm>
<primary>collation</>
<secondary>in PL/pgSQL</>
</indexterm>
<para>
When a <application>PL/pgSQL</application> function has one or more
parameters of collatable data types, a collation is identified for each
function call depending on the collations assigned to the actual
arguments, as described in <xref linkend="collation">. If a collation is
successfully identified (i.e., there are no conflicts of implicit
collations among the arguments) then all the collatable parameters are
treated as having that collation implicitly. This will affect the
behavior of collation-sensitive operations within the function.
For example, consider
<programlisting>
CREATE FUNCTION less_than(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN a &lt; b;
END;
$$ LANGUAGE plpgsql;
SELECT less_than(text_field_1, text_field_2) FROM table1;
SELECT less_than(text_field_1, text_field_2 COLLATE "C") FROM table1;
</programlisting>
The first use of <function>less_than</> will use the common collation
of <structfield>text_field_1</> and <structfield>text_field_2</> for
the comparison, while the second use will use <literal>C</> collation.
</para>
<para>
Furthermore, the identified collation is also assumed as the collation of
any local variables that are of collatable types. Thus this function
would not work any differently if it were written as
<programlisting>
CREATE FUNCTION less_than(a text, b text) RETURNS boolean AS $$
DECLARE
local_a text := a;
local_b text := b;
BEGIN
RETURN local_a &lt; local_b;
END;
$$ LANGUAGE plpgsql;
</programlisting>
</para>
<para>
If there are no parameters of collatable data types, or no common
collation can be identified for them, then parameters and local variables
use the default collation of their data type (which is usually the
database's default collation, but could be different for variables of
domain types).
</para>
<para>
Explicit <literal>COLLATE</> clauses can be written inside a function
if it is desired to force a particular collation to be used regardless
of what the function is called with. For example,
<programlisting>
CREATE FUNCTION less_than_c(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN a &lt; b COLLATE "C";
END;
$$ LANGUAGE plpgsql;
</programlisting>
</para>
</sect2>
</sect1>
<sect1 id="plpgsql-expressions">

View File

@ -1100,6 +1100,61 @@ SELECT concat_values('|', 1, 4, 2);
</screen>
</para>
</sect2>
<sect2>
<title><acronym>SQL</acronym> Functions with Collations</title>
<indexterm>
<primary>collation</>
<secondary>in SQL functions</>
</indexterm>
<para>
When a SQL function has one or more parameters of collatable data types,
a collation is identified for each function call depending on the
collations assigned to the actual arguments, as described in <xref
linkend="collation">. If a collation is successfully identified
(i.e., there are no conflicts of implicit collations among the arguments)
then all the collatable parameters are treated as having that collation
implicitly. This will affect the behavior of collation-sensitive
operations within the function. For example, using the
<function>anyleast</> function described above, the result of
<programlisting>
SELECT anyleast('abc'::text, 'ABC');
</programlisting>
will depend on the database's default collation. In <literal>C</> locale
the result will be <literal>ABC</>, but in many other locales it will
be <literal>abc</>. The collation to use can be forced by adding
a <literal>COLLATE</> clause to any of the arguments, for example
<programlisting>
SELECT anyleast('abc'::text, 'ABC' COLLATE "C");
</programlisting>
Alternatively, if you wish a function to operate with a particular
collation regardless of what it is called with, insert
<literal>COLLATE</> clauses as needed in the function definition.
This version of <function>anyleast</> would always use <literal>en_US</>
locale to compare strings:
<programlisting>
CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$
SELECT min($1[i] COLLATE "en_US") FROM generate_subscripts($1, 1) g(i);
$$ LANGUAGE SQL;
</programlisting>
But note that this will throw an error if applied to a non-collatable
data type.
</para>
<para>
If no common collation can be identified among the actual arguments,
then a SQL function treats its parameters as having their data types'
default collation (which is usually the database's default collation,
but could be different for parameters of domain types).
</para>
<para>
The behavior of collatable parameters can be thought of as a limited
form of polymorphism, applicable only to textual data types.
</para>
</sect2>
</sect1>
<sect1 id="xfunc-overload">