mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-01-12 18:34:36 +08:00
1788 lines
63 KiB
Plaintext
1788 lines
63 KiB
Plaintext
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.53 2009/02/07 20:11:16 momjian Exp $ -->
|
|
|
|
<chapter id="queries">
|
|
<title>Queries</title>
|
|
|
|
<indexterm zone="queries">
|
|
<primary>query</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="queries">
|
|
<primary>SELECT</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The previous chapters explained how to create tables, how to fill
|
|
them with data, and how to manipulate that data. Now we finally
|
|
discuss how to retrieve the data out of the database.
|
|
</para>
|
|
|
|
|
|
<sect1 id="queries-overview">
|
|
<title>Overview</title>
|
|
|
|
<para>
|
|
The process of retrieving or the command to retrieve data from a
|
|
database is called a <firstterm>query</firstterm>. In SQL the
|
|
<xref linkend="sql-select" endterm="sql-select-title"> command is
|
|
used to specify queries. The general syntax of the
|
|
<command>SELECT</command> command is
|
|
<synopsis>
|
|
<optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
|
|
</synopsis>
|
|
The following sections describe the details of the select list, the
|
|
table expression, and the sort specification. <literal>WITH</>
|
|
queries are treated last since they are an advanced feature.
|
|
</para>
|
|
|
|
<para>
|
|
A simple kind of query has the form:
|
|
<programlisting>
|
|
SELECT * FROM table1;
|
|
</programlisting>
|
|
Assuming that there is a table called <literal>table1</literal>,
|
|
this command would retrieve all rows and all columns from
|
|
<literal>table1</literal>. (The method of retrieval depends on the
|
|
client application. For example, the
|
|
<application>psql</application> program will display an ASCII-art
|
|
table on the screen, while client libraries will offer functions to
|
|
extract individual values from the query result.) The select list
|
|
specification <literal>*</literal> means all columns that the table
|
|
expression happens to provide. A select list can also select a
|
|
subset of the available columns or make calculations using the
|
|
columns. For example, if
|
|
<literal>table1</literal> has columns named <literal>a</>,
|
|
<literal>b</>, and <literal>c</> (and perhaps others) you can make
|
|
the following query:
|
|
<programlisting>
|
|
SELECT a, b + c FROM table1;
|
|
</programlisting>
|
|
(assuming that <literal>b</> and <literal>c</> are of a numerical
|
|
data type).
|
|
See <xref linkend="queries-select-lists"> for more details.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>FROM table1</literal> is a particularly simple kind of
|
|
table expression: it reads just one table. In general, table
|
|
expressions can be complex constructs of base tables, joins, and
|
|
subqueries. But you can also omit the table expression entirely and
|
|
use the <command>SELECT</command> command as a calculator:
|
|
<programlisting>
|
|
SELECT 3 * 4;
|
|
</programlisting>
|
|
This is more useful if the expressions in the select list return
|
|
varying results. For example, you could call a function this way:
|
|
<programlisting>
|
|
SELECT random();
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-table-expressions">
|
|
<title>Table Expressions</title>
|
|
|
|
<indexterm zone="queries-table-expressions">
|
|
<primary>table expression</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A <firstterm>table expression</firstterm> computes a table. The
|
|
table expression contains a <literal>FROM</> clause that is
|
|
optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
|
|
<literal>HAVING</> clauses. Trivial table expressions simply refer
|
|
to a table on disk, a so-called base table, but more complex
|
|
expressions can be used to modify or combine base tables in various
|
|
ways.
|
|
</para>
|
|
|
|
<para>
|
|
The optional <literal>WHERE</>, <literal>GROUP BY</>, and
|
|
<literal>HAVING</> clauses in the table expression specify a
|
|
pipeline of successive transformations performed on the table
|
|
derived in the <literal>FROM</> clause. All these transformations
|
|
produce a virtual table that provides the rows that are passed to
|
|
the select list to compute the output rows of the query.
|
|
</para>
|
|
|
|
<sect2 id="queries-from">
|
|
<title>The <literal>FROM</literal> Clause</title>
|
|
|
|
<para>
|
|
The <xref linkend="sql-from" endterm="sql-from-title"> derives a
|
|
table from one or more other tables given in a comma-separated
|
|
table reference list.
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
|
|
</synopsis>
|
|
|
|
A table reference can be a table name (possibly schema-qualified),
|
|
or a derived table such as a subquery, a table join, or complex
|
|
combinations of these. If more than one table reference is listed
|
|
in the <literal>FROM</> clause they are cross-joined (see below)
|
|
to form the intermediate virtual table that can then be subject to
|
|
transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
|
|
and <literal>HAVING</> clauses and is finally the result of the
|
|
overall table expression.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>ONLY</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
When a table reference names a table that is the parent of a
|
|
table inheritance hierarchy, the table reference produces rows of
|
|
not only that table but all of its descendant tables, unless the
|
|
key word <literal>ONLY</> precedes the table name. However, the
|
|
reference produces only the columns that appear in the named table
|
|
— any columns added in subtables are ignored.
|
|
</para>
|
|
|
|
<sect3 id="queries-join">
|
|
<title>Joined Tables</title>
|
|
|
|
<indexterm zone="queries-join">
|
|
<primary>join</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A joined table is a table derived from two other (real or
|
|
derived) tables according to the rules of the particular join
|
|
type. Inner, outer, and cross-joins are available.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<title>Join Types</title>
|
|
|
|
<varlistentry>
|
|
<term>Cross join</term>
|
|
|
|
<indexterm>
|
|
<primary>join</primary>
|
|
<secondary>cross</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>cross join</primary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<synopsis>
|
|
<replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
|
|
</synopsis>
|
|
|
|
<para>
|
|
For each combination of rows from
|
|
<replaceable>T1</replaceable> and
|
|
<replaceable>T2</replaceable>, the derived table will contain a
|
|
row consisting of all columns in <replaceable>T1</replaceable>
|
|
followed by all columns in <replaceable>T2</replaceable>. If
|
|
the tables have N and M rows respectively, the joined
|
|
table will have N * M rows.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
|
|
<replaceable>T2</replaceable></literal> is equivalent to
|
|
<literal>FROM <replaceable>T1</replaceable>,
|
|
<replaceable>T2</replaceable></literal>. It is also equivalent to
|
|
<literal>FROM <replaceable>T1</replaceable> INNER JOIN
|
|
<replaceable>T2</replaceable> ON TRUE</literal> (see below).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Qualified joins</term>
|
|
|
|
<indexterm>
|
|
<primary>join</primary>
|
|
<secondary>outer</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>outer join</primary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<synopsis>
|
|
<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
|
|
<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
|
|
<replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
|
|
</synopsis>
|
|
|
|
<para>
|
|
The words <literal>INNER</literal> and
|
|
<literal>OUTER</literal> are optional in all forms.
|
|
<literal>INNER</literal> is the default;
|
|
<literal>LEFT</literal>, <literal>RIGHT</literal>, and
|
|
<literal>FULL</literal> imply an outer join.
|
|
</para>
|
|
|
|
<para>
|
|
The <firstterm>join condition</firstterm> is specified in the
|
|
<literal>ON</> or <literal>USING</> clause, or implicitly by
|
|
the word <literal>NATURAL</>. The join condition determines
|
|
which rows from the two source tables are considered to
|
|
<quote>match</quote>, as explained in detail below.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>ON</> clause is the most general kind of join
|
|
condition: it takes a Boolean value expression of the same
|
|
kind as is used in a <literal>WHERE</> clause. A pair of rows
|
|
from <replaceable>T1</> and <replaceable>T2</> match if the
|
|
<literal>ON</> expression evaluates to true for them.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>USING</> is a shorthand notation: it takes a
|
|
comma-separated list of column names, which the joined tables
|
|
must have in common, and forms a join condition specifying
|
|
equality of each of these pairs of columns. Furthermore, the
|
|
output of a <literal>JOIN USING</> has one column for each of
|
|
the equated pairs of input columns, followed by all of the
|
|
other columns from each table. Thus, <literal>USING (a, b,
|
|
c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
|
|
t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
|
|
if <literal>ON</> is used there will be two columns
|
|
<literal>a</>, <literal>b</>, and <literal>c</> in the result,
|
|
whereas with <literal>USING</> there will be only one of each.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>join</primary>
|
|
<secondary>natural</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>natural join</primary>
|
|
</indexterm>
|
|
Finally, <literal>NATURAL</> is a shorthand form of
|
|
<literal>USING</>: it forms a <literal>USING</> list
|
|
consisting of exactly those column names that appear in both
|
|
input tables. As with <literal>USING</>, these columns appear
|
|
only once in the output table.
|
|
</para>
|
|
|
|
<para>
|
|
The possible types of qualified join are:
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal>INNER JOIN</></term>
|
|
|
|
<listitem>
|
|
<para>
|
|
For each row R1 of T1, the joined table has a row for each
|
|
row in T2 that satisfies the join condition with R1.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>LEFT OUTER JOIN</></term>
|
|
|
|
<indexterm>
|
|
<primary>join</primary>
|
|
<secondary>left</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>left join</primary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<para>
|
|
First, an inner join is performed. Then, for each row in
|
|
T1 that does not satisfy the join condition with any row in
|
|
T2, a joined row is added with null values in columns of
|
|
T2. Thus, the joined table unconditionally has at least
|
|
one row for each row in T1.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>RIGHT OUTER JOIN</></term>
|
|
|
|
<indexterm>
|
|
<primary>join</primary>
|
|
<secondary>right</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>right join</primary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<para>
|
|
First, an inner join is performed. Then, for each row in
|
|
T2 that does not satisfy the join condition with any row in
|
|
T1, a joined row is added with null values in columns of
|
|
T1. This is the converse of a left join: the result table
|
|
will unconditionally have a row for each row in T2.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>FULL OUTER JOIN</></term>
|
|
|
|
<listitem>
|
|
<para>
|
|
First, an inner join is performed. Then, for each row in
|
|
T1 that does not satisfy the join condition with any row in
|
|
T2, a joined row is added with null values in columns of
|
|
T2. Also, for each row of T2 that does not satisfy the
|
|
join condition with any row in T1, a joined row with null
|
|
values in the columns of T1 is added.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>
|
|
Joins of all types can be chained together or nested: either or
|
|
both of <replaceable>T1</replaceable> and
|
|
<replaceable>T2</replaceable> might be joined tables. Parentheses
|
|
can be used around <literal>JOIN</> clauses to control the join
|
|
order. In the absence of parentheses, <literal>JOIN</> clauses
|
|
nest left-to-right.
|
|
</para>
|
|
|
|
<para>
|
|
To put this together, assume we have tables <literal>t1</literal>:
|
|
<programlisting>
|
|
num | name
|
|
-----+------
|
|
1 | a
|
|
2 | b
|
|
3 | c
|
|
</programlisting>
|
|
and <literal>t2</literal>:
|
|
<programlisting>
|
|
num | value
|
|
-----+-------
|
|
1 | xxx
|
|
3 | yyy
|
|
5 | zzz
|
|
</programlisting>
|
|
then we get the following results for the various joins:
|
|
<screen>
|
|
<prompt>=></> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
|
|
num | name | num | value
|
|
-----+------+-----+-------
|
|
1 | a | 1 | xxx
|
|
1 | a | 3 | yyy
|
|
1 | a | 5 | zzz
|
|
2 | b | 1 | xxx
|
|
2 | b | 3 | yyy
|
|
2 | b | 5 | zzz
|
|
3 | c | 1 | xxx
|
|
3 | c | 3 | yyy
|
|
3 | c | 5 | zzz
|
|
(9 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
|
|
num | name | num | value
|
|
-----+------+-----+-------
|
|
1 | a | 1 | xxx
|
|
3 | c | 3 | yyy
|
|
(2 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
|
|
num | name | value
|
|
-----+------+-------
|
|
1 | a | xxx
|
|
3 | c | yyy
|
|
(2 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
|
|
num | name | value
|
|
-----+------+-------
|
|
1 | a | xxx
|
|
3 | c | yyy
|
|
(2 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
|
|
num | name | num | value
|
|
-----+------+-----+-------
|
|
1 | a | 1 | xxx
|
|
2 | b | |
|
|
3 | c | 3 | yyy
|
|
(3 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
|
|
num | name | value
|
|
-----+------+-------
|
|
1 | a | xxx
|
|
2 | b |
|
|
3 | c | yyy
|
|
(3 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
|
|
num | name | num | value
|
|
-----+------+-----+-------
|
|
1 | a | 1 | xxx
|
|
3 | c | 3 | yyy
|
|
| | 5 | zzz
|
|
(3 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
|
|
num | name | num | value
|
|
-----+------+-----+-------
|
|
1 | a | 1 | xxx
|
|
2 | b | |
|
|
3 | c | 3 | yyy
|
|
| | 5 | zzz
|
|
(4 rows)
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
The join condition specified with <literal>ON</> can also contain
|
|
conditions that do not relate directly to the join. This can
|
|
prove useful for some queries but needs to be thought out
|
|
carefully. For example:
|
|
<screen>
|
|
<prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
|
|
num | name | num | value
|
|
-----+------+-----+-------
|
|
1 | a | 1 | xxx
|
|
2 | b | |
|
|
3 | c | |
|
|
(3 rows)
|
|
</screen>
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="queries-table-aliases">
|
|
<title>Table and Column Aliases</title>
|
|
|
|
<indexterm zone="queries-table-aliases">
|
|
<primary>alias</primary>
|
|
<secondary>in the FROM clause</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>label</primary>
|
|
<see>alias</see>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A temporary name can be given to tables and complex table
|
|
references to be used for references to the derived table in
|
|
the rest of the query. This is called a <firstterm>table
|
|
alias</firstterm>.
|
|
</para>
|
|
|
|
<para>
|
|
To create a table alias, write
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
|
|
</synopsis>
|
|
or
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
|
|
</synopsis>
|
|
The <literal>AS</literal> key word is optional noise.
|
|
<replaceable>alias</replaceable> can be any identifier.
|
|
</para>
|
|
|
|
<para>
|
|
A typical application of table aliases is to assign short
|
|
identifiers to long table names to keep the join clauses
|
|
readable. For example:
|
|
<programlisting>
|
|
SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The alias becomes the new name of the table reference for the
|
|
current query — it is no longer possible to refer to the table
|
|
by the original name. Thus:
|
|
<programlisting>
|
|
SELECT * FROM my_table AS m WHERE my_table.a > 5;
|
|
</programlisting>
|
|
is not valid according to the SQL standard. In
|
|
<productname>PostgreSQL</productname> this will draw an error if the
|
|
<xref linkend="guc-add-missing-from"> configuration variable is
|
|
<literal>off</> (as it is by default). If it is <literal>on</>,
|
|
an implicit table reference will be added to the
|
|
<literal>FROM</literal> clause, so the query is processed as if
|
|
it were written as:
|
|
<programlisting>
|
|
SELECT * FROM my_table AS m, my_table AS my_table WHERE my_table.a > 5;
|
|
</programlisting>
|
|
That will result in a cross join, which is usually not what you want.
|
|
</para>
|
|
|
|
<para>
|
|
Table aliases are mainly for notational convenience, but it is
|
|
necessary to use them when joining a table to itself, e.g.:
|
|
<programlisting>
|
|
SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
|
|
</programlisting>
|
|
Additionally, an alias is required if the table reference is a
|
|
subquery (see <xref linkend="queries-subqueries">).
|
|
</para>
|
|
|
|
<para>
|
|
Parentheses are used to resolve ambiguities. In the following example,
|
|
the first statement assigns the alias <literal>b</literal> to the second
|
|
instance of <literal>my_table</>, but the second statement assigns the
|
|
alias to the result of the join:
|
|
<programlisting>
|
|
SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
|
|
SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Another form of table aliasing gives temporary names to the columns of
|
|
the table, as well as the table itself:
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
|
|
</synopsis>
|
|
If fewer column aliases are specified than the actual table has
|
|
columns, the remaining columns are not renamed. This syntax is
|
|
especially useful for self-joins or subqueries.
|
|
</para>
|
|
|
|
<para>
|
|
When an alias is applied to the output of a <literal>JOIN</>
|
|
clause, using any of these forms, the alias hides the original
|
|
names within the <literal>JOIN</>. For example:
|
|
<programlisting>
|
|
SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
|
|
</programlisting>
|
|
is valid SQL, but:
|
|
<programlisting>
|
|
SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
|
|
</programlisting>
|
|
is not valid: the table alias <literal>a</> is not visible
|
|
outside the alias <literal>c</>.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="queries-subqueries">
|
|
<title>Subqueries</title>
|
|
|
|
<indexterm zone="queries-subqueries">
|
|
<primary>subquery</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Subqueries specifying a derived table must be enclosed in
|
|
parentheses and <emphasis>must</emphasis> be assigned a table
|
|
alias name. (See <xref linkend="queries-table-aliases">.) For
|
|
example:
|
|
<programlisting>
|
|
FROM (SELECT * FROM table1) AS alias_name
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
This example is equivalent to <literal>FROM table1 AS
|
|
alias_name</literal>. More interesting cases, which cannot be
|
|
reduced to a plain join, arise when the subquery involves
|
|
grouping or aggregation.
|
|
</para>
|
|
|
|
<para>
|
|
A subquery can also be a <command>VALUES</> list:
|
|
<programlisting>
|
|
FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
|
|
AS names(first, last)
|
|
</programlisting>
|
|
Again, a table alias is required. Assigning alias names to the columns
|
|
of the <command>VALUES</> list is optional, but is good practice.
|
|
For more information see <xref linkend="queries-values">.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="queries-tablefunctions">
|
|
<title>Table Functions</title>
|
|
|
|
<indexterm zone="queries-tablefunctions"><primary>table function</></>
|
|
|
|
<indexterm zone="queries-tablefunctions">
|
|
<primary>function</>
|
|
<secondary>in the FROM clause</>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Table functions are functions that produce a set of rows, made up
|
|
of either base data types (scalar types) or composite data types
|
|
(table rows). They are used like a table, view, or subquery in
|
|
the <literal>FROM</> clause of a query. Columns returned by table
|
|
functions can be included in <literal>SELECT</>,
|
|
<literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
|
|
as a table, view, or subquery column.
|
|
</para>
|
|
|
|
<para>
|
|
If a table function returns a base data type, the single result
|
|
column is named like the function. If the function returns a
|
|
composite type, the result columns get the same names as the
|
|
individual attributes of the type.
|
|
</para>
|
|
|
|
<para>
|
|
A table function can be aliased in the <literal>FROM</> clause,
|
|
but it also can be left unaliased. If a function is used in the
|
|
<literal>FROM</> clause with no alias, the function name is used
|
|
as the resulting table name.
|
|
</para>
|
|
|
|
<para>
|
|
Some examples:
|
|
<programlisting>
|
|
CREATE TABLE foo (fooid int, foosubid int, fooname text);
|
|
|
|
CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
|
|
SELECT * FROM foo WHERE fooid = $1;
|
|
$$ LANGUAGE SQL;
|
|
|
|
SELECT * FROM getfoo(1) AS t1;
|
|
|
|
SELECT * FROM foo
|
|
WHERE foosubid IN (select foosubid from getfoo(foo.fooid) z
|
|
where z.fooid = foo.fooid);
|
|
|
|
CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
|
|
|
|
SELECT * FROM vw_getfoo;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
In some cases it is useful to define table functions that can
|
|
return different column sets depending on how they are invoked.
|
|
To support this, the table function can be declared as returning
|
|
the pseudotype <type>record</>. When such a function is used in
|
|
a query, the expected row structure must be specified in the
|
|
query itself, so that the system can know how to parse and plan
|
|
the query. Consider this example:
|
|
<programlisting>
|
|
SELECT *
|
|
FROM dblink('dbname=mydb', 'select proname, prosrc from pg_proc')
|
|
AS t1(proname name, prosrc text)
|
|
WHERE proname LIKE 'bytea%';
|
|
</programlisting>
|
|
The <literal>dblink</> function executes a remote query (see
|
|
<filename>contrib/dblink</>). It is declared to return
|
|
<type>record</> since it might be used for any kind of query.
|
|
The actual column set must be specified in the calling query so
|
|
that the parser knows, for example, what <literal>*</> should
|
|
expand to.
|
|
</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2 id="queries-where">
|
|
<title>The <literal>WHERE</literal> Clause</title>
|
|
|
|
<indexterm zone="queries-where">
|
|
<primary>WHERE</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The syntax of the <xref linkend="sql-where"
|
|
endterm="sql-where-title"> is
|
|
<synopsis>
|
|
WHERE <replaceable>search_condition</replaceable>
|
|
</synopsis>
|
|
where <replaceable>search_condition</replaceable> is any value
|
|
expression (see <xref linkend="sql-expressions">) that
|
|
returns a value of type <type>boolean</type>.
|
|
</para>
|
|
|
|
<para>
|
|
After the processing of the <literal>FROM</> clause is done, each
|
|
row of the derived virtual table is checked against the search
|
|
condition. If the result of the condition is true, the row is
|
|
kept in the output table, otherwise (that is, if the result is
|
|
false or null) it is discarded. The search condition typically
|
|
references at least some column of the table generated in the
|
|
<literal>FROM</> clause; this is not required, but otherwise the
|
|
<literal>WHERE</> clause will be fairly useless.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The join condition of an inner join can be written either in
|
|
the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
|
|
For example, these table expressions are equivalent:
|
|
<programlisting>
|
|
FROM a, b WHERE a.id = b.id AND b.val > 5
|
|
</programlisting>
|
|
and:
|
|
<programlisting>
|
|
FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5
|
|
</programlisting>
|
|
or perhaps even:
|
|
<programlisting>
|
|
FROM a NATURAL JOIN b WHERE b.val > 5
|
|
</programlisting>
|
|
Which one of these you use is mainly a matter of style. The
|
|
<literal>JOIN</> syntax in the <literal>FROM</> clause is
|
|
probably not as portable to other SQL database management systems. For
|
|
outer joins there is no choice in any case: they must be done in
|
|
the <literal>FROM</> clause. An <literal>ON</>/<literal>USING</>
|
|
clause of an outer join is <emphasis>not</> equivalent to a
|
|
<literal>WHERE</> condition, because it determines the addition
|
|
of rows (for unmatched input rows) as well as the removal of rows
|
|
from the final result.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
Here are some examples of <literal>WHERE</literal> clauses:
|
|
<programlisting>
|
|
SELECT ... FROM fdt WHERE c1 > 5
|
|
|
|
SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
|
|
|
|
SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
|
|
|
|
SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
|
|
|
|
SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
|
|
|
|
SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)
|
|
</programlisting>
|
|
<literal>fdt</literal> is the table derived in the
|
|
<literal>FROM</> clause. Rows that do not meet the search
|
|
condition of the <literal>WHERE</> clause are eliminated from
|
|
<literal>fdt</literal>. Notice the use of scalar subqueries as
|
|
value expressions. Just like any other query, the subqueries can
|
|
employ complex table expressions. Notice also how
|
|
<literal>fdt</literal> is referenced in the subqueries.
|
|
Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
|
|
if <literal>c1</> is also the name of a column in the derived
|
|
input table of the subquery. But qualifying the column name adds
|
|
clarity even when it is not needed. This example shows how the column
|
|
naming scope of an outer query extends into its inner queries.
|
|
</para>
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="queries-group">
|
|
<title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
|
|
|
|
<indexterm zone="queries-group">
|
|
<primary>GROUP BY</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="queries-group">
|
|
<primary>grouping</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After passing the <literal>WHERE</> filter, the derived input
|
|
table might be subject to grouping, using the <literal>GROUP BY</>
|
|
clause, and elimination of group rows using the <literal>HAVING</>
|
|
clause.
|
|
</para>
|
|
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable>
|
|
FROM ...
|
|
<optional>WHERE ...</optional>
|
|
GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
|
|
</synopsis>
|
|
|
|
<para>
|
|
The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
|
|
used to group together those rows in a table that share the same
|
|
values in all the columns listed. The order in which the columns
|
|
are listed does not matter. The effect is to combine each set
|
|
of rows sharing common values into one group row that is
|
|
representative of all rows in the group. This is done to
|
|
eliminate redundancy in the output and/or compute aggregates that
|
|
apply to these groups. For instance:
|
|
<screen>
|
|
<prompt>=></> <userinput>SELECT * FROM test1;</>
|
|
x | y
|
|
---+---
|
|
a | 3
|
|
c | 2
|
|
b | 5
|
|
a | 1
|
|
(4 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT x FROM test1 GROUP BY x;</>
|
|
x
|
|
---
|
|
a
|
|
b
|
|
c
|
|
(3 rows)
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
In the second query, we could not have written <literal>SELECT *
|
|
FROM test1 GROUP BY x</literal>, because there is no single value
|
|
for the column <literal>y</> that could be associated with each
|
|
group. The grouped-by columns can be referenced in the select list since
|
|
they have a single value in each group.
|
|
</para>
|
|
|
|
<para>
|
|
In general, if a table is grouped, columns that are not
|
|
used in the grouping cannot be referenced except in aggregate
|
|
expressions. An example with aggregate expressions is:
|
|
<screen>
|
|
<prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
|
|
x | sum
|
|
---+-----
|
|
a | 4
|
|
b | 5
|
|
c | 2
|
|
(3 rows)
|
|
</screen>
|
|
Here <literal>sum</literal> is an aggregate function that
|
|
computes a single value over the entire group. More information
|
|
about the available aggregate functions can be found in <xref
|
|
linkend="functions-aggregate">.
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
Grouping without aggregate expressions effectively calculates the
|
|
set of distinct values in a column. This can also be achieved
|
|
using the <literal>DISTINCT</> clause (see <xref
|
|
linkend="queries-distinct">).
|
|
</para>
|
|
</tip>
|
|
|
|
<para>
|
|
Here is another example: it calculates the total sales for each
|
|
product (rather than the total sales on all products):
|
|
<programlisting>
|
|
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
|
|
FROM products p LEFT JOIN sales s USING (product_id)
|
|
GROUP BY product_id, p.name, p.price;
|
|
</programlisting>
|
|
In this example, the columns <literal>product_id</literal>,
|
|
<literal>p.name</literal>, and <literal>p.price</literal> must be
|
|
in the <literal>GROUP BY</> clause since they are referenced in
|
|
the query select list. (Depending on how exactly the products
|
|
table is set up, name and price might be fully dependent on the
|
|
product ID, so the additional groupings could theoretically be
|
|
unnecessary, but this is not implemented yet.) The column
|
|
<literal>s.units</> does not have to be in the <literal>GROUP
|
|
BY</> list since it is only used in an aggregate expression
|
|
(<literal>sum(...)</literal>), which represents the sales
|
|
of a product. For each product, the query returns a summary row about
|
|
all sales of the product.
|
|
</para>
|
|
|
|
<para>
|
|
In strict SQL, <literal>GROUP BY</> can only group by columns of
|
|
the source table but <productname>PostgreSQL</productname> extends
|
|
this to also allow <literal>GROUP BY</> to group by columns in the
|
|
select list. Grouping by value expressions instead of simple
|
|
column names is also allowed.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>HAVING</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
If a table has been grouped using a <literal>GROUP BY</literal>
|
|
clause, but then only certain groups are of interest, the
|
|
<literal>HAVING</literal> clause can be used, much like a
|
|
<literal>WHERE</> clause, to eliminate groups from a grouped
|
|
table. The syntax is:
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
|
|
</synopsis>
|
|
Expressions in the <literal>HAVING</> clause can refer both to
|
|
grouped expressions and to ungrouped expressions (which necessarily
|
|
involve an aggregate function).
|
|
</para>
|
|
|
|
<para>
|
|
Example:
|
|
<screen>
|
|
<prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;</>
|
|
x | sum
|
|
---+-----
|
|
a | 4
|
|
b | 5
|
|
(2 rows)
|
|
|
|
<prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';</>
|
|
x | sum
|
|
---+-----
|
|
a | 4
|
|
b | 5
|
|
(2 rows)
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
Again, a more realistic example:
|
|
<programlisting>
|
|
SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
|
|
FROM products p LEFT JOIN sales s USING (product_id)
|
|
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
|
|
GROUP BY product_id, p.name, p.price, p.cost
|
|
HAVING sum(p.price * s.units) > 5000;
|
|
</programlisting>
|
|
In the example above, the <literal>WHERE</> clause is selecting
|
|
rows by a column that is not grouped (the expression is only true for
|
|
sales during the last four weeks), while the <literal>HAVING</>
|
|
clause restricts the output to groups with total gross sales over
|
|
5000. Note that the aggregate expressions do not necessarily need
|
|
to be the same in all parts of the query.
|
|
</para>
|
|
|
|
<para>
|
|
If a query contains aggregate function calls, but no <literal>GROUP BY</>
|
|
clause, grouping still occurs: the result is a single group row (or
|
|
perhaps no rows at all, if the single row is then eliminated by
|
|
<literal>HAVING</>).
|
|
The same is true if it contains a <literal>HAVING</> clause, even
|
|
without any aggregate function calls or <literal>GROUP BY</> clause.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="queries-window">
|
|
<title>Window Function Processing</>
|
|
|
|
<indexterm zone="queries-window">
|
|
<primary>window function</primary>
|
|
<secondary>order of execution</>
|
|
</indexterm>
|
|
|
|
<para>
|
|
If the query contains any window functions (see
|
|
<xref linkend="tutorial-window"> and
|
|
<xref linkend="syntax-window-functions">), these functions are evaluated
|
|
after any grouping, aggregation, and <literal>HAVING</> filtering is
|
|
performed. That is, if the query uses any aggregates, <literal>GROUP
|
|
BY</>, or <literal>HAVING</>, then the rows seen by the window functions
|
|
are the group rows instead of the original table rows from
|
|
<literal>FROM</>/<literal>WHERE</>.
|
|
</para>
|
|
|
|
<para>
|
|
When multiple window functions are used, all the window functions having
|
|
syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
|
|
clauses in their window definitions are guaranteed to be evaluated in a
|
|
single pass over the data. Therefore they will see the same sort ordering,
|
|
even if the <literal>ORDER BY</> does not uniquely determine an ordering.
|
|
However, no guarantees are made about the evaluation of functions having
|
|
different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
|
|
(In such cases a sort step is typically required between the passes of
|
|
window function evaluations, and the sort is not guaranteed to preserve
|
|
ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
|
|
</para>
|
|
|
|
<para>
|
|
Currently, window functions always require presorted data, and so the
|
|
query output will be ordered according to one or another of the window
|
|
functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
|
|
It is not recommendable to rely on this, however. Use an explicit
|
|
top-level <literal>ORDER BY</> clause if you want to be sure the
|
|
results are sorted in a particular way.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-select-lists">
|
|
<title>Select Lists</title>
|
|
|
|
<indexterm>
|
|
<primary>SELECT</primary>
|
|
<secondary>select list</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
As shown in the previous section,
|
|
the table expression in the <command>SELECT</command> command
|
|
constructs an intermediate virtual table by possibly combining
|
|
tables, views, eliminating rows, grouping, etc. This table is
|
|
finally passed on to processing by the <firstterm>select list</firstterm>. The select
|
|
list determines which <emphasis>columns</emphasis> of the
|
|
intermediate table are actually output.
|
|
</para>
|
|
|
|
<sect2 id="queries-select-list-items">
|
|
<title>Select-List Items</title>
|
|
|
|
<indexterm>
|
|
<primary>*</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The simplest kind of select list is <literal>*</literal> which
|
|
emits all columns that the table expression produces. Otherwise,
|
|
a select list is a comma-separated list of value expressions (as
|
|
defined in <xref linkend="sql-expressions">). For instance, it
|
|
could be a list of column names:
|
|
<programlisting>
|
|
SELECT a, b, c FROM ...
|
|
</programlisting>
|
|
The columns names <literal>a</>, <literal>b</>, and <literal>c</>
|
|
are either the actual names of the columns of tables referenced
|
|
in the <literal>FROM</> clause, or the aliases given to them as
|
|
explained in <xref linkend="queries-table-aliases">. The name
|
|
space available in the select list is the same as in the
|
|
<literal>WHERE</> clause, unless grouping is used, in which case
|
|
it is the same as in the <literal>HAVING</> clause.
|
|
</para>
|
|
|
|
<para>
|
|
If more than one table has a column of the same name, the table
|
|
name must also be given, as in:
|
|
<programlisting>
|
|
SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
|
|
</programlisting>
|
|
When working with multiple tables, it can also be useful to ask for
|
|
all the columns of a particular table:
|
|
<programlisting>
|
|
SELECT tbl1.*, tbl2.a FROM ...
|
|
</programlisting>
|
|
(See also <xref linkend="queries-where">.)
|
|
</para>
|
|
|
|
<para>
|
|
If an arbitrary value expression is used in the select list, it
|
|
conceptually adds a new virtual column to the returned table. The
|
|
value expression is evaluated once for each result row, with
|
|
the row's values substituted for any column references. But the
|
|
expressions in the select list do not have to reference any
|
|
columns in the table expression of the <literal>FROM</> clause;
|
|
they could be constant arithmetic expressions as well, for
|
|
instance.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="queries-column-labels">
|
|
<title>Column Labels</title>
|
|
|
|
<indexterm zone="queries-column-labels">
|
|
<primary>alias</primary>
|
|
<secondary>in the select list</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The entries in the select list can be assigned names for further
|
|
processing. The <quote>further processing</quote> in this case is
|
|
an optional sort specification and the client application (e.g.,
|
|
column headers for display). For example:
|
|
<programlisting>
|
|
SELECT a AS value, b + c AS sum FROM ...
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
If no output column name is specified using <literal>AS</>,
|
|
the system assigns a default column name. For simple column references,
|
|
this is the name of the referenced column. For function
|
|
calls, this is the name of the function. For complex expressions,
|
|
the system will generate a generic name.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>AS</> keyword is optional, but only if the new column
|
|
name does not match any
|
|
<productname>PostgreSQL</productname> keyword (see <xref
|
|
linkend="sql-keywords-appendix">). To avoid an accidental match to
|
|
a keyword, you can double-quote the column name. For example,
|
|
<literal>VALUE</> is a keyword, so this does not work:
|
|
<programlisting>
|
|
SELECT a value, b + c AS sum FROM ...
|
|
</programlisting>
|
|
but this does:
|
|
<programlisting>
|
|
SELECT a "value", b + c AS sum FROM ...
|
|
</programlisting>
|
|
For protection against possible
|
|
future keyword additions, it is recommended that you always either
|
|
write <literal>AS</literal> or double-quote the output column name.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The naming of output columns here is different from that done in
|
|
the <literal>FROM</> clause (see <xref
|
|
linkend="queries-table-aliases">). This pipeline will in fact
|
|
allow you to rename the same column twice, but the name chosen in
|
|
the select list is the one that will be passed on.
|
|
</para>
|
|
</note>
|
|
</sect2>
|
|
|
|
<sect2 id="queries-distinct">
|
|
<title><literal>DISTINCT</literal></title>
|
|
|
|
<indexterm zone="queries-distinct">
|
|
<primary>DISTINCT</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="queries-distinct">
|
|
<primary>duplicates</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After the select list has been processed, the result table can
|
|
optionally be subject to the elimination of duplicate rows. The
|
|
<literal>DISTINCT</literal> key word is written directly after
|
|
<literal>SELECT</literal> to specify this:
|
|
<synopsis>
|
|
SELECT DISTINCT <replaceable>select_list</replaceable> ...
|
|
</synopsis>
|
|
(Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
|
|
can be used to specify the default behavior of retaining all rows.)
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>null value</><secondary sortas="DISTINCT">in
|
|
DISTINCT</></indexterm>
|
|
Obviously, two rows are considered distinct if they differ in at
|
|
least one column value. Null values are considered equal in this
|
|
comparison.
|
|
</para>
|
|
|
|
<para>
|
|
Alternatively, an arbitrary expression can determine what rows are
|
|
to be considered distinct:
|
|
<synopsis>
|
|
SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
|
|
</synopsis>
|
|
Here <replaceable>expression</replaceable> is an arbitrary value
|
|
expression that is evaluated for all rows. A set of rows for
|
|
which all the expressions are equal are considered duplicates, and
|
|
only the first row of the set is kept in the output. Note that
|
|
the <quote>first row</quote> of a set is unpredictable unless the
|
|
query is sorted on enough columns to guarantee a unique ordering
|
|
of the rows arriving at the <literal>DISTINCT</> filter.
|
|
(<literal>DISTINCT ON</> processing occurs after <literal>ORDER
|
|
BY</> sorting.)
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>DISTINCT ON</> clause is not part of the SQL standard
|
|
and is sometimes considered bad style because of the potentially
|
|
indeterminate nature of its results. With judicious use of
|
|
<literal>GROUP BY</> and subqueries in <literal>FROM</> the
|
|
construct can be avoided, but it is often the most convenient
|
|
alternative.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-union">
|
|
<title>Combining Queries</title>
|
|
|
|
<indexterm zone="queries-union">
|
|
<primary>UNION</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>INTERSECT</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>EXCEPT</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>set union</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>set intersection</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>set difference</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>set operation</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The results of two queries can be combined using the set operations
|
|
union, intersection, and difference. The syntax is
|
|
<synopsis>
|
|
<replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
|
|
<replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
|
|
<replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
|
|
</synopsis>
|
|
<replaceable>query1</replaceable> and
|
|
<replaceable>query2</replaceable> are queries that can use any of
|
|
the features discussed up to this point. Set operations can also
|
|
be nested and chained, for example
|
|
<synopsis>
|
|
<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
|
|
</synopsis>
|
|
which really says
|
|
<synopsis>
|
|
(<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
|
|
</synopsis>
|
|
</para>
|
|
|
|
<para>
|
|
<literal>UNION</> effectively appends the result of
|
|
<replaceable>query2</replaceable> to the result of
|
|
<replaceable>query1</replaceable> (although there is no guarantee
|
|
that this is the order in which the rows are actually returned).
|
|
Furthermore, it eliminates duplicate rows from its result, in the same
|
|
way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>INTERSECT</> returns all rows that are both in the result
|
|
of <replaceable>query1</replaceable> and in the result of
|
|
<replaceable>query2</replaceable>. Duplicate rows are eliminated
|
|
unless <literal>INTERSECT ALL</> is used.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>EXCEPT</> returns all rows that are in the result of
|
|
<replaceable>query1</replaceable> but not in the result of
|
|
<replaceable>query2</replaceable>. (This is sometimes called the
|
|
<firstterm>difference</> between two queries.) Again, duplicates
|
|
are eliminated unless <literal>EXCEPT ALL</> is used.
|
|
</para>
|
|
|
|
<para>
|
|
In order to calculate the union, intersection, or difference of two
|
|
queries, the two queries must be <quote>union compatible</quote>,
|
|
which means that they return the same number of columns and
|
|
the corresponding columns have compatible data types, as
|
|
described in <xref linkend="typeconv-union-case">.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-order">
|
|
<title>Sorting Rows</title>
|
|
|
|
<indexterm zone="queries-order">
|
|
<primary>sorting</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="queries-order">
|
|
<primary>ORDER BY</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After a query has produced an output table (after the select list
|
|
has been processed) it can optionally be sorted. If sorting is not
|
|
chosen, the rows will be returned in an unspecified order. The actual
|
|
order in that case will depend on the scan and join plan types and
|
|
the order on disk, but it must not be relied on. A particular
|
|
output ordering can only be guaranteed if the sort step is explicitly
|
|
chosen.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>ORDER BY</> clause specifies the sort order:
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable>
|
|
FROM <replaceable>table_expression</replaceable>
|
|
ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
|
|
<optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
|
|
</synopsis>
|
|
The sort expression(s) can be any expression that would be valid in the
|
|
query's select list. An example is:
|
|
<programlisting>
|
|
SELECT a, b FROM table1 ORDER BY a + b, c;
|
|
</programlisting>
|
|
When more than one expression is specified,
|
|
the later values are used to sort rows that are equal according to the
|
|
earlier values. Each expression can be followed by an optional
|
|
<literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
|
|
ascending or descending. <literal>ASC</> order is the default.
|
|
Ascending order puts smaller values first, where
|
|
<quote>smaller</quote> is defined in terms of the
|
|
<literal><</literal> operator. Similarly, descending order is
|
|
determined with the <literal>></literal> operator.
|
|
<footnote>
|
|
<para>
|
|
Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
|
|
operator class</> for the expression's data type to determine the sort
|
|
ordering for <literal>ASC</> and <literal>DESC</>. Conventionally,
|
|
data types will be set up so that the <literal><</literal> and
|
|
<literal>></literal> operators correspond to this sort ordering,
|
|
but a user-defined data type's designer could choose to do something
|
|
different.
|
|
</para>
|
|
</footnote>
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
|
|
used to determine whether nulls appear before or after non-null values
|
|
in the sort ordering. By default, null values sort as if larger than any
|
|
non-null value; that is, <literal>NULLS FIRST</> is the default for
|
|
<literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
|
|
</para>
|
|
|
|
<para>
|
|
Note that the ordering options are considered independently for each
|
|
sort column. For example <literal>ORDER BY x, y DESC</> means
|
|
<literal>ORDER BY x ASC, y DESC</>, which is not the same as
|
|
<literal>ORDER BY x DESC, y DESC</>.
|
|
</para>
|
|
|
|
<para>
|
|
For backwards compatibility with the SQL92 version of the standard,
|
|
a <replaceable>sort_expression</> can instead be the name or number
|
|
of an output column, as in:
|
|
<programlisting>
|
|
SELECT a + b AS sum, c FROM table1 ORDER BY sum;
|
|
SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
|
|
</programlisting>
|
|
both of which sort by the first output column. Note that an output
|
|
column name has to stand alone, it's not allowed as part of an expression
|
|
— for example, this is <emphasis>not</> correct:
|
|
<programlisting>
|
|
SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- wrong
|
|
</programlisting>
|
|
This restriction is made to reduce ambiguity. There is still
|
|
ambiguity if an <literal>ORDER BY</> item is a simple name that
|
|
could match either an output column name or a column from the table
|
|
expression. The output column is used in such cases. This would
|
|
only cause confusion if you use <literal>AS</> to rename an output
|
|
column to match some other table column's name.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>ORDER BY</> can be applied to the result of a
|
|
<literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
|
|
combination, but in this case it is only permitted to sort by
|
|
output column names or numbers, not by expressions.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-limit">
|
|
<title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
|
|
|
|
<indexterm zone="queries-limit">
|
|
<primary>LIMIT</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="queries-limit">
|
|
<primary>OFFSET</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
|
|
a portion of the rows that are generated by the rest of the query:
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable>
|
|
FROM <replaceable>table_expression</replaceable>
|
|
<optional> ORDER BY ... </optional>
|
|
<optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
|
|
</synopsis>
|
|
</para>
|
|
|
|
<para>
|
|
If a limit count is given, no more than that many rows will be
|
|
returned (but possibly less, if the query itself yields less rows).
|
|
<literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
|
|
clause.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>OFFSET</> says to skip that many rows before beginning to
|
|
return rows. <literal>OFFSET 0</> is the same as omitting the
|
|
<literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
|
|
as omitting the <literal>LIMIT</> clause. If both <literal>OFFSET</>
|
|
and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
|
|
skipped before starting to count the <literal>LIMIT</> rows that
|
|
are returned.
|
|
</para>
|
|
|
|
<para>
|
|
When using <literal>LIMIT</>, it is important to use an
|
|
<literal>ORDER BY</> clause that constrains the result rows into a
|
|
unique order. Otherwise you will get an unpredictable subset of
|
|
the query's rows. You might be asking for the tenth through
|
|
twentieth rows, but tenth through twentieth in what ordering? The
|
|
ordering is unknown, unless you specified <literal>ORDER BY</>.
|
|
</para>
|
|
|
|
<para>
|
|
The query optimizer takes <literal>LIMIT</> into account when
|
|
generating a query plan, so you are very likely to get different
|
|
plans (yielding different row orders) depending on what you give
|
|
for <literal>LIMIT</> and <literal>OFFSET</>. Thus, using
|
|
different <literal>LIMIT</>/<literal>OFFSET</> values to select
|
|
different subsets of a query result <emphasis>will give
|
|
inconsistent results</emphasis> unless you enforce a predictable
|
|
result ordering with <literal>ORDER BY</>. This is not a bug; it
|
|
is an inherent consequence of the fact that SQL does not promise to
|
|
deliver the results of a query in any particular order unless
|
|
<literal>ORDER BY</> is used to constrain the order.
|
|
</para>
|
|
|
|
<para>
|
|
The rows skipped by an <literal>OFFSET</> clause still have to be
|
|
computed inside the server; therefore a large <literal>OFFSET</>
|
|
might be inefficient.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-values">
|
|
<title><literal>VALUES</literal> Lists</title>
|
|
|
|
<indexterm zone="queries-values">
|
|
<primary>VALUES</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<literal>VALUES</> provides a way to generate a <quote>constant table</>
|
|
that can be used in a query without having to actually create and populate
|
|
a table on-disk. The syntax is
|
|
<synopsis>
|
|
VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
|
|
</synopsis>
|
|
Each parenthesized list of expressions generates a row in the table.
|
|
The lists must all have the same number of elements (i.e., the number
|
|
of columns in the table), and corresponding entries in each list must
|
|
have compatible data types. The actual data type assigned to each column
|
|
of the result is determined using the same rules as for <literal>UNION</>
|
|
(see <xref linkend="typeconv-union-case">).
|
|
</para>
|
|
|
|
<para>
|
|
As an example:
|
|
<programlisting>
|
|
VALUES (1, 'one'), (2, 'two'), (3, 'three');
|
|
</programlisting>
|
|
|
|
will return a table of two columns and three rows. It's effectively
|
|
equivalent to:
|
|
<programlisting>
|
|
SELECT 1 AS column1, 'one' AS column2
|
|
UNION ALL
|
|
SELECT 2, 'two'
|
|
UNION ALL
|
|
SELECT 3, 'three';
|
|
</programlisting>
|
|
|
|
By default, <productname>PostgreSQL</productname> assigns the names
|
|
<literal>column1</>, <literal>column2</>, etc. to the columns of a
|
|
<literal>VALUES</> table. The column names are not specified by the
|
|
SQL standard and different database systems do it differently, so
|
|
it's usually better to override the default names with a table alias
|
|
list.
|
|
</para>
|
|
|
|
<para>
|
|
Syntactically, <literal>VALUES</> followed by expression lists is
|
|
treated as equivalent to
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
|
|
</synopsis>
|
|
and can appear anywhere a <literal>SELECT</> can. For example, you can
|
|
use it as an arm of a <literal>UNION</>, or attach a
|
|
<replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
|
|
<literal>LIMIT</>, and/or <literal>OFFSET</>) to it. <literal>VALUES</>
|
|
is most commonly used as the data source in an <command>INSERT</> command,
|
|
and next most commonly as a subquery.
|
|
</para>
|
|
|
|
<para>
|
|
For more information see <xref linkend="sql-values"
|
|
endterm="sql-values-title">.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-with">
|
|
<title><literal>WITH</literal> Queries</title>
|
|
|
|
<indexterm zone="queries-with">
|
|
<primary>WITH</primary>
|
|
<secondary>in SELECT</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>common table expression</primary>
|
|
<see>WITH</see>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<literal>WITH</> provides a way to write subqueries for use in a larger
|
|
<literal>SELECT</> query. The subqueries can be thought of as defining
|
|
temporary tables that exist just for this query. One use of this feature
|
|
is to break down complicated queries into simpler parts. An example is:
|
|
|
|
<programlisting>
|
|
WITH regional_sales AS (
|
|
SELECT region, SUM(amount) AS total_sales
|
|
FROM orders
|
|
GROUP BY region
|
|
), top_regions AS (
|
|
SELECT region
|
|
FROM regional_sales
|
|
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
|
|
)
|
|
SELECT region,
|
|
product,
|
|
SUM(quantity) AS product_units,
|
|
SUM(amount) AS product_sales
|
|
FROM orders
|
|
WHERE region IN (SELECT region FROM top_regions)
|
|
GROUP BY region, product;
|
|
</programlisting>
|
|
|
|
which displays per-product sales totals in only the top sales regions.
|
|
This example could have been written without <literal>WITH</>,
|
|
but we'd have needed two levels of nested sub-SELECTs. It's a bit
|
|
easier to follow this way.
|
|
</para>
|
|
|
|
<para>
|
|
The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
|
|
from a mere syntactic convenience into a feature that accomplishes
|
|
things not otherwise possible in standard SQL. Using
|
|
<literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
|
|
output. A very simple example is this query to sum the integers from 1
|
|
through 100:
|
|
|
|
<programlisting>
|
|
WITH RECURSIVE t(n) AS (
|
|
VALUES (1)
|
|
UNION ALL
|
|
SELECT n+1 FROM t WHERE n < 100
|
|
)
|
|
SELECT sum(n) FROM t;
|
|
</programlisting>
|
|
|
|
The general form of a recursive <literal>WITH</> query is always a
|
|
<firstterm>non-recursive term</>, then <literal>UNION</> (or
|
|
<literal>UNION ALL</>), then a
|
|
<firstterm>recursive term</>, where only the recursive term can contain
|
|
a reference to the query's own output. Such a query is executed as
|
|
follows:
|
|
</para>
|
|
|
|
<procedure>
|
|
<title>Recursive Query Evaluation</title>
|
|
|
|
<step performance="required">
|
|
<para>
|
|
Evaluate the non-recursive term. For <literal>UNION</> (but not
|
|
<literal>UNION ALL</>), discard duplicate rows. Include all remaining
|
|
rows in the result of the recursive query, and also place them in a
|
|
temporary <firstterm>working table</>.
|
|
</para>
|
|
</step>
|
|
|
|
<step performance="required">
|
|
<para>
|
|
So long as the working table is not empty, repeat these steps:
|
|
</para>
|
|
<substeps>
|
|
<step performance="required">
|
|
<para>
|
|
Evaluate the recursive term, substituting the current contents of
|
|
the working table for the recursive self-reference.
|
|
For <literal>UNION</> (but not <literal>UNION ALL</>), discard
|
|
duplicate rows and rows that duplicate any previous result row.
|
|
Include all remaining rows in the result of the recursive query, and
|
|
also place them in a temporary <firstterm>intermediate table</>.
|
|
</para>
|
|
</step>
|
|
|
|
<step performance="required">
|
|
<para>
|
|
Replace the contents of the working table with the contents of the
|
|
intermediate table, then empty the intermediate table.
|
|
</para>
|
|
</step>
|
|
</substeps>
|
|
</step>
|
|
</procedure>
|
|
|
|
<note>
|
|
<para>
|
|
Strictly speaking, this process is iteration not recursion, but
|
|
<literal>RECURSIVE</> is the terminology chosen by the SQL standards
|
|
committee.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
In the example above, the working table has just a single row in each step,
|
|
and it takes on the values from 1 through 100 in successive steps. In
|
|
the 100th step, there is no output because of the <literal>WHERE</>
|
|
clause, and so the query terminates.
|
|
</para>
|
|
|
|
<para>
|
|
Recursive queries are typically used to deal with hierarchical or
|
|
tree-structured data. A useful example is this query to find all the
|
|
direct and indirect sub-parts of a product, given only a table that
|
|
shows immediate inclusions:
|
|
|
|
<programlisting>
|
|
WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
|
|
SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
|
|
UNION ALL
|
|
SELECT p.sub_part, p.part, p.quantity
|
|
FROM included_parts pr, parts p
|
|
WHERE p.part = pr.sub_part
|
|
)
|
|
SELECT sub_part, SUM(quantity) as total_quantity
|
|
FROM included_parts
|
|
GROUP BY sub_part
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
When working with recursive queries it is important to be sure that
|
|
the recursive part of the query will eventually return no tuples,
|
|
or else the query will loop indefinitely. Sometimes, using
|
|
<literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
|
|
by discarding rows that duplicate previous output rows. However, often a
|
|
cycle does not involve output rows that are completely duplicate: it may be
|
|
necessary to check just one or a few fields to see if the same point has
|
|
been reached before. The standard method for handling such situations is
|
|
to compute an array of the already-visited values. For example, consider
|
|
the following query that searches a table <structname>graph</> using a
|
|
<structfield>link</> field:
|
|
|
|
<programlisting>
|
|
WITH RECURSIVE search_graph(id, link, data, depth) AS (
|
|
SELECT g.id, g.link, g.data, 1
|
|
FROM graph g
|
|
UNION ALL
|
|
SELECT g.id, g.link, g.data, sg.depth + 1
|
|
FROM graph g, search_graph sg
|
|
WHERE g.id = sg.link
|
|
)
|
|
SELECT * FROM search_graph;
|
|
</programlisting>
|
|
|
|
This query will loop if the <structfield>link</> relationships contain
|
|
cycles. Because we require a <quote>depth</> output, just changing
|
|
<literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
|
|
Instead we need to recognize whether we have reached the same row again
|
|
while following a particular path of links. We add two columns
|
|
<structfield>path</> and <structfield>cycle</> to the loop-prone query:
|
|
|
|
<programlisting>
|
|
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
|
|
SELECT g.id, g.link, g.data, 1,
|
|
ARRAY[g.id],
|
|
false
|
|
FROM graph g
|
|
UNION ALL
|
|
SELECT g.id, g.link, g.data, sg.depth + 1,
|
|
path || g.id,
|
|
g.id = ANY(path)
|
|
FROM graph g, search_graph sg
|
|
WHERE g.id = sg.link AND NOT cycle
|
|
)
|
|
SELECT * FROM search_graph;
|
|
</programlisting>
|
|
|
|
Aside from preventing cycles, the array value is often useful in its own
|
|
right as representing the <quote>path</> taken to reach any particular row.
|
|
</para>
|
|
|
|
<para>
|
|
In the general case where more than one field needs to be checked to
|
|
recognize a cycle, use an array of rows. For example, if we needed to
|
|
compare fields <structfield>f1</> and <structfield>f2</>:
|
|
|
|
<programlisting>
|
|
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
|
|
SELECT g.id, g.link, g.data, 1,
|
|
ARRAY[ROW(g.f1, g.f2)],
|
|
false
|
|
FROM graph g
|
|
UNION ALL
|
|
SELECT g.id, g.link, g.data, sg.depth + 1,
|
|
path || ROW(g.f1, g.f2),
|
|
ROW(g.f1, g.f2) = ANY(path)
|
|
FROM graph g, search_graph sg
|
|
WHERE g.id = sg.link AND NOT cycle
|
|
)
|
|
SELECT * FROM search_graph;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
Omit the <literal>ROW()</> syntax in the common case where only one field
|
|
needs to be checked to recognize a cycle. This allows a simple array
|
|
rather than a composite-type array to be used, gaining efficiency.
|
|
</para>
|
|
</tip>
|
|
|
|
<tip>
|
|
<para>
|
|
The recursive query evaluation algorithm produces its output in
|
|
breadth-first search order. You can display the results in depth-first
|
|
search order by making the outer query <literal>ORDER BY</> a
|
|
<quote>path</> column constructed in this way.
|
|
</para>
|
|
</tip>
|
|
|
|
<para>
|
|
A helpful trick for testing queries
|
|
when you are not certain if they might loop is to place a <literal>LIMIT</>
|
|
in the parent query. For example, this query would loop forever without
|
|
the <literal>LIMIT</>:
|
|
|
|
<programlisting>
|
|
WITH RECURSIVE t(n) AS (
|
|
SELECT 1
|
|
UNION ALL
|
|
SELECT n+1 FROM t
|
|
)
|
|
SELECT n FROM t LIMIT 100;
|
|
</programlisting>
|
|
|
|
This works because <productname>PostgreSQL</productname>'s implementation
|
|
evaluates only as many rows of a <literal>WITH</> query as are actually
|
|
fetched by the parent query. Using this trick in production is not
|
|
recommended, because other systems might work differently. Also, it
|
|
usually won't work if you make the outer query sort the recursive query's
|
|
results or join them to some other table.
|
|
</para>
|
|
|
|
<para>
|
|
A useful property of <literal>WITH</> queries is that they are evaluated
|
|
only once per execution of the parent query, even if they are referred to
|
|
more than once by the parent query or sibling <literal>WITH</> queries.
|
|
Thus, expensive calculations that are needed in multiple places can be
|
|
placed within a <literal>WITH</> query to avoid redundant work. Another
|
|
possible application is to prevent unwanted multiple evaluations of
|
|
functions with side-effects.
|
|
However, the other side of this coin is that the optimizer is less able to
|
|
push restrictions from the parent query down into a <literal>WITH</> query
|
|
than an ordinary sub-query. The <literal>WITH</> query will generally be
|
|
evaluated as stated, without suppression of rows that the parent query
|
|
might discard afterwards. (But, as mentioned above, evaluation might stop
|
|
early if the reference(s) to the query demand only a limited number of
|
|
rows.)
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|