1998-08-15 14:55:05 +08:00
|
|
|
<!--
|
2005-04-12 11:16:50 +08:00
|
|
|
$PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.32 2005/04/12 03:16:50 tgl Exp $
|
1998-08-15 14:55:05 +08:00
|
|
|
Genetic Optimizer
|
|
|
|
-->
|
|
|
|
|
2000-08-23 13:59:11 +08:00
|
|
|
<chapter id="geqo">
|
2003-11-25 03:08:02 +08:00
|
|
|
<chapterinfo>
|
2000-08-23 13:59:11 +08:00
|
|
|
<author>
|
|
|
|
<firstname>Martin</firstname>
|
|
|
|
<surname>Utesch</surname>
|
|
|
|
<affiliation>
|
|
|
|
<orgname>
|
|
|
|
University of Mining and Technology
|
|
|
|
</orgname>
|
|
|
|
<orgdiv>
|
|
|
|
Institute of Automatic Control
|
|
|
|
</orgdiv>
|
|
|
|
<address>
|
|
|
|
<city>
|
|
|
|
Freiberg
|
|
|
|
</city>
|
|
|
|
<country>
|
|
|
|
Germany
|
|
|
|
</country>
|
|
|
|
</address>
|
|
|
|
</affiliation>
|
|
|
|
</author>
|
|
|
|
<date>1997-10-02</date>
|
2003-11-25 03:08:02 +08:00
|
|
|
</chapterinfo>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
2003-09-30 02:18:35 +08:00
|
|
|
<title id="geqo-title">Genetic Query Optimizer</title>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
|
|
|
<para>
|
|
|
|
<note>
|
|
|
|
<title>Author</title>
|
|
|
|
<para>
|
2000-12-23 05:51:58 +08:00
|
|
|
Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
|
2000-08-23 13:59:11 +08:00
|
|
|
for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
</para>
|
|
|
|
|
2000-09-30 04:21:34 +08:00
|
|
|
<sect1 id="geqo-intro">
|
2000-08-23 13:59:11 +08:00
|
|
|
<title>Query Handling as a Complex Optimization Problem</title>
|
|
|
|
|
|
|
|
<para>
|
2003-09-30 02:18:35 +08:00
|
|
|
Among all relational operators the most difficult one to process
|
|
|
|
and optimize is the <firstterm>join</firstterm>. The number of
|
|
|
|
alternative plans to answer a query grows exponentially with the
|
|
|
|
number of joins included in it. Further optimization effort is
|
|
|
|
caused by the support of a variety of <firstterm>join
|
|
|
|
methods</firstterm> (e.g., nested loop, hash join, merge join in
|
|
|
|
<productname>PostgreSQL</productname>) to process individual joins
|
|
|
|
and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
|
|
|
|
B-tree, hash in <productname>PostgreSQL</productname>) as access
|
|
|
|
paths for relations.
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-11-21 13:53:41 +08:00
|
|
|
The current <productname>PostgreSQL</productname> optimizer
|
2003-09-30 02:18:35 +08:00
|
|
|
implementation performs a <firstterm>near-exhaustive
|
|
|
|
search</firstterm> over the space of alternative strategies. This
|
|
|
|
algorithm, first introduced in the <quote>System R</quote>
|
|
|
|
database, produces a near-optimal join order, but can take an
|
|
|
|
enormous amount of time and memory space when the number of joins
|
|
|
|
in the query grows large. This makes the ordinary
|
|
|
|
<productname>PostgreSQL</productname> query optimizer
|
2005-01-06 07:42:03 +08:00
|
|
|
inappropriate for queries that join a large number of tables.
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The Institute of Automatic Control at the University of Mining and
|
|
|
|
Technology, in Freiberg, Germany, encountered the described problems as its
|
2001-11-21 13:53:41 +08:00
|
|
|
folks wanted to take the <productname>PostgreSQL</productname> DBMS as the backend for a decision
|
2000-08-23 13:59:11 +08:00
|
|
|
support knowledge based system for the maintenance of an electrical
|
2001-10-10 02:46:00 +08:00
|
|
|
power grid. The DBMS needed to handle large join queries for the
|
2000-08-23 13:59:11 +08:00
|
|
|
inference machine of the knowledge based system.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2000-12-17 06:44:47 +08:00
|
|
|
Performance difficulties in exploring the space of possible query
|
2003-09-30 02:18:35 +08:00
|
|
|
plans created the demand for a new optimization technique to be developed.
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-09-30 02:18:35 +08:00
|
|
|
In the following we describe the implementation of a
|
|
|
|
<firstterm>Genetic Algorithm</firstterm> to solve the join
|
|
|
|
ordering problem in a manner that is efficient for queries
|
|
|
|
involving large numbers of joins.
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
2000-09-30 04:21:34 +08:00
|
|
|
<sect1 id="geqo-intro2">
|
2001-10-10 02:46:00 +08:00
|
|
|
<title>Genetic Algorithms</title>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
|
|
|
<para>
|
2001-10-10 02:46:00 +08:00
|
|
|
The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
|
2000-12-17 06:44:47 +08:00
|
|
|
operates through
|
2005-01-06 07:42:03 +08:00
|
|
|
nondeterministic, randomized search. The set of possible solutions for the
|
2000-08-23 13:59:11 +08:00
|
|
|
optimization problem is considered as a
|
2000-12-17 06:44:47 +08:00
|
|
|
<firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
|
2002-01-21 06:19:57 +08:00
|
|
|
The degree of adaptation of an individual to its environment is specified
|
2000-08-23 13:59:11 +08:00
|
|
|
by its <firstterm>fitness</firstterm>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The coordinates of an individual in the search space are represented
|
|
|
|
by <firstterm>chromosomes</firstterm>, in essence a set of character
|
|
|
|
strings. A <firstterm>gene</firstterm> is a
|
|
|
|
subsection of a chromosome which encodes the value of a single parameter
|
|
|
|
being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
|
|
|
|
<firstterm>integer</firstterm>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
|
|
|
|
<firstterm>mutation</firstterm>, and
|
|
|
|
<firstterm>selection</firstterm> new generations of search points are found
|
|
|
|
that show a higher average fitness than their ancestors.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-09-13 23:55:24 +08:00
|
|
|
According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
|
2000-08-23 13:59:11 +08:00
|
|
|
strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
|
|
|
|
problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
|
|
|
|
non-random (better than random).
|
2001-10-10 02:46:00 +08:00
|
|
|
</para>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
2001-10-10 02:46:00 +08:00
|
|
|
<figure id="geqo-diagram">
|
|
|
|
<title>Structured Diagram of a Genetic Algorithm</title>
|
|
|
|
|
|
|
|
<informaltable frame="none">
|
|
|
|
<tgroup cols="2">
|
|
|
|
<tbody>
|
|
|
|
<row>
|
|
|
|
<entry>P(t)</entry>
|
|
|
|
<entry>generation of ancestors at a time t</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>P''(t)</entry>
|
|
|
|
<entry>generation of descendants at a time t</entry>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</informaltable>
|
|
|
|
|
|
|
|
<literallayout class="monospaced">
|
1998-03-01 16:16:16 +08:00
|
|
|
+=========================================+
|
2005-01-23 06:56:36 +08:00
|
|
|
|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
|
1998-03-01 16:16:16 +08:00
|
|
|
+=========================================+
|
|
|
|
| INITIALIZE t := 0 |
|
|
|
|
+=========================================+
|
|
|
|
| INITIALIZE P(t) |
|
|
|
|
+=========================================+
|
2002-01-21 06:19:57 +08:00
|
|
|
| evaluate FITNESS of P(t) |
|
1998-03-01 16:16:16 +08:00
|
|
|
+=========================================+
|
|
|
|
| while not STOPPING CRITERION do |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | P'(t) := RECOMBINATION{P(t)} |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | P''(t) := MUTATION{P'(t)} |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | P(t+1) := SELECTION{P''(t) + P(t)} |
|
|
|
|
| +-------------------------------------+
|
2002-01-21 06:19:57 +08:00
|
|
|
| | evaluate FITNESS of P''(t) |
|
1998-03-01 16:16:16 +08:00
|
|
|
| +-------------------------------------+
|
|
|
|
| | t := t + 1 |
|
|
|
|
+===+=====================================+
|
2001-10-10 02:46:00 +08:00
|
|
|
</literallayout>
|
|
|
|
</figure>
|
2000-08-23 13:59:11 +08:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-30 04:21:34 +08:00
|
|
|
<sect1 id="geqo-pg-intro">
|
2001-11-21 13:53:41 +08:00
|
|
|
<title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
|
|
|
<para>
|
2005-01-06 07:42:03 +08:00
|
|
|
The <acronym>GEQO</acronym> module approaches the query
|
|
|
|
optimization problem as though it were the well-known traveling salesman
|
|
|
|
problem (<acronym>TSP</acronym>).
|
2000-08-23 13:59:11 +08:00
|
|
|
Possible query plans are encoded as integer strings. Each string
|
2001-10-10 02:46:00 +08:00
|
|
|
represents the join order from one relation of the query to the next.
|
2005-01-06 07:42:03 +08:00
|
|
|
For example, the join tree
|
2001-10-10 02:46:00 +08:00
|
|
|
<literallayout class="monospaced">
|
2000-08-23 13:59:11 +08:00
|
|
|
/\
|
|
|
|
/\ 2
|
|
|
|
/\ 3
|
|
|
|
4 1
|
2001-10-10 02:46:00 +08:00
|
|
|
</literallayout>
|
2000-08-23 13:59:11 +08:00
|
|
|
is encoded by the integer string '4-1-3-2',
|
|
|
|
which means, first join relation '4' and '1', then '3', and
|
2002-01-21 06:19:57 +08:00
|
|
|
then '2', where 1, 2, 3, 4 are relation IDs within the
|
2001-11-21 13:53:41 +08:00
|
|
|
<productname>PostgreSQL</productname> optimizer.
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
|
|
|
|
algorithm.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Specific characteristics of the <acronym>GEQO</acronym>
|
2001-11-21 13:53:41 +08:00
|
|
|
implementation in <productname>PostgreSQL</productname>
|
2000-08-23 13:59:11 +08:00
|
|
|
are:
|
|
|
|
|
|
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
|
|
|
|
individuals in a population, not whole-generational replacement)
|
|
|
|
allows fast convergence towards improved query plans. This is
|
|
|
|
essential for query handling with reasonable time;
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2003-09-30 02:18:35 +08:00
|
|
|
Usage of <firstterm>edge recombination crossover</firstterm>
|
|
|
|
which is especially suited to keep edge losses low for the
|
|
|
|
solution of the <acronym>TSP</acronym> by means of a
|
|
|
|
<acronym>GA</acronym>;
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Mutation as genetic operator is deprecated so that no repair
|
|
|
|
mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2000-12-17 06:44:47 +08:00
|
|
|
The <acronym>GEQO</acronym> module allows
|
2001-11-21 13:53:41 +08:00
|
|
|
the <productname>PostgreSQL</productname> query optimizer to
|
2001-10-10 02:46:00 +08:00
|
|
|
support large join queries effectively through
|
2000-12-17 06:44:47 +08:00
|
|
|
non-exhaustive search.
|
2000-08-23 13:59:11 +08:00
|
|
|
</para>
|
|
|
|
|
2000-12-17 06:44:47 +08:00
|
|
|
<sect2 id="geqo-future">
|
2000-08-23 13:59:11 +08:00
|
|
|
<title>Future Implementation Tasks for
|
2000-09-19 04:11:37 +08:00
|
|
|
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
|
|
|
<para>
|
2000-12-17 06:44:47 +08:00
|
|
|
Work is still needed to improve the genetic algorithm parameter
|
|
|
|
settings.
|
2005-01-06 07:42:03 +08:00
|
|
|
In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>,
|
|
|
|
routines
|
2000-08-23 13:59:11 +08:00
|
|
|
<function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
|
|
|
|
we have to find a compromise for the parameter settings
|
|
|
|
to satisfy two competing demands:
|
|
|
|
<itemizedlist spacing="compact">
|
|
|
|
<listitem>
|
2005-01-06 07:42:03 +08:00
|
|
|
<para>
|
|
|
|
Optimality of the query plan
|
|
|
|
</para>
|
2000-08-23 13:59:11 +08:00
|
|
|
</listitem>
|
|
|
|
<listitem>
|
2005-01-06 07:42:03 +08:00
|
|
|
<para>
|
|
|
|
Computing time
|
|
|
|
</para>
|
2000-08-23 13:59:11 +08:00
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
2005-01-06 07:42:03 +08:00
|
|
|
<para>
|
|
|
|
At a more basic level, it is not clear that solving query optimization
|
|
|
|
with a GA algorithm designed for TSP is appropriate. In the TSP case,
|
|
|
|
the cost associated with any substring (partial tour) is independent
|
|
|
|
of the rest of the tour, but this is certainly not true for query
|
|
|
|
optimization. Thus it is questionable whether edge recombination
|
|
|
|
crossover is the most effective mutation procedure.
|
|
|
|
</para>
|
|
|
|
|
2000-08-23 13:59:11 +08:00
|
|
|
</sect2>
|
2001-10-10 02:46:00 +08:00
|
|
|
</sect1>
|
2000-08-23 13:59:11 +08:00
|
|
|
|
2002-01-09 02:03:26 +08:00
|
|
|
<sect1 id="geqo-biblio">
|
2005-01-06 07:42:03 +08:00
|
|
|
<title>Further Reading</title>
|
2002-01-09 02:03:26 +08:00
|
|
|
|
|
|
|
<para>
|
|
|
|
The following resources contain additional information about
|
|
|
|
genetic algorithms:
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2005-04-09 11:52:43 +08:00
|
|
|
<ulink url="http://surf.de.uu.net/encore/www/">
|
2005-04-12 11:16:50 +08:00
|
|
|
The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink
|
2005-03-31 11:54:38 +08:00
|
|
|
url="news://comp.ai.genetic"></ulink>)
|
2002-01-09 02:03:26 +08:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2005-04-09 11:52:43 +08:00
|
|
|
<ulink url="http://www.red3d.com/cwr/evolve.html">
|
|
|
|
Evolutionary Computation and its application to art and design</ulink>, by
|
2002-01-09 02:03:26 +08:00
|
|
|
Craig Reynolds
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2005-02-21 10:21:03 +08:00
|
|
|
<xref linkend="ELMA04">
|
2002-01-09 02:03:26 +08:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<xref linkend="FONG">
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
1999-03-30 23:25:56 +08:00
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
2000-03-31 11:27:42 +08:00
|
|
|
mode:sgml
|
1999-03-30 23:25:56 +08:00
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
2000-03-31 11:27:42 +08:00
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
1999-03-30 23:25:56 +08:00
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|