mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-12-15 08:20:16 +08:00
Convert body of chapter to SGML. Was embedded text from original doc.
This commit is contained in:
parent
e01d442174
commit
c452ddcac6
@ -3,78 +3,103 @@
|
||||
<Author>
|
||||
<FirstName>Martin</FirstName>
|
||||
<SurName>Utesch</SurName>
|
||||
<Affiliation>
|
||||
<Orgname>
|
||||
University of Mining and Technology
|
||||
</Orgname>
|
||||
<Orgdiv>
|
||||
Institute of Automatic Control
|
||||
</Orgdiv>
|
||||
<Address>
|
||||
<City>
|
||||
Freiberg
|
||||
</City>
|
||||
<Country>
|
||||
Germany
|
||||
</Country>
|
||||
</Address>
|
||||
</Affiliation>
|
||||
</Author>
|
||||
<Date>1997-10-02</Date>
|
||||
</DocInfo>
|
||||
|
||||
<Title>Genetic Query Optimization in Database Systems</Title>
|
||||
|
||||
<Para>
|
||||
<ProgramListing>
|
||||
<ULink url="utesch@aut.tu-freiberg.de">Martin Utesch</ULink>
|
||||
<Note>
|
||||
<Title>Author</Title>
|
||||
<Para>
|
||||
Written by <ULink url="utesch@aut.tu-freiberg.de">Martin Utesch</ULink>
|
||||
for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
|
||||
</Para>
|
||||
</Note>
|
||||
|
||||
Institute of Automatic Control
|
||||
University of Mining and Technology
|
||||
Freiberg, Germany
|
||||
|
||||
02/10/1997
|
||||
|
||||
|
||||
1.) Query Handling as a Complex Optimization Problem
|
||||
====================================================
|
||||
<Sect1>
|
||||
<Title>Query Handling as a Complex Optimization Problem</Title>
|
||||
|
||||
<Para>
|
||||
Among all relational operators the most difficult one to process and
|
||||
optimize is the JOIN. The number of alternative plans to answer a query
|
||||
grows exponentially with the number of JOINs included in it. Further
|
||||
optimization effort is caused by the support of a variety of *JOIN
|
||||
methods* (e.g., nested loop, index scan, merge join in Postgres) to
|
||||
process individual JOINs and a diversity of *indices* (e.g., r-tree,
|
||||
b-tree, hash in Postgres) as access paths for relations.
|
||||
optimize is the <FirstTerm>join</FirstTerm>. The number of alternative plans to answer a query
|
||||
grows exponentially with the number of <Command>join</Command>s included in it. Further
|
||||
optimization effort is caused by the support of a variety of <FirstTerm>join methods</FirstTerm>
|
||||
(e.g., nested loop, index scan, merge join in <ProductName>Postgres</ProductName>) to
|
||||
process individual <Command>join</Command>s and a diversity of <FirstTerm>indices</FirstTerm> (e.g., r-tree,
|
||||
b-tree, hash in <ProductName>Postgres</ProductName>) as access paths for relations.
|
||||
|
||||
The current Postgres optimizer implementation performs a *near-
|
||||
exhaustive search* over the space of alternative strategies. This query
|
||||
<Para>
|
||||
The current <ProductName>Postgres</ProductName> optimizer implementation performs a <FirstTerm>near-
|
||||
exhaustive search</FirstTerm> over the space of alternative strategies. This query
|
||||
optimization technique is inadequate to support database application
|
||||
domains that involve the need for extensive queries, such as artificial
|
||||
intelligence.
|
||||
|
||||
<Para>
|
||||
The Institute of Automatic Control at the University of Mining and
|
||||
Technology, in Freiberg, Germany, encountered the described problems as its
|
||||
folks wanted to take the Postgres DBMS as the backend for a decision
|
||||
folks wanted to take the <ProductName>Postgres</ProductName> DBMS as the backend for a decision
|
||||
support knowledge based system for the maintenance of an electrical
|
||||
power grid. The DBMS needed to handle large JOIN queries for the
|
||||
power grid. The DBMS needed to handle large <Command>join</Command> queries for the
|
||||
inference machine of the knowledge based system.
|
||||
|
||||
<Para>
|
||||
Performance difficulties within exploring the space of possible query
|
||||
plans arose the demand for a new optimization technique being developed.
|
||||
|
||||
In the following we propose the implementation of a *Genetic
|
||||
Algorithm* as an option for the database query optimization problem.
|
||||
<Para>
|
||||
In the following we propose the implementation of a <FirstTerm>Genetic Algorithm</FirstTerm>
|
||||
as an option for the database query optimization problem.
|
||||
|
||||
|
||||
2.) Genetic Algorithms (GA)
|
||||
===========================
|
||||
<Sect1>
|
||||
<Title>Genetic Algorithms (<Acronym>GA</Acronym>)</Title>
|
||||
|
||||
The GA is a heuristic optimization method which operates through
|
||||
<Para>
|
||||
The <Acronym>GA</Acronym> is a heuristic optimization method which operates through
|
||||
determined, randomized search. The set of possible solutions for the
|
||||
optimization problem is considered as a *population* of *individuals*.
|
||||
optimization problem is considered as a <FirstTerm>population</FirstTerm> of <FirstTerm>individuals</FirstTerm>.
|
||||
The degree of adaption of an individual to its environment is specified
|
||||
by its *fitness*.
|
||||
by its <FirstTerm>fitness</FirstTerm>.
|
||||
|
||||
<Para>
|
||||
The coordinates of an individual in the search space are represented
|
||||
by *chromosomes*, in essence a set of character strings. A *gene* is a
|
||||
by <FirstTerm>chromosomes</FirstTerm>, in essence a set of character strings. A <FirstTerm>gene</FirstTerm> is a
|
||||
subsection of a chromosome which encodes the value of a single parameter
|
||||
being optimized. Typical encodings for a gene could be *binary* or
|
||||
*integer*.
|
||||
being optimized. Typical encodings for a gene could be <FirstTerm>binary</FirstTerm> or
|
||||
<FirstTerm>integer</FirstTerm>.
|
||||
|
||||
Through simulation of the evolutionary operations *recombination*,
|
||||
*mutation*, and *selection* new generations of search points are found
|
||||
<Para>
|
||||
Through simulation of the evolutionary operations <FirstTerm>recombination</FirstTerm>,
|
||||
<FirstTerm>mutation</FirstTerm>, and <FirstTerm>selection</FirstTerm> new generations of search points are found
|
||||
that show a higher average fitness than their ancestors.
|
||||
|
||||
According to the "comp.ai.genetic" FAQ it cannot be stressed too
|
||||
strongly that a GA is not a pure random search for a solution to a
|
||||
problem. A GA uses stochastic processes, but the result is distinctly
|
||||
<Para>
|
||||
According to the "comp.ai.genetic" <Acronym>FAQ</Acronym> it cannot be stressed too
|
||||
strongly that a <Acronym>GA</Acronym> is not a pure random search for a solution to a
|
||||
problem. A <Acronym>GA</Acronym> uses stochastic processes, but the result is distinctly
|
||||
non-random (better than random).
|
||||
|
||||
Structured Diagram of a GA:
|
||||
<ProgramListing>
|
||||
Structured Diagram of a <Acronym>GA</Acronym>:
|
||||
---------------------------
|
||||
|
||||
P(t) generation of ancestors at a time t
|
||||
@ -101,128 +126,233 @@ P''(t) generation of descendants at a time t
|
||||
| +-------------------------------------+
|
||||
| | t := t + 1 |
|
||||
+===+=====================================+
|
||||
</ProgramListing>
|
||||
|
||||
<Sect1>
|
||||
<Title>Genetic Query Optimization (<Acronym>GEQO</Acronym>) in Postgres</Title>
|
||||
|
||||
3.) Genetic Query Optimization (GEQO) in PostgreSQL
|
||||
===================================================
|
||||
|
||||
The GEQO module is intended for the solution of the query
|
||||
optimization problem similar to a traveling salesman problem (TSP).
|
||||
<Para>
|
||||
The <Acronym>GEQO</Acronym> module is intended for the solution of the query
|
||||
optimization problem similar to a traveling salesman problem (<Acronym>TSP</Acronym>).
|
||||
Possible query plans are encoded as integer strings. Each string
|
||||
represents the JOIN order from one relation of the query to the next.
|
||||
E. g., the query tree /\
|
||||
/\ 2
|
||||
/\ 3
|
||||
4 1 is encoded by the integer string '4-1-3-2',
|
||||
represents the <Command>join</Command> order from one relation of the query to the next.
|
||||
E. g., the query tree
|
||||
<ProgramListing>
|
||||
/\
|
||||
/\ 2
|
||||
/\ 3
|
||||
4 1
|
||||
</ProgramListing>
|
||||
is encoded by the integer string '4-1-3-2',
|
||||
which means, first join relation '4' and '1', then '3', and
|
||||
then '2', where 1, 2, 3, 4 are relids in PostgreSQL.
|
||||
then '2', where 1, 2, 3, 4 are relids in <ProductName>Postgres</ProductName>.
|
||||
|
||||
Parts of the GEQO module are adapted from D. Whitley's Genitor
|
||||
<Para>
|
||||
Parts of the <Acronym>GEQO</Acronym> module are adapted from D. Whitley's Genitor
|
||||
algorithm.
|
||||
|
||||
Specific characteristics of the GEQO implementation in PostgreSQL
|
||||
<Para>
|
||||
Specific characteristics of the <Acronym>GEQO</Acronym> implementation in <ProductName>Postgres</ProductName>
|
||||
are:
|
||||
|
||||
o usage of a *steady state* GA (replacement of the least fit
|
||||
<ItemizedList Mark="bullet" Spacing="compact">
|
||||
<ListItem>
|
||||
<Para>
|
||||
Usage of a <FirstTerm>steady state</FirstTerm> <Acronym>GA</Acronym> (replacement of the least fit
|
||||
individuals in a population, not whole-generational replacement)
|
||||
allows fast convergence towards improved query plans. This is
|
||||
essential for query handling with reasonable time;
|
||||
</Para>
|
||||
</ListItem>
|
||||
|
||||
o usage of *edge recombination crossover* which is especially suited
|
||||
to keep edge losses low for the solution of the TSP by means of a GA;
|
||||
<ListItem>
|
||||
<Para>
|
||||
Usage of <FirstTerm>edge recombination crossover</FirstTerm> which is especially suited
|
||||
to keep edge losses low for the solution of the <Acronym>TSP</Acronym> by means of a <Acronym>GA</Acronym>;
|
||||
</Para>
|
||||
</ListItem>
|
||||
|
||||
o mutation as genetic operator is deprecated so that no repair
|
||||
mechanisms are needed to generate legal TSP tours.
|
||||
<ListItem>
|
||||
<Para>
|
||||
Mutation as genetic operator is deprecated so that no repair
|
||||
mechanisms are needed to generate legal <Acronym>TSP</Acronym> tours.
|
||||
</Para>
|
||||
</ListItem>
|
||||
</ItemizedList>
|
||||
|
||||
The GEQO module gives the following benefits to the PostgreSQL DBMS
|
||||
compared to the Postgres query optimizer implementation:
|
||||
<Para>
|
||||
The <Acronym>GEQO</Acronym> module gives the following benefits to the <ProductName>Postgres</ProductName> DBMS
|
||||
compared to the <ProductName>Postgres</ProductName> query optimizer implementation:
|
||||
|
||||
o handling of large JOIN queries through non-exhaustive search;
|
||||
<ItemizedList Mark="bullet" Spacing="compact">
|
||||
<ListItem>
|
||||
<Para>
|
||||
Handling of large <Command>join</Command> queries through non-exhaustive search;
|
||||
</Para>
|
||||
</ListItem>
|
||||
|
||||
o improved cost size approximation of query plans since no longer
|
||||
plan merging is needed (the GEQO module evaluates the cost for a
|
||||
<ListItem>
|
||||
<Para>
|
||||
Improved cost size approximation of query plans since no longer
|
||||
plan merging is needed (the <Acronym>GEQO</Acronym> module evaluates the cost for a
|
||||
query plan as an individual).
|
||||
</Para>
|
||||
</ListItem>
|
||||
</ItemizedList>
|
||||
|
||||
</Sect1>
|
||||
|
||||
References
|
||||
==========
|
||||
<Sect1>
|
||||
<Title>Future Implementation Tasks for <ProductName>Postgres</ProductName> <Acronym>GEQO</Acronym></Title>
|
||||
|
||||
J. Heitk"otter, D. Beasley:
|
||||
---------------------------
|
||||
"The Hitch-Hicker's Guide to Evolutionary Computation",
|
||||
FAQ in 'comp.ai.genetic',
|
||||
'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html'
|
||||
<Sect2>
|
||||
<Title>Basic Improvements</Title>
|
||||
|
||||
Z. Fong:
|
||||
--------
|
||||
"The Design and Implementation of the Postgres Query Optimizer",
|
||||
file 'planner/Report.ps' in the 'postgres-papers' distribution
|
||||
<Sect3>
|
||||
<Title>Improve freeing of memory when query is already processed</Title>
|
||||
|
||||
R. Elmasri, S. Navathe:
|
||||
-----------------------
|
||||
"Fundamentals of Database Systems",
|
||||
The Benjamin/Cummings Pub., Inc.
|
||||
<Para>
|
||||
With large <Command>join</Command> queries the computing time spent for the genetic query
|
||||
optimization seems to be a mere <Emphasis>fraction</Emphasis> of the time
|
||||
<ProductName>Postgres</ProductName>
|
||||
needs for freeing memory via routine <Function>MemoryContextFree</Function>,
|
||||
file <FileName>backend/utils/mmgr/mcxt.c</FileName>.
|
||||
Debugging showed that it get stucked in a loop of routine
|
||||
<Function>OrderedElemPop</Function>, file <FileName>backend/utils/mmgr/oset.c</FileName>.
|
||||
The same problems arise with long queries when using the normal
|
||||
<ProductName>Postgres</ProductName> query optimization algorithm.
|
||||
|
||||
<Sect3>
|
||||
<Title>Improve genetic algorithm parameter settings</Title>
|
||||
|
||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
||||
* Things left to done for the PostgreSQL *
|
||||
= Genetic Query Optimization (GEQO) =
|
||||
* module implementation *
|
||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
||||
* Martin Utesch * Institute of Automatic Control *
|
||||
= = University of Mining and Technology =
|
||||
* utesch@aut.tu-freiberg.de * Freiberg, Germany *
|
||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
||||
|
||||
|
||||
1.) Basic Improvements
|
||||
===============================================================
|
||||
|
||||
a) improve freeing of memory when query is already processed:
|
||||
-------------------------------------------------------------
|
||||
with large JOIN queries the computing time spent for the genetic query
|
||||
optimization seems to be a mere *fraction* of the time Postgres
|
||||
needs for freeing memory via routine 'MemoryContextFree',
|
||||
file 'backend/utils/mmgr/mcxt.c';
|
||||
debugging showed that it get stucked in a loop of routine
|
||||
'OrderedElemPop', file 'backend/utils/mmgr/oset.c';
|
||||
the same problems arise with long queries when using the normal
|
||||
Postgres query optimization algorithm;
|
||||
|
||||
b) improve genetic algorithm parameter settings:
|
||||
------------------------------------------------
|
||||
file 'backend/optimizer/geqo/geqo_params.c', routines
|
||||
'gimme_pool_size' and 'gimme_number_generations';
|
||||
<Para>
|
||||
In file <FileName>backend/optimizer/geqo/geqo_params.c</FileName>, routines
|
||||
<Function>gimme_pool_size</Function> and <Function>gimme_number_generations</Function>,
|
||||
we have to find a compromise for the parameter settings
|
||||
to satisfy two competing demands:
|
||||
1. optimality of the query plan
|
||||
2. computing time
|
||||
<ItemizedList Spacing="compact">
|
||||
<ListItem>
|
||||
<Para>
|
||||
Optimality of the query plan
|
||||
</Para>
|
||||
</ListItem>
|
||||
<ListItem>
|
||||
<Para>
|
||||
Computing time
|
||||
</Para>
|
||||
</ListItem>
|
||||
</ItemizedList>
|
||||
|
||||
c) find better solution for integer overflow:
|
||||
---------------------------------------------
|
||||
file 'backend/optimizer/geqo/geqo_eval.c', routine
|
||||
'geqo_joinrel_size';
|
||||
the present hack for MAXINT overflow is to set the Postgres integer
|
||||
value of 'rel->size' to its logarithm;
|
||||
modifications of 'struct Rel' in 'backend/nodes/relation.h' will
|
||||
surely have severe impacts on the whole PostgreSQL implementation.
|
||||
<Sect3>
|
||||
<Title>Find better solution for integer overflow</Title>
|
||||
|
||||
d) find solution for exhausted memory:
|
||||
--------------------------------------
|
||||
that may occur with more than 10 relations involved in a query,
|
||||
file 'backend/optimizer/geqo/geqo_eval.c', routine
|
||||
'gimme_tree' which is recursively called;
|
||||
maybe I forgot something to be freed correctly, but I dunno what;
|
||||
of course the 'rel' data structure of the JOIN keeps growing and
|
||||
growing the more relations are packed into it;
|
||||
suggestions are welcome :-(
|
||||
<Para>
|
||||
In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine
|
||||
<Function>geqo_joinrel_size</Function>,
|
||||
the present hack for MAXINT overflow is to set the <ProductName>Postgres</ProductName> integer
|
||||
value of <StructField>rel->size</StructField> to its logarithm.
|
||||
Modifications of <StructName>Rel</StructName> in <FileName>backend/nodes/relation.h</FileName> will
|
||||
surely have severe impacts on the whole <ProductName>Postgres</ProductName> implementation.
|
||||
|
||||
<Sect3>
|
||||
<Title>Find solution for exhausted memory</Title>
|
||||
|
||||
<Para>
|
||||
Memory exhaustion may occur with more than 10 relations involved in a query.
|
||||
In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine
|
||||
<Function>gimme_tree</Function> is recursively called.
|
||||
Maybe I forgot something to be freed correctly, but I dunno what.
|
||||
Of course the <StructName>rel</StructName> data structure of the <Command>join</Command> keeps growing and
|
||||
growing the more relations are packed into it.
|
||||
Suggestions are welcome :-(
|
||||
|
||||
|
||||
2.) Further Improvements
|
||||
===============================================================
|
||||
Enable bushy query tree processing within PostgreSQL;
|
||||
<Sect2>
|
||||
<Title>Further Improvements</Title>
|
||||
|
||||
<Para>
|
||||
Enable bushy query tree processing within <ProductName>Postgres</ProductName>;
|
||||
that may improve the quality of query plans.
|
||||
|
||||
</ProgramListing>
|
||||
<BIBLIOGRAPHY>
|
||||
<TITLE>
|
||||
References
|
||||
</TITLE>
|
||||
<PARA>Reference information for <Acronym>GEQ</Acronym> algorithms.
|
||||
</PARA>
|
||||
<BIBLIOENTRY>
|
||||
|
||||
<BOOKBIBLIO>
|
||||
<TITLE>
|
||||
The Hitch-Hiker's Guide to Evolutionary Computation
|
||||
</TITLE>
|
||||
<AUTHORGROUP>
|
||||
<AUTHOR>
|
||||
<FIRSTNAME>Jörg</FIRSTNAME>
|
||||
<SURNAME>Heitkötter</SURNAME>
|
||||
</AUTHOR>
|
||||
<AUTHOR>
|
||||
<FIRSTNAME>David</FIRSTNAME>
|
||||
<SURNAME>Beasley</SURNAME>
|
||||
</AUTHOR>
|
||||
</AUTHORGROUP>
|
||||
<PUBLISHER>
|
||||
<PUBLISHERNAME>
|
||||
InterNet resource
|
||||
</PUBLISHERNAME>
|
||||
</PUBLISHER>
|
||||
<ABSTRACT>
|
||||
<Para>
|
||||
FAQ in <ULink url="news://comp.ai.genetic">comp.ai.genetic</ULink>
|
||||
is available at <ULink url="ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html">Encore</ULink>.
|
||||
</Para>
|
||||
</ABSTRACT>
|
||||
</BOOKBIBLIO>
|
||||
|
||||
<BOOKBIBLIO>
|
||||
<TITLE>
|
||||
The Design and Implementation of the Postgres Query Optimizer
|
||||
</TITLE>
|
||||
<AUTHORGROUP>
|
||||
<AUTHOR>
|
||||
<FIRSTNAME>Z.</FIRSTNAME>
|
||||
<SURNAME>Fong</SURNAME>
|
||||
</AUTHOR>
|
||||
</AUTHORGROUP>
|
||||
<PUBLISHER>
|
||||
<PUBLISHERNAME>
|
||||
University of California, Berkeley Computer Science Department
|
||||
</PUBLISHERNAME>
|
||||
</PUBLISHER>
|
||||
<ABSTRACT>
|
||||
<Para>
|
||||
File <FileName>planner/Report.ps</FileName> in the 'postgres-papers' distribution.
|
||||
</Para>
|
||||
</ABSTRACT>
|
||||
</BOOKBIBLIO>
|
||||
|
||||
<BOOKBIBLIO>
|
||||
<TITLE>
|
||||
Fundamentals of Database Systems
|
||||
</TITLE>
|
||||
<AUTHORGROUP>
|
||||
<AUTHOR>
|
||||
<FIRSTNAME>R.</FIRSTNAME>
|
||||
<SURNAME>Elmasri</SURNAME>
|
||||
</AUTHOR>
|
||||
<AUTHOR>
|
||||
<FIRSTNAME>S.</FIRSTNAME>
|
||||
<SURNAME>Navathe</SURNAME>
|
||||
</AUTHOR>
|
||||
</AUTHORGROUP>
|
||||
<PUBLISHER>
|
||||
<PUBLISHERNAME>
|
||||
The Benjamin/Cummings Pub., Inc.
|
||||
</PUBLISHERNAME>
|
||||
</PUBLISHER>
|
||||
</BOOKBIBLIO>
|
||||
|
||||
</BIBLIOENTRY>
|
||||
</BIBLIOGRAPHY>
|
||||
|
||||
</Chapter>
|
||||
|
Loading…
Reference in New Issue
Block a user