diff --git a/doc/src/sgml/geqo.sgml b/doc/src/sgml/geqo.sgml
index 725504c28f..61abf13ca4 100644
--- a/doc/src/sgml/geqo.sgml
+++ b/doc/src/sgml/geqo.sgml
@@ -3,78 +3,103 @@
Martin
Utesch
+
+
+University of Mining and Technology
+
+
+Institute of Automatic Control
+
+
+
+Freiberg
+
+
+Germany
+
+
+
+1997-10-02
Genetic Query Optimization in Database Systems
-
-Martin Utesch
+
+Author
+
+Written by Martin Utesch
+for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
+
+
- Institute of Automatic Control
- University of Mining and Technology
- Freiberg, Germany
-
- 02/10/1997
-
-
-1.) Query Handling as a Complex Optimization Problem
-====================================================
+
+Query Handling as a Complex Optimization Problem
+
Among all relational operators the most difficult one to process and
-optimize is the JOIN. The number of alternative plans to answer a query
-grows exponentially with the number of JOINs included in it. Further
-optimization effort is caused by the support of a variety of *JOIN
-methods* (e.g., nested loop, index scan, merge join in Postgres) to
-process individual JOINs and a diversity of *indices* (e.g., r-tree,
-b-tree, hash in Postgres) as access paths for relations.
+optimize is the join. The number of alternative plans to answer a query
+grows exponentially with the number of joins included in it. Further
+optimization effort is caused by the support of a variety of join methods
+ (e.g., nested loop, index scan, merge join in Postgres) to
+process individual joins and a diversity of indices (e.g., r-tree,
+b-tree, hash in Postgres) as access paths for relations.
- The current Postgres optimizer implementation performs a *near-
-exhaustive search* over the space of alternative strategies. This query
+
+ The current Postgres optimizer implementation performs a near-
+exhaustive search over the space of alternative strategies. This query
optimization technique is inadequate to support database application
domains that involve the need for extensive queries, such as artificial
intelligence.
+
The Institute of Automatic Control at the University of Mining and
Technology, in Freiberg, Germany, encountered the described problems as its
-folks wanted to take the Postgres DBMS as the backend for a decision
+folks wanted to take the Postgres DBMS as the backend for a decision
support knowledge based system for the maintenance of an electrical
-power grid. The DBMS needed to handle large JOIN queries for the
+power grid. The DBMS needed to handle large join queries for the
inference machine of the knowledge based system.
+
Performance difficulties within exploring the space of possible query
plans arose the demand for a new optimization technique being developed.
- In the following we propose the implementation of a *Genetic
-Algorithm* as an option for the database query optimization problem.
+
+ In the following we propose the implementation of a Genetic Algorithm
+ as an option for the database query optimization problem.
-2.) Genetic Algorithms (GA)
-===========================
+
+Genetic Algorithms (GA)
- The GA is a heuristic optimization method which operates through
+
+ The GA is a heuristic optimization method which operates through
determined, randomized search. The set of possible solutions for the
-optimization problem is considered as a *population* of *individuals*.
+optimization problem is considered as a population of individuals.
The degree of adaption of an individual to its environment is specified
-by its *fitness*.
+by its fitness.
+
The coordinates of an individual in the search space are represented
-by *chromosomes*, in essence a set of character strings. A *gene* is a
+by chromosomes, in essence a set of character strings. A gene is a
subsection of a chromosome which encodes the value of a single parameter
-being optimized. Typical encodings for a gene could be *binary* or
-*integer*.
+being optimized. Typical encodings for a gene could be binary or
+integer.
- Through simulation of the evolutionary operations *recombination*,
-*mutation*, and *selection* new generations of search points are found
+
+ Through simulation of the evolutionary operations recombination,
+mutation, and selection new generations of search points are found
that show a higher average fitness than their ancestors.
- According to the "comp.ai.genetic" FAQ it cannot be stressed too
-strongly that a GA is not a pure random search for a solution to a
-problem. A GA uses stochastic processes, but the result is distinctly
+
+ According to the "comp.ai.genetic" FAQ it cannot be stressed too
+strongly that a GA is not a pure random search for a solution to a
+problem. A GA uses stochastic processes, but the result is distinctly
non-random (better than random).
-Structured Diagram of a GA:
+
+Structured Diagram of a GA:
---------------------------
P(t) generation of ancestors at a time t
@@ -101,128 +126,233 @@ P''(t) generation of descendants at a time t
| +-------------------------------------+
| | t := t + 1 |
+===+=====================================+
+
+
+Genetic Query Optimization (GEQO) in Postgres
-3.) Genetic Query Optimization (GEQO) in PostgreSQL
-===================================================
-
- The GEQO module is intended for the solution of the query
-optimization problem similar to a traveling salesman problem (TSP).
+
+ The GEQO module is intended for the solution of the query
+optimization problem similar to a traveling salesman problem (TSP).
Possible query plans are encoded as integer strings. Each string
-represents the JOIN order from one relation of the query to the next.
-E. g., the query tree /\
- /\ 2
- /\ 3
- 4 1 is encoded by the integer string '4-1-3-2',
+represents the join order from one relation of the query to the next.
+E. g., the query tree
+
+ /\
+ /\ 2
+ /\ 3
+ 4 1
+
+is encoded by the integer string '4-1-3-2',
which means, first join relation '4' and '1', then '3', and
-then '2', where 1, 2, 3, 4 are relids in PostgreSQL.
+then '2', where 1, 2, 3, 4 are relids in Postgres.
- Parts of the GEQO module are adapted from D. Whitley's Genitor
+
+ Parts of the GEQO module are adapted from D. Whitley's Genitor
algorithm.
- Specific characteristics of the GEQO implementation in PostgreSQL
+
+ Specific characteristics of the GEQO implementation in Postgres
are:
-o usage of a *steady state* GA (replacement of the least fit
+
+
+
+Usage of a steady state GA (replacement of the least fit
individuals in a population, not whole-generational replacement)
allows fast convergence towards improved query plans. This is
essential for query handling with reasonable time;
+
+
-o usage of *edge recombination crossover* which is especially suited
- to keep edge losses low for the solution of the TSP by means of a GA;
+
+
+Usage of edge recombination crossover which is especially suited
+ to keep edge losses low for the solution of the TSP by means of a GA;
+
+
-o mutation as genetic operator is deprecated so that no repair
- mechanisms are needed to generate legal TSP tours.
+
+
+Mutation as genetic operator is deprecated so that no repair
+ mechanisms are needed to generate legal TSP tours.
+
+
+
- The GEQO module gives the following benefits to the PostgreSQL DBMS
-compared to the Postgres query optimizer implementation:
+
+ The GEQO module gives the following benefits to the Postgres DBMS
+compared to the Postgres query optimizer implementation:
-o handling of large JOIN queries through non-exhaustive search;
+
+
+
+Handling of large join queries through non-exhaustive search;
+
+
-o improved cost size approximation of query plans since no longer
- plan merging is needed (the GEQO module evaluates the cost for a
+
+
+Improved cost size approximation of query plans since no longer
+ plan merging is needed (the GEQO module evaluates the cost for a
query plan as an individual).
+
+
+
+
-References
-==========
+
+Future Implementation Tasks for Postgres GEQO
-J. Heitk"otter, D. Beasley:
----------------------------
- "The Hitch-Hicker's Guide to Evolutionary Computation",
- FAQ in 'comp.ai.genetic',
- 'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html'
+
+Basic Improvements
-Z. Fong:
---------
- "The Design and Implementation of the Postgres Query Optimizer",
- file 'planner/Report.ps' in the 'postgres-papers' distribution
+
+Improve freeing of memory when query is already processed
-R. Elmasri, S. Navathe:
------------------------
- "Fundamentals of Database Systems",
- The Benjamin/Cummings Pub., Inc.
+
+With large join queries the computing time spent for the genetic query
+optimization seems to be a mere fraction of the time
+ Postgres
+needs for freeing memory via routine MemoryContextFree,
+file backend/utils/mmgr/mcxt.c.
+Debugging showed that it get stucked in a loop of routine
+OrderedElemPop, file backend/utils/mmgr/oset.c.
+The same problems arise with long queries when using the normal
+Postgres query optimization algorithm.
+
+Improve genetic algorithm parameter settings
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
-* Things left to done for the PostgreSQL *
-= Genetic Query Optimization (GEQO) =
-* module implementation *
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
-* Martin Utesch * Institute of Automatic Control *
-= = University of Mining and Technology =
-* utesch@aut.tu-freiberg.de * Freiberg, Germany *
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
-
-
-1.) Basic Improvements
-===============================================================
-
-a) improve freeing of memory when query is already processed:
--------------------------------------------------------------
-with large JOIN queries the computing time spent for the genetic query
-optimization seems to be a mere *fraction* of the time Postgres
-needs for freeing memory via routine 'MemoryContextFree',
-file 'backend/utils/mmgr/mcxt.c';
-debugging showed that it get stucked in a loop of routine
-'OrderedElemPop', file 'backend/utils/mmgr/oset.c';
-the same problems arise with long queries when using the normal
-Postgres query optimization algorithm;
-
-b) improve genetic algorithm parameter settings:
-------------------------------------------------
-file 'backend/optimizer/geqo/geqo_params.c', routines
-'gimme_pool_size' and 'gimme_number_generations';
+
+In file backend/optimizer/geqo/geqo_params.c, routines
+gimme_pool_size and gimme_number_generations,
we have to find a compromise for the parameter settings
to satisfy two competing demands:
-1. optimality of the query plan
-2. computing time
+
+
+
+Optimality of the query plan
+
+
+
+
+Computing time
+
+
+
-c) find better solution for integer overflow:
----------------------------------------------
-file 'backend/optimizer/geqo/geqo_eval.c', routine
-'geqo_joinrel_size';
-the present hack for MAXINT overflow is to set the Postgres integer
-value of 'rel->size' to its logarithm;
-modifications of 'struct Rel' in 'backend/nodes/relation.h' will
-surely have severe impacts on the whole PostgreSQL implementation.
+
+Find better solution for integer overflow
-d) find solution for exhausted memory:
---------------------------------------
-that may occur with more than 10 relations involved in a query,
-file 'backend/optimizer/geqo/geqo_eval.c', routine
-'gimme_tree' which is recursively called;
-maybe I forgot something to be freed correctly, but I dunno what;
-of course the 'rel' data structure of the JOIN keeps growing and
-growing the more relations are packed into it;
-suggestions are welcome :-(
+
+In file backend/optimizer/geqo/geqo_eval.c, routine
+geqo_joinrel_size,
+the present hack for MAXINT overflow is to set the Postgres integer
+value of rel->size to its logarithm.
+Modifications of Rel in backend/nodes/relation.h will
+surely have severe impacts on the whole Postgres implementation.
+
+
+Find solution for exhausted memory
+
+
+Memory exhaustion may occur with more than 10 relations involved in a query.
+In file backend/optimizer/geqo/geqo_eval.c, routine
+gimme_tree is recursively called.
+Maybe I forgot something to be freed correctly, but I dunno what.
+Of course the rel data structure of the join keeps growing and
+growing the more relations are packed into it.
+Suggestions are welcome :-(
-2.) Further Improvements
-===============================================================
-Enable bushy query tree processing within PostgreSQL;
+
+Further Improvements
+
+
+Enable bushy query tree processing within Postgres;
that may improve the quality of query plans.
-
+
+
+References
+
+Reference information for GEQ algorithms.
+
+
+
+
+
+The Hitch-Hiker's Guide to Evolutionary Computation
+
+
+
+Jörg
+Heitkötter
+
+
+David
+Beasley
+
+
+
+
+InterNet resource
+
+
+
+
+FAQ in comp.ai.genetic
+is available at Encore.
+
+
+
+
+
+The Design and Implementation of the Postgres Query Optimizer
+
+
+
+Z.
+Fong
+
+
+
+
+University of California, Berkeley Computer Science Department
+
+
+
+
+File planner/Report.ps in the 'postgres-papers' distribution.
+
+
+
+
+
+
+Fundamentals of Database Systems
+
+
+
+R.
+Elmasri
+
+
+S.
+Navathe
+
+
+
+
+The Benjamin/Cummings Pub., Inc.
+
+
+
+
+
+
+