Fix thinko in estimate_num_groups

The code for the reworked n-distinct estimation on commit 7b504eb282 was
written differently in a previous version of the patch, prior to commit;
on rewriting it, we missed updating an initializer.  This caused the
code to (mistakenly) apply a fudge factor even in the case where a
single value is applied, leading to incorrect results.

This means that the 'relvarcount' variable name is now wrong.  Add a
comment to try and make the situation clearer, and remove an incorrect
comment I added.

Problem noticed, and code patch, by Tomas Vondra.  Additional commentary
by Álvaro.
This commit is contained in:
Alvaro Herrera 2017-03-27 12:52:50 -03:00
parent 827d6f9779
commit 1f171a1803

View File

@ -3404,7 +3404,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
RelOptInfo *rel = varinfo1->rel;
double reldistinct = 1;
double relmaxndistinct = reldistinct;
int relvarcount = 1;
int relvarcount = 0;
List *newvarinfos = NIL;
List *relvarinfos = NIL;
@ -3436,6 +3436,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
* we multiply them together. Any remaining relvarinfos after
* no more multivariate matches are found are assumed independent too,
* so their individual ndistinct estimates are multiplied also.
*
* While iterating, count how many separate numdistinct values we
* apply. We apply a fudge factor below, but only if we multiplied
* more than one such values.
*/
while (relvarinfos)
{
@ -3447,7 +3451,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
reldistinct *= mvndistinct;
if (relmaxndistinct < mvndistinct)
relmaxndistinct = mvndistinct;
relvarcount++; /* inaccurate, but doesn't matter */
relvarcount++;
}
else
{