postgresql/src/include
Tom Lane 20a8595043 Fix misestimation of n_distinct for a nearly-unique column with many nulls.
If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct.  But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are.  This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh.  We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)

In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index.  Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.

Back-patch to all supported branches.  Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.

Patch by me, with documentation wording suggested by Dean Rasheed

Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>
2016-08-07 18:52:02 -04:00
..
access Fix handling of multixacts predating pg_upgrade 2016-06-24 18:29:28 -04:00
bootstrap Fix off-by-one loop count in MapArrayTypeName, and get rid of static array. 2014-12-16 15:35:40 -05:00
catalog Fix misestimation of n_distinct for a nearly-unique column with many nulls. 2016-08-07 18:52:02 -04:00
commands Rework internals of changing a type's ownership 2015-12-21 19:49:15 -03:00
common Add pg_string_endswith as the start of a string helper library in src/common. 2015-01-03 20:54:13 +01:00
datatype Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
executor Fix latent crash in do_text_output_multiline(). 2016-05-23 14:16:41 -04:00
foreign Improve updatability checking for views and foreign tables. 2013-06-12 17:53:33 -04:00
lib Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
libpq Fix incorrect order of lock file removal and failure to close() sockets. 2015-08-02 14:54:44 -04:00
mb Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
nodes Fix assorted fallout from IS [NOT] NULL patch. 2016-07-28 16:09:15 -04:00
optimizer Fix mishandling of equivalence-class tests in parameterized plans. 2016-04-29 20:19:38 -04:00
parser Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
port Improve TranslateSocketError() to handle more Windows error codes. 2016-04-21 16:59:08 -04:00
portability Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
postmaster Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
regex Suppress compiler warnings about useless comparison of unsigned to zero. 2016-02-15 17:11:52 -05:00
replication Make SyncRepWakeQueue to a static function 2015-03-26 10:39:18 +09:00
rewrite Avoid getting more than AccessShareLock when deparsing a query. 2014-03-06 19:31:09 -05:00
snowball Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
storage Fix --disable-spinlocks in 9.2 and 9.3 branches. 2016-04-18 13:19:52 -04:00
tcop Be more careful to not lose sync in the FE/BE protocol. 2015-02-02 17:09:40 +02:00
tsearch Predict integer overflow to avoid buffer overruns. 2014-02-17 09:33:32 -05:00
utils Fix GiST index build for NaN values in geometric types. 2016-07-14 18:46:00 -04:00
.gitignore Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
c.h Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
fmgr.h Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
funcapi.h Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
getaddrinfo.h Fix assorted issues in client host name lookup. 2014-04-02 17:11:27 -04:00
getopt_long.h Update copyrights for 2013 2013-01-01 17:15:01 -05:00
Makefile Install headers from the new src/include/common subdirectory. 2013-02-26 15:27:30 -05:00
miscadmin.h Perform an immediate shutdown if the postmaster.pid file is removed. 2015-10-06 17:15:27 -04:00
pg_config_ext.h.in Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
pg_config_ext.h.win32 Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
pg_config_manual.h Further reduce the number of semaphores used under --disable-spinlocks. 2016-04-18 13:33:07 -04:00
pg_config.h.in Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>. 2016-03-15 13:19:58 -04:00
pg_config.h.win32 Stamp 9.3.13. 2016-05-09 16:53:56 -04:00
pg_trace.h Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pgstat.h Don't reset changes_since_analyze after a selective-columns ANALYZE. 2016-06-06 17:44:17 -04:00
pgtar.h Adopt the GNU convention for handling tar-archive members exceeding 8GB. 2015-11-21 20:21:32 -05:00
pgtime.h Support timezone abbreviations that sometimes change. 2014-10-16 15:22:17 -04:00
port.h Revert error-throwing wrappers for the printf family of functions. 2015-05-19 18:16:58 -04:00
postgres_ext.h Remove tabs after spaces in C comments 2014-05-06 11:26:28 -04:00
postgres_fe.h Create libpgcommon, and move pg_malloc et al to it 2013-02-12 11:21:05 -03:00
postgres.h Adjust DatumGetBool macro, this time for sure. 2016-04-28 11:51:17 -04:00
rusagestub.h Update copyrights for 2013 2013-01-01 17:15:01 -05:00
windowapi.h Update copyrights for 2013 2013-01-01 17:15:01 -05:00