postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2025-02-17 19:30:00 +08:00

Go to file

Andres Freund b30d3ea824 Add a macro templatized hashtable. dynahash.c hash tables aren't quite fast enough for some use-cases. There are several reasons for lacking performance: - the use of chaining for collision handling makes them cache inefficient, that's especially an issue when the tables get bigger. - as the element sizes for dynahash are only determined at runtime, offset computations are somewhat expensive - hash and element comparisons are indirect function calls, causing unnecessary pipeline stalls - it's two level structure has some benefits (somewhat natural partitioning), but increases the number of indirections to fix several of these the hash tables have to be adjusted to the individual use-case at compile-time. C unfortunately doesn't provide a good way to do compile code generation (like e.g. c++'s templates for all their weaknesses do). Thus the somewhat ugly approach taken here is to allow for code generation using a macro-templatized header file, which generates functions and types based on a prefix and other parameters. Later patches use this infrastructure to use such hash tables for tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation, ...). In queries where these use up a large fraction of the time, this has been measured to lead to performance improvements of over 100%. There are other cases where this could be useful (e.g. catcache.c). The hash table design chosen is a variant of linear open-addressing. The biggest disadvantage of simple linear addressing schemes are highly variable lookup times due to clustering, and deletions leaving a lot of tombstones around. To address these issues a variant of "robin hood" hashing is employed. Robin hood hashing optimizes chaining lengths by moving elements close to their optimal bucket ("rich" elements), out of the way if a to-be-inserted element is further away from its optimal position (i.e. it's "poor"). While that can make insertions slower, the average lookup performance is a lot better, and higher fill factors can be used in a still performant manner. To avoid tombstones - which normally solve the issue that a deleted node's presence is relevant to determine whether a lookup needs to continue looking or is done - buckets following a deleted element are shifted backwards, unless they're empty or already at their optimal position. There's further possible improvements that can be made to this implementation. Amongst others: - Use distance as a termination criteria during searches. This is generally a good idea, but I've been able to see the overhead of distance calculations in some cases. - Consider combining the 'empty' status into the hashvalue, and enforce storing the hashvalue. That could, in some cases, increase memory density and remove a few instructions. - Experiment further with the, very conservatively choosen, fillfactor. - Make maximum size of hashtable configurable, to allow storing very very large tables. That'd require 64bit hash values to be more common than now, though. - some smaller memcpy calls could be optimized to copy larger chunks But since the new implementation is already considerably faster than dynahash it seem sensible to start using it. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>		2016-10-14 16:07:38 -07:00
config	Fix python shlib probe for Cygwin.	2016-10-07 11:27:34 -04:00
contrib	Fix further hash table order dependent tests.	2016-10-12 18:31:45 -07:00
doc	Fix typo.	2016-10-14 09:07:33 +09:00
src	Add a macro templatized hashtable.	2016-10-14 16:07:38 -07:00
.dir-locals.el	emacs: Set indent-tabs-mode in perl-mode	2015-04-12 23:53:23 -04:00
.gitattributes	Fix whitespace and remove obsolete gitattributes entry	2016-03-13 16:03:13 -04:00
.gitignore	Allow .so minor version numbers above 9 in .gitignore.	2016-08-15 17:35:35 -04:00
aclocal.m4	Replace our hacked version of ax_pthread.m4 with latest upstream version.	2015-07-08 20:36:06 +03:00
configure	Remove "sco" and "unixware" ports.	2016-10-11 11:26:04 -04:00
configure.in	Remove "sco" and "unixware" ports.	2016-10-11 11:26:04 -04:00
COPYRIGHT	Update copyright for 2016	2016-01-02 13:33:40 -05:00
GNUmakefile.in	Have "make coverage" recurse into contrib as well	2016-09-05 18:44:36 -03:00
HISTORY	Improve text of stub HISTORY file.	2014-02-12 18:16:17 -05:00
Makefile	Allow make check in PL directories	2011-02-15 06:52:12 +02:00
README	Don't generate plain-text HISTORY and src/test/regress/README anymore.	2014-02-10 20:48:04 -05:00
README.git	Don't generate plain-text HISTORY and src/test/regress/README anymore.	2014-02-10 20:48:04 -05:00

README

PostgreSQL Database Management System
=====================================

This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains C language bindings.

PostgreSQL has many language interfaces, many of which are listed here:

	http://www.postgresql.org/download

See the file INSTALL for instructions on how to build and install
PostgreSQL.  That file also lists supported operating systems and
hardware platforms and contains information regarding any other
software packages that are required to build or run the PostgreSQL
system.  Copyright and license information can be found in the
file COPYRIGHT.  A comprehensive documentation set is included in this
distribution; it can be read as described in the installation
instructions.

The latest version of this software may be obtained at
http://www.postgresql.org/download/.  For more information look at our
web site located at http://www.postgresql.org/.