When estimating the number of groups using extended statistics, the code
was discarding information about system attributes. This led to strange
situation that
SELECT 1 FROM t GROUP BY ctid;
could have produced higher estimate (equal to pg_class.reltuples) than
SELECT 1 FROM t GROUP BY a, b, ctid;
with extended statistics on (a,b). Fixed by retaining information about
the system attribute.
Backpatch all the way to 10, where extended statistics were introduced.
Author: Tomas Vondra
Backpatch-through: 10
Membership consists, implicitly, of the current database owner. Expect
use in template databases. Once pg_database_owner has rights within a
template, each owner of a database instantiated from that template will
exercise those rights.
Reviewed by John Naylor.
Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com
The deserialization failed to ensure correct alignment, as it assumed it
can simply point into the serialized value. The serialization however
ignores alignment and copies just the significant bytes in order to make
the result as small as possible. This caused failures on systems that
are sensitive to mialigned addresses, like sparc, or with address
sanitizer enabled.
Fixed by copying the serialized data to ensure proper alignment. While
at it, fix an issue with serialization on big endian machines, using the
same store_att_byval/fetch_att trick as extended statistics.
Discussion: https://postgr.es/0c8c3304-d3dd-5e29-d5ac-b50589a23c8c%40enterprisedb.com
Adds BRIN opclasses similar to the existing minmax, except that instead
of summarizing the page range into a single [min,max] range, the summary
consists of multiple ranges and/or points, allowing gaps. This allows
more efficient handling of data with poor correlation to physical
location within the table and/or outlier values, for which the regular
minmax opclassed tend to work poorly.
It's possible to specify the number of values kept for each page range,
either as a single point or an interval boundary.
CREATE TABLE t (a int);
CREATE INDEX ON t
USING brin (a int4_minmax_multi_ops(values_per_range=16));
When building the summary, the values are combined into intervals with
the goal to minimize the "covering" (sum of interval lengths), using a
support procedure computing distance between two values.
Bump catversion, due to various catalog changes.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
Adds a BRIN opclass using a Bloom filter to summarize the range. Indexes
using the new opclasses allow only equality queries (similar to hash
indexes), but that works fine for data like UUID, MAC addresses etc. for
which range queries are not very common. This also means the indexes
work for data that is not well correlated to physical location within
the table, or perhaps even entirely random (which is a common issue with
existing BRIN minmax opclasses).
It's possible to specify opclass parameters with the usual Bloom filter
parameters, i.e. the desired false-positive rate and the expected number
of distinct values per page range.
CREATE TABLE t (a int);
CREATE INDEX ON t
USING brin (a int4_bloom_ops(false_positive_rate = 0.05,
n_distinct_per_range = 100));
The opclasses do not operate on the indexed values directly, but compute
a 32-bit hash first, and the Bloom filter is built on the hash value.
Collisions should not be a huge issue though, as the number of distinct
values in a page ranges is usually fairly small.
Bump catversion, due to various catalog changes.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: Nico Williams <nico@cryptonector.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
Commit a1c649d889 changed the signature of the BRIN consistent function
by adding a new required parameter. Treating the parameter as optional,
which would make the change backwards incompatibile, was rejected with
the justification that there are few out-of-core extensions, so it's not
worth adding making the code more complex, and it's better to deal with
that in the extension.
But after further thought, that would be rather problematic, because
pg_upgrade simply dumps catalog contents and the same version of an
extension needs to work on both PostgreSQL versions. Supporting both
variants of the consistent function (with 3 or 4 arguments) makes that
possible.
The signature is not the only thing that changed, as commit 72ccf55cb9
moved handling of IS [NOT] NULL keys from the support procedures. But
this change is backward compatible - handling the keys in exension is
unnecessary, but harmless. The consistent function will do a bit of
unnecessary work, but it should be very cheap.
This also undoes most of the changes to the existing opclasses (minmax
and inclusion), making them use the old signature again. This should
make backpatching simpler.
Catversion bump, because of changes in pg_amproc.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
The BRIN minmax opclasses included amproc entries with mismatching left
and right types, but those happen to be unnecessary. The opclasses only
need cross-type operators, not cross-type support procedures. Discovered
when trying to define equivalent BRIN operator families in an extension.
Catversion bump, because of pg_amproc changes.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/78c357ab-3395-8433-e7b3-b2cfcc9fdc23%40enterprisedb.com
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.
Because it runs in two transactions, it cannot be used in a transaction
block. This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it. But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)
In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.
The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command. Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included. As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.
A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
On partition detach, we acquire a SHARE lock on all tables that
reference the partitioned table that we're detaching a partition from,
but failed to document this fact. My oversight in commit f56f8f8da6.
Repair. Backpatch to 12.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20210325180244.GA12738@alvherre.pgsql
Struct AlteredRelationInfo gains a new Relation member, to be used only
by Phase 2 (ATRewriteCatalogs); this allows ATExecCmd() subroutines open
and close the relation internally.
A future commit will use this facility to implement an ALTER TABLE
subcommand that closes and reopens the relation across transaction
boundaries.
(It is possible to keep the relation open past phase 2 to be used by
phase 3 instead of having to reopen it that point, but there are some
minor complications with that; it's not clear that there is much to be
won from doing that, though.)
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
I introduced this duplicate code in commit 8b08f7d482 for no good
reason. Remove it, and backpatch to 11 where it was introduced.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Previously the WAL replay of COMMIT_TS_SETTS record called
TransactionTreeSetCommitTsData() with the argument write_xlog=true,
which generated and wrote new COMMIT_TS_SETTS record.
This should not be acceptable because it's during recovery.
This commit fixes the WAL replay of COMMIT_TS_SETTS record
so that it calls TransactionTreeSetCommitTsData() with write_xlog=false
and doesn't generate new WAL during recovery.
Back-patch to all supported branches.
Reported-by: lx zou <zoulx1982@163.com>
Author: Fujii Masao
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org
Previously when an archive recovery or a standby was starting and
reached the consistent recovery state but hot_standby was configured
to off, the error message when a client connectted was "the database
system is starting up", which was needless confusing and not really
all that accurate either.
This commit improves the connection denied error message during
recovery, as follows, so that the users immediately know that their
servers are configured to deny those connections.
- If hot_standby is disabled, the error message "the database system
is not accepting connections" and the detail message "Hot standby
mode is disabled." are output when clients connect while an archive
recovery or a standby is running.
- If hot_standby is enabled, the error message "the database system
is not yet accepting connections" and the detail message
"Consistent recovery state has not been yet reached." are output
when clients connect until the consistent recovery state is reached
and postmaster starts accepting read only connections.
This commit doesn't change the connection denied error message of
"the database system is starting up" during normal server startup and
crash recovery. Because it's still suitable for those situations.
Author: James Coleman
Reviewed-by: Alvaro Herrera, Andres Freund, David Zhang, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAAaqYe8h5ES_B=F_zDT+Nj9XU7YEwNhKhHA2RE4CFhAQ93hfig@mail.gmail.com
Currently instances of PostgresNode find their Postgres executables in
the PATH of the caller. This modification allows for instances that know
the installation path they are supposed to use, and the module adjusts
the environment of methods that call Postgres executables appropriately.
This facility is activated by passing the installation path to the
constructor:
my $node = PostgresNode->get_new_node('mynode',
installation_path => '/path/to/installation');
This makes a number of things substantially easier, including
. testing third party modules
. testing different versions of postgres together
. testing different builds of postgres together
Discussion: https://postgr.es/m/a94c74f9-6b71-1957-7973-a734ea3cbef1@dunslane.net
Reviewed-By: Alvaro Herrera, Michael Paquier, Dagfinn Ilmari Mannsåker
Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long. This
change updates the default to 0.9 (from 0.5) to match that
recommendation.
There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.
Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
Change the default_toast_compression GUC to be an enum rather than
a string. Earlier, uncommitted versions of the patch supported using
CREATE ACCESS METHOD to add new compression methods to a running
system, but that idea was dropped before commit. So, we can simplify
the GUC handling as well, which has the nice side effect of improving
the error messages.
While updating the documentation to reflect the new GUC type, also
move it back to the right place in the list. I moved this while
revising what became commit 24f0e395ac,
but apparently the intended ordering is "alphabetical" rather than
"whatever Robert thinks looks nice."
Rejigger things to avoid having access/toast_compression.h depend on
utils/guc.h, so that we don't end up with every file that includes
it also depending on something largely unrelated. Move a few
inline functions back into the C source file partly to help reduce
dependencies and partly just to avoid clutter. A few very minor
cosmetic fixes.
Original patch by Justin Pryzby, but very heavily edited by me,
and reverse reviewed by him and also reviewed by by Tom Lane.
Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.
Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.
List of commits reverted, in reverse chronological order:
ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
Commit de829ddf23 added wait event WalrcvExit. But its name is not
consistent with other wait events like WalReceiverMain or
WalReceiverWaitStart, etc. So this commit renames WalrcvExit to
WalReceiverExit.
Author: Fujii Masao
Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/cced9995-8fa2-7b22-9d91-3f22a2b8c23c@oss.nttdata.com
GetNewOidWithIndex() generates a new OID one by one until it finds
one not in the relation. If there are very long runs of consecutive
existing OIDs, GetNewOidWithIndex() needs to iterate many times
in the loop to find unused OID. Since TOAST table can have a large
number of entries and there can be such long runs of OIDs, there is
the case where it takes so many iterations to find new OID not in
TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
GetNewOidWithIndex() enters something like busy loop and repeats
the iterations until at least one OID is marked as unused.
There are some reported troubles caused by a large number of
iterations in GetNewOidWithIndex(). For example, when inserting
a billion of records into the table, all the backends doing that
insertion operation got hang with 100% CPU usage at some point.
Previously there was no easy way to detect that GetNewOidWithIndex()
failed to find unused OID many times. So, for example, gdb full
backtrace of hanged backends needed to be taken, in order to
investigate that trouble. This is inconvenient and may not be
available in some production environments.
To provide easy way for that, this commit makes GetNewOidWithIndex()
log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
not yet found OID unused in the relation. Also this commit makes
it repeat logging with exponentially increasing intervals until
it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
an unused OID is found. Those macro variables are used not to
fill up the server log with the similar messages.
In the discusion at pgsql-hackers, there was another idea to report
the lots of iterations in GetNewOidWithIndex() via wait event.
But since GetNewOidWithIndex() traverses indexes to find unused
OID and which will do I/O, acquire locks, etc, which will overwrite
the wait event and reset it to nothing once done. So that idea
doesn't work well, and we didn't adopt it.
Author: Tomohiro Hiramitsu
Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
When resolving types during catalog bootstrap, try to reload the pg_type
contents if a type is not found. That allows catalogs to contain
composite types, e.g. row types for other catalogs.
Author: Justin Pryzby
Reviewed-by: Dean Rasheed, Tomas Vondra
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
It's a bit easier and more convenient to free and reload a List,
compared to a plain array. This will be helpful when allowing catalogs
to contain composite types.
Author: Justin Pryzby
Reviewed-by: Dean Rasheed, Tomas Vondra
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
Teach nbtree VACUUM to press on with vacuuming in the event of a page
deletion attempt that fails to "re-find" a downlink for its child/target
page.
There is no good reason to treat this as an irrecoverable error. But
there is a good reason not to: pressing on at this point removes any
question of VACUUM not making progress solely due to misbehavior from
user-defined operator class code.
Discussion: https://postgr.es/m/CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com
Disable autovacuum, because we don't want it to run against
intentionally corrupted tables. Also, before corrupting the tables,
run pg_amcheck and ensure that it passes. Otherwise, if something
unexpected happens when we check the corrupted tables, it's not so
clear whether it would have also happened before we corrupted
them.
Mark Dilger
Discussion: http://postgr.es/m/AA5506CE-7D2A-42E4-A51D-358635E3722D@enterprisedb.com
Jasen Betts reported yet another unintended side effect of commit
85c54287a: reconnecting with "\c service=whatever" did not have the
expected results. The reason is that starting from the output of
PQconndefaults() effectively allows environment variables (such
as PGPORT) to override entries in the service file, whereas the
normal priority is the other way around.
Not using PQconndefaults at all would require yet a third main code
path in do_connect's parameter setup, so I don't really want to fix
it that way. But we can have the logic effectively ignore all the
default values for just a couple more lines of code.
This patch doesn't change the behavior for "\c -reuse-previous=on
service=whatever". That remains significantly different from before
85c54287a, because many more parameters will be re-used, and thus
not be possible for service entries to replace. But I think this
is (mostly?) intentional. In any case, since libpq does not report
where it got parameter values from, it's hard to do differently.
Per bug #16936 from Jasen Betts. As with the previous patches,
back-patch to all supported branches. (9.5 is unfortunately now
out of support, so this won't get fixed there.)
Discussion: https://postgr.es/m/16936-3f524322a53a29f0@postgresql.org
end_heap_rewrite was not careful to ensure that the target relation
is open at the smgr level before performing its final smgrimmedsync.
In ordinary cases this is no problem, because it would have been
opened earlier during the rewrite. However a crash can be reproduced
by re-clustering an empty table with CLOBBER_CACHE_ALWAYS enabled.
Although that exact scenario does not crash in v13, I think that's
a chance result of unrelated planner changes, and the problem is
likely still reachable with other test cases. The true proximate
cause of this failure is commit c6b92041d, which replaced a call to
heap_sync (which was careful about opening smgr) with a direct call
to smgrimmedsync. Hence, back-patch to v13.
Amul Sul, per report from Neha Sharma; cosmetic changes
and test case by me.
Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
This has previously not been a problem (that anyone ever reported),
but in future OpenSSL versions (3.0.0), where legacy ciphers are/can
be disabled, this is the place where this is reported. So we need to
catch the error here, otherwise the higher-level functions would
return garbage. The nearby encryption code already handled errors
similarly.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/9e9c431c-0adc-7a6d-9b1a-915de1ba3fe7@enterprisedb.com
This function for bit and bytea counts the set bits in the bit or byte
string. Internally, we use the existing popcount functionality.
For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.
Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
Once a relation's autovacuum is completed, the logs include more
information about this relation state if the threshold of
log_autovacuum_min_duration (or its relation option) is reached, with
for example contents about the statistics of the VACUUM operation for
the relation, WAL and system usage.
This commit adds more information about the statistics of the relation's
indexes, with one line of logs generated for each index. The index
stats were already calculated, but not printed in the context of
autovacuum yet. While on it, some refactoring is done to keep track of
the index statistics directly within LVRelStats, simplifying some
routines related to parallel VACUUMs.
Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.
Backpatch-through: 10
Previously the type of this wait event was Client. But while this
wait event is being reported, walreceiver process is waiting for
the startup process to set initial data for streaming replication.
It's not waiting for any activity on a socket connected to a user
application or walsender. So this commit changes the type for
WalReceiverWaitStart wait event to IPC.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com
pg_waldump --stats=record identifies a record by a combination
of the RmgrId and the four bits of the xl_info field of the record.
But XACT records use the first bit of those four bits for an optional
flag variable, and the following three bits for the opcode to
identify a record. So previously the same type of XACT record
could have different four bits (three bits are the same but the
first one bit is different), and which could cause
pg_waldump --stats=record to show two lines of per-record statistics
for the same XACT record. This is a bug.
This commit changes pg_waldump --stats=record so that it processes
only XACT record differently, i.e., filters the opcode out of xl_info
and uses a combination of the RmgrId and those three bits as
the identifier of a record, only for XACT record. For other records,
the four bits of the xl_info field are still used.
Back-patch to all supported branches.
Author: Kyotaro Horiguchi
Reviewed-by: Shinya Kato, Fujii Masao
Discussion: https://postgr.es/m/2020100913412132258847@highgo.ca
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code. This matches other RelationIs* macros.
This macro will be used in more places in future cluster file encryption
patches.
Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
The bringetbitmap function allocates memory for various purposes, which
may be quite expensive, depending on the number of scan keys. Instead of
allocating them separately, allocate one bit chunk of memory an carve it
into smaller pieces as needed - all the pieces have the same lifespan,
and it saves quite a bit of CPU and memory overhead.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
The handling of IS [NOT] NULL clauses is independent of an opclass, and
most of the code was exactly the same in both minmax and inclusion. So
instead move the code from support procedures to the AM.
This simplifies the code - especially the support procedures - quite a
bit, as they don't need to care about NULL values and flags at all. It
also means the IS [NOT] NULL clauses can be evaluated without invoking
the support procedure.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
This commit changes how we pass scan keys to BRIN consistent function.
Instead of passing them one by one, we now pass all scan keys for a
given attribute at once. That makes the consistent function a bit more
complex, as it has to loop through the keys, but it does allow more
elaborate opclasses that can use multiple keys to eliminate ranges much
more effectively.
The existing BRIN opclasses (minmax, inclusion) don't really benefit
from this change. The primary purpose is to allow future opclases to
benefit from seeing all keys at once.
This does change the BRIN API, because the signature of the consistent
function changes (a new parameter with number of scan keys). So this
breaks existing opclasses, and will require supporting two variants of
the code for different PostgreSQL versions. We've considered supporting
two variants of the consistent, but we've decided not to do that.
Firstly, there's another patch that moves handling of NULL values from
the opclass, which means the opclasses need to be updated anyway.
Secondly, we're not aware of any out-of-core BRIN opclasses, so it does
not seem worth the extra complexity.
Bump catversion, because of pg_proc changes.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com