4904 Commits

Author SHA1 Message Date
Peter Eisentraut
c55df7c6ea Fix incorrect format placeholders
BlockNumber is unsigned int.  Fix for commit 14ffaece0fb.
2025-04-14 08:56:33 +02:00
Tom Lane
e708ffe79d Fix GIN's shimTriConsistentFn to not corrupt its input.
Commit 0f21db36d made an assumption that GIN triConsistentFns
would not modify their input entryRes[] arrays.  But in fact,
the "shim" triConsistentFn that we use for opclasses that don't
supply their own did exactly that, potentially leading to wrong
answers from a GIN index search.  Through bad luck, none of the
test cases that we have for such opclasses exposed the bug.

One response to this could be that the assumption of consistency check
functions not modifying entryRes[] arrays is a bad one, but it still
seems reasonable to me.  Notably, shimTriConsistentFn is itself
assuming that with respect to the underlying boolean consistentFn,
so it's sure being self-centered in supposing that it gets to do so.

Fortunately, it's quite simple to fix shimTriConsistentFn to restore
the entry-time state of entryRes[], so let's do that instead.

This issue doesn't affect any core GIN opclasses, since they all
supply their own triConsistentFns.  It does affect contrib modules
btree_gin, hstore, and intarray.

Along the way, I (tgl) noticed that shimTriConsistentFn failed to
pick up on a "recheck" flag returned by its first call to the boolean
consistentFn.  This may be only a latent problem, since it would be
unlikely for a consistentFn to set recheck for the all-false case
and not any other cases.  (Indeed, none of our contrib modules do
that.)  Nonetheless, it's formally wrong.

Reported-by: Vinod Sridharan <vsridh90@gmail.com>
Author: Vinod Sridharan <vsridh90@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com
Backpatch-through: 13
2025-04-12 12:28:02 -04:00
Peter Geoghegan
a6cab6a78e Harmonize function parameter names for Postgres 18.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 18 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2025-04-12 12:07:36 -04:00
David Rowley
928394b664 Improve various new-to-v18 appendStringInfo calls
Similar to 8461424fd, here we adjust a few new locations which were not
using the most suitable appendStringInfo* function for the intended
purpose.

Author: David Rowley <drowleyml@gmail.com
Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com
2025-04-11 10:07:22 +12:00
Amit Kapila
4909b38af0 Fix data loss in logical replication.
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ...  that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.

This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.

We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.

The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.

Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
2025-04-10 13:14:40 +05:30
Tomas Vondra
3887d0cfeb Cleanup of pg_numa.c
This moves/renames some of the functions defined in pg_numa.c:

* pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and
  moved to src/backend/storage/ipc/shmem.c. The new name better reflects
  that the page size is not related to NUMA, and it's specifically about
  the page size used for the main shared memory segment.

* move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into
  the backend (which more appropriate for functions callable from SQL).
  While at it, improve the comment to explain what page size it returns.

* remove unnecessary includes from src/port/pg_numa.c, adding
  unnecessary dependencies (src/port should be suitable for frontent).
  These were either leftovers or unnecessary thanks to the other changes
  in this commit.

This eliminates unnecessary dependencies on backend symbols, which we
don't want in src/port.

Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-09 21:50:17 +02:00
Tom Lane
b1720fe63f Move contrib/spi testing from core regression tests to contrib/spi.
It's weird to have the core regression tests depending on contrib
code, and coverage testing shows that those test queries add nothing
to the core-code coverage of the core tests.  So pull those test bits
out and put them into ordinary test scripts inside contrib/spi/,
making that more like other contrib modules.

Aside from being structurally nicer, anything we can take out of the
core tests (which are executed multiple times per check-world run)
and put into tests executed only once should be a win.  It doesn't
look like this change will buy a whole lot of milliseconds, but a
cycle saved is a cycle earned.

Also, there is some discussion around possibly removing refint and/or
autoinc altogether.  I don't know if that will happen, but we'd
certainly need to decouple them from the core tests to do so.

The tests for autoinc were quite intertwined with the undocumented
"ttdummy" trigger in regress.c.  That made the tests very hard to
understand and contributed nothing to autoinc's testing either.
So I just deleted ttdummy and rewrote the autoinc tests without it.

I realized while doing this that the description of autoinc in
the SGML docs is not a great description of what the function
actually does, so the patch includes some updates to those docs.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/3872677.1744077559@sss.pgh.pa.us
2025-04-08 19:12:03 -04:00
Peter Eisentraut
8969194b73 Fix incorrect format placeholder
for commit 749a9e20c97
2025-04-08 19:12:03 +02:00
Tomas Vondra
91f1fe90c7 pg_buffercache: Change page_num type to bigint
The page_num was defined as integer, which should be sufficient for the
near future (with 4K pages it's 8TB). But it's virtually free to return
bigint, and get a wider range. This was agreed on the thread, but I
forgot to tweak this in ba2a3c2302f1.

While at it, make the data types in CREATE VIEW a bit more consistent.

Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co
2025-04-08 12:38:42 +02:00
Andres Freund
dcf7e1697b Add pg_buffercache_evict_{relation,all} functions
In addition to the added functions, the pg_buffercache_evict() function now
shows whether the buffer was flushed.

pg_buffercache_evict_relation(): Evicts all shared buffers in a
relation at once.
pg_buffercache_evict_all(): Evicts all shared buffers at once.

Both functions provide mechanism to evict multiple shared buffers at
once. They are designed to address the inefficiency of repeatedly calling
pg_buffercache_evict() for each individual buffer, which can be time-consuming
when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for
16GB of fully populated shared buffers).

These functions are intended for developer testing and debugging
purposes and are available to superusers only.

Minimal tests for the new functions are included. Also, there was no test for
pg_buffercache_evict(), test for this added too.

No new extension version is needed, as it was already increased this release
by ba2a3c2302f.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
2025-04-08 02:19:32 -04:00
David Rowley
d69d45a5a9 Speedup child EquivalenceMember lookup in planner
When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition.  For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow.  Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.

To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members.  Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid.  When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.

To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required.  Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.

The actual performance increases here are highly dependent on the number
of partitions and the query being planned.  Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions.  The speedups become much more visible
with a few dozen to hundreds of partitions.  With some tested queries
using 56 partitions, the planner was around 3x faster than before.  For
use cases with thousands of partitions, these are likely to become
significantly faster.  Some testing has shown planner speedups of 60x or
more with 8192 partitions.

Author: Yuya Watari <watari.yuya@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Tested-by: Thom Brown <thom@linux.com>
Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
2025-04-08 18:09:57 +12:00
Tomas Vondra
ba2a3c2302 Add pg_buffercache_numa view with NUMA node info
Introduces a new view pg_buffercache_numa, showing NUMA memory nodes
for individual buffers. For each buffer the view returns an entry for
each memory page, with the associated NUMA node.

The database blocks and OS memory pages may have different size - the
default block size is 8KB, while the memory page is 4K (on x86). But
other combinations are possible, depending on configure parameters,
platform, etc. This means buffers may overlap with multiple memory
pages, each associated with a different NUMA node.

To determine the NUMA node for a buffer, we first need to touch the
memory pages using pg_numa_touch_mem_if_required, otherwise we might get
status -2 (ENOENT = The page is not present), indicating the page is
either unmapped or unallocated.

The view may be relatively expensive, especially when accessed for the
first time in a backend, as it touches all memory pages to get reliable
information about the NUMA node. This may also force allocation of the
shared memory.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tom Lane
8cfbdf8f4d Fix some issues in contrib/spi/refint.c.
check_foreign_key incorrectly used a single cache entry for its saved
plans for a 'c' (cascade) trigger, although there are two different
queries to execute depending on whether it fires for an update or a
delete.  This caused the wrong things to be done if both types of
event occur in one session.  (This was indeed visible in the triggers
regression test, but apparently nobody ever questioned it.)  To fix,
add the operation type to the cache key.

Its debug log output failed to distinguish update from delete
events, too.

Also, change the intended trigger usage from BEFORE ROW to AFTER ROW,
and add checks insisting on that usage.  BEFORE is really rather
unsafe, since if there are other BEFORE triggers they might change or
cancel the operation we are trying to check.  AFTER triggers are the
standard way to propagate changes to other rows, so we should follow
that way here.

In passing, remove a useless duplicate lookup of the cache entry.

This code is mostly intended as a documentation example, so we
won't consider a back-patch.

Author: Dmitrii Bondar <d.bondar@postgrespro.ru>
Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Lilian Ontowhee <ontowhee@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/79755a2b18ed4fe5e29da6a87a1e00d1@postgrespro.ru
2025-04-07 15:54:16 -04:00
Tom Lane
969ab9d4f5 Follow-up fixes for SHA-2 patch (commit 749a9e20c).
This changes the check for valid characters in the salt string to
only allow plain ASCII letters and digits.  The previous coding was
locale-dependent which doesn't really seem like a great idea here;
moreover it could not work correctly in multibyte encodings.

This fixes a careless pointer-use-after-pfree, too.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://postgr.es/m/6fab35422df6b6b9727fdcc243c5fa1c667dd3b5.camel@oopsware.de
2025-04-07 14:14:28 -04:00
Tom Lane
b73e6d71a8 Fix erroneous construction of functions' dependencies on transforms.
The list of transform objects that a function should use is specified
in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly
in pg_proc.protrftypes.  However, ProcedureCreate completely ignored
that for purposes of constructing pg_depend entries, and instead made
the function depend on any transforms that exist for its parameter or
return data types.  This is bad in both directions: the function could
be made dependent on a transform it does not actually use, or it
could try to use a transform that's since been dropped.  (The latter
scenario would require use of a transform that's not for any of the
parameter or return types, but that seems legit for cases where the
function performs SQL operations internally.)

To fix, pass in the list of transform objects that CreateFunction
identified, and build pg_depend entries from that not from the
parameter/return types.  This results in changes in the expected
test outputs in contrib/bool_plperl, which I guess are due to
different ordering of pg_depend entries -- that test case is
surely not exercising either of the problem scenarios.

This fix is not back-patchable as-is: changing the signature of
ProcedureCreate seems too risky in stable branches.  We could
do something like making ProcedureCreate a wrapper around
ProcedureCreateExt or so.  However, I'm more inclined to do
nothing in the back branches.  We had no field complaints up to
now, so the hazards don't seem to be a big issue in practice.
And we couldn't do anything about existing pg_depend entries,
so a back-patched fix would result in a mishmash of dependencies
created according to different rules.  That cure could be worse
than the disease, perhaps.

I bumped catversion just to lay down a marker that the expected
contents of pg_depend are a bit different than before.

Reported-by: Chapman Flack <jcflack@acm.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us
2025-04-07 13:31:37 -04:00
Tom Lane
8ab6ef2bb8 Fix memory leaks in px_crypt_shacrypt().
Per Coverity.  I don't think these are of any actual significance
since the function ought to be invoked in a short-lived context.
Still, if it's trying to be neat it should get it right.

Also const-ify a constant and fix up typedef formatting.
2025-04-06 11:57:22 -04:00
Álvaro Herrera
749a9e20c9
Add modern SHA-2 based password hashes to pgcrypto.
This adapts the publicly available reference implementation on
https://www.akkadia.org/drepper/SHA-crypt.txt and adds the new hash
algorithms sha256crypt and sha512crypt to crypt() and gen_salt()
respectively.

Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/c763235a2757e2f5f9e3e27268b9028349cef659.camel@oopsware.de
2025-04-05 19:17:13 +02:00
Melanie Plageman
d9c7911e1a Use streaming read I/O in autoprewarm
Make a read stream for each valid fork of each valid relation
represented in the autoprewarm dump file and prewarm those blocks
through the read stream API instead of by directly invoking
ReadBuffer().

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> (earlier versions)
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>  (earlier versions)
Reviewed-by: Matheus Alcantara <mths.dev@pm.me> (earlier versions)
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
2025-04-04 15:28:54 -04:00
Melanie Plageman
6acab8bdbc Refactor autoprewarm_database_main() in preparation for read stream
Autoprewarm prewarms blocks from a dump file representing the contents
of shared buffers at the time it was dumped. It uses a sorted array of
BlockInfoRecords, each representing a block from one of the cluster's
databases and tables.

autoprewarm_database_main() prewarms all the blocks from a single
database. It is optimized to ensure we don't try to open the same
relation or fork over and over again if it has been dropped or is
invalid. The main loop handled this by carefully setting various local
variables to sentinel values when a run of blocks should be skipped.

This method won't work with the read stream API. The read stream
callback must be able to advance the current position in the
BlockInfoRecord array to allow for reading ahead additional blocks,
however a read stream maps 1-1 with a relation and fork combination. So,
the main loop in autoprewarm_database_main() must also advance the
position in the array of BlockInfoRecords to skip invalid relations and
forks. This split control doesn't fit well with the current flow control
in autoprewarm_database_main()

To make it compatible with the read stream API, change
autoprewarm_database_main() to explicitly fast-forward in the
BlockInfoRecords array past the blocks belonging to an invalid relation
or fork.

This commit only implements the new control flow -- it does not use the
read stream API.

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
2025-04-04 15:28:49 -04:00
Melanie Plageman
7f848cb788 Remove superfluous autoprewarm check
autoprewarm_database_main() prewarms blocks from the same database. It
is passed an array of sorted BlockInfoRecords and a start and stop index
into the array. The range represented should include only blocks
belonging to global objects or blocks from a single database. Remove an
unnecessary check that the current block is from the same database and
add an assert to ensure this invariant remains. Doing so removes a
special case that makes future refactoring to accommodate read
streamifying autoprewarm easier.

Noticed off-list by Andres Freund
2025-04-04 15:28:39 -04:00
Melanie Plageman
64e7fa43a9 Fix autoprewarm neglect of tablespaces
While prewarming blocks from a dump file, autoprewarm_database_main()
mistakenly ignored tablespace when detecting the beginning of the next
relation to prewarm. Because RelFileNumbers are only unique within a
tablespace, autoprewarm could miss prewarming blocks from a
relation with the same RelFileNumber in a different tablespace.

Though this situation is likely rare in practice, it's best to make the
code correct. Do so by explicitly checking for the RelFileNumber when
detecting a new relation.

Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/97c36982-603b-494a-95f4-aaf2a12ac27e%40iki.fi
2025-04-04 11:34:06 -04:00
Peter Eisentraut
8123e91f5a Convert PathKey to use CompareType
Change the PathKey struct to use CompareType to record the sort
direction instead of hardcoding btree strategy numbers.  The
CompareType is then converted to the index-type-specific strategy when
the plan is created.

This reduces the number of places btree strategy numbers are
hardcoded, and it's a self-contained subset of a larger effort to
allow non-btree indexes to behave like btrees.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-04-04 11:22:20 +02:00
Melanie Plageman
54a3615f15 Remove misleading read stream asserts in a few users
Several read stream users asserted that the read stream was exhausted
after looping on that very condition. It was pointed out in an a
review of an as-of-yet uncommitted read stream user [1] that this was
confusing and could lead the reader to think there was a possibility of
some kind of race condition. Remove these asserts.

[1] https://postgr.es/m/F9ACE8D0-B807-4A17-B6BD-87EF0717983D%40yesql.se
2025-04-03 18:22:37 -04:00
Heikki Linnakangas
e4309f73f6 Add support for sorted gist index builds to btree_gist
This enables sortsupport in the btree_gist extension for faster builds
of gist indexes.

Sorted gist index build strategy is the new default now. Regression
tests are unchanged (except for one small change in the 'enum' test to
add coverage for enum values added later) and are using the sorted
build strategy instead.

One version of this was committed a long time ago already, in commit
9f984ba6d2, but it was quickly reverted because of buildfarm
failures. The failures were presumably caused by some small bugs, but
we never got around to debug and commit it again. This patch was
written from scratch, implementing the same idea, with some fragments
and ideas from the original patch.

Author: Bernd Helmle <mailings@oopsware.de>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-03 13:46:35 +03:00
Heikki Linnakangas
9370978da8 Fix boilerplate comments in btree_gist
A few of these were copy-pasted wrong, like the comment "Bytea ops" in
btree_numeric.c. Instead of fixing the incorrect ones, replace them
all with generic comment "GiST support functions".

Also tidy up the inconsistent newlines between various functions while
we're at it.
2025-04-03 13:39:33 +03:00
Andres Freund
ae3df4b341 read_stream: Introduce and use optional batchmode support
Submitting IO in larger batches can be more efficient than doing so
one-by-one, particularly for many small reads. It does, however, require
the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not:
a) block without first calling pgaio_submit_staged(), unless a
   to-be-waited-on lock cannot be part of a deadlock, e.g. because it is
   never held while waiting for IO.

b) directly or indirectly start another batch pgaio_enter_batchmode()

As this requires care and is nontrivial in some cases, batching is only
used with explicit opt-in.

This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and
uses it where appropriate.

There are two cases where batching would likely be beneficial, but where we
aren't using it yet:

1) bitmap heap scans, because the callback reads the VM

   This should soon be solved, because we are planning to remove the use of
   the VM, due to that not being sound.

2) The first phase of heap vacuum

   This could be made to support batchmode, but would require some care.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-30 18:36:41 -04:00
Tomas Vondra
49b82522f1 Remove incidental md5() function use from test
Replace md5() with sha256() in tests introduced in 14ffaece0fb5, to
allow test to pass in OpenSSL FIPS mode.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3518736.1743307492@sss.pgh.pa.us
2025-03-30 13:22:39 +02:00
Tomas Vondra
68f97aeadb amcheck: Add a GIN index to the CREATE INDEX CONCURRENTLY tests
The existing CREATE INDEX CONCURRENTLY tests checking only B-Tree, but
can be cheaply extended to also check GIN. This helps increasing test
coverage for GIN amcheck, especially related to handling concurrent page
splits and posting list trees.

This already helped to identify several issues during development of the
GIN amcheck support.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
Tomas Vondra
ca738bdc4c amcheck: Add a test with GIN index on JSONB data
Extend the existing test of GIN checks to also include an index on JSONB
data, using the jsonb_path_ops opclass. This is a common enough usage of
GIN that it makes sense to have better test coverage for it.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
Tomas Vondra
ec4327d106 amcheck: Fix indentation in verify_gin.c
I forgot to reindent the code after a couple last-minute adjustments
just before committing 14ffaece0fb53fed8ddbc46d2b353e1c4834863a.

Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 16:47:44 +01:00
Tomas Vondra
14ffaece0f amcheck: Add gin_index_check() to verify GIN index
Adds a new function, validating two kinds of invariants on a GIN index:

- parent-child consistency: Paths in a GIN graph have to contain
  consistent keys. Tuples on parent pages consistently include tuples
  from child pages; parent tuples do not require any adjustments.

- balanced-tree / graph: Each internal page has at least one downlink,
  and can reference either only leaf pages or only internal pages.

The GIN verification is based on work by Grigory Kryachko, reworked by
Heikki Linnakangas and with various improvements by Andrey Borodin.
Investigation and fixes for multiple bugs by Kirill Reshke.

Author: Grigory Kryachko <GSKryachko@gmail.com>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:44:29 +01:00
Tomas Vondra
d70b17636d amcheck: Move common routines into a separate module
Before performing checks on an index, we need to take some safety
measures that apply to all index AMs. This includes:

* verifying that the index can be checked - Only selected AMs are
supported by amcheck (right now only B-Tree). The index has to be
valid and not a temporary index from another session.

* changing (and then restoring) user's security context

* obtaining proper locks on the index (and table, if needed)

* discarding GUC changes from the index functions

Until now this was implemented in the B-Tree amcheck module, but it's
something every AM will have to do. So relocate the code into a new
module verify_common for reuse.

The shared steps are implemented by amcheck_lock_relation_and_check(),
receiving the AM-specific verification as a callback. Custom parameters
may be supplied using a pointer.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:14:49 +01:00
Peter Eisentraut
a0ed19e0a9 Use PRI?64 instead of "ll?" in format strings (continued).
Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 10:43:57 +01:00
Robert Haas
83ccc85859 pg_overexplain: Use PG_MODULE_MAGIC_EXT.
I committed this contrib module just after Tom committed
55527368bd07248e91e3d37a782bf66b76f06865; adjust it to match.

Author: Man Zeng <zengman@halodbtech.com>
Discussion: http://postgr.es/m/174313513707.60295.16516085012903412705.pgcf@coridan.postgresql.org
2025-03-28 09:16:29 -04:00
Robert Haas
9f0c36aea0 pg_overexplain: Call previous hooks as appropriate.
It makes no sense to remember the previous values of the hook variables
and then never bother calling those functions. Thanks to Andrei for
spotting my goof.

Author: Andrei Lepikhov <lepihov@gmail.com>
Discussion: http://postgr.es/m/41a344e3-ffb1-4296-8ba7-801f1e9642e5@gmail.com
2025-03-28 09:02:37 -04:00
Melanie Plageman
043799fa08 Use streaming read I/O in heap amcheck
Instead of directly invoking ReadBuffer() for each unskippable block in
the heap relation, verify_heapam() now uses the read stream API to
acquire the next buffer to check for corruption.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/flat/CAFY6G8eLyz7%2BsccegZYFj%3D5tAUR-GZ9uEq4Ch5gvwKqUwb_hCA%40mail.gmail.com
2025-03-27 14:04:14 -04:00
Tom Lane
4623d71443 Prevent assertion failure in contrib/pg_freespacemap.
Applying pg_freespacemap() to a relation lacking storage (such as a
view) caused an assertion failure, although there was no ill effect
in non-assert builds.  Add an error check for that case.

Bug: #18866
Reported-by: Robins Tharakan <tharakan@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/18866-d68926d0f1c72d44@postgresql.org
Backpatch-through: 13
2025-03-27 13:20:23 -04:00
Robert Haas
081ec08e6a pg_overexplain: Filter out actual row count from test result.
Per buildfarm, these are not stable. In particular, 1/8 is sometimes
0.12 and sometimes 0.13.
2025-03-27 09:00:46 -04:00
Álvaro Herrera
9fbd53dea5
Remove the query_id_squash_values GUC
Commit 62d712ecfd94 introduced the capability to calculate the same
queryId for queries with different lengths of constants in a list for an
IN clause.  This behavior was originally enabled with a GUC
query_id_squash_values.  After a discussion about the value of such a
GUC, it was decided to back out of the use of a GUC and make the
squashing behavior the only available option.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de
Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com
2025-03-27 13:33:37 +01:00
David Rowley
f31aad9b07 Fix query jumbling to account for NULL nodes
Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct had distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since, in the first query, we
ignored the NULL sortClause and appended the jumble bytes for the
distictClause.  In the latter query, since we did nothing for the NULL
distinctClause then jumble the non-NULL sortClause, and since the node
representation stored is the same in both cases, the query IDs were
identical.

Here we fix this by always accounting for NULL nodes by recording that
we saw a NULL in the jumble buffer.  This fixes the issue as the order that
the NULL is recorded isn't the same in the above two queries.

Author: Bykov Ivan <i.bykov@modernsys.ru>
Author: Michael Paquier <michael@paquier.xyz>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
2025-03-27 18:23:00 +13:00
Robert Haas
47a1f076a7 pg_overexplain: SET jit=off when running tests.
Per buildfarm.
2025-03-26 15:43:25 -04:00
Robert Haas
de65c4dade Fix oversights in commit 8d5ceb113e3f7ddb627bd40b26438a9d2fa05512
It added bogus whitespace at the end of a line in the documentation.
It should not have done that.

The pg_overexplain tests must SET debug_parallel_query = false,
not just RESET debug_parallel_query, or we get failures on test
machines that make debug_parallel_query = true the defualt.
2025-03-26 14:22:45 -04:00
Robert Haas
8d5ceb113e pg_overexplain: Additional EXPLAIN options for debugging.
There's a fair amount of information in the Plan and PlanState trees
that isn't printed by any existing EXPLAIN option. This means that,
when working on the planner, it's often necessary to rely on facilities
such as debug_print_plan, which produce excessively voluminous
output. Hence, use the new EXPLAIN extension facilities to implement
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core
EXPLAIN facility.

A great deal more could be done here, and the specific choices about
what to print and how are definitely arguable, but this is at least
a starting point for discussion and a jumping-off point for possible
future improvements.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviweed-by: Andrei Lepikhov <lepihov@gmail.com> (who didn't like it)
Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com
2025-03-26 13:52:21 -04:00
Tom Lane
55527368bd Use PG_MODULE_MAGIC_EXT in our installable shared libraries.
It seems potentially useful to label our shared libraries with version
information, now that a facility exists for retrieving that.  This
patch labels them with the PG_VERSION string.  There was some
discussion about using semantic versioning conventions, but that
doesn't seem terribly helpful for modules with no SQL-level presence;
and for those that do have SQL objects, we typically expect them
to support multiple revisions of the SQL definitions, so it'd still
not be very helpful.

I did not label any of src/test/modules/.  It seems unnecessary since
we don't install those, and besides there ought to be someplace that
still provides test coverage for the original PG_MODULE_MAGIC macro.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:11:02 -04:00
Tom Lane
9324c8c580 Introduce PG_MODULE_MAGIC_EXT macro.
This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future.  This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object.  But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:06:12 -04:00
Daniel Gustafsson
e92c0632c1 Move GSSAPI includes into its own header
Due to a conflict in macro names on Windows between <wincrypt.h>
and <openssl/ssl.h> these headers need to be included using a
predictable pattern with an undef to handle that. The GSSAPI
header <gssapi.h> does include <wincrypt.h> which cause problems
with compiling PostgreSQL using MSVC when OpenSSL and GSSAPI are
both enabled in the tree. Rather than fixing piecemeal for each
file including gssapi headers, move the the includes and undef
to a new file which should be used to centralize the logic.

This patch is a reworked version of a patch by Imran Zaheer
proposed earlier in the thread. Once this has proven effective
in master we should look at backporting this as the problem
exist at least since v16.

Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Imran Zaheer <imran.zhir@gmail.com>
Reported-by: Dave Page <dpage@pgadmin.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/20240708173204.3f3xjilglx5wuzx6@awork3.anarazel.de
2025-03-26 15:31:46 +01:00
Peter Eisentraut
3642df265d dblink: SCRAM authentication pass-through
This enables SCRAM authentication for dblink (using dblink_fdw) when
connecting to a foreign server without having to store a plain-text
password on user mapping options

This uses the same approach as it was implemented for postgres_fdw in
commit 761c79508e7.  (It also contains the equivalent of the
subsequent fixes 76563f88cfb and d2028e9bbc1.)

Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFY6G8ercA1KES%3DE_0__R9QCTR805TTyYr1No8qF8ZxmMg8z2Q%40mail.gmail.com
2025-03-26 10:49:23 +01:00
Michael Paquier
787514b30b Use relation name instead of OID in query jumbling for RangeTblEntry
custom_query_jumble (introduced in 5ac462e2b7ac as a node field
attribute) is now assigned to the expanded reference name "eref" of
RangeTblEntry, adding in the query jumble computation the non-qualified
aliased relation name, without the list of column names.  The relation
OID is removed from the query jumbling.

The effects of this change can be seen in the tests added by
3430215fe35f, where pg_stat_statements (PGSS) entries are now grouped
using the relation name, ignoring the relation search_path may point at.
For example, these two relations are different, but are now grouped in a
single PGSS entry as they are assigned the same query ID:
CREATE TABLE foo1.tab (a int);
CREATE TABLE foo2.tab (b int);
SET search_path = 'foo1';
SELECT count(*) FROM tab;
SET search_path = 'foo2';
SELECT count(*) FROM tab;
SELECT count(*) FROM foo1.tab;
SELECT count(*) FROM foo2.tab;
SELECT query, calls FROM pg_stat_statements WHERE query ~ 'FROM tab';
          query           | calls
--------------------------+-------
 SELECT count(*) FROM tab |     4
(1 row)

It is still possible to use an alias in the FROM clause to split these.
This behavior is useful for relations re-created with the same name,
where queries based on such relations would be grouped in the same
PGSS entry.  For permanent schemas, it should not really matter in
practice.  The main benefit is for workloads that use a lot of temporary
relations, which are usually re-created with the same name continuously.
These can be a heavy source of bloat in PGSS depending on the workload.
Such entries can now be grouped together, improving the user experience.

The original idea from Christoph Berg used catalog lookups to find
temporary relations, something that the query jumble has never done, and
it could cause some performance regressions.  The idea to use
RangeTblEntry.eref and the relation name, applying the same rules for
all relations, temporary and not temporary, has been proposed by Tom
Lane.  The documentation additions have been suggested by Sami Imseih.

Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 15:21:05 +09:00
Peter Eisentraut
d2028e9bbc postgres_fdw: Fix tests on some Windows variants
The tests introduced by commit 76563f88cfb only work when Unix-domain
sockets are available.  This is optional on Windows, and buildfarm
member drongo runs without them.  To fix, skip the test if Unix-domain
sockets are not enabled.
2025-03-26 07:00:00 +01:00
Michael Paquier
3430215fe3 pg_stat_statements: Add more tests with temp tables and namespaces
These tests provide coverage for RangeTblEntry and how query jumbling
works with search_path, as well as the case where relations are
re-created, generating a different query ID as the relation OID is used
in the computation.

A patch is under discussion to switch to a different approach based on
the relation name, and there was no test coverage for this area,
including how queries are currently grouped with search_path.  This is
useful to track how the situation changes between HEAD and any patches
proposed.

Christoph has proposed the test with ON COMMIT DROP temporary tables,
and I have written the second part.

Author: Christoph Berg <myon@debian.org>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 07:25:23 +09:00