postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2025-04-18 20:50:44 +08:00

Author	SHA1	Message	Date
Nathan Bossart	b0a4c3e88b	Prevent 006_transfer_modes.pl from leaving files behind. This test was leaving files like delete_old_cluster.{sh,bat} in the source directory for VPATH and meson builds. To fix, change the directory to tmp_check before running the test, as was done in commits 15b6d21553, 8af917be6b, and c462b054ba. Oversight in commit af0d4901c1. Reported-by: Andrew Dunstan <andrew@dunslane.net> (on Discord) Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/Z_RHkG770w3SE0yU%40nathan	2025-04-08 10:57:31 -05:00
Daniel Gustafsson	88edd661c8	ci: Add MBUILD_TARGET for NetBSD and OpenBSD Commit b2bdb972c0 added MBUILD_TARGET to ensure that meson builds the tests before running them, this adds MBUILD_TARGET to OpenBSD and NetBSD builds as well where it was missing. No backpatching since OpenBSD and NetBSD support does not exist in the backbranch CI. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAN55FZ2LNnRrtL+cpSdEg44fQcLPq_GjJjfNa0vz+xqEdq=ZHw@mail.gmail.com	2025-04-08 15:28:29 +02:00
Tomas Vondra	91f1fe90c7	pg_buffercache: Change page_num type to bigint The page_num was defined as integer, which should be sufficient for the near future (with 4K pages it's 8TB). But it's virtually free to return bigint, and get a wider range. This was agreed on the thread, but I forgot to tweak this in ba2a3c2302f1. While at it, make the data types in CREATE VIEW a bit more consistent. Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co	2025-04-08 12:38:42 +02:00
Tomas Vondra	b8a6078ca8	doc: Correct pg_shmem_allocations_numa.size data type The code in pg_get_shmem_allocations_numa() returned 'size' as int64, but the docs said int32. Report and fix by Noriyoshi Shinoda. Reported-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com> Discussion: https://postgr.es/m/DM4PR84MB1734308EB741A6ECFF040C27EEAA2@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM	2025-04-08 12:36:36 +02:00
Amit Kapila	12eece5fd5	Fix uninitialized index information access during apply. The issue happens when building conflict information during apply of INSERT or UPDATE operations that violate unique constraints on leaf partitions. The problem was introduced in commit 9ff68679b5, which removed the redundant calls to ExecOpenIndices/ExecCloseIndices. The previous code was relying on the redundant ExecOpenIndices call in apply_handle_tuple_routing() to build the index information required for unique key conflict detection. The fix is to delay building the index information until a conflict is detected instead of relying on ExecOpenIndices to do the same. The additional benefit of this approach is that it avoids building index information when there is no conflict. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by:Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/TYAPR01MB57244ADA33DDA57119B9D26494A62@TYAPR01MB5724.jpnprd01.prod.outlook.com	2025-04-08 15:35:42 +05:30
Thomas Munro	7ea21f4ee2	Fix typo in docs. Typo in previous commit.	2025-04-08 22:02:45 +12:00
Thomas Munro	f78ca6f3eb	Introduce file_copy_method setting. It can be set to either COPY (the default) or CLONE if the system supports it. CLONE causes callers of copydir(), currently CREATE DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET TABLESPACE = ..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) to copy files instead of a read-write loop over the contents. CLONE gives the kernel the opportunity to share block ranges on copy-on-write file systems and push copying down to storage on others, depending on configuration. On some systems CLONE can be used to clone large databases quickly with CREATE DATABASE ... TEMPLATE=source STRATEGY=FILE_COPY. Other operating systems could be supported; patches welcome. Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com	2025-04-08 21:35:38 +12:00
Daniel Gustafsson	042a66291b	Add function to get memory context stats for processes This adds a function for retrieving memory context statistics and information from backends as well as auxiliary processes. The intended usecase is cluster debugging when under memory pressure or unanticipated memory usage characteristics. When calling the function it sends a signal to the specified process to submit statistics regarding its memory contexts into dynamic shared memory. Each memory context is returned in detail, followed by a cumulative total in case the number of contexts exceed the max allocated amount of shared memory. Each process is limited to use at most 1Mb memory for this. A summary can also be explicitly requested by the user, this will return the TopMemoryContext and a cumulative total of all lower contexts. In order to not block on busy processes the caller specifies the number of seconds during which to retry before timing out. In the case where no statistics are published within the set timeout, the last known statistics are returned, or NULL if no previously published statistics exist. This allows dash- board type queries to continually publish even if the target process is temporarily congested. Context records contain a timestamp to indicate when they were submitted. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com	2025-04-08 11:06:56 +02:00
Andres Freund	15f0cb26b5	Increase BAS_BULKREAD based on effective_io_concurrency Before, BAS_BULKREAD was always of size 256kB. With the default io_combine_limit of 16, that only allowed 1-2 IOs to be in flight - insufficient even on very low latency storage. We don't just want to increase the size to a much larger hardcoded value, as very large rings (10s of MBs of of buffers), appear to have negative performance effects when reading in data that the OS has cached (but not when actually needing to do IO). To address this, increase the size of BAS_BULKREAD to allow for io_combine_limit * effective_io_concurrency buffers getting read in. To prevent the ring being much larger than useful, limit the increased size with GetPinLimit(). The formula outlined above keeps the ring size to sizes for which we have not observed performance regressions, unless very large effective_io_concurrency values are used together with large shared_buffers setting. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/lqwghabtu2ak4wknzycufqjm5ijnxhb4k73vzphlt2a3wsemcd@gtftg44kdim6 Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah@brqs62irg4dt	2025-04-08 02:41:03 -04:00
Andres Freund	dcf7e1697b	Add pg_buffercache_evict_{relation,all} functions In addition to the added functions, the pg_buffercache_evict() function now shows whether the buffer was flushed. pg_buffercache_evict_relation(): Evicts all shared buffers in a relation at once. pg_buffercache_evict_all(): Evicts all shared buffers at once. Both functions provide mechanism to evict multiple shared buffers at once. They are designed to address the inefficiency of repeatedly calling pg_buffercache_evict() for each individual buffer, which can be time-consuming when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for 16GB of fully populated shared buffers). These functions are intended for developer testing and debugging purposes and are available to superusers only. Minimal tests for the new functions are included. Also, there was no test for pg_buffercache_evict(), test for this added too. No new extension version is needed, as it was already increased this release by ba2a3c2302f. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com	2025-04-08 02:19:32 -04:00
David Rowley	d69d45a5a9	Speedup child EquivalenceMember lookup in planner When planning queries to partitioned tables, we clone all EquivalenceMembers belonging to the partitioned table into em_is_child EquivalenceMembers for each non-pruned partition. For partitioned tables with large numbers of partitions, this meant the ec_members list could become large and code searching that list would become slow. Effectively, the more partitions which were present, the more searches needed to be performed for operations such as find_ec_member_matching_expr() during create_plan() and the more partitions present, the longer these searches would take, i.e., a quadratic slowdown. To fix this, here we adjust how we store EquivalenceMembers for em_is_child members. Instead of storing these directly in ec_members, these are now stored in a new array of Lists in the EquivalenceClass, which is indexed by the relid. When we want to find EquivalenceMembers belonging to a certain child relation, we can narrow the search to the array element for that relation. To make EquivalenceMember lookup easier and to reduce the amount of code change, this commit provides a pair of functions to allow iteration over the EquivalenceMembers of an EC which also handles finding the child members, if required. Callers that never need to look at child members can remain using the foreach loop over ec_members, which will now often be faster due to only parent-level members being stored there. The actual performance increases here are highly dependent on the number of partitions and the query being planned. Performance increases can be visible with as few as 8 partitions, but the speedup is marginal for such low numbers of partitions. The speedups become much more visible with a few dozen to hundreds of partitions. With some tested queries using 56 partitions, the planner was around 3x faster than before. For use cases with thousands of partitions, these are likely to become significantly faster. Some testing has shown planner speedups of 60x or more with 8192 partitions. Author: Yuya Watari <watari.yuya@gmail.com> Co-authored-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Tested-by: Thom Brown <thom@linux.com> Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com> Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com	2025-04-08 18:09:57 +12:00
Amit Kapila	105b2cb336	Stabilize 035_standby_logical_decoding.pl. Some tests try to invalidate logical slots on the standby server by running VACUUM on the primary. The problem is that xl_running_xacts was getting generated and replayed before the VACUUM command, leading to the advancement of the active slot's catalog_xmin. Due to this, active slots were not getting invalidated, leading to test failures. We fix it by skipping the generation of xl_running_xacts for the required tests with the help of injection points. As the required interface for injection points was not present in back branches, we fixed the failing tests in them by disallowing the slot to become active for the required cases (where rows_removed conflict could be generated). Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16, where it was introduced Discussion: https://postgr.es/m/Z6oQXc8LmiTLfwLA@ip-10-97-1-34.eu-west-3.compute.internal	2025-04-08 09:38:02 +05:30
Bruce Momjian	46b4ba533c	Fix PG 17 [NOT] NULL optimization bug for domains A PG 17 optimization allowed columns with NOT NULL constraints to skip table scans for IS NULL queries, and to skip IS NOT NULL checks for IS NOT NULL queries. This didn't work for domain types, since domain types don't follow the IS NULL/IS NOT NULL constraint logic. To fix, disable this optimization for domains for PG 17+. Reported-by: Jan Behrens Diagnosed-by: Tom Lane Discussion: https://postgr.es/m/Z37p0paENWWUarj-@momjian.us Backpatch-through: 17	2025-04-07 21:33:42 -04:00
Michael Paquier	039549d70f	Flush the IO statistics of active WAL senders more frequently WAL senders do not flush their statistics until they exit, limiting the monitoring possible for live processes. This is penalizing when WAL senders are running for a long time, like in streaming or logical replication setups, because it is not possible to know the amount of IO they generate while running. This commit makes WAL senders more aggressive with their statistics flush, using an internal of 1 second, with the flush timing calculated based on the existing GetCurrentTimestamp() done before the sleeps done to wait for some activity. Note that the sleep done for logical and physical WAL senders happens in two different code paths, so the stats flushes need to happen in these two places. One test is added for the physical WAL sender case, and one for the logical WAL sender case. This can be done in a stable fashion by relying on the WAL generated by the TAP tests in combination with a stats reset while a server is running, but only on HEAD as WAL data has been added to pg_stat_io in a051e71e28a1. This issue exists since a9c70b46dbe and the introduction of pg_stat_io, so backpatch down to v16. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z73IsKBceoVd4t55@ip-10-97-1-34.eu-west-3.compute.internal Backpatch-through: 16	2025-04-08 07:57:19 +09:00
Tomas Vondra	ba2a3c2302	Add pg_buffercache_numa view with NUMA node info Introduces a new view pg_buffercache_numa, showing NUMA memory nodes for individual buffers. For each buffer the view returns an entry for each memory page, with the associated NUMA node. The database blocks and OS memory pages may have different size - the default block size is 8KB, while the memory page is 4K (on x86). But other combinations are possible, depending on configure parameters, platform, etc. This means buffers may overlap with multiple memory pages, each associated with a different NUMA node. To determine the NUMA node for a buffer, we first need to touch the memory pages using pg_numa_touch_mem_if_required, otherwise we might get status -2 (ENOENT = The page is not present), indicating the page is either unmapped or unallocated. The view may be relatively expensive, especially when accessed for the first time in a backend, as it touches all memory pages to get reliable information about the NUMA node. This may also force allocation of the shared memory. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com	2025-04-07 23:08:17 +02:00
Tomas Vondra	8cc139bec3	Introduce pg_shmem_allocations_numa view Introduce new pg_shmem_alloctions_numa view with information about how shared memory is distributed across NUMA nodes. For each shared memory segment, the view returns one row for each NUMA node backing it, with the total amount of memory allocated from that node. The view may be relatively expensive, especially when executed for the first time in a backend, as it has to touch all memory pages to get reliable information about the NUMA node. This may also force allocation of the shared memory. Unlike pg_shmem_allocations, the view does not show anonymous shared memory allocations. It also does not show memory allocated using the dynamic shared memory infrastructure. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com	2025-04-07 23:08:17 +02:00
Tomas Vondra	65c298f61f	Add support for basic NUMA awareness Add basic NUMA awareness routines, using a minimal src/port/pg_numa.c portability wrapper and an optional build dependency, enabled by --with-libnuma configure option. For now this is Linux-only, other platforms may be supported later. A built-in SQL function pg_numa_available() allows checking NUMA support, i.e. that the server was built/linked with the NUMA library. The main function introduced is pg_numa_query_pages(), which allows determining the NUMA node for individual memory pages. Internally the function uses move_pages(2) syscall, as it allows batching, and is more efficient than get_mempolicy(2). Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com	2025-04-07 23:08:17 +02:00
Álvaro Herrera	17bcf4f545	Use specific collation where needed in new test Oversight in commit a379061a22a8. Per Czech buildfarm members jay and hippopotamus.	2025-04-07 21:58:06 +02:00
Tom Lane	8cfbdf8f4d	Fix some issues in contrib/spi/refint.c. check_foreign_key incorrectly used a single cache entry for its saved plans for a 'c' (cascade) trigger, although there are two different queries to execute depending on whether it fires for an update or a delete. This caused the wrong things to be done if both types of event occur in one session. (This was indeed visible in the triggers regression test, but apparently nobody ever questioned it.) To fix, add the operation type to the cache key. Its debug log output failed to distinguish update from delete events, too. Also, change the intended trigger usage from BEFORE ROW to AFTER ROW, and add checks insisting on that usage. BEFORE is really rather unsafe, since if there are other BEFORE triggers they might change or cancel the operation we are trying to check. AFTER triggers are the standard way to propagate changes to other rows, so we should follow that way here. In passing, remove a useless duplicate lookup of the cache entry. This code is mostly intended as a documentation example, so we won't consider a back-patch. Author: Dmitrii Bondar <d.bondar@postgrespro.ru> Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Lilian Ontowhee <ontowhee@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/79755a2b18ed4fe5e29da6a87a1e00d1@postgrespro.ru	2025-04-07 15:54:16 -04:00
Andres Freund	8e293e689b	aio: Make AIO more compatible with valgrind In some edge cases valgrind flags issues with the memory referenced by IOs. All of the cases addressed in this change are false positives. Most of the false positives are caused by UnpinBuffer[NoOwner] marking buffer data as inaccessible. This happens even though the AIO subsystem still holds a pin. That's good, there shouldn't be accesses to the buffer outside of AIO related code until it is pinned by "user" code again. But it requires some explicit work - if the buffer is not pinned by the current backend, we need to explicitly mark the buffer data accessible/inaccessible while executing completion callbacks. That however causes a cascading issue in IO workers: After the completion callbacks for a buffer is executed, the page is marked as inaccessible. If subsequently the same worker is executing IO targeting the same buffer, we would get an error, as the memory is still marked inaccessible. To avoid that, we need to explicitly mark the memory as accessible in IO workers. Another issue is that IO executed in workers or via io_uring will not mark memory as DEFINED. In the case of workers that is because valgrind does not track memory definedness across processes. For io_uring that is because valgrind does not understand io_uring, and therefore its IOs never mark memory as defined, whether the completions are processed in the defining process or in another context. It's not entirely clear how to best solve that. The current user of AIO is not affected, as it explicitly marks buffers as DEFINED & NOACCESS anyway. Defer solving this issue until we have a user with different needs. Per buildfarm animal skink. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6	2025-04-07 15:20:30 -04:00
Andres Freund	8ab4241b9f	localbuf: Add Valgrind buffer access instrumentation This mirrors 1e0dfd166b3 (+ 46ef520b9566), for temporary table buffers. This is mainly interesting right now because the AIO work currently triggers spurious valgrind errors, and the fix for that is cleaner if temp buffers behave the same as shared buffers. This requires one change beyond the annotations themselves, namely to pin local buffers while writing them out in FlushRelationBuffers(). Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6	2025-04-07 15:20:30 -04:00
Masahiko Sawada	a13d49014d	doc: Fix a typo in pg_recvlogical documentation. Oversight in cf2655a9029a. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/OS3PR01MB5718DD1466E2B9043448AE5094AA2@OS3PR01MB5718.jpnprd01.prod.outlook.com	2025-04-07 12:13:08 -07:00
Tom Lane	969ab9d4f5	Follow-up fixes for SHA-2 patch (commit 749a9e20c). This changes the check for valid characters in the salt string to only allow plain ASCII letters and digits. The previous coding was locale-dependent which doesn't really seem like a great idea here; moreover it could not work correctly in multibyte encodings. This fixes a careless pointer-use-after-pfree, too. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Andres Freund <andres@anarazel.de> Author: Bernd Helmle <mailings@oopsware.de> Discussion: https://postgr.es/m/6fab35422df6b6b9727fdcc243c5fa1c667dd3b5.camel@oopsware.de	2025-04-07 14:14:28 -04:00
Tom Lane	b73e6d71a8	Fix erroneous construction of functions' dependencies on transforms. The list of transform objects that a function should use is specified in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly in pg_proc.protrftypes. However, ProcedureCreate completely ignored that for purposes of constructing pg_depend entries, and instead made the function depend on any transforms that exist for its parameter or return data types. This is bad in both directions: the function could be made dependent on a transform it does not actually use, or it could try to use a transform that's since been dropped. (The latter scenario would require use of a transform that's not for any of the parameter or return types, but that seems legit for cases where the function performs SQL operations internally.) To fix, pass in the list of transform objects that CreateFunction identified, and build pg_depend entries from that not from the parameter/return types. This results in changes in the expected test outputs in contrib/bool_plperl, which I guess are due to different ordering of pg_depend entries -- that test case is surely not exercising either of the problem scenarios. This fix is not back-patchable as-is: changing the signature of ProcedureCreate seems too risky in stable branches. We could do something like making ProcedureCreate a wrapper around ProcedureCreateExt or so. However, I'm more inclined to do nothing in the back branches. We had no field complaints up to now, so the hazards don't seem to be a big issue in practice. And we couldn't do anything about existing pg_depend entries, so a back-patched fix would result in a mishmash of dependencies created according to different rules. That cure could be worse than the disease, perhaps. I bumped catversion just to lay down a marker that the expected contents of pg_depend are a bit different than before. Reported-by: Chapman Flack <jcflack@acm.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us	2025-04-07 13:31:37 -04:00
Álvaro Herrera	a379061a22	Allow NOT NULL constraints to be added as NOT VALID This allows them to be added without scanning the table, and validating them afterwards without holding access exclusive lock on the table after any violating rows have been deleted or fixed. Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid not-null constraint validates that constraint. ALTER TABLE .. VALIDATE CONSTRAINT is also supported. There are various checks on whether an invalid constraint is allowed in a child table when the parent table has a valid constraint; this should match what we do for enforced/not enforced constraints. pg_attribute.attnotnull is now only an indicator for whether a not-null constraint exists for the column; whether it's valid or invalid must be queried in pg_constraint. Applications can continue to query pg_attribute.attnotnull as before, but now it's possible that NULL rows are present in the column even when that's set to true. For backend internal purposes, we cache the nullability status in CompactAttribute->attnullability that each tuple descriptor carries (replacing CompactAttribute.attnotnull, which was a mirror of Form_pg_attribute.attnotnull). During the initial tuple descriptor creation, based on the pg_attribute scan, we set this to UNRESTRICTED if pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we update the latter to VALID or INVALID depending on the pg_constraint scan. This flag is also copied when tupledescs are copied. Comparing tuple descs for equality must also compare the CompactAttribute.attnullability flag and return false in case of a mismatch. pg_dump deals with these constraints by storing the OIDs of invalid not-null constraints in a separate array, and running a query to obtain their properties. The regular table creation SQL omits them entirely. They are then dealt with in the same way as "separate" CHECK constraints, and dumped after the data has been loaded. Because no additional pg_dump infrastructure was required, we don't bump its version number. I decided not to bump catversion either, because the old catalog state works perfectly in the new world. (Trying to run with new catalog state and the old server version would likely run into issues, however.) System catalogs do not support invalid not-null constraints (because commit 14e87ffa5c54 didn't allow them to have pg_constraint rows anyway.) Author: Rushabh Lathia <rushabh.lathia@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Tested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com	2025-04-07 19:19:50 +02:00
Andrew Dunstan	b52a4a5f28	Clean up error messages from 1495eff7bdb Quote file names, and mostly avoid hard coded file names. Along the way make a few other minor improvements. Discussion: https://postgr.es/m/20250407.152721.1397761902317499205.horikyota.ntt@gmail.com	2025-04-07 12:22:41 -04:00
Tom Lane	3516ea768c	Add local-address escape "%L" to log_line_prefix. This escape shows the numeric server IP address that the client has connected to. Unix-socket connections will show "[local]". Non-client processes (e.g. background processes) will show "[none]". We expect that this option will be of interest to only a fairly small number of users. Therefore the implementation is optimized for the case where it's not used (that is, we don't do the string conversion until we have to), and we've not added the field to csvlog or jsonlog formats. Author: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAKAnmmK-U+UicE-qbNU23K--Q5XTLdM6bj+gbkZBZkjyjrd3Ow@mail.gmail.com	2025-04-07 11:06:05 -04:00
Andrew Dunstan	8f5e419484	Revert "Use workaround of __builtin_setjmp only on MINGW on MSVCRT" This reverts commit c313fa4602defe1be947370ab5b217ca163a1e3c. This is found to cause issues on x86_64 Windows even when using UCRT. Discussion: https://postgr.es/m/3312149.1744001936@sss.pgh.pa.us	2025-04-07 11:01:15 -04:00
Andres Freund	8ce79483dc	read_stream: Fix overflow hazard with large shared buffers If the limit returned by GetAdditionalPinLimit() is large, the buffer_limit variable in read_stream_start_pending_read() can overflow. While the code is careful to limit buffer_limit PG_INT16_MAX, we subsequently add the number of forwarded buffers. The overflow can lead to assertion failures, crashes or wrong query results when using large shared buffers. It seems easier to avoid this if we make the buffer_limit variable an int, instead of an int16. Do so, and clamp buffer_limit after adding the number of forwarded buffers. It's possible we might want to address this and related issues more widely by changing to int instead of int16 more widely, but since the consequences of this bug can be confusing, it seems better to fix it now. This bug was introduced in ed0b87caaca. Discussion: https://postgr.es/m/ewvz3cbtlhrwqk7h6ca6cctiqh7r64ol3pzb3iyjycn2r5nxk5@tnhw3a5zatlr	2025-04-07 09:45:00 -04:00
Alexander Korotkov	717d0e8dd9	Remove GUC_NOT_IN_SAMPLE from enable_self_join_elimination fc069a3a6319 implements Self-Join Elimination (SJE) and provides a new GUC variable: enable_self_join_elimination. This new GUC variable was marked as GUC_NOT_IN_SAMPLE. However, enable_self_join_elimination is documented and is not different from any other enable_* GUCs. Thus, remove GUC_NOT_IN_SAMPLE from it and add it to the postgresql.conf.sample. Discussion: https://postgr.es/m/CAPpHfdsqMTEsmxk3aQwt6xPz%2BKpUELO%3D6fzmER9ZRGrbs4uMfA%40mail.gmail.com Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2025-04-07 16:28:54 +03:00
Daniel Gustafsson	ae60947643	psql: Clarify help message for WATCH_INTERVAL The help message for WATCH_INTERVAL was hard to interpret and didn't follow the style of other messages, this updates it to nake it fit in better and be easier to interpret. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/20250326.120732.1167093737847500721.horikyota.ntt@gmail.com	2025-04-07 13:44:58 +02:00
Michael Paquier	d6f118444d	Fix grammar in log message of pg_restore.c Introduced by 1495eff7bdb0. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20250407.151359.72428746612514925.horikyota.ntt@gmail.com	2025-04-07 15:37:34 +09:00
Michael Paquier	2c7bd2ba50	libpq: Fix some issues in TAP tests for service files The valid service file was not correctly shaped, as append_to_file() was called with an array as input. This is changed so as the parameter and value pairs from the valid connection string are appended to the valid service file one by one. Even with the first issue fixed, the tests should fail. However, they have been passing because all the connection attempts relied on the default values given to PGPORT and PGHOST from the node when using Cluster.pm's connect_ok() and connect_fails(), rather than the data in the service file. The test is updated to use an interesting trick: a dummy node is initialized but not started, and all the connection attempts are done through it. This ensures that the data inside the service file is used for all the connection tests. Note that breaking the contents of the valid service file on purpose makes all the tests that rely on it fail. Issues introduced by 72c2f36d5727. Author: Andrew Jackson <andrewjackson947@gmail.com> Discussion: https://postgr.es/m/CAKK5BkG_6_YSaebM6gG=8EuKaY7_VX1RFgYeySuwFPh8FZY73g@mail.gmail.com	2025-04-07 12:55:09 +09:00
Michael Paquier	c36eda2591	Clarify comment for worst-case allocation in quote_literal_cstr() palloc() is invoked with a specific formula for its allocation size in quote_literal_cstr(). This wastes some memory, but the size is large enough to cover even the worst-case scenarios. No explanations were given about the reasons behind these numbers. This commit adds more documentation about all that. Author: Steve Chavez <steve@supabase.io> Discussion: https://postgr.es/m/CAGRrpzZ9bToRWS+fAnjxDJrxwZN1QcJ-y1Pn2yg=Hst6rydLtw@mail.gmail.com	2025-04-07 10:02:12 +09:00
Michael Paquier	3191a593d6	Fix use-after-free in pgstat_fetch_stat_backend_by_pid() stats_fetch_consistency set to "snapshot" causes the backend entry "beentry" retrieved by pgstat_get_beentry_by_proc_number() to be reset at the beginning of pgstat_fetch_stat_backend() when fetching the backend pgstats entry. As coded, "beentry" was being accessed after being freed. This commit moves all the accesses to "beentry" to happen before calling pgstat_fetch_stat_backend(), fixing the problem. This problem could be reached by calling the SQL functions pg_stat_get_backend_io() or pg_stat_get_backend_wal(). Issue caught by valgrind. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/f1788cc0-253a-4a3a-aee0-1b8ab9538736@gmail.com	2025-04-07 09:51:40 +09:00
Fujii Masao	173c97812f	Use XLOG_CONTROL_FILE macro consistently for control file name. The XLOG_CONTROL_FILE macro (defined in access/xlog_internal.h) represents the control file name. While some parts of the codebase already use this macro, others previously hardcoded the file name as a string. This commit replaces those hardcoded strings with the macro, ensuring consistent usage throughout the code. This makes future maintenance easier and improves searchability, for example when grepping for control file usage. Author: Anton A. Melnikov <a.melnikov@postgrespro.ru> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Masao Fujii <masao.fujii@gmail.com> Discussion: https://postgr.es/m/0841ec77-47e5-452a-adb4-c6fa55d605fc@postgrespro.ru	2025-04-07 09:27:33 +09:00
Daniel Gustafsson	a233a603ba	doc: Clarify project naming Clarify the project naming in the history section of the docs to match the recent license preamble changes. Backpatch to all supported versions. Author: Dave Page <dpage@pgadmin.org> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA+OCxozLzK2+Jc14XZyWXSp6L9Ot+3efwXUE35FJG=fsbib2EA@mail.gmail.com Backpatch-through: 13	2025-04-07 00:03:18 +02:00
Andrew Dunstan	643a1a6198	Clean up checking for pg_dumpall output directory Coverity objected to the original code, and in any case this is much cleaner, using the existing routine pg_check_dir() instead of rolling its own test. Per suggestion from Tom Lane.	2025-04-06 17:04:58 -04:00
Tom Lane	218ab68275	Doc: fix PDF "contents ... exceed the available area" warnings. Tweak column widths in a new table, similarly to some previous fixes such as b62381d9a. Per buildfarm.	2025-04-06 16:27:39 -04:00
Nathan Bossart	de48056ec7	pg_upgrade: Fix memory leak in check_for_unicode_update(). This function was initializing the "task" variable before a couple of early returns. To fix, postpone the initialization until just before it's needed. Per Coverity. Discussion: https://postgr.es/m/Z_KMsUH2-FEbiNjC%40nathan	2025-04-06 15:11:41 -05:00
Andres Freund	57dec20fd4	aio: Avoid spurious coverity warning PgAioResult.result is never accessed in the relevant path, but coverity complains about an uninitialized access anyway. So just zero-initialize the whole thing. While at it, reduce the scope of the variable. Reported-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAEudQApsKqd-s+fsUQ0OmxJAMHmBSXxrAz3dCs+uvqb3iRtjSw@mail.gmail.com	2025-04-06 12:07:02 -04:00
Tom Lane	8ab6ef2bb8	Fix memory leaks in px_crypt_shacrypt(). Per Coverity. I don't think these are of any actual significance since the function ought to be invoked in a short-lived context. Still, if it's trying to be neat it should get it right. Also const-ify a constant and fix up typedef formatting.	2025-04-06 11:57:22 -04:00
Tom Lane	2e4ccf1b45	Use "(void)" to mark pgstat_lock_entry(..., false) calls. This should silence Coverity's complaints about the result being sometimes ignored. I'm inclined to think that these routines are simply misdesigned, because sometimes it's okay to ignore the result and sometimes it isn't, and we have no way to enforce the latter. But for now I just added a comment.	2025-04-06 11:37:09 -04:00
Andrew Dunstan	5e19154390	Avoid unnecessary copying of a string in pg_restore.c Coverity complained about a possible overrun in the copy, but there is no actual need to copy the string at all.	2025-04-06 09:21:09 -04:00
Andrew Dunstan	6d5417e634	Fix a couple of memory leaks in pg_restore.c per complaint from Coverity.	2025-04-06 09:09:25 -04:00
Peter Eisentraut	a8025f5448	Relax ordering-related hardcoded btree requirements in planning There were several places in ordering-related planning where a requirement for btree was hardcoded but an amcanorder index could suffice. This fixes that. We just need to do the necessary mapping between strategy numbers and compare types and adjust some related APIs so that this works independent of btree strategy numbers. For instance, non-btree amcanorder indexes can now be used to support sorting and merge joins. Also, predtest.c works independent of btree strategy numbers now. To avoid performance regressions, some details on btree and other built-in index types are still hardcoded as shortcuts, but other index types now have access to the same features by providing the required flags and callbacks. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-04-06 14:43:51 +02:00
Alexander Korotkov	3a1a7c5a70	Revert "Put enable_self_join_elimination into postgresql.conf.sample" This reverts commit c2d329260cd8. Reported-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/D292EB44-806E-439A-82A4-491A1BA59E7A%40yesql.se	2025-04-06 14:30:20 +03:00
Alexander Korotkov	c2d329260c	Put enable_self_join_elimination into postgresql.conf.sample fc069a3a6319 implements Self-Join Elimination (SJE) and provides a new GUC variable: enable_self_join_elimination. This commit adds enable_self_join_elimination to the postgresql.conf.sample, as it was forgotten in the original commit. Discussion: https://postgr.es/m/CAHewXN%3D%2Bghd6O6im46q7j2u6c3H6vkXtXmF%3D_v4CfGSnjje8PA%40mail.gmail.com Author: Tender Wang <tndrwang@gmail.com>	2025-04-06 13:24:16 +03:00
John Naylor	3c6e8c1238	Compute CRC32C using AVX-512 instructions where available The previous implementation of CRC32C on x86 relied on the native CRC32 instruction from the SSE 4.2 extension, which operates on up to 8 bytes at a time. We can get a substantial speedup by using carryless multiplication on SIMD registers, processing 64 bytes per loop iteration. Shorter inputs fall back to ordinary CRC instructions. On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs between 64 and 112 bytes, and 3x faster for 256 bytes. The VPCLMULQDQ instruction on 512-bit registers has been available on Intel hardware since 2019 and AMD since 2022. There is an older variant for 128-bit registers, but at least on Zen 2 it performs worse than normal CRC instructions for short inputs. We must now do a runtime check, even for builds that target SSE 4.2. This doesn't matter in practice for WAL (arguably the most critical case), because since commit e2809e3a1 the final computation with the 20-byte WAL header is inlined and unrolled when targeting that extension. Compared with two direct function calls, testing showed equal or slightly faster performance in performing an indirect function call on several dozen bytes followed by inlined instructions on constant input of 20 bytes. The MIT-licensed implementation was generated with the "generate" program from https://github.com/corsix/fast-crc32/ Based on: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" V. Gopal, E. Ozturk, et al., 2009 Co-authored-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Co-authored-by: Paul Amonson <paul.d.amonson@intel.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Matthew Sterrett <matthewsterrett2@gmail.com> (earlier version) Tested-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Tested-by: David Rowley <<dgrowleyml@gmail.com>> (earlier version) Discussion: https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192@BL1PR11MB5304.namprd11.prod.outlook.com Discussion: https://postgr.es/m/PH8PR11MB82869FF741DFA4E9A029FF13FBF72@PH8PR11MB8286.namprd11.prod.outlook.com	2025-04-06 14:04:30 +07:00
Daniel Gustafsson	683df3f4de	Quote filename in error message Project standard is to quote filenames in error and log messages, which commit 2da74d8d640 missed in two error messages. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/20250404.120328.103562371975971823.horikyota.ntt@gmail.com	2025-04-05 22:10:28 +02:00

1 2 3 4 5 ...

61148 Commits