postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-12-27 08:39:28 +08:00

Author	SHA1	Message	Date
Peter Geoghegan	b4af70cb21	Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com	2021-04-05 13:21:44 -07:00
Stephen Frost	6c3ffd697e	Add pg_read_all_data and pg_write_all_data roles A commonly requested use-case is to have a role who can run an unfettered pg_dump without having to explicitly GRANT that user access to all tables, schemas, et al, without that role being a superuser. This address that by adding a "pg_read_all_data" role which implicitly gives any member of this role SELECT rights on all tables, views and sequences, and USAGE rights on all schemas. As there may be cases where it's also useful to have a role who has write access to all objects, pg_write_all_data is also introduced and gives users implicit INSERT, UPDATE and DELETE rights on all tables, views and sequences. These roles can not be logged into directly but instead should be GRANT'd to a role which is able to log in. As noted in the documentation, if RLS is being used then an administrator may (or may not) wish to set BYPASSRLS on the login role which these predefined roles are GRANT'd to. Reviewed-by: Georgios Kokolatos Discussion: https://postgr.es/m/20200828003023.GU29590@tamriel.snowman.net	2021-04-05 13:42:52 -04:00
Fujii Masao	ad8b674922	Shut down transaction tracking at startup process exit. Maxim Orlov reported that the shutdown of standby server could result in the following assertion failure. The cause of this issue was that, when the shutdown caused the startup process to exit, recovery-time transaction tracking was not shut down even if it's already initialized, and some locks the tracked transactions were holding could not be released. At this situation, if other process was invoked and the PGPROC entry that the startup process used was assigned to it, it found such unreleased locks and caused the assertion failure, during the initialization of it. TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))" This commit fixes this issue by making the startup process shut down transaction tracking and release all locks, at the exit of it. Back-patch to all supported branches. Reported-by: Maxim Orlov Author: Fujii Masao Reviewed-by: Maxim Orlov Discussion: https://postgr.es/m/ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru	2021-04-06 02:25:37 +09:00
Alvaro Herrera	6734e80695	Align some terms in arch-dev.sgml to glossary This mostly adds links to the glossary to the existing text, instead of using <firstterm>. Heikki left this out of `29ad6595ef` out of stylistic concerns; these have since been addressed. Author: Jürgen Purtz <juergen@purtz.de> Discussion: https://postgr.es/m/67d7240f-8596-83fc-5e15-af06c128a0f5@purtz.de	2021-04-05 11:46:13 -04:00
Peter Eisentraut	a63dd8afe2	Renumber cursor option flags Move the planner-control flags up so that there is more room for parse options. Some pending patches need some room there, so do this renumbering separately so that there is less potential for conflicts.	2021-04-05 09:10:27 +02:00
Michael Paquier	9f6f1f9b8e	Fix typo in collationcmds.c Introduced by `51e225d`. Author: Anton Voloshin Discussion: https://postgr.es/m/05477da0-703c-7de7-998c-5879738e8f39@postgrespro.ru	2021-04-05 11:18:12 +09:00
Michael Paquier	c50624cdd2	Refactor all TAP test suites doing connection checks This commit refactors more TAP tests to adapt with the recent introduction of connect_ok() and connect_fails() in PostgresNode, introduced by `0d1a3343`. This changes the following test suites to use the same code paths for connection checks: - Kerberos - LDAP - SSL - Authentication Those routines are extended to be able to handle optional parameters that are set depending on each suite's needs, as of: - custom SQL query. - expected stderr matching pattern. - expected stdout matching pattern. The new design is extensible with more parameters, and there are some plans for those routines in the future with checks based on the contents of the backend logs. Author: Jacob Champion, Michael Paquier Discussion: https://postgr.es/m/d17b919e27474abfa55d97786cb9cfadfe2b59e9.camel@vmware.com	2021-04-05 10:13:57 +09:00
Tom Lane	dfc843d465	Fix more confusion in SP-GiST. spg_box_quad_leaf_consistent unconditionally returned the leaf datum as leafValue, even though in its usage for poly_ops that value is of completely the wrong type. In versions before 12, that was harmless because the core code did nothing with leafValue in non-index-only scans ... but since commit `2a6368343`, if we were doing a KNN-style scan, spgNewHeapItem would unconditionally try to copy the value using the wrong datatype parameters. Said copying is a waste of time and space if we're not going to return the data, but it accidentally failed to fail until I fixed the datatype confusion in `ac9099fc1`. Hence, change spgNewHeapItem to not copy the datum unless we're actually going to return it later. This saves cycles and dodges the question of whether lossy opclasses are returning the right type. Also change spg_box_quad_leaf_consistent to not return data that might be of the wrong type, as insurance against somebody introducing a similar bug into the core code in future. It seems like a good idea to back-patch these two changes into v12 and v13, although I'm afraid to change spgNewHeapItem's mistaken idea of which datatype to use in those branches. Per buildfarm results from `ac9099fc1`. Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us	2021-04-04 17:57:07 -04:00
Tom Lane	ac9099fc1d	Fix confusion in SP-GiST between attribute type and leaf storage type. According to the documentation, the attType passed to the opclass config function (and also relied on by the core code) is the type of the heap column or expression being indexed. But what was actually being passed was the type stored for the index column. This made no difference for user-defined SP-GiST opclasses, because we weren't allowing the STORAGE clause of CREATE OPCLASS to be used, so the two types would be the same. But it's silly not to allow that, seeing that the built-in poly_ops opclass has a different value for opckeytype than opcintype, and that if you want to do lossy storage then the types must really be different. (Thus, user-defined opclasses doing lossy storage had to lie about what type is in the index.) Hence, remove the restriction, and make sure that we use the input column type not opckeytype where relevant. For reasons of backwards compatibility with existing user-defined opclasses, we can't quite insist that the specified leafType match the STORAGE clause; instead just add an amvalidate() warning if they don't match. Also fix some bugs that would only manifest when trying to return index entries when attType is different from attLeafType. It's not too surprising that these have not been reported, because the only usual reason for such a difference is to store the leaf value lossily, rendering index-only scans impossible. Add a src/test/modules module to exercise cases where attType is different from attLeafType and yet index-only scan is supported. Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us	2021-04-04 14:28:57 -04:00
Tomas Vondra	d9c5b9a9ee	Fix bug in brin_minmax_multi_union When calling sort_expanded_ranges() we need to remember the return value, because the function sorts and also deduplicates the ranges. So the number of ranges may decrease. brin_minmax_multi_union failed to do that, which resulted in crashes due to bogus ranges (equal minval/maxval but not marked as compacted). Reported-by: Jaime Casanova Discussion: https://postgr.es/m/20210404052550.GA4376%40ahch-to	2021-04-04 19:36:12 +02:00
Tomas Vondra	4908684dda	Add regression test for minmax-multi macaddr8 type The regression test for BRIN minmax-multi opclasses tested almost all supported data types, with the exception of macaddr8. So this adds it.	2021-04-04 19:26:55 +02:00
Tomas Vondra	1dad2a5ea3	Fix order of parameters in BRIN minmax-multi calls The BRIN minmax-multi consistent function incorrectly assumed it can lookup an operator, and then swap the arguments to get the commutator. For example <(a,b) would be called as <(b,a) to get >(a,b). This works when the arguments are of the same type, but with cross-type opclasses this fails. We can't swap <(float4,float8) arguments, for example. Fixed by passing arguments in the right order. Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com	2021-04-04 19:25:41 +02:00
Tomas Vondra	e1fbe1181c	Fix BRIN minmax-multi distance for inet type The distance calculation ignored the mask, unlike the inet comparator, which resulted in negative distance in some cases. Fixed by applying the mask in brin_minmax_multi_distance_inet. I've considered simply calling inetmi() to calculate the delta, but that does not consider mask either. Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/1a0a7b9d-9bda-e3a2-7fa4-88f15042a051%40enterprisedb.com	2021-04-04 19:23:32 +02:00
Tomas Vondra	7262f2421a	Fix BRIN minmax-multi distance for timetz type The distance calculation ignored the time zone, so the result of (b-a) might have ended negative even if (b > a). Fixed by considering the time zone difference. Reported-by: Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com	2021-04-04 19:22:23 +02:00
Tomas Vondra	2b10e0e3c2	Fix BRIN minmax-multi distance for interval type The distance calculation for interval type was treating months as having 31 days, which is inconsistent with the interval comparator (using 30 days). Due to this it was possible to get negative distance (b-a) when (a<b), trigerring an assert. Fixed by adopting the same logic as interval_cmp_value. Reported-by: Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5jKH0Xhneau2mNftNPtTy-BVgQfXc8zQkEvRvBHfeUThQ%40mail.gmail.com	2021-04-04 19:19:51 +02:00
Tom Lane	55873a00e3	Improve psql's behavior when the editor is exited without saving. When editing the previous query buffer, if the editor is exited without modifying the temp file then clear the query buffer, rather than re-loading (and probably re-executing) the previous query buffer. This reduces the probability of accidentally re-executing something you didn't intend to. Similarly, in "\e file", if the file isn't actually modified then don't load it into the query buffer. And in "\ef" and "\ev", if no changes are made then clear the query buffer instead of loading the function or view definition into it. Cases where we fail to invoke the editor at all, or it returns a nonzero status, are treated like the no-file-modification case. Laurenz Albe, reviewed by Jacob Champion Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel@cybertec.at	2021-04-03 17:38:31 -04:00
Andres Freund	225a22b19e	Improve efficiency of wait event reporting, remove proc.h dependency. pgstat_report_wait_start() and pgstat_report_wait_end() required two conditional branches so far. One to check if MyProc is NULL, the other to check if pgstat_track_activities is set. As wait events are used around comparatively lightweight operations, and are inlined (reducing branch predictor effectiveness), that's not great. The dependency on MyProc has a second disadvantage: Low-level subsystems, like storage/file/fd.c, report wait events, but architecturally it is preferable for them to not depend on inter-process subsystems like proc.h (defining PGPROC). After this change including pgstat.h (nor obviously its sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC related headers anymore. These goals, efficiency and abstraction, are achieved by having pgstat_report_wait_start/end() not interact with MyProc, but instead a new my_wait_event_info variable. At backend startup it points to a local variable, removing the need to check for MyProc being NULL. During process initialization my_wait_event_info is redirected to MyProc->wait_event_info. At shutdown this is reversed. Because wait event reporting now does not need to know about where the wait event is stored, it does not need to know about PGPROC anymore. The removal of the branch for checking pgstat_track_activities is simpler: Don't check anymore. The cost due to the branch are often higher than the store - and even if not, pgstat_track_activities is rarely disabled. The main motivator to commit this work now is that removing the (indirect) pgproc.h include from pgstat.h simplifies a patch to move statistics reporting to shared memory (which still has a chance to get into 14). Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de	2021-04-03 12:03:45 -07:00
Andres Freund	e1025044cd	Split backend status and progress related functionality out of pgstat.c. Backend status (supporting pg_stat_activity) and command progress (supporting pg_stat_progress*) related code is largely independent from the rest of pgstat.[ch] (supporting views like pg_stat_all_tables that accumulate data over time). See also `a333476b92`. This commit doesn't rename the function names to make the distinction from the rest of pgstat_ clearer - that'd be more invasive and not clearly beneficial. If we were to decide to do such a rename at some point, it's better done separately from moving the code as well. Robert's review was of an earlier version. Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de	2021-04-03 11:42:52 -07:00
Michael Paquier	8d3a4c3eae	Use more verbose matching patterns for errors in SSL TAP tests The TAP tests of src/test/ssl/ have been using rather generic matching patterns to check some failure scenarios, like "SSL error" or just "FATAL". These have been introduced in `081bfc1`. Those messages are not wrong per se, but when working on the integration of new SSL libraries it becomes hard to know if those errors are legit or not, and existing scenarios may fail in incorrect ways. This commit makes all those messages more verbose by adding the information generated by OpenSSL. Fortunately, the same error messages are used for all the versions supported on HEAD (checked that after running the tests from 1.0.1 to 1.1.1), so the change is straight-forward. Reported-by: Jacob Champion, Álvaro Herrera Discussion: https://postgr.es/m/YGU3AxQh0zBMMW8m@paquier.xyz	2021-04-03 20:49:08 +09:00
Michael Paquier	e6bdfd9700	Refactor HMAC implementations Similarly to the cryptohash implementations, this refactors the existing HMAC code into a single set of APIs that can be plugged with any crypto libraries PostgreSQL is built with (only OpenSSL currently). If there is no such libraries, a fallback implementation is available. Those new APIs are designed similarly to the existing cryptohash layer, so there is no real new design here, with the same logic around buffer bound checks and memory handling. HMAC has a dependency on cryptohashes, so all the cryptohash types supported by cryptohash{_openssl}.c can be used with HMAC. This refactoring is an advantage mainly for SCRAM, that included its own implementation of HMAC with SHA256 without relying on the existing crypto libraries even if PostgreSQL was built with their support. This code has been tested on Windows and Linux, with and without OpenSSL, across all the versions supported on HEAD from 1.1.1 down to 1.0.1. I have also checked that the implementations are working fine using some sample results, a custom extension of my own, and doing cross-checks across different major versions with SCRAM with the client and the backend. Author: Michael Paquier Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz	2021-04-03 17:30:49 +09:00
Andres Freund	1d9c5d0ce2	Do not rely on pgstat.h to indirectly include storage/ headers. An upcoming patch might remove the (now indirect) proc.h include (which in turn includes other headers), and it's cleaner for the modified files to include their dependencies directly anyway... Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de	2021-04-02 20:02:47 -07:00
Andres Freund	a333476b92	Split wait event related code from pgstat.[ch] into wait_event.[ch]. The wait event related code is independent from the rest of the pgstat.[ch] code, of nontrivial size and changes on a regular basis. Put it into its own set of files. As there doesn't seem to be a good pre-existing directory for code like this, add src/backend/utils/activity. Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de	2021-04-02 20:02:26 -07:00
David Rowley	1267d9862f	Remove useless Asserts in Result Cache code Testing if an unsigned variable is >= 0 is pretty pointless. There's likely enough code in remove_cache_entry() to verify the cache memory accounting is correct in assert enabled builds. These Asserts were not adding much extra cover, even if they had been checking >= 0 on a signed variable. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210402204734.6mo3nfacnljlicgn@alap3.anarazel.de	2021-04-03 10:41:43 +13:00
Bruce Momjian	84bc2b1752	Use macro MONTHS_PER_YEAR instead of '12' in /ecpg/pgtypeslib All other places already use MONTHS_PER_YEAR appropriately. Backpatch-through: 9.6	2021-04-02 16:42:38 -04:00
Thomas Munro	c30f54ad73	Detect POLLHUP/POLLRDHUP while running queries. Provide a new GUC check_client_connection_interval that can be used to check whether the client connection has gone away, while running very long queries. It is disabled by default. For now this uses a non-standard Linux extension (also adopted by at least one other OS). POLLRDHUP is not defined by POSIX, and other OSes don't have a reliable way to know if a connection was closed without actually trying to read or write. In future we might consider trying to send a no-op/heartbeat message instead, but that could require protocol changes. Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Maksim Milyutin <milyutinma@gmail.com> Reviewed-by: Tsunakawa, Takayuki/綱川貴之 <tsunakawa.takay@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version) Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru	2021-04-03 09:02:41 +13:00
Joe Conway	174edbe9f9	Clarify documentation of RESET ROLE Command-line options, or previous "ALTER (ROLE\|DATABASE) ... SET ROLE ..." commands, can change the value of the default role for a session. In the presence of one of these, RESET ROLE will change the current user identifier to the default role rather than the session user identifier. Fix the documentation to reflect this reality. Backpatch to all supported versions. Author: Nathan Bossart Reviewed-By: Laurenz Albe, David G. Johnston, Joe Conway Reported by: Nathan Bossart Discussion: https://postgr.es/m/flat/925134DB-8212-4F60-8AB1-B1231D750CB4%40amazon.com Backpatch-through: 9.6	2021-04-02 13:48:42 -04:00
Fujii Masao	2eb1fc8b1a	pg_checksums: Fix progress reporting. pg_checksums uses two counters, total size and current size, to calculate the progress. Previously the progress that pg_checksums reported could not reach 100% at the end. The cause of this issue was that the sizes of only pages excluding new ones in each file were counted as the current size while the size of each file is counted as the total size. That is, the total size of all new pages could be reported as the difference between the total size and current size. This commit fixes this issue by making pg_checksums count the sizes of all pages including new ones in each file as the current size. Back-patch to v12 where progress reporting was added to pg_checksums. Reported-by: Shinya Kato Author: Shinya Kato Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/TYAPR01MB289656B1ACA0A5E7CAD07BE3C47A9@TYAPR01MB2896.jpnprd01.prod.outlook.com	2021-04-03 00:07:00 +09:00
Tom Lane	53aafdb9ff	Strip file names reported in error messages on Windows, too. Commit `dd136052b` established a policy that error message FILE items should include only the base name of the reporting source file, for uniformity and succinctness. We now observe that some Windows compilers use backslashes in __FILE__ strings, so truncate at backslashes as well. This is expected to fix some platform variation in the results of the new libpq_pipeline test module. Discussion: https://postgr.es/m/3650140.1617372290@sss.pgh.pa.us	2021-04-02 10:43:54 -04:00
Andrew Dunstan	1877c9ac3a	Fix typo in `6d7a6feac4` Per gripe from Daniel Gustafsson	2021-04-02 10:29:58 -04:00
Fujii Masao	b1be3074ac	postgres_fdw: Add option to control whether to keep connections open. This commit adds a new option keep_connections that controls whether postgres_fdw keeps the connections to the foreign server open so that the subsequent queries can re-use them. This option can only be specified for a foreign server. The default is on. If set to off, all connections to the foreign server will be discarded at the end of transaction. Closed connections will be re-established when they are necessary by future queries using a foreign table. This option is useful, for example, when users want to prevent the connections from eating up the foreign servers connections capacity. Author: Bharath Rupireddy Reviewed-by: Alexey Kondratov, Vignesh C, Fujii Masao Discussion: https://postgr.es/m/CALj2ACVvrp5=AVp2PupEm+nAC8S4buqR3fJMmaCoc7ftT0aD2A@mail.gmail.com	2021-04-02 19:45:42 +09:00
Peter Eisentraut	9c5f67fd62	Add support for NullIfExpr in eval_const_expressions Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com> Discussion: https://www.postgresql.org/message-id/flat/7ea5ce773bbc4eea9ff1a381acd3b102@G08CNEXMBPEKD05.g08.fujitsu.local	2021-04-02 11:01:49 +02:00
Fujii Masao	96bdb7e19d	Fix pgstat_report_replslot() to use proper data types for its arguments. The caller of pgstat_report_replslot() passes int64 values to the function. Also the function stores those values in PgStat_Counter (i.e., int64) fields of PgStat_MsgReplSlot struct. But previously the function used "int" as the data types of some arguments for those values, which could lead to the overflow of values. To avoid this risk, this commit fixes pgstat_report_replslot() to use PgStat_Counter type for the arguments. Since they are the statistics counters, PgStat_Counter, the data type used for counters, is used for them instead of int64. Reported-by: Vignesh C Author: Vignesh C Reviewed-by: Jeevan Ladhe, Fujii Masao Discussion: https://postgr.es/m/CALDaNm080OpG=ZwOb0i8EyChH5SyHAMFWJCKaKTXmrfvJLbgaA@mail.gmail.com	2021-04-02 17:27:31 +09:00
Michael Paquier	6fb66c268d	doc: Clarify how to generate backup files with non-exclusive backups The current instructions describing how to write the backup_label and tablespace_map files are confusing. For example, opening a file in text mode on Windows and copy-pasting the file's contents would result in a failure at recovery because of the extra CRLF characters generated. The documentation was not stating that clearly, and per discussion this is not considered as a supported scenario. This commit extends a bit the documentation to mention that it may be required to open the file in binary mode before writing its data. Reported-by: Wang Shenhao Author: David Steele Reviewed-by: Andrew Dunstan, Magnus Hagander Discussion: https://postgr.es/m/8373f61426074f2cb6be92e02f838389@G08CNEXMBPEKD06.g08.fujitsu.local Backpatch-through: 9.6	2021-04-02 16:37:00 +09:00
Fujii Masao	98e5bd103f	Fix typos in comments. Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoA1YL7t0nzVSEySx6zOaE7xO3r0jyu8hkitGL2_XbaMxQ@mail.gmail.com	2021-04-02 16:26:36 +09:00
David Rowley	a4fac4ffe8	Attempt to fix unstable Result Cache regression tests force_parallel_mode = regress is causing a few more problems than I thought. It seems that both the leader and the single worker can contribute to the execution. I had mistakenly thought that only the worker process would do any work. Since it's not deterministic as to which of the two processes will get a chance to work on the plan, it seems just better to disable force_parallel_mode for these tests. At least doing this seems better than changing to EXPLAIN only rather than EXPLAIN ANALYZE. Additionally, I overlooked the fact that the number of executions of the sub-plan below a Result Cache will execute a varying number of times depending on cache eviction. 32-bit machines will use less memory and evict fewer tuples from the cache. That results in the subnode being executed fewer times on 32-bit machines. Let's just blank out the number of loops in each node.	2021-04-02 15:25:38 +13:00
Bruce Momjian	2bda93f813	doc: mention that intervening major releases can be skipped Also mention that you should read the intervening major releases notes. This change was also applied to the website. Discussion: https://postgr.es/m/20210330144949.GA8259@momjian.us Backpatch-through: 9.6	2021-04-01 21:17:33 -04:00
David Rowley	9eacee2e62	Add Result Cache executor node (take 2) Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	2021-04-02 14:10:56 +13:00
Michael Paquier	fe246d1c11	Improve stability of test with vacuum_truncate in reloptions.sql This test has been using a simple VACUUM with pg_relation_size() to check if a relation gets physically truncated or not, but forgot the fact that some concurrent activity, like checkpoint buffer writes, could cause some pages to be skipped. The second test enabling vacuum_truncate could fail, seeing a non-empty relation. The first test would not have failed, but could finish by testing a behavior different than the one aimed for. Both tests gain a FREEZE option, to make the vacuums more aggressive and prevent page skips. This is similar to the issues fixed in `c2dc1a7`. Author: Arseny Sher Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/87tuotr2hh.fsf@ars-thinkpad backpatch-through: 12	2021-04-02 09:44:42 +09:00
Tom Lane	1ebdec8c03	Rethink handling of pass-by-value leaf datums in SP-GiST. The existing convention in SP-GiST is that any pass-by-value datatype is stored in Datum representation, i.e. it's of width sizeof(Datum) even when typlen is less than that. This is okay, or at least it's too late to change it, for prefix datums and node-label datums in inner (upper) tuples. But it's problematic for leaf datums, because we'd prefer those to be stored in Postgres' standard on-disk representation so that we can easily extend leaf tuples to carry additional "included" columns. I believe, however, that we can get away with just up and changing that. This would be an unacceptable on-disk-format break, but there are two big mitigating factors: 1. It seems quite unlikely that there are any SP-GiST opclasses out there that use pass-by-value leaf datatypes. Certainly none of the ones in core do, nor has codesearch.debian.net heard of any. Given what SP-GiST is good for, it's hard to conceive of a use-case where the leaf-level values would be both small and fixed-width. (As an example, if you wanted to index text values with the leaf level being just a byte, then every text string would have to be represented with one level of inner tuple per preceding byte, which would be horrendously space-inefficient and slow to access. You always want to use as few inner-tuple levels as possible, leaving as much as possible in the leaf values.) 2. Even granting that you have such an index, this change only breaks things on big-endian machines. On little-endian, the high order bytes of the Datum format will now just appear to be alignment padding space. So, change the code to store pass-by-value leaf datums in their usual on-disk form. Inner-tuple datums are not touched. This is extracted from a larger patch that intends to add support for "included" columns. I'm committing it separately for visibility in our commit logs. Pavel Borisov and Tom Lane, reviewed by Andrey Borodin Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com	2021-04-01 17:55:17 -04:00
Stephen Frost	c9c41c7a33	Rename Default Roles to Predefined Roles The term 'default roles' wasn't quite apt as these roles aren't able to be modified or removed after installation, so rename them to be 'Predefined Roles' instead, adding an entry into the newly added Obsolete Appendix to help users of current releases find the new documentation. Bruce Momjian and Stephen Frost Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net	2021-04-01 15:32:06 -04:00
Alvaro Herrera	a68a894f01	Fix setvbuf()-induced crash in libpq_pipeline Windows doesn't like setvbuf(..., _IOLBF) and crashes if you use it, which has been causing the libpq_pipeline failures all along ... and our own port.h has known about it for a long time: it offers PG_IOLBF that's defined to _IONBF on that platform. Follow its advice. While at it, get rid of a bogus bitshift that used a constant of the wrong size. Decorate the constant as LL to fix. While at it, remove a pointless addition that only confused matters. All as diagnosed by Tom Lane. Discussion: https://postgr.es/m/3458958.1617302154@sss.pgh.pa.us	2021-04-01 16:25:51 -03:00
Robert Haas	3b6c1259f9	amcheck: Fix verify_heapam's tuple visibility checking rules. We now follow the order of checks from HeapTupleSatisfies* more closely to avoid coming to erroneous conclusions. Mark Dilger and Robert Haas Discussion: http://postgr.es/m/CA+Tgmob6sii0yTvULYJ0Vq4w6ZBmj7zUhddL3b+SKDi9z9NA7Q@mail.gmail.com	2021-04-01 13:36:28 -04:00
Tom Lane	ec03f2df17	Fix pg_restore's misdesigned code for detecting archive file format. Despite the clear comments pointing out that the duplicative code segments in ReadHead() and _discoverArchiveFormat() needed to be in sync, they were not: the latter did not bother to apply any of the sanity checks in the former. We'd missed noticing this partly because none of those checks would fail in scenarios we customarily test, and partly because the oversight would be masked if both segments execute, which they would in cases other than needing to autodetect the format of a non-seekable stdin source. However, in a case meeting all these requirements --- for example, trying to read a newer-than-supported archive format from non-seekable stdin --- pg_restore missed applying the version check and would likely dump core or otherwise misbehave. The whole thing is silly anyway, because there seems little reason to duplicate the logic beyond the one-line verification that the file starts with "PGDMP". There seems to have been an undocumented assumption that multiple major formats (major enough to require separate reader modules) would nonetheless share the first half-dozen fields of the custom-format header. This seems unlikely, so let's fix it by just nuking the duplicate logic in _discoverArchiveFormat(). Also get rid of the pointless attempt to seek back to the start of the file after successful autodetection. That wastes cycles and it means we have four behaviors to verify not two. Per bug #16951 from Sergey Koposov. This has been broken for decades, so back-patch to all supported versions. Discussion: https://postgr.es/m/16951-a4dd68cf0de23048@postgresql.org	2021-04-01 13:34:16 -04:00
Peter Eisentraut	91e7c90329	Fix internal extract(timezone_minute) formulas Through various refactorings over time, the extract(timezone_minute from time with time zone) and extract(timezone_minute from timestamp with time zone) implementations ended up with two different but equally nonsensical formulas by using SECS_PER_MINUTE and MINS_PER_HOUR interchangeably. Since those two are of course both the same number, the formulas do work, but for readability, fix them to be semantically correct.	2021-04-01 16:12:53 +02:00
Alvaro Herrera	dde1a35aee	libpq_pipeline: Must strdup(optarg) to avoid crash I forgot to strdup() when processing argv[]. Apparently many platforms hide this mistake from users, but in those that don't you may get a program crash. Repair. Per buildfarm member drongo, which is the only one in all the buildfarm manifesting a problem here. While at it, move "numrows" processing out of the line of special cases, and make it getopt's -r instead. (A similar thing could be done to 'conninfo', but current use of the program doesn't warrant spending time on that -- nowhere else we use conninfo in so simplistic a manner.) Discussion: https://postgr.es/m/20210401124850.GA19247@alvherre.pgsql	2021-04-01 10:26:20 -03:00
Heikki Linnakangas	f82de5c46b	Do COPY FROM encoding conversion/verification in larger chunks. This gives a small performance gain, by reducing the number of calls to the conversion/verification function, and letting it work with larger inputs. Also, reorganizing the input pipeline makes it easier to parallelize the input parsing: after the input has been converted to the database encoding, the next stage of finding the newlines can be done in parallel, because there cannot be any newline chars "embedded" in multi-byte characters in the encodings that we support as server encodings. This changes behavior in one corner case: if client and server encodings are the same single-byte encoding (e.g. latin1), previously the input would not be checked for zero bytes ('\0'). Any fields containing zero bytes would be truncated at the zero. But if encoding conversion was needed, the conversion routine would throw an error on the zero. After this commit, the input is always checked for zeros. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	2021-04-01 12:23:40 +03:00
Heikki Linnakangas	ea1b99a661	Add 'noError' argument to encoding conversion functions. With the 'noError' argument, you can try to convert a buffer without knowing the character boundaries beforehand. The functions now need to return the number of input bytes successfully converted. This is is a backwards-incompatible change, if you have created a custom encoding conversion with CREATE CONVERSION. This adds a check to pg_upgrade for that, refusing the upgrade if there are any user-defined encoding conversions. Custom conversions are very rare, there are no commonly used extensions that I know of that uses that feature. No other objects can depend on conversions, so if you do have one, you can fairly easily drop it before upgrading, and recreate it after the upgrade with an updated version. Add regression tests for built-in encoding conversions. This doesn't cover every conversion, but it covers all the internal functions in conv.c that are used to implement the conversions. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	2021-04-01 11:45:22 +03:00
Peter Eisentraut	e2639a767b	Make extract(timetz) tests a bit more interesting Use a time zone offset with nonzero minutes to make the timezone_minute test meaningful.	2021-04-01 09:52:03 +02:00
Michael Paquier	ffd3391ea9	doc: Clarify use of ACCESS EXCLUSIVE lock in various sections Some sections of the documentation used "exclusive lock" to describe that an ACCESS EXCLUSIVE lock is taken during a given operation. This can be confusing to the reader as ACCESS SHARE is allowed with an EXCLUSIVE lock is used, but that would not be the case with what is described on those parts of the documentation. Author: Greg Rychlewski Discussion: https://postgr.es/m/CAKemG7VptD=7fNWckFMsMVZL_zzvgDO6v2yVmQ+ZiBfc_06kCQ@mail.gmail.com Backpatch-through: 9.6	2021-04-01 15:28:37 +09:00
Amit Kapila	4778826532	Ensure to send a prepare after we detect concurrent abort during decoding. It is possible that while decoding a prepared transaction, it gets aborted concurrently via a ROLLBACK PREPARED command. In that case, we were skipping all the changes and directly sending Rollback Prepared when we find the same in WAL. However, the downstream has no idea of the GID of such a transaction. So, ensure to send prepare even when a concurrent abort is detected. Author: Ajin Cherian Reviewed-by: Markus Wanner, Amit Kapila Discussion: https://postgr.es/m/f82133c6-6055-b400-7922-97dae9f2b50b@enterprisedb.com	2021-04-01 07:57:34 +05:30

1 2 3 4 5 ...

51167 Commits