postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-11-21 03:13:05 +08:00

Author	SHA1	Message	Date
Peter Geoghegan	5bf748b86b	Enhance nbtree ScalarArrayOp execution. Commit `9e8da0f7` taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit `807a40c5` taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit `a4523c5a`, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com	2024-04-06 11:47:10 -04:00
Tom Lane	ddd9e43a92	Remove obsolete comment in CopyReadLineText(). When this bit of commentary was written, it was alluding to the fact that we looked for newlines and EOD markers in the raw (not yet encoding-converted) input data. We don't do that anymore, preferring to batch the conversion of larger chunks of input and split it into lines later. Hence there's no longer any need for assumptions about the relevant characters being encoding-invariant, and we should remove this comment saying we assume that. Discussion: https://postgr.es/m/1461688.1712347668@sss.pgh.pa.us	2024-04-06 11:16:27 -04:00
John Naylor	a365d9e2e8	Speed up tail processing when hashing aligned C strings, take two After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. This was slower than the unaligned case for very short strings. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. In addition to endianness issues, the previous attempt upset valgrind in the way it computed the mask. Whether by accident or by wisdom, the author's proposed method passes locally with valgrind 3.22. Ants Aasma, with cosmetic adjustments by me Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com	2024-04-06 17:14:28 +07:00
John Naylor	0c25fee359	Teach fasthash_accum to use platform endianness for bytewise loads This function previously used a mix of word-wise loads and bytewise loads. The bytewise loads happened to be little-endian regardless of platform. This in itself is not a problem. However, a future commit will require the same result whether A) the input is loaded as a word with the relevent bytes masked-off, or B) the input is loaded one byte at a time. While at it, improve debuggability of the internal hash state. Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com	2024-04-06 17:14:28 +07:00
Thomas Munro	98f320eb2e	Increase default vacuum_buffer_usage_limit to 2MB. The BAS_VACUUM ring size has been 256kB since commit `d526575f` introduced the mechanism 17 years ago. Commit `1cbbee03` recently made it configurable but retained the traditional default. The correct default size has been debated for years, but 256kB is certainly very small. VACUUM soon needs to write back data it dirtied only 32 blocks ago, which usually requires flushing the WAL. New experiments in prefetching pages for VACUUM exacerbated the problem by crashing into dirty data even sooner. Let's make the default 2MB. That's 1.6% of the default toy buffer pool size, and 0.2% of 1GB, which would be a considered a small shared_buffers setting for a real system these days. Users are still free to set the GUC to a different value. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com	2024-04-06 23:12:03 +13:00
Thomas Munro	3bd8439ed6	Allow BufferAccessStrategy to limit pin count. While pinning extra buffers to look ahead, users of strategies are in danger of using too many buffers. For some strategies, that means "escaping" from the ring, and in others it means forcing dirty data to disk very frequently with associated WAL flushing. Since external code has no insight into any of that, allow individual strategy types to expose a clamp that should be applied when deciding how many buffers to pin at once. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com	2024-04-06 23:11:45 +13:00
John Naylor	f956ecd035	Convert uses of hash_string_pointer to fasthash equivalent Remove duplicate hash_string_pointer() function definitions by creating a new inline function hash_string() for this purpose. This has the added advantage of avoiding strlen() calls when doing hash lookup. It's not clear how many of these are perfomance-sensitive enough to benefit from that, but the simplification is worth it on its own. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com	2024-04-06 12:20:40 +07:00
John Naylor	db17594ad7	Add macro to disable address safety instrumentation fasthash_accum_cstring_aligned() uses a technique, found in various strlen() implementations, to detect a string's NUL terminator by reading a word at at time. That triggers failures when testing with "-fsanitize=address", at least with frontend code. To enable using this function anywhere, add a function attribute macro to disable such testing. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw%40mail.gmail.com	2024-04-06 12:20:40 +07:00
John Naylor	4b968e2027	Fix incorrect return type fasthash32() calculates a 32-bit hashcode, but the return type was uint64. Change to uint32. Noted by Jeff Davis Discussion: https://postgr.es/m/b16c93e6c736a422d4de668343515375664eb05d.camel%40j-davis.com	2024-04-06 12:20:40 +07:00
Thomas Munro	aa1e8c2064	Improve read_stream.c's fast path. The "fast path" for well cached scans that don't do any I/O was accidentally coded in a way that could only be triggered by pg_prewarm's usage pattern, which starts out with a higher distance because of the flags it passes in. We want it to work for streaming sequential scans too, once that patch is committed. Adjust. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com	2024-04-06 18:00:35 +13:00
Andres Freund	9e7386924e	Fix headerscheck violation introduced in `f8ce4ed78c` Per ci.	2024-04-05 14:27:45 -07:00
Andrew Dunstan	c3e60f3d7e	Silence some compiler warnings in commit `3311ea86ed` Per report from Nathan Bossart	2024-04-05 16:08:40 -04:00
Robert Haas	55a5ee30cd	Fix incorrect calculation in BlockRefTableEntryGetBlocks. The previous formula was incorrect in the case where the function's nblocks argument was a multiple of BLOCKS_PER_CHUNK, which happens whenever a relation segment file is exactly 512MB or exactly 1GB in length. In such cases, the formula would calculate a stop_offset of 0 rather than 65536, resulting in modified blocks in the second half of a 1GB file, or all the modified blocks in a 512MB file, being omitted from the incremental backup. Reported off-list by Tomas Vondra and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com	2024-04-05 13:41:19 -04:00
Tomas Vondra	079d94ab34	Check HAVE_COPY_FILE_RANGE before calling copy_file_range Fix a mistake in `ac81101551` - write_reconstructed_file() called copy_file_range() without properly checking HAVE_COPY_FILE_RANGE. Reported by several macOS machines. Also reported by cfbot, but I missed that issue before commit.	2024-04-05 19:38:20 +02:00
Tomas Vondra	ac81101551	Allow using copy_file_range in write_reconstructed_file This commit allows using copy_file_range() for efficient combining of data from multiple files, instead of simply reading/writing the blocks. Depending on the filesystem and other factors (size of the increment, distribution of modified blocks etc.) this may be faster than the block-by-block copy, but more importantly it enables various features provided by CoW filesystems. If a checksum needs to be calculated for the file, the same strategy as when copying whole files is used - copy_file_range is used to copy the blocks, but the file is also read for the checksum calculation. While the checksum calculation is rarely needed when cloning whole files, when reconstructing the files from multiple backups it needs to happen almost always (the only exception is when the user specified --no-manifest). Author: Tomas Vondra Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 19:19:36 +02:00
Alvaro Herrera	b8b37e41ba	Make libpqsrv_cancel's return const char , not char Per headerscheck's C++ check. Discussion: https://postgr.es/m/372769.1712179784@sss.pgh.pa.us	2024-04-05 18:23:10 +02:00
Tomas Vondra	8e392595e5	Remove unused variable in checksum_file() The 'offset' variable was set but otherwise unused. Per buildfarm animals with clang, e.g. sifaka and longlin.	2024-04-05 18:13:15 +02:00
Tomas Vondra	f8ce4ed78c	Allow copying files using clone/copy_file_range Adds --clone/--copy-file-range options to pg_combinebackup, to allow copying files using file cloning or copy_file_range(). These methods may be faster than the standard block-by-block copy, but the main advantage is that they enable various features provided by CoW filesystems. This commit only uses these copy methods for files that did not change and can be copied as a whole from a single backup. These new copy methods may not be available on all platforms, in which case the command throws an error (immediately, even if no files would be copied as a whole). This early failure seems better than failing later when trying to copy the first file, after performing a lot of work on earlier files. If the requested copy method is available, but a checksum needs to be recalculated (e.g. because of a different checksum type), the file is still copied using the requested method, but it is also read for the checksum calculation. Depending on the filesystem this may be more expensive than just performing the simple copy, but it does enable the CoW benefits. Initial patch by Jakub Wartak, various reworks and improvements by me. Author: Tomas Vondra, Jakub Wartak Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 18:01:32 +02:00
Tom Lane	3c5ff36aba	Suppress "variable may be used uninitialized" warning. Buildfarm member caiman is showing this, which surprises me because it's very late-model gcc (14.0.1) and ought to be smart enough to know that elog(ERROR) doesn't return. But we're likely to see the same from stupider compilers too, so add a dummy initialization in our usual style.	2024-04-05 10:58:30 -04:00
Robert Haas	fe8eaa5442	docs: Merge separate chapters on built-in index AMs into one. The documentation index is getting very long, which makes it hard to find things. Since these chapters are all very similar in structure and content, merging them is a natural way of reducing the size of the toplevel index. Rather than actually combining all of the SGML into a single file, keep one file per <sect1>, and add a glue file that includes all of them. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com	2024-04-05 10:34:04 -04:00
Tomas Vondra	10e3226ba1	Align blocks in incremental backups to BLCKSZ Align blocks stored in incremental files to BLCKSZ, so that the incremental backups work well with CoW filesystems. The header of the incremental file is padded with \0 to a multiple of BLCKSZ, so that the block data (also BLCKSZ) is aligned to BLCKSZ. The padding is added only to files containing block data, so files with just the header remain small. This adds a bit of extra space, but as the number of blocks increases the overhead gets negligible very quickly. And as the padding is \0 bytes, it does compress extremely well. The alignment is important for CoW filesystems that usually require the blocks to be aligned to filesystem page size for features like block sharing, deduplication etc. to work well. With the variable sized header the blocks in the increments were not aligned at all, negating the benefits of the CoW filesystems. This matters even for non-CoW filesystems, for example when placed on a RAID array. If the block is not aligned, it may easily span multiple devices, causing read and write amplification. It might be better to align the blocks to the filesystem page, not BLCKSZ, but we have no good way to determine that. Even if we determine the page size at the time of taking the backup, the backup may move. For now the BLCKSZ seems sufficient - the filesystem page is usually 4K, so the default BLCKSZ (8K by default) is aligned to that. Author: Tomas Vondra Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 16:30:01 +02:00
Alvaro Herrera	ee1cbe806d	Operate XLogCtl->log{Write,Flush}Result with atomics This removes the need to hold both the info_lck spinlock and WALWriteLock to update them. We use stock atomic write instead, with WALWriteLock held. Readers can use atomic read, without any locking. This allows for some code to be reordered: some places were a bit contorted to avoid repeated spinlock acquisition, but that's no longer a concern, so we can turn them to more natural coding. Some further changes are possible (maybe to performance wins), but in this commit I did rather minimal ones only, to avoid increasing the blast radius. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql	2024-04-05 16:14:39 +02:00
Amit Kapila	6f132ed693	Allow synced slots to have their inactive_since. This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com	2024-04-05 09:48:49 +05:30
Michael Paquier	f98dbdeb51	Add "ABI_compatibility" regions to wait_event_names.txt The current design behind the automatic generation of the C code and documentation related to wait events introduced in `fa88928470` does not offer a way to attach new wait events without breaking ABI compatibility, as all the events are forcibly reordered for each section in the input file wait_event_names.txt. Adding new wait events to stable branches is something that has happened in the past, `0b6517a3b7` being a recent example of that with VERSION_FILE_SYNC, so we need a way to generate any C code for wait events while maintaining compatibility on stable branches already released. This commit solves this issue by adding a new region called "ABI_compatibility" (keyword could be updated to something else if someone had a better idea) to each section of wait_event_names.txt, so as one can add new wait events to stable branches in wait_event_names.txt while keeping the code ABI-compatible. "ABI_compatibility" has no impact on the documentation generated: all the wait events of one section are still alphabetically ordered. LWLock and Lock sections generate their C code elsewhere, so they do not need an "ABI_compatibility" region. For example, let's imagine a wait_event_names.txt like that: Section: ClassName - Foo FOO_1 "Waiting in Foo 1" FOO_2 "Waiting in Foo 2" ABI_compatibility: NEW_FOO_1 "Waiting in New Foo 1" NEW_BAR_1 "Waiting in New Bar 1" This results in the following enum, where the events in the ABI region are listed last with the same ordering as in wait_event_names.txt: typedef enum { WAIT_EVENT_FOO_1, WAIT_EVENT_FOO_2, WAIT_EVENT_NEW_FOO_1, WAIT_EVENT_NEW_BAR_1 } WaitEventFoo; New wait events added in stable branches should be added at the end of each ABI_compatibility region, and ABI_compatibility should remain empty on HEAD and unreleased stable branches. This design has been suggested by Noah Misch and me. Reported-by: Noah Misch Author: Bertrand Drouvot Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com	2024-04-05 08:56:52 +09:00
Jeff Davis	e2a2357671	Fix test failures when language environment is not UTF-8. For tests that depend on UTF-8 encoding, force LC_COLLATE=C and LC_CTYPE=C to avoid an encoding mismatch. Reported-by: Thomas Munro Discussion: https://postgr.es/m/CA+hUKGK-ZqV1njkG_=xcCqXh2fcMkz85FTMnhS2opm4ZerH=xw@mail.gmail.com	2024-04-04 16:10:12 -07:00
Robert Haas	e57fe3824e	Fix old, misleading comment for PGRES_POLLING_ACTIVE. The comment implies that we can eventually remove this, but per discussion, we actually don't want to do that ever, in order to maintain compatibility. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com	2024-04-04 16:22:11 -04:00
Robert Haas	12b964d781	Remove reachable call to pg_unreachable(). The loop just before this uses break, not return, so this line is reachable. Commit `cafe105655` introduced this issue. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com	2024-04-04 16:22:11 -04:00
Tom Lane	096a761d68	Fix ecpg's mechanism for detecting unsupported cases in the grammar. ecpg wants to emit a warning if it parses a SQL construct that the backend can parse but will immediately throw a FEATURE_NOT_SUPPORTED error for. The way it was testing for this was to see if the string ERRCODE_FEATURE_NOT_SUPPORTED appeared anywhere in the gram.y code. This is, of course, not nearly good enough, as there are plenty of rules in gram.y that throw that error only conditionally. There was a hack dating to 2008 to suppress the warning in one rule that doesn't even exist anymore, but nothing for other cases we've created since then. End result was that you could get "unsupported feature will be passed to server" warnings while compiling perfectly good SQL code in ecpg. Somehow we'd not heard complaints about this, but it was exposed by the recent addition of an ecpg test for a SQL/JSON construct. To fix, suppress the warning if the rule contains any "if" statement. Manual comparison of gram.y with the generated preproc.y file shows that the warning is now emitted only in rules where it's sensible. This problem has existed for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/603615.1712245382@sss.pgh.pa.us	2024-04-04 15:31:53 -04:00
Tom Lane	332d406140	Further cleanup for recent JSON-related commits. The link commands in test_json_parser/Makefile were a long way shy of a load, as evidenced by buildfarm failures. Model them on pgxs.mk's PROGRAM rule. (Probably we should have put these two test programs in different subdirectories so we could actually use the PROGRAM rule. But I won't question that decision today.)	2024-04-04 13:39:12 -04:00
Tom Lane	2497a669ef	Further cleanup for recent JSON-related commits. Add overlooked .gitignore entries. Fix test_json_parser/Makefile to use the pgxs.mk clean rule instead of fighting it. Suppresses a warning from make, at least for me.	2024-04-04 13:21:25 -04:00
Andrew Dunstan	88620824c2	Tidy up after incremental JSON parser patch Remove junk left over from non-vpath builds. Try to remedy gettext error on some platforms.	2024-04-04 12:41:55 -04:00
Andrew Dunstan	1b00fe30a6	Fix warnings re typedef redefinition in `ea7b4e9a2a` and `3311ea86ed` Per gripe from Tom Lane and the buildfarm	2024-04-04 11:36:26 -04:00
Amit Langote	6f4d63e989	Add missing initialization in transformJsonFuncExpr() `de3600452b` added some code for the new JSON_TABLE_OP to that function but missed to initialize the default_format variable. Reported-by: Erik Rijkers <er@xs4all.nl> Discussion: https://postgr.es/m/254b2fa2-2f6b-a30a-20ee-21f8a2c12a50@xs4all.nl	2024-04-04 22:01:13 +09:00
Amit Langote	2f6e78b061	Fix typo introduced in `6185c9737` Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxGHiU0p0usjh5hnR0_ByZn4tq1FC3eKAtrQgJeKU6W9kw@mail.gmail.com	2024-04-04 20:53:23 +09:00
Amit Langote	de3600452b	Add basic JSON_TABLE() functionality JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in `6185c9737c` are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-04-04 20:20:15 +09:00
Peter Eisentraut	a9d6c38684	pg_upgrade: Fix typo in message	2024-04-04 12:58:57 +02:00
Andrew Dunstan	222e11a10a	Use incremental parsing of backup manifests. This changes the three callers to json_parse_manifest() to use json_parse_manifest_incremental_chunk() if appropriate. In the case of the backend caller, since we don't know the size of the manifest in advance we always call the incremental parser. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Andrew Dunstan	ea7b4e9a2a	Add support for incrementally parsing backup manifests This adds the infrastructure for using the new non-recursive JSON parser in processing manifests. It's important that callers make sure that the last piece of json handed to the incremental manifest parser contains the entire last few lines of the manifest, including the checksum. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Andrew Dunstan	3311ea86ed	Introduce a non-recursive JSON parser This parser uses an explicit prediction stack, unlike the present recursive descent parser where the parser state is represented on the call stack. This difference makes the new parser suitable for use in incremental parsing of huge JSON documents that cannot be conveniently handled piece-wise by the recursive descent parser. One potential use for this will be in parsing large backup manifests associated with incremental backups. Because this parser is somewhat slower than the recursive descent parser, it is not replacing that parser, but is an additional parser available to callers. For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all JSON parsing is done with the non-recursive parser, in which case only trivial regression differences in error messages should be observed. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Peter Eisentraut	585df02b44	Silence meson warning Commit `619bc23a1a` introduced WARNING: Project targets '>=0.54' but uses feature introduced in '0.55.0': Passing executable/found program object to script parameter of add_dist_script Work around that by wrapping the offending line in a meson version check. Author: Tristan Partin <tristan@neon.tech> Discussion: https://www.postgresql.org/message-id/flat/D096Q3NFFVH1.1T5RE4MOO9ZFH%40neon.tech	2024-04-04 11:22:07 +02:00
Etsuro Fujita	dd24098cd6	postgres_fdw: Remove useless ternary expression. There is no case where we would call pgfdw_exec_cleanup_query or pgfdw_exec_cleanup_query_{begin,end} with a NULL query string, so this expression is pointless; remove it and instead add to the latter functions an assertion ensuring the given query string is not NULL. Thinko in commit `815d61fcd`. Discussion: https://postgr.es/m/CAPmGK14mm%2B%3DUjyjoWj_Hu7c%2BQqX-058RFfF%2BqOkcMZ_Nj52v-A%40mail.gmail.com	2024-04-04 17:55:00 +09:00
David Rowley	3a4a3537a9	Secondary refactor of heap scanning functions Similar to `44086b097`, refactor heap scanning functions to be more suitable for the read stream API. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 19:22:45 +13:00
Michael Paquier	2a217c3717	Coordinate emit_log_hook and all log destinations to share the same timeval This would cause the timestamp values used by emit_log_hook and all the other log destinations to differ, because the timestamps are reset before sending the logs to the server and after calling the hook. This change matters for emit_log_hook when generating log information with 'n' or 'm' in log_line_prefix through log_status_format(), or when doing direct calls to get_formatted_log_time() like in the JSON or CSV log formats. While on it, this commit fixes a couple of comments related to the formatted timestamps where the JSON was not mentioned. Oversight in `dc686681e0`, that I have noticed while reviewing this patch. Author: Kambam Vinay, Michael Paquier Discussion: https://postgr.es/m/CANiRfmsK36A0i8mnQtzaxhSm3CUCimPwJPp4WQNq53OdSNkgWg@mail.gmail.com	2024-04-04 14:15:22 +09:00
David Rowley	44086b0975	Preliminary refactor of heap scanning functions To allow the use of the read stream API added in `b5a9b18cd` for sequential scans on heap tables, here we make some adjustments to make that change less invasive and perhaps make the code easier to follow in the process. Here heapgetpage() gets broken into two functions: 1) The part which reads the block has now been moved into a function named heapfetchbuf(). 2) The part which performed pruning and populated the scan's rs_vistuples[] array is now moved into a new function named heap_prepare_pagescan(). The functionality provided by heap_prepare_pagescan() was only ever required by SO_ALLOW_PAGEMODE scans, so the branching that was previously done in heapgetpage() is no longer needed as we simply just don't call heap_prepare_pagescan() from heapgettup() in the refactored code. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 16:41:13 +13:00
Michael Paquier	85230a247c	pg_regress: Save errno in emit_tap_output_v() and switch to %m emit_tap_output_v() includes some fprintf() calls for some output related to the TAP protocol, that may clobber errno and break %m. This commit makes the logging of pg_regress smarter by saving errno before restoring it in vfprintf() where the input strings are used, removing the need for strerror(). All logs are switched to %m rather than strerror(), shaving some code. This was not a problem until now as pg_regress.c did not use %m, but the change is simple enough that we have no reason to not support this placeholder, and that will avoid future mistakes if new logs that include %m are added. Author: Dagfinn Ilmari Mannsåker Reviewed-by: Peter Eisentraunt, Michael Paquier Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org	2024-04-04 11:33:07 +09:00
Jeff Davis	71b66171d0	CREATE INDEX: do not update stats during binary upgrade. During binary upgrade, indexes are created before the data is moved into place, so it will always be zero. This is not currently a major problem, but will be when we try to preserve statistics during upgrade. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=daPdFB8V0tgFxK-dLowFsAEzWRWJHyxij7BG3kBjcouA@mail.gmail.com	2024-04-03 16:12:45 -07:00
Tom Lane	06286709ee	Invent SERIALIZE option for EXPLAIN. EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting. Stepan Rutz and Matthias van de Meent, reviewed by Tomas Vondra and myself Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de	2024-04-03 17:41:57 -04:00
Alexander Korotkov	97ce821e3e	Fix the parameters order for TableAmRoutine.relation_copy_for_cluster() Specify OldTable first, NewTable second as used by table_relation_copy_for_cluster() and as implemented in heapam_relation_copy_for_cluster(). Backpatch to PostgreSQL 12, where TableAmRoutine was introduced. Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM Author: Japin Li Reviewed-by: Pavel Borisov Backpatch-through: 12	2024-04-04 00:34:28 +03:00
Robert Haas	f470b5c679	docs: Demote "Monitoring Disk Usage" from chapter to section. This chapter is very short, and the immediately preceding chapter is called "Monitoring Database Activity". So, instead of having a separate chapter for this, make it the last section of the preceding chapter instead. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com	2024-04-03 16:09:41 -04:00
Alvaro Herrera	c9920a9068	Split XLogCtl->LogwrtResult into separate struct members After this change we have XLogCtl->logWriteResult and ->logFlushResult. There's no functional change, other than the fact that the assignment from shared memory to local is no longer done via struct assignment, but instead using a macro that copies each member separately. The current representation is inconvenient going forward; notably, we would like to add a new member "Copy" (to keep track of the last position copied into WAL buffers), so the symmetry between the values in shared memory vs. those in local would be lost. This also gives us freedom to later change the concurrency model for the values in shared memory: we can make them use atomics instead of relying on the info_lck spinlock. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/202404031119.cd2kugjk2vho@alvherre.pgsql	2024-04-03 19:55:11 +02:00

1 2 3 4 5 ...

58267 Commits