postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2025-03-31 20:20:44 +08:00

Author	SHA1	Message	Date
Álvaro Herrera	9fbd53dea5	Remove the query_id_squash_values GUC Commit 62d712ecfd94 introduced the capability to calculate the same queryId for queries with different lengths of constants in a list for an IN clause. This behavior was originally enabled with a GUC query_id_squash_values. After a discussion about the value of such a GUC, it was decided to back out of the use of a GUC and make the squashing behavior the only available option. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com	2025-03-27 13:33:37 +01:00
David Rowley	f31aad9b07	Fix query jumbling to account for NULL nodes Previously NULL nodes were ignored. This could cause issues where the computed query ID could match for queries where fields that are next to each other in their Node struct where one field was NULL and the other non-NULL. For example, the Query struct had distinctClause and sortClause next to each other. If someone wrote; SELECT DISTINCT c1 FROM t; and then; SELECT c1 FROM t ORDER BY c1; these would produce the same query ID since, in the first query, we ignored the NULL sortClause and appended the jumble bytes for the distictClause. In the latter query, since we did nothing for the NULL distinctClause then jumble the non-NULL sortClause, and since the node representation stored is the same in both cases, the query IDs were identical. Here we fix this by always accounting for NULL nodes by recording that we saw a NULL in the jumble buffer. This fixes the issue as the order that the NULL is recorded isn't the same in the above two queries. Author: Bykov Ivan <i.bykov@modernsys.ru> Author: Michael Paquier <michael@paquier.xyz> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain	2025-03-27 18:23:00 +13:00
Michael Paquier	3430215fe3	pg_stat_statements: Add more tests with temp tables and namespaces These tests provide coverage for RangeTblEntry and how query jumbling works with search_path, as well as the case where relations are re-created, generating a different query ID as the relation OID is used in the computation. A patch is under discussion to switch to a different approach based on the relation name, and there was no test coverage for this area, including how queries are currently grouped with search_path. This is useful to track how the situation changes between HEAD and any patches proposed. Christoph has proposed the test with ON COMMIT DROP temporary tables, and I have written the second part. Author: Christoph Berg <myon@debian.org> Author: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de	2025-03-26 07:25:23 +09:00
Álvaro Herrera	62d712ecfd	Introduce squashing of constant lists in query jumbling pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Tested-by: Yasuo Honda <yasuo.honda@gmail.com> Tested-by: Sergei Kornilov <sk@zsrv.org> Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com> Tested-by: Chengxi Sun <sunchengxi@highgo.com> Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com	2025-03-18 18:56:11 +01:00
David Rowley	89988ac589	Fix further fallout from EXPLAIN ANALYZE BUFFERS change c2a4078eb adjusted EXPLAIN ANALYZE to default the BUFFERS to ON. This (hopefully) fixes the last remaining issue with regression test failures with -D RELCACHE_FORCE_RELEASE -D CATCACHE_FORCE_RELEASE builds, where the planner accesses more buffers due to the cold caches. Discussion: https://postgr.es/m/CAApHDvqLdzgz77JsE-yTki3w9UiKQ-uTMLRctazcu+99-ips3g@mail.gmail.com	2024-12-12 09:50:00 +13:00
Michael Paquier	499edb0974	Track more precisely query locations for nested statements Previously, a Query generated through the transform phase would have unset stmt_location, tracking the starting point of a query string. Extensions relying on the statement location to extract its relevant parts in the source text string would fallback to use the whole statement instead, leading to confusing results like in pg_stat_statements for queries relying on nested queries, like: - EXPLAIN, with top-level and nested query using the same query string, and a query ID coming from the nested query when the non-top-level entry. - Multi-statements, with only partial portions of queries being normalized. - COPY TO with a query, SELECT or DMLs. This patch improves things by keeping track of the statement locations and propagate it to Query during transform, allowing PGSS to only show the relevant part of the query for nested query. This leads to less bloat in entries for non-top-level entries, as queries can now be grouped within the same (toplevel, queryid) duos in pg_stat_statements. The result gives a stricter one-one mapping between query IDs and its query strings. The regression tests introduced in 45e0ba30fc40 produce differences reflecting the new logic. Author: Anthonin Bonnefoy Reviewed-by: Michael Paquier, Jian He Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com	2024-10-24 09:29:54 +09:00
Tom Lane	14e5680eee	Improve parser's reporting of statement start locations. Up to now, the parser's reporting of a statement's stmt_location included any preceding whitespace or comments. This isn't really desirable but was done to avoid accounting honestly for nonterminals that reduce to empty. It causes problems for pg_stat_statements, which partially compensates by manually stripping whitespace, but is not bright enough to strip /*-style comments. There will be more problems with an upcoming patch to improve reporting of errors in extension scripts, so it's time to do something about this. The thing we have to do to make it work right is to adjust YYLLOC_DEFAULT to scan the inputs of each production to find the first one that has a valid location (i.e., did not reduce to empty). In theory this adds a little bit of per-reduction overhead, but in practice it's negligible. I checked by measuring the time to run raw_parser() on the contents of information_schema.sql, and there was basically no change. Having done that, we can rely on any nonterminal that didn't reduce to completely empty to have a correct starting location, and we don't need the kluges the stmtmulti production formerly used. This should have a side benefit of allowing parse error reports to include an error position in some cases where they formerly failed to do so, due to trying to report the position of an empty nonterminal. I did not go looking for an example though. The one previously known case where that could happen (OptSchemaEltList) no longer needs the kluge it had; but I rather doubt that that was the only case. Discussion: https://postgr.es/m/ZvV1ClhnbJLCz7Sm@msg.df7cb.de	2024-10-22 11:26:05 -04:00
Michael Paquier	45e0ba30fc	pg_stat_statements: Add tests for nested queries with level tracking There have never been any regression tests in PGSS for various query patterns for nested queries combined with level tracking, like: - Multi-statements. - CREATE TABLE AS - CREATE/REFRESH MATERIALIZED VIEW - DECLARE CURSOR - EXPLAIN, with a subset of the above supported. - COPY. All the tests added here track historical, sometimes confusing, existing behaviors. For example, EXPLAIN stores two PGSS entries with the same top-level query string but two different query IDs as one is calculated for the top-level EXPLAIN (this part is right) and a second one for the inner query in the EXPLAIN (this part is not right). A couple of patches are under discussion to improve the situation, and all the tests added here will prove useful to evaluate the changes discussed. Author: Anthonin Bonnefoy Reviewed-by: Michael Paquier, Jian He Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com	2024-10-22 13:05:51 +09:00
Michael Paquier	cf54a2c002	pg_stat_statements: Add columns to track parallel worker activity The view pg_stat_statements gains two columns: - parallel_workers_to_launch, the number of parallel workers planned to be launched. - parallel_workers_launched, the number of parallel workers actually launched. The ratio of both columns offers hints that parallel workers are lacking on a per-statement basis, requiring some tuning, in coordination with "calls", the number of times a query is executed. As of now, these numbers are tracked within Gather and GatherMerge nodes. They could be extended to utilities that make use of parallel workers (parallel btree and brin, VACUUM). The module is bumped to 1.12. Author: Guillaume Lelarge Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com	2024-10-09 08:30:45 +09:00
Michael Paquier	ba90eac7a9	pg_stat_statements: Expand tests for SET statements There are many grammar flavors that depend on the parse node VariableSetStmt. This closes the gap in pg_stat_statements by providing test coverage for what should be a large majority of them, improving more the work begun in de2aca288569. This will be used to ease the evaluation of a path towards more normalization of SET queries with query jumbling. Note that SET NAMES (grammar from the standard, synonym of SET client_encoding) is omitted on purpose, this could use UTF8 with a conditional script where UTF8 is supported, but that does not seem worth the maintenance cost for the sake of these tests. The author has submitted most of these in a TAP test (filled in any holes I could spot), still queries in a SQL file of pg_stat_statements is able to achieve the same goal while being easier to look at when testing normalization patterns. Author: Greg Sabino Mullane, Michael Paquier Discussion: https://postgr.es/m/CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com	2024-09-25 10:04:44 +09:00
Michael Paquier	933848d16d	Add missing query ID reporting in extended query protocol This commit adds query ID reports for two code paths when processing extended query protocol messages: - When receiving a bind message, setting it to the first Query retrieved from a cached cache. - When receiving an execute message, setting it to the first PlannedStmt stored in a portal. An advantage of this method is that this is able to cover all the types of portals handled in the extended query protocol, particularly these two when the report done in ExecutorStart() is not enough (neither is an addition in ExecutorRun(), actually, for the second point): - Multiple execute messages, with multiple ExecutorRun(). - Portal with execute/fetch messages, like a query with a RETURNING clause and a fetch size that stores the tuples in a first execute message going though ExecutorStart() and ExecuteRun(), followed by one or more execute messages doing only fetches from the tuplestore created in the first message. This corresponds to the case where execute_is_fetch is set, for example. Note that the query ID reporting done in ExecutorStart() is still necessary, as an EXECUTE requires it. Query ID reporting is optimistic and more calls to pgstat_report_query_id() don't matter as the first report takes priority except if the report is forced. The comment in ExecutorStart() is adjusted to reflect better the reality with the extended query protocol. The test added in pg_stat_statements is a courtesy of Robert Haas. This uses psql's \bind metacommand, hence this part is backpatched down to v16. Reported-by: Kaido Vaikla, Erik Wienhold Author: Sami Imseih Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org Backpatch-through: 14	2024-09-18 09:59:09 +09:00
Michael Paquier	7b1ddbae36	pg_stat_statements: Add tests with extended query protocol There are currently no tests in the tree checking that queries using the extended query protocol are able to map with their query ID. This can be achieved for some paths of the extended query protocol with the psql meta-commands \bind or \bind_named, so let's add some tests based on both. I have found that to be a useful addition while working on a different issue. Discussion: https://postgr.es/m/ZuEt6MOEBSlifBfn@paquier.xyz	2024-09-13 09:41:06 +09:00
Fujii Masao	97f2bc5aa5	pg_stat_statements: Add regression test for privilege handling. This commit adds a regression test to verify that pg_stat_statements correctly handles privileges, improving its test coverage. Author: Keisuke Kuroda Reviewed-by: Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/2224ccf2e12c41ccb81702ef3303d5ac@nttcom.co.jp	2024-07-24 20:54:51 +09:00
Michael Paquier	c145f321b6	Propagate query IDs of utility statements in functions For utility statements defined within a function, the query tree is copied to a PlannedStmt as utility commands do not require planning. However, the query ID was missing from the information passed down. This leads to plugins relying on the query ID like pg_stat_statements to not be able to track utility statements within function calls. Tests are added to check this behavior, depending on pg_stat_statements.track. This is an old bug. Now, query IDs for utilities are compiled using their parsed trees rather than the query string since v16 (3db72ebcbe20), leading to less bloat with utilities, so backpatch down only to this version. Author: Anthonin Bonnefoy Discussion: https://postgr.es/m/CAO6_XqrGp-uwBqi3vBPLuRULKkddjC7R5QZCgsFren=8E+m2Sg@mail.gmail.com Backpatch-through: 16	2024-07-19 10:21:01 +09:00
Peter Eisentraut	1141e29b61	Revert "pg_stat_statements: Add coverage for entry_dealloc()" This reverts commit 742f6b3e6df980d7dafa4a18a165d285483d5f0e. The new test failed on big-endian platforms. Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org	2023-12-31 15:30:57 +01:00
Peter Eisentraut	742f6b3e6d	pg_stat_statements: Add coverage for entry_dealloc() This involves creating more than pg_stat_statements.max entries and checking that the limit is kept and the least used entries are kicked out. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org	2023-12-30 20:18:23 +01:00
Peter Eisentraut	3e527aeeed	pg_stat_statements: Add test coverage for pg_stat_statements_reset_1_7 Run pg_stat_statements_reset() once while the appropriate extension version is installed. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org	2023-12-27 10:48:01 +01:00
Peter Eisentraut	3727b8d0e3	pg_stat_statements: Add test coverage for pg_stat_statements_1_8() This requires reading pg_stat_statements at least once while the 1.8 version of the extension is installed. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org	2023-12-27 10:48:01 +01:00
Alexander Korotkov	dc9f8a7983	Track statement entry timestamp in contrib/pg_stat_statements This patch adds 'stats_since' and 'minmax_stats_since' columns to the pg_stat_statements view and pg_stat_statements() function. The new min/max reset mode for the pg_stat_stetments_reset() function is controlled by the parameter minmax_only. 'stat_since' column is populated with the current timestamp when a new statement is added to the pg_stat_statements hashtable. It provides clean information about statistics collection time intervals for each statement. Besides it can be used by sampling solutions to detect situations when a statement was evicted and stored again between samples. Such a sampling solution could derive any pg_stat_statements statistic values for an interval between two samples with the exception of all min/max statistics. To address this issue this patch adds the ability to reset min/max statistics independently of the statement reset using the new minmax_only parameter of the pg_stat_statements_reset(userid oid, dbid oid, queryid bigint, minmax_only boolean) function. The timestamp of such reset is stored in the minmax_stats_since field for each statement. pg_stat_statements_reset() function now returns the timestamp of a reset as the result. Discussion: https://postgr.es/m/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru Author: Andrei Zubkov Reviewed-by: Julien Rouhaud, Hayato Kuroda, Yuki Seino, Chengxi Sun Reviewed-by: Anton Melnikov, Darren Rush, Michael Paquier, Sergei Kornilov Reviewed-by: Alena Rybakina, Andrei Lepikhov	2023-11-27 02:52:17 +02:00
Alexander Korotkov	6ab1dbd26b	Add NOT NULL checking of pg_stat_statements_reset() in tests This is preliminary patch. It adds NOT NULL checking for the result of pg_stat_statements_reset() function. It is needed for upcoming patch "Track statement entry timestamp" that will change the result type of this function to the timestamp of a reset performed. Discussion: https://postgr.es/m/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru Author: Andrei Zubkov Reviewed-by: Julien Rouhaud, Hayato Kuroda, Yuki Seino, Chengxi Sun Reviewed-by: Anton Melnikov, Darren Rush, Michael Paquier, Sergei Kornilov Reviewed-by: Alena Rybakina, Andrei Lepikhov	2023-11-27 02:52:17 +02:00
Michael Paquier	108161bcb9	pg_stat_statements: Remove duplicated tests for SET statements This looks like a copy-paste mistake introduced in de2aca288569, that has added checks for more patterns of SET statements while ignoring the original test block that existed. Backpatch down to where this has been introduced, as this shaves some cycles. Author: Sergei Kornilov Discussion: https://postgr.es/m/5689421699428803@mail-sendbernar-production-main-46.myt.yp-c.yandex.net Backpatch-through: 16	2023-11-09 10:04:31 +09:00
Tom Lane	76db9cb636	Fix some issues with tracking nesting level in pg_stat_statements. When we decide that we don't want to track execution time of a specific planner or ProcessUtility call, we still have to increment the nesting depth, or we'll make the wrong determination of whether we are at top level when considering nested statements. (PREPARE and EXECUTE are exceptions, for reasons explained in the code.) Counting planner nesting depth separately from executor nesting depth was a mistake: it causes us to make the wrong determination of whether we are at top level when considering nested statements that get executed during planning (as a result of constant-folding of functions, for example). Merge those counters into one. In passing, get rid of the PGSS_HANDLED_UTILITY macro in favor of explicitly listing statement types. It seems somewhat coincidental that PREPARE and EXECUTE are handled alike in each of the places where that was used: the reasoning tends to be different for each one. Thus, the macro seems as likely to encourage future bugs as prevent them, since it's quite unclear whether any future statement type that might need special-casing here would also need the same choices at each spot. Sergei Kornilov, Julien Rouhaud, and Tom Lane, per bug #17552 from Maxim Boguk. This is pretty clearly a bug fix, but it's also a behavioral change that might surprise somebody, so no back-patch. Discussion: https://postgr.es/m/17552-213b534c56ab5d02@postgresql.org	2023-11-08 12:01:28 -05:00
Daniel Gustafsson	c5a032e518	Update oldextversions test for pg_stat_statements 1.11 Commit 5a3423ad8e updated pg_stat_statements to 1.11 due to added columns in the view for jit deform counters. Fixing oldextversion was however missed in that commit. This adds a test for the 1.11 version view definition. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAN55FZ2Y7c2VEg4S1KDXFcaHjyF8=KZa_rMV6WOe+KwE-KE97g@mail.gmail.com	2023-10-13 13:50:18 +02:00
Michael Paquier	11c34b342b	Show parameters of CALL as constants in pg_stat_statements This commit changes the query jumbling of CallStmt so as its IN/OUT parameters are able to show up as constants with a parameter symbol in pg_stat_statements, like: CALL proc1($1, $2); CALL proc2($1, $2, $3); The transformed FuncExpr is used in the query ID computation instead of the FuncCall generated by the parser, so as it is sensitive to the OID of the procedure and its list of input arguments. The output arguments are handled in a separate list in CallStmt, which is also included in the computation. Tests are added to pg_stat_statements to show how this affects CALL with IN/OUT parameters as well as overloaded functions. Like 638d42a3c520 or 31de7e60da34, this improves the monitoring of workloads with a lot of CALL statements, preventing unnecessary bloat when these use different input (or event output) values. Author: Sami Imseih Discussion: https://postgr.es/m/B44FA29D-EBD0-4DD9-ABC2-16F1CB087074@amazon.com	2023-09-28 15:17:55 +09:00
Andres Freund	7369798a83	Fix tracking of temp table relation extensions as writes Karina figured out that I (Andres) confused BufferUsage.temp_blks_written with BufferUsage.local_blks_written in fcdda1e4b5. Tests in core PG can't easily test this, as BufferUsage is just used for EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements. Thus this commit adds tests for this to pg_stat_statements. Reported-by: Karina Litskevich <litskevichkarina@gmail.com> Author: Karina Litskevich <litskevichkarina@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CACiT8ibxXA6+0amGikbeFhm8B84XdQVo6D0Qfd1pQ1s8zpsnxQ@mail.gmail.com Backpatch: 16-, where fcdda1e4b5 was merged	2023-09-13 19:14:09 -07:00
Michael Paquier	bb45156f34	Show names of DEALLOCATE as constants in pg_stat_statements This commit switches query jumbling so as prepared statement names are treated as constants in DeallocateStmt. A boolean field is added to DeallocateStmt to make a distinction between ALL and named prepared statements, as "name" was used to make this difference before, NULL meaning DEALLOCATE ALL. Prior to this commit, DEALLOCATE was not tracked in pg_stat_statements, for the reason that it was not possible to treat its name parameter as a constant. Now that query jumbling applies to all the utility nodes, this reason does not apply anymore. Like 638d42a3c520, this can be a huge advantage for monitoring where prepared statement names are randomly generated, preventing bloat in pg_stat_statements. A couple of tests are added to track the new behavior. Author: Dagfinn Ilmari Mannsåker, Michael Paquier Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz	2023-08-27 17:27:44 +09:00
Michael Paquier	638d42a3c5	Show GIDs of two-phase commit commands as constants in pg_stat_statements This relies on the "location" field added to TransactionStmt in 31de7e6, now applied to the "gid" field used by 2PC commands. These commands are now reported like: COMMIT PREPARED $1 PREPARE TRANSACTION $1 ROLLBACK PREPARED $1 Applying constants for these commands is a huge advantage for workloads that rely a lot on 2PC commands with different GIDs. Some tests are added to track the new behavior. Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz	2023-08-12 10:44:15 +09:00
Michael Paquier	31de7e60da	Show savepoint names as constants in pg_stat_statements In pg_stat_statements, savepoint names now show up as constants with a parameter symbol, using as base query string the one added as a new entry to the PGSS hash table, leading to: RELEASE $1 ROLLBACK TO $1 SAVEPOINT $1 Applying constants to these query parts is a huge advantage for workloads that generate randomly savepoint points, like ORMs (Django is at the origin of this patch). The ODBC driver is a second layer that likes a lot savepoints, though it does not use a random naming pattern. A "location" field is added to TransactionStmt, now set only for savepoints. The savepoint name is ignored by the query jumbling. The location can be extended to other query patterns, if required, like 2PC commands. Some tests are added to pg_stat_statements for all the query patterns supported by the parser. ROLLBACK, ROLLBACK TO SAVEPOINT and ROLLBACK TRANSACTION TO SAVEPOINT have the same Node representation, so all these are equivalents. The same happens for RELEASE and RELEASE SAVEPOINT. Author: Greg Sabino Mullane Discussion: https://postgr.es/m/CAKAnmm+2s9PA4OaumwMJReWHk8qvJ_-g1WqxDRDAN1BSUfxyTw@mail.gmail.com	2023-07-27 09:42:33 +09:00
Michael Paquier	9a714b9d6e	Improve cleanup phases in regression tests of pg_stat_statements As shaped, two DROP ROLE queries included in "user_activity" were showing in the reports for "wal". The intention is to keep each test isolated and independent, so this is incorrect. This commit adds some calls to pg_stat_statements_reset() to clean up the statistics once each test finishes, so as there are no risks of overlap in the reports for individial scenarios. The addition in "user_activity" fixes the output of "wal". The new resets done in "level_tracking" and "utility" are added for consistency with the rest, though they do not affect the stats generated in the other tests. Oversight in d0028e3. Reported-by: Andrei Zubkov Discussion: https://postgr.es/m/7beb722dd016bf54f1c78bfd6d44a684e28da624.camel@moonset.ru	2023-03-07 08:58:13 +09:00
Michael Paquier	d0028e35a0	Refactor more the regression tests of pg_stat_statements This commit expands more the refactoring of the regression tests of pg_stat_statements, with tests moved out of pg_stat_statements.sql into separate files. The following file structure is now used: - select is mostly the former pg_stat_statements.sql, renamed. - dml for INSERT/UPDATE/DELETE and MERGE - user_activity, to test role-level checks and stat resets. - wal, to check the WAL generation after some queries. Like e8dbdb1, there is no change in terms of code coverage or results, and this finishes the split I was aiming for in these tests. Most of the tests used "test" of "pgss_test" as names for the tables used, these are renamed to less generic names. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Y/7Y9U/y/keAW3qH@paquier.xyz	2023-03-03 08:46:11 +09:00
Michael Paquier	de2aca2885	Expand regression tests of pg_stat_statements for utility queries This commit adds more coverage for utility statements so as it is possible to track down all the effects of query normalization done for all the queries that use either Const or A_Const nodes, which are the nodes where normalization makes the most sense as they apply to constants (well, most of the time, really). This set of queries is extracted from an analysis done while looking at full dumps of the regression database when applying different levels of normalization to either Const or A_Const nodes for utilities, as of a minimal set of these, for: - All relkinds (CREATE, ALTER, DROP) - Policies - Cursors - Triggers - Types - Rules - Statistics - CALL - Transaction statements (isolation level, options) - EXPLAIN - COPY Note that pg_stat_statements is not switched yet to show any normalization for utilities, still it improves the default coverage of the query jumbling code (not by as much as enabling query jumbling on the main regression test suite, though): - queryjumblefuncs.funcs.c: 36.8% => 48.5% - queryjumblefuncs.switch.c: 33.2% => 43.1% Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Y+MRdEq9W9XVa2AB@paquier.xyz	2023-02-20 10:16:51 +09:00
Michael Paquier	e8dbdb15db	Refactor tests of pg_stat_statements for planning, utility and level tracking pg_stat_statements.sql acts as the main file for all the core tests of the module, but things have become complicated to follow over the years as some of the sub-scenarios tested in this file rely on assumptions that come from completely different areas of it, like a GUC setup or a relation created previously. For example, row tracking for CTAS/COPY was looking at the number of plans, which was not necessary, or level tracking was mixed with checks on planner counts. This commit refactors the tests of pg_stat_statements, by moving test cases out of pg_stat_statements.sql into their own file, as of: - Planning-related tests in planning.sql, for [re]plan counts and top-level handling. These depend on pg_stat_statements.track_planning. - Utilities in utility.sql (pg_stat_statements.track_utility), that includes now the tests for: -- Row tracking for CTAS, CREATE MATERIALIZED VIEW, COPY. -- Basic utility statements. -- SET statements. - Tracking level, depending on pg_stat_statements.track. This part has been looking at scenarios with DO blocks, PL functions and SQL functions. pg_stat_statements.sql (still named the same for now) still includes some checks for role-level tracking and WAL generation metrics, that ought to become independent in the long term for clarity. While on it, this switches the order of the attributes when querying pg_stat_statements, the query field becoming last. This makes much easier the tracking of changes related to normalization, as queries are the only variable-length attributes queried (unaligned mode would be one extra choice, but that reduces the checks on the other fields). Test scenarios and their results match exactly with what was happening before this commit in terms of calls, number of plans, number of rows, cached data or level tracking, so this has no effect on the coverage in terms of what is produced by the reports in the table pg_stat_statements. A follow-up patch will extend more the tests of pg_stat_statements around utilities, so this split creates a foundation for this purpose, without complicating more pg_stat_statements.sql. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Y+MRdEq9W9XVa2AB@paquier.xyz	2023-02-20 09:28:29 +09:00
Michael Paquier	9ba37b2cb6	Include values of A_Const nodes in query jumbling Like the implementation for node copy, write and read, this node requires a custom implementation so as the query jumbling is able to consider the correct value assigned to it, depending on its type (int, float, bool, string, bitstring). Based on a dump of pg_stat_statements from the regression database, this would confuse the query jumbling of the following queries: - SET. - COPY TO with SELECT queries. - START TRANSACTION with different isolation levels. - ALTER TABLE with default expressions. - CREATE TABLE with partition bounds. Note that there may be a long-term argument in tracking the location of such nodes so as query strings holding such nodes could be normalized, but this is left as a separate discussion. Oversight in 3db72eb. Discussion: https://postgr.es/m/Y9+HuYslMAP6yyPb@paquier.xyz	2023-02-07 09:03:54 +09:00
Michael Paquier	3db72ebcbe	Generate code for query jumbling through gen_node_support.pl This commit changes the query jumbling code in queryjumblefuncs.c to be generated automatically based on the information of the nodes in the headers of src/include/nodes/ by using gen_node_support.pl. This approach offers many advantages: - Support for query jumbling for all the utility statements, based on the state of their parsed Nodes and not only their query string. This will greatly ease the switch to normalize the information of some DDLs, like SET or CALL for example (this is left unchanged and should be part of a separate discussion). With this feature, the number of entries stored for utilities in pg_stat_statements is reduced (for example now "CHECKPOINT" and "checkpoint" mean the same thing with the same query ID). - Documentation of query jumbling directly in the structure definition of the nodes. Since this code has been introduced in pg_stat_statements and then moved to code, the reasons behind the choices of what should be included in the jumble are rather sparse. Note that some explanation is added for the most relevant parts, as a start. - Overall code reduction and more consistency with the other parts generating read, write and copy depending on the nodes. The query jumbling is controlled by a couple of new node attributes, documented in nodes/nodes.h: - custom_query_jumble, to mark a Node as having a custom implementation. - no_query_jumble, to ignore entirely a Node. - query_jumble_ignore, to ignore a field in a Node. - query_jumble_location, to mark a location in a Node, for normalization. This can apply only to int fields, with "location" in their name (only Const as of this commit). There should be no compatibility impact on pg_stat_statements, as the new code applies the jumbling to the same fields for each node (its regression tests have no modification, for one). Some benchmark of the query jumbling between HEAD and this commit for SELECT and DMLs has proved that this new code does not cause a performance regression, with computation times close for both methods. For utility queries, the new method is slower than the previous method of calculating a hash of the query string, though we are talking about extra ns-level changes based on what I measured, which is unnoticeable even for OLTP workloads as a query ID is calculated once per query post-parse analysis. Author: Michael Paquier Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz	2023-01-31 15:24:05 +09:00
Alvaro Herrera	249b0409b1	Fix pg_stat_statements for MERGE We weren't jumbling the merge action list, so wildly different commands would be considered to use the same query ID. Add that, mention it in the docs, and some test lines. Backpatch to 15. Author: Tatsu <bt22nakamorit@oss.nttdata.com> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://postgr.es/m/d87e391694db75a038abc3b2597828e8@oss.nttdata.com	2022-09-27 10:44:42 +02:00
Robert Haas	2d7ead8526	pg_stat_statements: Fix test that assumes wal_records = rows. It's not very robust to assume that each inserted row will produce exactly one WAL record and that no other WAL records will be generated in the process, because for example a particular transaction could always be the one that has to extend clog. Because these tests are not run by 'make installcheck' but only by 'make check', it may be that in our current testing infrastructure this can't be hit, but it doesn't seem like a good idea to rely on that, since unrelated changes to the regression tests or the way write-ahead logging is done could easily cause it to start happening, and debugging such failures is a pain. Adjust the regression test to be less sensitive. Anton Melnikov, reviewed by Julien Rouhaud Discussion: http://postgr.es/m/1ccd00d9-1723-6b68-ae56-655aab00d406@inbox.ru	2022-07-06 13:05:51 -04:00
Michael Paquier	76cbf7edb6	pg_stat_statements: Track I/O timing for temporary file blocks This commit adds two new columns to pg_stat_statements, called temp_blk_read_time and temp_blk_write_time. Those columns respectively show the time spent to read and write temporary file blocks on disk, whose tracking has been added in efb0ef9. This information is available when track_io_timing is enabled, like blk_read_time and blk_write_time. pg_stat_statements is updated to version to 1.10 as an effect of the newly-added columns. Tests for the upgrade path 1.9->1.10 are added. PGSS_FILE_HEADER is bumped for the new stats file format. Author: Masahiko Sawada Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud, Ranier Vilela Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com	2022-04-08 13:12:07 +09:00
Michael Paquier	2b0da0365b	pg_stat_statements: Add some tests for older versions still usable When the newest version is loaded, the backend would load objects from the oldest complete SQL file (here 1.4) and then update to the latest version with transition scripts (up to 1.9 currently). This provides some coverage for upgrades of pg_stat_statements, but there is no test to show how things have changed across each version. This adds a couple of tests for the upgrade paths using objects from each version supported, stressing the objects whose behaviors have changed across each version supported. Author: Erica Zhang Reviewed-by: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/tencent_BBA974AFF61379F2345E782FD6C55891950A@qq.com	2021-10-02 17:40:13 +09:00
Tom Lane	0806d08d46	Harden pg_stat_statements tests against CLOBBER_CACHE_ALWAYS. Turns out the buildfarm hasn't been testing this, which will soon change. Julien Rouhaud, per report from me Discussion: https://postgr.es/m/42557.1627229005@sss.pgh.pa.us	2021-07-25 23:25:15 -04:00
Bruce Momjian	f7a97b6ec3	Update query_id computation Properly fix: - the "ONLY" in FROM [ONLY] isn't hashed - the agglevelsup field in GROUPING isn't hashed - WITH TIES not being hashed (new in PG 13) - "DISTINCT" in "GROUP BY [DISTINCT]" isn't hashed (new in PG 14) Reported-by: Julien Rouhaud Discussion: https://postgr.es/m/20210425081119.ulyzxqz23ueh3wuj@nol	2021-05-03 14:59:39 -04:00
Magnus Hagander	6b4d23feef	Track identical top vs nested queries independently in pg_stat_statements Changing pg_stat_statements.track between 'all' and 'top' would control if pg_stat_statements tracked just top level statements or also statements inside functions, but when tracking all it would not differentiate between the two. Being table to differentiate this is useful both to track where the actual query is coming from, and to see if there are differences in executions between the two. To do this, add a boolean to the hash key indicating if the statement was top level or not. Experience from the pg_stat_kcache module shows that in at least some "reasonable worloads" only <5% of the queries show up both top level and nested. Based on this, admittedly small, dataset, this patch does not try to de-duplicate those query texts, and will just store one copy for the top level and one for the nested. Author: Julien Rohaud Reviewed-By: Magnus Hagander, Masahiro Ikeda Discussion: https://postgr.es/m/20201202040516.GA43757@nol	2021-04-08 10:30:34 +02:00
Fujii Masao	9fbc3f318d	pg_stat_statements: Track number of times pgss entries were deallocated. If more distinct statements than pg_stat_statements.max are observed, pg_stat_statements entries about the least-executed statements are deallocated. This commit enables us to track the total number of times those entries were deallocated. That number can be viewed in the pg_stat_statements_info view that this commit adds. It's useful when tuning pg_stat_statements.max parameter. If it's high, i.e., the entries are deallocated very frequently, which might cause the performance regression and we can increase pg_stat_statements.max to avoid those frequent deallocations. The pg_stat_statements_info view is intended to display the statistics of pg_stat_statements module itself. Currently it has only one column "dealloc" indicating the number of times entries were deallocated. But an upcoming patch will add other columns (for example, the time at which pg_stat_statements statistics were last reset) into the view. Author: Katsuragi Yuta, Yuki Seino Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0d9f1107772cf5c3f954e985464c7298@oss.nttdata.com	2020-11-26 21:18:05 +09:00
Fujii Masao	b62e6056a0	pg_stat_statements: track number of rows processed by REFRESH MATERIALIZED VIEW. Commit 6023b7ea71 allowed pg_stat_statements to track the number of rows retrieved or affected by some utility commands including CREATE MATERIALIZED VIEW. However it did not track the rowcount of REFRESH MATERIALIZED VIEW. This commit allows pg_stat_statements to track that. To track that, this commit changes the query completion for REFRESH MATERIALIZED VIEW so that it saves the rowcount. But note that the rowcount is still not displayed in the command completion tag output. That is, the display_rowcount flag of CMDTAG_REFRESH_MATERIALIZED_VIEW command tag is left false in cmdtaglist.h. Otherwise, the change of completion tag output might break applications using it. Author: Katsuragi Yuta, Seino Yuki Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/71f6bc72f8bbaa06e701f8bd2562c347@oss.nttdata.com Discussion: https://postgr.es/m/aadbfba9-e4bb-9531-6b3a-d13c31c8f4fe@oss.nttdata.com	2020-11-12 11:26:55 +09:00
Fujii Masao	6023b7ea71	pg_stat_statements: track number of rows processed by some utility commands. This commit makes pg_stat_statements track the total number of rows retrieved or affected by CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW and FETCH commands. Suggested-by: Pascal Legrand Author: Fujii Masao Reviewed-by: Asif Rehman Discussion: https://postgr.es/m/1584293755198-0.post@n3.nabble.com	2020-07-29 23:21:55 +09:00
Fujii Masao	d1763ea8c9	Change default of pg_stat_statements.track_planning to off. Since v13 pg_stat_statements is allowed to track the planning time of statements when track_planning option is enabled. Its default was on. But this feature could cause more terrible spinlock contentions in pg_stat_statements. As a result of this, Robins Tharakan reported that v13 beta1 showed ~45% performance drop at high DB connection counts (when compared with v12.3) during fully-cached SELECT-only test using pgbench. To avoid this performance regression by the default setting, this commit changes default of pg_stat_statements.track_planning to off. Back-patch to v13 where pg_stat_statements.track_planning was introduced. Reported-by: Robins Tharakan Author: Fujii Masao Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com	2020-07-03 11:35:22 +09:00
Amit Kapila	6b466bf5f2	Allow pg_stat_statements to track WAL usage statistics. This commit adds three new columns in pg_stat_statements output to display WAL usage statistics added by commit df3b181499. This commit doesn't bump the version of pg_stat_statements as the same is done for this release in commit 17e0328224. Author: Kirill Bychik and Julien Rouhaud Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-05 07:34:04 +05:30
Fujii Masao	17e0328224	Allow pg_stat_statements to track planning statistics. This commit makes pg_stat_statements support new GUC pg_stat_statements.track_planning. If this option is enabled, pg_stat_statements tracks the planning statistics of the statements, e.g., the number of times the statement was planned, the total time spent planning the statement, etc. This feature is useful to check the statements that it takes a long time to plan. Previously since pg_stat_statements tracked only the execution statistics, we could not use that for the purpose. The planning and execution statistics are stored at the end of each phase separately. So there are not always one-to-one relationship between them. For example, if the statement is successfully planned but fails in the execution phase, only its planning statistics are stored. This may cause the users to be able to see different pg_stat_statements results from the previous version. To avoid this, pg_stat_statements.track_planning needs to be disabled. This commit bumps the version of pg_stat_statements to 1.8 since it changes the definition of pg_stat_statements function. Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com	2020-04-02 11:20:19 +09:00
Andrew Gierth	6e74c64bcf	Teach pg_stat_statements not to ignore FOR UPDATE clauses Performance of a SELECT FOR UPDATE may be quite distinct from the non-UPDATE version of the query, so treat all of the FOR UPDATE clause as being significant for distinguishing queries. Andrew Gierth and Vik Fearing, reviewed by Sergei Kornilov, Thomas Munro, Tom Lane Discussion: https://postgr.es/m/87h8e4hfwv.fsf@news-spur.riddles.org.uk	2019-07-14 12:07:40 +01:00
Amit Kapila	43cbedab8f	Extend pg_stat_statements_reset to reset statistics specific to a particular user/db/query. The function pg_stat_statements_reset() is extended to accept userid, dbid, and queryid as input parameters. Now, it can discard the statistics gathered so far by pg_stat_statements corresponding to the specified userid, dbid, and queryid. If no parameter is specified or all the specified parameters have default value aka 0, it will discard all statistics as per the old behavior. The new behavior is useful to get the fresh statistics for a specific user/database/query without resetting all the existing statistics. Author: Haribabu Kommi, with few additional changes by me Reviewed-by: Michael Paquier, Amit Kapila and Fujii Masao Discussion: https://postgr.es/m/CAJrrPGcyh-gkFswyc6C661K6cknL0XkNqVT0sQt2mFNMR4HRKA@mail.gmail.com	2019-01-11 08:50:09 +05:30
Tom Lane	a6f22e8356	Show ignored constants as "$N" rather than "?" in pg_stat_statements. The trouble with the original choice here is that "?" is a valid (and indeed used) operator name, so that you could end up with ambiguous statement texts like "SELECT ? ? ?". With this patch, you instead see "SELECT $1 ? $2", which seems significantly more readable. The numbers used for this purpose begin after the last actual $N parameter in the particular query. The conflict with external parameters has its own potential for confusion of course, but it was agreed to be an improvement over the previous behavior. Lukas Fittl Discussion: https://postgr.es/m/CAP53PkxeaCuwYmF-A4J5z2-qk5fYFo5_NH3gpXGJJBxv1DMwEw@mail.gmail.com	2017-03-27 20:14:36 -04:00

1 2

52 Commits