52 Commits

Author SHA1 Message Date
Álvaro Herrera
9fbd53dea5
Remove the query_id_squash_values GUC
Commit 62d712ecfd94 introduced the capability to calculate the same
queryId for queries with different lengths of constants in a list for an
IN clause.  This behavior was originally enabled with a GUC
query_id_squash_values.  After a discussion about the value of such a
GUC, it was decided to back out of the use of a GUC and make the
squashing behavior the only available option.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de
Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com
2025-03-27 13:33:37 +01:00
David Rowley
f31aad9b07 Fix query jumbling to account for NULL nodes
Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct had distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since, in the first query, we
ignored the NULL sortClause and appended the jumble bytes for the
distictClause.  In the latter query, since we did nothing for the NULL
distinctClause then jumble the non-NULL sortClause, and since the node
representation stored is the same in both cases, the query IDs were
identical.

Here we fix this by always accounting for NULL nodes by recording that
we saw a NULL in the jumble buffer.  This fixes the issue as the order that
the NULL is recorded isn't the same in the above two queries.

Author: Bykov Ivan <i.bykov@modernsys.ru>
Author: Michael Paquier <michael@paquier.xyz>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
2025-03-27 18:23:00 +13:00
Michael Paquier
3430215fe3 pg_stat_statements: Add more tests with temp tables and namespaces
These tests provide coverage for RangeTblEntry and how query jumbling
works with search_path, as well as the case where relations are
re-created, generating a different query ID as the relation OID is used
in the computation.

A patch is under discussion to switch to a different approach based on
the relation name, and there was no test coverage for this area,
including how queries are currently grouped with search_path.  This is
useful to track how the situation changes between HEAD and any patches
proposed.

Christoph has proposed the test with ON COMMIT DROP temporary tables,
and I have written the second part.

Author: Christoph Berg <myon@debian.org>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 07:25:23 +09:00
Álvaro Herrera
62d712ecfd
Introduce squashing of constant lists in query jumbling
pg_stat_statements produces multiple entries for queries like
    SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is individually jumbled.  Most of the time that's undesirable,
especially if the list becomes too large.

Fix this by introducing a new GUC query_id_squash_values which modifies
the node jumbling code to only consider the first and last element of a
list of constants, rather than each list element individually.  This
affects both the query_id generated by query jumbling, as well as
pg_stat_statements query normalization so that it suppresses printing of
the individual elements of such a list.

The default value is off, meaning the previous behavior is maintained.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Sergey Dudoladov (mysterious, off-list)
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Tested-by: Yasuo Honda <yasuo.honda@gmail.com>
Tested-by: Sergei Kornilov <sk@zsrv.org>
Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Tested-by: Chengxi Sun <sunchengxi@highgo.com>
Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
2025-03-18 18:56:11 +01:00
David Rowley
89988ac589 Fix further fallout from EXPLAIN ANALYZE BUFFERS change
c2a4078eb adjusted EXPLAIN ANALYZE to default the BUFFERS to ON.  This
(hopefully) fixes the last remaining issue with regression test failures
with -D RELCACHE_FORCE_RELEASE -D CATCACHE_FORCE_RELEASE builds, where
the planner accesses more buffers due to the cold caches.

Discussion: https://postgr.es/m/CAApHDvqLdzgz77JsE-yTki3w9UiKQ-uTMLRctazcu+99-ips3g@mail.gmail.com
2024-12-12 09:50:00 +13:00
Michael Paquier
499edb0974 Track more precisely query locations for nested statements
Previously, a Query generated through the transform phase would have
unset stmt_location, tracking the starting point of a query string.

Extensions relying on the statement location to extract its relevant
parts in the source text string would fallback to use the whole
statement instead, leading to confusing results like in
pg_stat_statements for queries relying on nested queries, like:
- EXPLAIN, with top-level and nested query using the same query string,
and a query ID coming from the nested query when the non-top-level
entry.
- Multi-statements, with only partial portions of queries being
normalized.
- COPY TO with a query, SELECT or DMLs.

This patch improves things by keeping track of the statement locations
and propagate it to Query during transform, allowing PGSS to only show
the relevant part of the query for nested query.  This leads to less
bloat in entries for non-top-level entries, as queries can now be
grouped within the same (toplevel, queryid) duos in pg_stat_statements.
The result gives a stricter one-one mapping between query IDs and its
query strings.

The regression tests introduced in 45e0ba30fc40 produce differences
reflecting the new logic.

Author: Anthonin Bonnefoy
Reviewed-by: Michael Paquier, Jian He
Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com
2024-10-24 09:29:54 +09:00
Tom Lane
14e5680eee Improve parser's reporting of statement start locations.
Up to now, the parser's reporting of a statement's stmt_location
included any preceding whitespace or comments.  This isn't really
desirable but was done to avoid accounting honestly for nonterminals
that reduce to empty.  It causes problems for pg_stat_statements,
which partially compensates by manually stripping whitespace, but
is not bright enough to strip /*-style comments.  There will be
more problems with an upcoming patch to improve reporting of errors
in extension scripts, so it's time to do something about this.

The thing we have to do to make it work right is to adjust
YYLLOC_DEFAULT to scan the inputs of each production to find the
first one that has a valid location (i.e., did not reduce to
empty).  In theory this adds a little bit of per-reduction overhead,
but in practice it's negligible.  I checked by measuring the time
to run raw_parser() on the contents of information_schema.sql, and
there was basically no change.

Having done that, we can rely on any nonterminal that didn't reduce
to completely empty to have a correct starting location, and we don't
need the kluges the stmtmulti production formerly used.

This should have a side benefit of allowing parse error reports to
include an error position in some cases where they formerly failed to
do so, due to trying to report the position of an empty nonterminal.
I did not go looking for an example though.  The one previously known
case where that could happen (OptSchemaEltList) no longer needs the
kluge it had; but I rather doubt that that was the only case.

Discussion: https://postgr.es/m/ZvV1ClhnbJLCz7Sm@msg.df7cb.de
2024-10-22 11:26:05 -04:00
Michael Paquier
45e0ba30fc pg_stat_statements: Add tests for nested queries with level tracking
There have never been any regression tests in PGSS for various query
patterns for nested queries combined with level tracking, like:
- Multi-statements.
- CREATE TABLE AS
- CREATE/REFRESH MATERIALIZED VIEW
- DECLARE CURSOR
- EXPLAIN, with a subset of the above supported.
- COPY.

All the tests added here track historical, sometimes confusing, existing
behaviors.  For example, EXPLAIN stores two PGSS entries with the same
top-level query string but two different query IDs as one is calculated
for the top-level EXPLAIN (this part is right) and a second one for the
inner query in the EXPLAIN (this part is not right).

A couple of patches are under discussion to improve the situation, and
all the tests added here will prove useful to evaluate the changes
discussed.

Author: Anthonin Bonnefoy
Reviewed-by: Michael Paquier, Jian He
Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com
2024-10-22 13:05:51 +09:00
Michael Paquier
cf54a2c002 pg_stat_statements: Add columns to track parallel worker activity
The view pg_stat_statements gains two columns:
- parallel_workers_to_launch, the number of parallel workers planned to
be launched.
- parallel_workers_launched, the number of parallel workers actually
launched.

The ratio of both columns offers hints that parallel workers are lacking
on a per-statement basis, requiring some tuning, in coordination with
"calls", the number of times a query is executed.

As of now, these numbers are tracked within Gather and GatherMerge
nodes.  They could be extended to utilities that make use of parallel
workers (parallel btree and brin, VACUUM).

The module is bumped to 1.12.

Author: Guillaume Lelarge
Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com
2024-10-09 08:30:45 +09:00
Michael Paquier
ba90eac7a9 pg_stat_statements: Expand tests for SET statements
There are many grammar flavors that depend on the parse node
VariableSetStmt.  This closes the gap in pg_stat_statements by providing
test coverage for what should be a large majority of them, improving more
the work begun in de2aca288569.  This will be used to ease the
evaluation of a path towards more normalization of SET queries with
query jumbling.

Note that SET NAMES (grammar from the standard, synonym of SET
client_encoding) is omitted on purpose, this could use UTF8 with a
conditional script where UTF8 is supported, but that does not seem worth
the maintenance cost for the sake of these tests.

The author has submitted most of these in a TAP test (filled in any
holes I could spot), still queries in a SQL file of pg_stat_statements
is able to achieve the same goal while being easier to look at when
testing normalization patterns.

Author: Greg Sabino Mullane, Michael Paquier
Discussion: https://postgr.es/m/CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com
2024-09-25 10:04:44 +09:00
Michael Paquier
933848d16d Add missing query ID reporting in extended query protocol
This commit adds query ID reports for two code paths when processing
extended query protocol messages:
- When receiving a bind message, setting it to the first Query retrieved
from a cached cache.
- When receiving an execute message, setting it to the first PlannedStmt
stored in a portal.

An advantage of this method is that this is able to cover all the types
of portals handled in the extended query protocol, particularly these
two when the report done in ExecutorStart() is not enough (neither is an
addition in ExecutorRun(), actually, for the second point):
- Multiple execute messages, with multiple ExecutorRun().
- Portal with execute/fetch messages, like a query with a RETURNING
clause and a fetch size that stores the tuples in a first execute
message going though ExecutorStart() and ExecuteRun(), followed by one
or more execute messages doing only fetches from the tuplestore created
in the first message.  This corresponds to the case where
execute_is_fetch is set, for example.

Note that the query ID reporting done in ExecutorStart() is still
necessary, as an EXECUTE requires it.  Query ID reporting is optimistic
and more calls to pgstat_report_query_id() don't matter as the first
report takes priority except if the report is forced.  The comment in
ExecutorStart() is adjusted to reflect better the reality with the
extended query protocol.

The test added in pg_stat_statements is a courtesy of Robert Haas.  This
uses psql's \bind metacommand, hence this part is backpatched down to
v16.

Reported-by:  Kaido Vaikla, Erik Wienhold
Author: Sami Imseih
Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier
Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org
Backpatch-through: 14
2024-09-18 09:59:09 +09:00
Michael Paquier
7b1ddbae36 pg_stat_statements: Add tests with extended query protocol
There are currently no tests in the tree checking that queries using the
extended query protocol are able to map with their query ID.

This can be achieved for some paths of the extended query protocol with
the psql meta-commands \bind or \bind_named, so let's add some tests
based on both.

I have found that to be a useful addition while working on a different
issue.

Discussion: https://postgr.es/m/ZuEt6MOEBSlifBfn@paquier.xyz
2024-09-13 09:41:06 +09:00
Fujii Masao
97f2bc5aa5 pg_stat_statements: Add regression test for privilege handling.
This commit adds a regression test to verify that pg_stat_statements
correctly handles privileges, improving its test coverage.

Author: Keisuke Kuroda
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/2224ccf2e12c41ccb81702ef3303d5ac@nttcom.co.jp
2024-07-24 20:54:51 +09:00
Michael Paquier
c145f321b6 Propagate query IDs of utility statements in functions
For utility statements defined within a function, the query tree is
copied to a PlannedStmt as utility commands do not require planning.
However, the query ID was missing from the information passed down.

This leads to plugins relying on the query ID like pg_stat_statements to
not be able to track utility statements within function calls.  Tests
are added to check this behavior, depending on pg_stat_statements.track.

This is an old bug.  Now, query IDs for utilities are compiled using
their parsed trees rather than the query string since v16
(3db72ebcbe20), leading to less bloat with utilities, so backpatch down
only to this version.

Author: Anthonin Bonnefoy
Discussion: https://postgr.es/m/CAO6_XqrGp-uwBqi3vBPLuRULKkddjC7R5QZCgsFren=8E+m2Sg@mail.gmail.com
Backpatch-through: 16
2024-07-19 10:21:01 +09:00
Peter Eisentraut
1141e29b61 Revert "pg_stat_statements: Add coverage for entry_dealloc()"
This reverts commit 742f6b3e6df980d7dafa4a18a165d285483d5f0e.

The new test failed on big-endian platforms.

Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org
2023-12-31 15:30:57 +01:00
Peter Eisentraut
742f6b3e6d pg_stat_statements: Add coverage for entry_dealloc()
This involves creating more than pg_stat_statements.max entries and
checking that the limit is kept and the least used entries are kicked
out.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org
2023-12-30 20:18:23 +01:00
Peter Eisentraut
3e527aeeed pg_stat_statements: Add test coverage for pg_stat_statements_reset_1_7
Run pg_stat_statements_reset() once while the appropriate extension
version is installed.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org
2023-12-27 10:48:01 +01:00
Peter Eisentraut
3727b8d0e3 pg_stat_statements: Add test coverage for pg_stat_statements_1_8()
This requires reading pg_stat_statements at least once while the 1.8
version of the extension is installed.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/40d1e4f2-835f-448f-a541-8ff5db75bf3d@eisentraut.org
2023-12-27 10:48:01 +01:00
Alexander Korotkov
dc9f8a7983 Track statement entry timestamp in contrib/pg_stat_statements
This patch adds 'stats_since' and 'minmax_stats_since' columns to the
pg_stat_statements view and pg_stat_statements() function.  The new min/max
reset mode for the pg_stat_stetments_reset() function is controlled by the
parameter minmax_only.

'stat_since' column is populated with the current timestamp when a new
statement is added to the pg_stat_statements hashtable.  It provides clean
information about statistics collection time intervals for each statement.
Besides it can be used by sampling solutions to detect situations when a
statement was evicted and stored again between samples.

Such a sampling solution could derive any pg_stat_statements statistic values
for an interval between two samples with the exception of all min/max
statistics. To address this issue this patch adds the ability to reset
min/max statistics independently of the statement reset using the new
minmax_only parameter of the pg_stat_statements_reset(userid oid, dbid oid,
queryid bigint, minmax_only boolean) function. The timestamp of such reset
is stored in the minmax_stats_since field for each statement.
pg_stat_statements_reset() function now returns the timestamp of a reset as the
result.

Discussion: https://postgr.es/m/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
Author: Andrei Zubkov
Reviewed-by: Julien Rouhaud, Hayato Kuroda, Yuki Seino, Chengxi Sun
Reviewed-by: Anton Melnikov, Darren Rush, Michael Paquier, Sergei Kornilov
Reviewed-by: Alena Rybakina, Andrei Lepikhov
2023-11-27 02:52:17 +02:00
Alexander Korotkov
6ab1dbd26b Add NOT NULL checking of pg_stat_statements_reset() in tests
This is preliminary patch.  It adds NOT NULL checking for the result of
pg_stat_statements_reset() function. It is needed for upcoming patch
"Track statement entry timestamp" that will change the result type of
this function to the timestamp of a reset performed.

Discussion: https://postgr.es/m/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
Author: Andrei Zubkov
Reviewed-by: Julien Rouhaud, Hayato Kuroda, Yuki Seino, Chengxi Sun
Reviewed-by: Anton Melnikov, Darren Rush, Michael Paquier, Sergei Kornilov
Reviewed-by: Alena Rybakina, Andrei Lepikhov
2023-11-27 02:52:17 +02:00
Michael Paquier
108161bcb9 pg_stat_statements: Remove duplicated tests for SET statements
This looks like a copy-paste mistake introduced in de2aca288569, that
has added checks for more patterns of SET statements while ignoring the
original test block that existed.

Backpatch down to where this has been introduced, as this shaves some
cycles.

Author: Sergei Kornilov
Discussion: https://postgr.es/m/5689421699428803@mail-sendbernar-production-main-46.myt.yp-c.yandex.net
Backpatch-through: 16
2023-11-09 10:04:31 +09:00
Tom Lane
76db9cb636 Fix some issues with tracking nesting level in pg_stat_statements.
When we decide that we don't want to track execution time of a
specific planner or ProcessUtility call, we still have to increment
the nesting depth, or we'll make the wrong determination of whether
we are at top level when considering nested statements.  (PREPARE
and EXECUTE are exceptions, for reasons explained in the code.)

Counting planner nesting depth separately from executor nesting depth
was a mistake: it causes us to make the wrong determination of whether
we are at top level when considering nested statements that get
executed during planning (as a result of constant-folding of
functions, for example).  Merge those counters into one.

In passing, get rid of the PGSS_HANDLED_UTILITY macro in favor of
explicitly listing statement types.  It seems somewhat coincidental
that PREPARE and EXECUTE are handled alike in each of the places where
that was used: the reasoning tends to be different for each one.
Thus, the macro seems as likely to encourage future bugs as prevent
them, since it's quite unclear whether any future statement type that
might need special-casing here would also need the same choices at
each spot.

Sergei Kornilov, Julien Rouhaud, and Tom Lane, per bug #17552 from
Maxim Boguk.  This is pretty clearly a bug fix, but it's also a
behavioral change that might surprise somebody, so no back-patch.

Discussion: https://postgr.es/m/17552-213b534c56ab5d02@postgresql.org
2023-11-08 12:01:28 -05:00
Daniel Gustafsson
c5a032e518 Update oldextversions test for pg_stat_statements 1.11
Commit 5a3423ad8e updated pg_stat_statements to 1.11 due to added
columns in the view for jit deform counters. Fixing oldextversion
was however missed in that commit.  This adds a test for the 1.11
version view definition.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ2Y7c2VEg4S1KDXFcaHjyF8=KZa_rMV6WOe+KwE-KE97g@mail.gmail.com
2023-10-13 13:50:18 +02:00
Michael Paquier
11c34b342b Show parameters of CALL as constants in pg_stat_statements
This commit changes the query jumbling of CallStmt so as its IN/OUT
parameters are able to show up as constants with a parameter symbol in
pg_stat_statements, like:
CALL proc1($1, $2);
CALL proc2($1, $2, $3);

The transformed FuncExpr is used in the query ID computation instead of
the FuncCall generated by the parser, so as it is sensitive to the OID
of the procedure and its list of input arguments.  The output arguments
are handled in a separate list in CallStmt, which is also included in
the computation.

Tests are added to pg_stat_statements to show how this affects CALL with
IN/OUT parameters as well as overloaded functions.

Like 638d42a3c520 or 31de7e60da34, this improves the monitoring of
workloads with a lot of CALL statements, preventing unnecessary bloat
when these use different input (or event output) values.

Author: Sami Imseih
Discussion: https://postgr.es/m/B44FA29D-EBD0-4DD9-ABC2-16F1CB087074@amazon.com
2023-09-28 15:17:55 +09:00
Andres Freund
7369798a83 Fix tracking of temp table relation extensions as writes
Karina figured out that I (Andres) confused BufferUsage.temp_blks_written with
BufferUsage.local_blks_written in fcdda1e4b5.

Tests in core PG can't easily test this, as BufferUsage is just used for
EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements. Thus this commit adds tests
for this to pg_stat_statements.

Reported-by: Karina Litskevich <litskevichkarina@gmail.com>
Author: Karina Litskevich <litskevichkarina@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACiT8ibxXA6+0amGikbeFhm8B84XdQVo6D0Qfd1pQ1s8zpsnxQ@mail.gmail.com
Backpatch: 16-, where fcdda1e4b5 was merged
2023-09-13 19:14:09 -07:00
Michael Paquier
bb45156f34 Show names of DEALLOCATE as constants in pg_stat_statements
This commit switches query jumbling so as prepared statement names are
treated as constants in DeallocateStmt.  A boolean field is added to
DeallocateStmt to make a distinction between ALL and named prepared
statements, as "name" was used to make this difference before, NULL
meaning DEALLOCATE ALL.

Prior to this commit, DEALLOCATE was not tracked in pg_stat_statements,
for the reason that it was not possible to treat its name parameter as a
constant.  Now that query jumbling applies to all the utility nodes,
this reason does not apply anymore.

Like 638d42a3c520, this can be a huge advantage for monitoring where
prepared statement names are randomly generated, preventing bloat in
pg_stat_statements.  A couple of tests are added to track the new
behavior.

Author: Dagfinn Ilmari Mannsåker, Michael Paquier
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz
2023-08-27 17:27:44 +09:00
Michael Paquier
638d42a3c5 Show GIDs of two-phase commit commands as constants in pg_stat_statements
This relies on the "location" field added to TransactionStmt in 31de7e6,
now applied to the "gid" field used by 2PC commands.  These commands are
now reported like:
COMMIT PREPARED $1
PREPARE TRANSACTION $1
ROLLBACK PREPARED $1

Applying constants for these commands is a huge advantage for workloads
that rely a lot on 2PC commands with different GIDs.  Some tests are
added to track the new behavior.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz
2023-08-12 10:44:15 +09:00
Michael Paquier
31de7e60da Show savepoint names as constants in pg_stat_statements
In pg_stat_statements, savepoint names now show up as constants with a
parameter symbol, using as base query string the one added as a new
entry to the PGSS hash table, leading to:
RELEASE $1
ROLLBACK TO $1
SAVEPOINT $1

Applying constants to these query parts is a huge advantage for
workloads that generate randomly savepoint points, like ORMs (Django is
at the origin of this patch).  The ODBC driver is a second layer that
likes a lot savepoints, though it does not use a random naming pattern.

A "location" field is added to TransactionStmt, now set only for
savepoints.  The savepoint name is ignored by the query jumbling.  The
location can be extended to other query patterns, if required, like 2PC
commands.  Some tests are added to pg_stat_statements for all the query
patterns supported by the parser.

ROLLBACK, ROLLBACK TO SAVEPOINT and ROLLBACK TRANSACTION TO SAVEPOINT
have the same Node representation, so all these are equivalents.  The
same happens for RELEASE and RELEASE SAVEPOINT.

Author: Greg Sabino Mullane
Discussion: https://postgr.es/m/CAKAnmm+2s9PA4OaumwMJReWHk8qvJ_-g1WqxDRDAN1BSUfxyTw@mail.gmail.com
2023-07-27 09:42:33 +09:00
Michael Paquier
9a714b9d6e Improve cleanup phases in regression tests of pg_stat_statements
As shaped, two DROP ROLE queries included in "user_activity" were
showing in the reports for "wal".  The intention is to keep each test
isolated and independent, so this is incorrect.  This commit adds some
calls to pg_stat_statements_reset() to clean up the statistics once each
test finishes, so as there are no risks of overlap in the reports for
individial scenarios.

The addition in "user_activity" fixes the output of "wal".  The new
resets done in "level_tracking" and "utility" are added for consistency
with the rest, though they do not affect the stats generated in the
other tests.

Oversight in d0028e3.

Reported-by: Andrei Zubkov
Discussion: https://postgr.es/m/7beb722dd016bf54f1c78bfd6d44a684e28da624.camel@moonset.ru
2023-03-07 08:58:13 +09:00
Michael Paquier
d0028e35a0 Refactor more the regression tests of pg_stat_statements
This commit expands more the refactoring of the regression tests of
pg_stat_statements, with tests moved out of pg_stat_statements.sql into
separate files.  The following file structure is now used:
- select is mostly the former pg_stat_statements.sql, renamed.
- dml for INSERT/UPDATE/DELETE and MERGE
- user_activity, to test role-level checks and stat resets.
- wal, to check the WAL generation after some queries.

Like e8dbdb1, there is no change in terms of code coverage or results,
and this finishes the split I was aiming for in these tests.  Most of
the tests used "test" of "pgss_test" as names for the tables used, these
are renamed to less generic names.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Y/7Y9U/y/keAW3qH@paquier.xyz
2023-03-03 08:46:11 +09:00
Michael Paquier
de2aca2885 Expand regression tests of pg_stat_statements for utility queries
This commit adds more coverage for utility statements so as it is
possible to track down all the effects of query normalization done for
all the queries that use either Const or A_Const nodes, which are the
nodes where normalization makes the most sense as they apply to
constants (well, most of the time, really).

This set of queries is extracted from an analysis done while looking at
full dumps of the regression database when applying different levels of
normalization to either Const or A_Const nodes for utilities, as of a
minimal set of these, for:
- All relkinds (CREATE, ALTER, DROP)
- Policies
- Cursors
- Triggers
- Types
- Rules
- Statistics
- CALL
- Transaction statements (isolation level, options)
- EXPLAIN
- COPY

Note that pg_stat_statements is not switched yet to show any
normalization for utilities, still it improves the default coverage of
the query jumbling code (not by as much as enabling query jumbling on
the main regression test suite, though):
- queryjumblefuncs.funcs.c: 36.8% => 48.5%
- queryjumblefuncs.switch.c: 33.2% => 43.1%

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Y+MRdEq9W9XVa2AB@paquier.xyz
2023-02-20 10:16:51 +09:00
Michael Paquier
e8dbdb15db Refactor tests of pg_stat_statements for planning, utility and level tracking
pg_stat_statements.sql acts as the main file for all the core tests of
the module, but things have become complicated to follow over the years
as some of the sub-scenarios tested in this file rely on assumptions
that come from completely different areas of it, like a GUC setup or a
relation created previously.  For example, row tracking for CTAS/COPY
was looking at the number of plans, which was not necessary, or level
tracking was mixed with checks on planner counts.

This commit refactors the tests of pg_stat_statements, by moving test
cases out of pg_stat_statements.sql into their own file, as of:
- Planning-related tests in planning.sql, for [re]plan counts and
top-level handling.  These depend on pg_stat_statements.track_planning.
- Utilities in utility.sql (pg_stat_statements.track_utility), that
includes now the tests for:
-- Row tracking for CTAS, CREATE MATERIALIZED VIEW, COPY.
-- Basic utility statements.
-- SET statements.
- Tracking level, depending on pg_stat_statements.track.  This part has
been looking at scenarios with DO blocks, PL functions and SQL
functions.

pg_stat_statements.sql (still named the same for now) still includes
some checks for role-level tracking and WAL generation metrics, that
ought to become independent in the long term for clarity.

While on it, this switches the order of the attributes when querying
pg_stat_statements, the query field becoming last.  This makes much
easier the tracking of changes related to normalization, as queries are
the only variable-length attributes queried (unaligned mode would be one
extra choice, but that reduces the checks on the other fields).

Test scenarios and their results match exactly with what was happening
before this commit in terms of calls, number of plans, number of rows,
cached data or level tracking, so this has no effect on the coverage in
terms of what is produced by the reports in the table
pg_stat_statements.  A follow-up patch will extend more the tests of
pg_stat_statements around utilities, so this split creates a foundation
for this purpose, without complicating more pg_stat_statements.sql.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Y+MRdEq9W9XVa2AB@paquier.xyz
2023-02-20 09:28:29 +09:00
Michael Paquier
9ba37b2cb6 Include values of A_Const nodes in query jumbling
Like the implementation for node copy, write and read, this node
requires a custom implementation so as the query jumbling is able to
consider the correct value assigned to it, depending on its type (int,
float, bool, string, bitstring).

Based on a dump of pg_stat_statements from the regression database, this
would confuse the query jumbling of the following queries:
- SET.
- COPY TO with SELECT queries.
- START TRANSACTION with different isolation levels.
- ALTER TABLE with default expressions.
- CREATE TABLE with partition bounds.

Note that there may be a long-term argument in tracking the location of
such nodes so as query strings holding such nodes could be normalized,
but this is left as a separate discussion.

Oversight in 3db72eb.

Discussion: https://postgr.es/m/Y9+HuYslMAP6yyPb@paquier.xyz
2023-02-07 09:03:54 +09:00
Michael Paquier
3db72ebcbe Generate code for query jumbling through gen_node_support.pl
This commit changes the query jumbling code in queryjumblefuncs.c to be
generated automatically based on the information of the nodes in the
headers of src/include/nodes/ by using gen_node_support.pl.  This
approach offers many advantages:
- Support for query jumbling for all the utility statements, based on the
state of their parsed Nodes and not only their query string.  This will
greatly ease the switch to normalize the information of some DDLs, like
SET or CALL for example (this is left unchanged and should be part of a
separate discussion).  With this feature, the number of entries stored
for utilities in pg_stat_statements is reduced (for example now
"CHECKPOINT" and "checkpoint" mean the same thing with the same query
ID).
- Documentation of query jumbling directly in the structure definition
of the nodes.  Since this code has been introduced in pg_stat_statements
and then moved to code, the reasons behind the choices of what should be
included in the jumble are rather sparse.  Note that some explanation is
added for the most relevant parts, as a start.
- Overall code reduction and more consistency with the other parts
generating read, write and copy depending on the nodes.

The query jumbling is controlled by a couple of new node attributes,
documented in nodes/nodes.h:
- custom_query_jumble, to mark a Node as having a custom
implementation.
- no_query_jumble, to ignore entirely a Node.
- query_jumble_ignore, to ignore a field in a Node.
- query_jumble_location, to mark a location in a Node, for
normalization.  This can apply only to int fields, with "location" in
their name (only Const as of this commit).

There should be no compatibility impact on pg_stat_statements, as the
new code applies the jumbling to the same fields for each node (its
regression tests have no modification, for one).

Some benchmark of the query jumbling between HEAD and this commit for
SELECT and DMLs has proved that this new code does not cause a
performance regression, with computation times close for both methods.
For utility queries, the new method is slower than the previous method
of calculating a hash of the query string, though we are talking about
extra ns-level changes based on what I measured, which is unnoticeable
even for OLTP workloads as a query ID is calculated once per query
post-parse analysis.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
2023-01-31 15:24:05 +09:00
Alvaro Herrera
249b0409b1
Fix pg_stat_statements for MERGE
We weren't jumbling the merge action list, so wildly different commands
would be considered to use the same query ID.  Add that, mention it in
the docs, and some test lines.

Backpatch to 15.

Author: Tatsu <bt22nakamorit@oss.nttdata.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/d87e391694db75a038abc3b2597828e8@oss.nttdata.com
2022-09-27 10:44:42 +02:00
Robert Haas
2d7ead8526 pg_stat_statements: Fix test that assumes wal_records = rows.
It's not very robust to assume that each inserted row will produce
exactly one WAL record and that no other WAL records will be generated
in the process, because for example a particular transaction could
always be the one that has to extend clog.

Because these tests are not run by 'make installcheck' but only by
'make check', it may be that in our current testing infrastructure
this can't be hit, but it doesn't seem like a good idea to rely on
that, since unrelated changes to the regression tests or the way
write-ahead logging is done could easily cause it to start happening,
and debugging such failures is a pain.

Adjust the regression test to be less sensitive.

Anton Melnikov, reviewed by Julien Rouhaud

Discussion: http://postgr.es/m/1ccd00d9-1723-6b68-ae56-655aab00d406@inbox.ru
2022-07-06 13:05:51 -04:00
Michael Paquier
76cbf7edb6 pg_stat_statements: Track I/O timing for temporary file blocks
This commit adds two new columns to pg_stat_statements, called
temp_blk_read_time and temp_blk_write_time.  Those columns respectively
show the time spent to read and write temporary file blocks on disk,
whose tracking has been added in efb0ef9.  This information is
available when track_io_timing is enabled, like blk_read_time and
blk_write_time.

pg_stat_statements is updated to version to 1.10 as an effect of the
newly-added columns.  Tests for the upgrade path 1.9->1.10 are added.

PGSS_FILE_HEADER is bumped for the new stats file format.

Author: Masahiko Sawada
Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud,
Ranier Vilela
Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com
2022-04-08 13:12:07 +09:00
Michael Paquier
2b0da0365b pg_stat_statements: Add some tests for older versions still usable
When the newest version is loaded, the backend would load objects from
the oldest complete SQL file (here 1.4) and then update to the latest
version with transition scripts (up to 1.9 currently).  This provides
some coverage for upgrades of pg_stat_statements, but there is no test
to show how things have changed across each version.

This adds a couple of tests for the upgrade paths using objects from
each version supported, stressing the objects whose behaviors have
changed across each version supported.

Author: Erica Zhang
Reviewed-by: Julien Rouhaud, Michael Paquier
Discussion: https://postgr.es/m/tencent_BBA974AFF61379F2345E782FD6C55891950A@qq.com
2021-10-02 17:40:13 +09:00
Tom Lane
0806d08d46 Harden pg_stat_statements tests against CLOBBER_CACHE_ALWAYS.
Turns out the buildfarm hasn't been testing this, which will soon change.

Julien Rouhaud, per report from me

Discussion: https://postgr.es/m/42557.1627229005@sss.pgh.pa.us
2021-07-25 23:25:15 -04:00
Bruce Momjian
f7a97b6ec3 Update query_id computation
Properly fix:

- the "ONLY" in FROM [ONLY] isn't hashed
- the agglevelsup field in GROUPING isn't hashed
- WITH TIES not being hashed (new in PG 13)
- "DISTINCT" in "GROUP BY [DISTINCT]" isn't hashed (new in PG 14)

Reported-by: Julien Rouhaud

Discussion: https://postgr.es/m/20210425081119.ulyzxqz23ueh3wuj@nol
2021-05-03 14:59:39 -04:00
Magnus Hagander
6b4d23feef Track identical top vs nested queries independently in pg_stat_statements
Changing pg_stat_statements.track between 'all' and 'top' would control
if pg_stat_statements tracked just top level statements or also
statements inside functions, but when tracking all it would not
differentiate between the two. Being table to differentiate this is
useful both to track where the actual query is coming from, and to see
if there are differences in executions between the two.

To do this, add a boolean to the hash key indicating if the statement
was top level or not.

Experience from the pg_stat_kcache module shows that in at least some
"reasonable worloads" only <5% of the queries show up both top level and
nested. Based on this, admittedly small, dataset, this patch does not
try to de-duplicate those query *texts*, and will just store one copy
for the top level and one for the nested.

Author: Julien Rohaud
Reviewed-By: Magnus Hagander, Masahiro Ikeda
Discussion: https://postgr.es/m/20201202040516.GA43757@nol
2021-04-08 10:30:34 +02:00
Fujii Masao
9fbc3f318d pg_stat_statements: Track number of times pgss entries were deallocated.
If more distinct statements than pg_stat_statements.max are observed,
pg_stat_statements entries about the least-executed statements are
deallocated. This commit enables us to track the total number of times
those entries were deallocated. That number can be viewed in the
pg_stat_statements_info view that this commit adds. It's useful when
tuning pg_stat_statements.max parameter. If it's high, i.e., the entries
are deallocated very frequently, which might cause the performance
regression and we can increase pg_stat_statements.max to avoid those
frequent deallocations.

The pg_stat_statements_info view is intended to display the statistics
of pg_stat_statements module itself. Currently it has only one column
"dealloc" indicating the number of times entries were deallocated.
But an upcoming patch will add other columns (for example, the time
at which pg_stat_statements statistics were last reset) into the view.

Author: Katsuragi Yuta, Yuki Seino
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0d9f1107772cf5c3f954e985464c7298@oss.nttdata.com
2020-11-26 21:18:05 +09:00
Fujii Masao
b62e6056a0 pg_stat_statements: track number of rows processed by REFRESH MATERIALIZED VIEW.
Commit 6023b7ea71 allowed pg_stat_statements to track the number
of rows retrieved or affected by some utility commands including
CREATE MATERIALIZED VIEW. However it did not track the rowcount
of REFRESH MATERIALIZED VIEW. This commit allows pg_stat_statements
to track that.

To track that, this commit changes the query completion for
REFRESH MATERIALIZED VIEW so that it saves the rowcount. But note that
the rowcount is still not displayed in the command completion tag output.
That is, the display_rowcount flag of CMDTAG_REFRESH_MATERIALIZED_VIEW
command tag is left false in cmdtaglist.h. Otherwise, the change of
completion tag output might break applications using it.

Author: Katsuragi Yuta, Seino Yuki
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/71f6bc72f8bbaa06e701f8bd2562c347@oss.nttdata.com
Discussion: https://postgr.es/m/aadbfba9-e4bb-9531-6b3a-d13c31c8f4fe@oss.nttdata.com
2020-11-12 11:26:55 +09:00
Fujii Masao
6023b7ea71 pg_stat_statements: track number of rows processed by some utility commands.
This commit makes pg_stat_statements track the total number
of rows retrieved or affected by CREATE TABLE AS, SELECT INTO,
CREATE MATERIALIZED VIEW and FETCH commands.

Suggested-by: Pascal Legrand
Author: Fujii Masao
Reviewed-by: Asif Rehman
Discussion: https://postgr.es/m/1584293755198-0.post@n3.nabble.com
2020-07-29 23:21:55 +09:00
Fujii Masao
d1763ea8c9 Change default of pg_stat_statements.track_planning to off.
Since v13 pg_stat_statements is allowed to track the planning time of
statements when track_planning option is enabled. Its default was on.

But this feature could cause more terrible spinlock contentions in
pg_stat_statements. As a result of this, Robins Tharakan reported that
v13 beta1 showed ~45% performance drop at high DB connection counts
(when compared with v12.3) during fully-cached SELECT-only test using
pgbench.

To avoid this performance regression by the default setting,
this commit changes default of pg_stat_statements.track_planning to off.

Back-patch to v13 where pg_stat_statements.track_planning was introduced.

Reported-by: Robins Tharakan
Author: Fujii Masao
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com
2020-07-03 11:35:22 +09:00
Amit Kapila
6b466bf5f2 Allow pg_stat_statements to track WAL usage statistics.
This commit adds three new columns in pg_stat_statements output to
display WAL usage statistics added by commit df3b181499.

This commit doesn't bump the version of pg_stat_statements as the
same is done for this release in commit 17e0328224.

Author: Kirill Bychik and Julien Rouhaud
Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-05 07:34:04 +05:30
Fujii Masao
17e0328224 Allow pg_stat_statements to track planning statistics.
This commit makes pg_stat_statements support new GUC
pg_stat_statements.track_planning. If this option is enabled,
pg_stat_statements tracks the planning statistics of the statements,
e.g., the number of times the statement was planned, the total time
spent planning the statement, etc. This feature is useful to check
the statements that it takes a long time to plan. Previously since
pg_stat_statements tracked only the execution statistics, we could
not use that for the purpose.

The planning and execution statistics are stored at the end of
each phase separately. So there are not always one-to-one relationship
between them. For example, if the statement is successfully planned
but fails in the execution phase, only its planning statistics are stored.
This may cause the users to be able to see different pg_stat_statements
results from the previous version. To avoid this,
pg_stat_statements.track_planning needs to be disabled.

This commit bumps the version of pg_stat_statements to 1.8
since it changes the definition of pg_stat_statements function.

Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
2020-04-02 11:20:19 +09:00
Andrew Gierth
6e74c64bcf Teach pg_stat_statements not to ignore FOR UPDATE clauses
Performance of a SELECT FOR UPDATE may be quite distinct from the
non-UPDATE version of the query, so treat all of the FOR UPDATE clause
as being significant for distinguishing queries.

Andrew Gierth and Vik Fearing, reviewed by Sergei Kornilov, Thomas
Munro, Tom Lane

Discussion: https://postgr.es/m/87h8e4hfwv.fsf@news-spur.riddles.org.uk
2019-07-14 12:07:40 +01:00
Amit Kapila
43cbedab8f Extend pg_stat_statements_reset to reset statistics specific to a
particular user/db/query.

The function pg_stat_statements_reset() is extended to accept userid, dbid,
and queryid as input parameters.  Now, it can discard the statistics
gathered so far by pg_stat_statements corresponding to the specified
userid, dbid, and queryid.  If no parameter is specified or all the
specified parameters have default value aka 0, it will discard all
statistics as per the old behavior.

The new behavior is useful to get the fresh statistics for a specific
user/database/query without resetting all the existing statistics.

Author: Haribabu Kommi, with few additional changes by me
Reviewed-by: Michael Paquier, Amit Kapila and Fujii Masao
Discussion: https://postgr.es/m/CAJrrPGcyh-gkFswyc6C661K6cknL0XkNqVT0sQt2mFNMR4HRKA@mail.gmail.com
2019-01-11 08:50:09 +05:30
Tom Lane
a6f22e8356 Show ignored constants as "$N" rather than "?" in pg_stat_statements.
The trouble with the original choice here is that "?" is a valid (and
indeed used) operator name, so that you could end up with ambiguous
statement texts like "SELECT ? ? ?".  With this patch, you instead
see "SELECT $1 ? $2", which seems significantly more readable.
The numbers used for this purpose begin after the last actual $N parameter
in the particular query.  The conflict with external parameters has its own
potential for confusion of course, but it was agreed to be an improvement
over the previous behavior.

Lukas Fittl

Discussion: https://postgr.es/m/CAP53PkxeaCuwYmF-A4J5z2-qk5fYFo5_NH3gpXGJJBxv1DMwEw@mail.gmail.com
2017-03-27 20:14:36 -04:00