XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed
with a list link field labeled as "blkno", which was confusing, especially
when the link was empty (InvalidBlockNumber). Print the metapage block
number instead, since that's what's actually being updated. We could
include the link values too as a separate field, but not clear it's worth
the trouble.
Back-patch to 8.4 where the dubious code was added.
Commit 337b6f5ecf contained the entirely
fanciful assumption that it had made comparetup_datum unreachable.
Reported and patched by Takashi Yamamoto.
Fix up some not terribly accurate/useful comments from that commit, too.
If we make the initially-called function return the table physical-size
estimate, acquire_inherited_sample_rows will be able to use that to
allocate numbers of samples among child tables, when the day comes that
we want to support foreign tables in inheritance trees.
ANALYZE now accepts foreign tables and allows the table's FDW to control
how the sample rows are collected. (But only manual ANALYZEs will touch
foreign tables, for the moment, since among other things it's not very
clear how to handle remote permissions checks in an auto-analyze.)
contrib/file_fdw is extended to support this.
Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.
Somebody didn't bother to fix this comment while adding foreign table
support to the code below it.
In passing, remove the explicit calling-out of relkind letters, which adds
complexity to the comment but doesn't help in understanding the code.
Ants Aasma's original patch to add timing information for buffer I/O
requests exposed this data at the relation level, which was judged too
costly. I've here exposed it at the database level instead.
The parser got confused if a cursor parameter had the same name as
a plpgsql variable. Reported and diagnosed by Yeb Havinga, though
this isn't exactly his proposed fix.
Also, some mostly-but-not-entirely-cosmetic adjustments to the original
named-cursor-parameter patch, for code readability and better error
diagnostics.
This patch provides a test case for libpq's row processor API.
contrib/dblink can deal with very large result sets by dumping them into
a tuplestore (which can spill to disk) --- but until now, the intermediate
storage of the query result in a PGresult meant memory bloat for any large
result. Now we use a row processor to convert the data to tuple form and
dump it directly into the tuplestore.
A limitation is that this only works for plain dblink() queries, not
dblink_send_query() followed by dblink_get_result(). In the latter
case we don't know the desired tuple rowtype soon enough. While hack
solutions to that are possible, a different user-level API would
probably be a better answer.
Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
Traditionally libpq has collected an entire query result before passing
it back to the application. That provides a simple and transactional API,
but it's pretty inefficient for large result sets. This patch allows the
application to process each row on-the-fly instead of accumulating the
rows into the PGresult. Error recovery becomes a bit more complex, but
often that tradeoff is well worth making.
Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
There is no existing or foreseeable case in which psql should see a
PGRES_COPY_BOTH PQresultStatus; and if such a case ever emerges, it's a
pretty good bet that these code fragments wouldn't do the right thing
anyway. Remove them, and let the existing default cases do the appropriate
thing, namely emit an "unexpected PQresultStatus" bleat.
Noted while working on libpq row processor patch, for which I was
considering adding a PGRES_SUSPENDED status code --- the same default-case
treatment would be appropriate for that.
The original coding of the syslogger had an arbitrary limit of 20 large
messages concurrently in progress, after which it would just punt and dump
message fragments to the output file separately. Our ambitions are a bit
higher than that now, so allow the data structure to expand as necessary.
Reported and patched by Andrew Dunstan; some editing by Tom
dblink_exec leaked temporary database connections if any error occurred
after connection setup, for example
SELECT dblink_exec('...connect string...', 'select 1/0');
Add a PG_TRY block to ensure PQfinish gets done when it is needed.
(dblink_record_internal is on the hairy edge of needing similar treatment,
but seems not to be actively broken at the moment.)
Also, in 9.0 and up, only one of the three functions using tuplestore
return mode was properly checking that the query context would allow
a tuplestore result.
Noted while reviewing dblink patch. Back-patch to all supported branches.
Combining the loop workspace with the record of already-processed objects
might have been a cute trick, but it behaves horridly if there are many
dependency loops to repair: the time spent in the first step of findLoop()
grows as O(N^2). Instead use a separate flag array indexed by dump ID,
which we can check in constant time. The length of the workspace array
is now never more than the actual length of a dependency chain, which
should be reasonably short in all cases of practical interest. The code
is noticeably easier to understand this way, too.
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.
The loop that matched owned sequences to their owning tables required time
proportional to number of owned sequences times number of tables; although
this work was only expended in selective-dump situations, which is probably
why the issue wasn't recognized long since. Refactor slightly so that we
can perform this work after the index array for findTableByOid has been
set up, reducing the time to O(M log N).
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.
ecpg and pg_dump each contain keyword arrays with structure similar
to the backend's keyword array. Up to now, we actually named those
arrays the same as the backend's and relied on parser/keywords.h
to declare them. This seems a tad too cute, though, and it breaks
now that we need to PGDLLIMPORT-decorate the backend symbols.
Rename to avoid the problem. Per buildfarm.
(It strikes me that maybe we should get rid of the separate keywords.c
files altogether, and just define these arrays in the modules that use
them, but that's a rather more invasive change.)
Over-optimization (by me, looks like :-() broke the case of recognizing
a word boundary just before a quoted identifier. Reported and diagnosed
by Dean Rasheed.
Some of these are newly added, some are older and were forgotten, some
don't contain any translatable strings right now but look like they
could in the future.
The DBLINK_GET_CONN and DBLINK_GET_NAMED_CONN macros did not set the
surrounding function's conname variable, causing errors to be incorrectly
reported as having occurred on the "unnamed" connection in some cases.
This bug was actually visible in two cases in the regression tests,
but apparently whoever added those cases wasn't paying attention.
Noted by Kyotaro Horiguchi, though this is different from his proposed
patch.
Back-patch to 8.4; 8.3 does not have the same type of error reporting
so the patch is not relevant.
It's actually more useful for the module to ignore these. Ignoring
EXECUTE (and not incrementing the nesting level) allows the executor
hooks to charge the time to the underlying prepared query, which
shows up as a stats entry with the original PREPARE as query string
(possibly modified by suppression of constants, which might not be
terribly useful here but it's not worth avoiding). This is much more
useful than cluttering the stats table with a distinct entry for each
textually distinct EXECUTE.
Experimentation with this idea shows that it's also preferable to ignore
PREPARE. If we don't, we get two stats table entries, one with the query
string hash and one with the jumble-derived hash, but with the same visible
query string (modulo those constants). This is confusing and not very
helpful, since the first entry will only receive costs associated with
initial planning of the query, which is not something counted at all
normally by pg_stat_statements. (And if we do start tracking planning
costs, we'd want them blamed on the other hash table entry anyway.)
When tracking nested statements, contrib/pg_stat_statements formerly
double-counted the execution costs of utility statements that directly
contain an executable statement, such as EXPLAIN and DECLARE CURSOR.
This was not obvious since the ProcessUtility and Executor hooks
would each add their measured costs to the same stats table entry.
However, with the new implementation that hashes utility and plannable
statements differently, this showed up as seemingly-duplicate stats
entries. Fix that by disabling the Executor hooks when the query has a
queryId of zero, which was the case already for such statements but is now
more clearly specified in the code. (The zero queryId was causing problems
anyway because all such statements would add to a single bogus entry.)
The PREPARE/EXECUTE case still results in counting the same execution
in two different stats table entries, but it should be much less surprising
to users that there are two entries in such cases.
In passing, include a CommonTableExpr's ctename in the query hash.
I had left it out originally on the grounds that we wanted to omit all
inessential aliases, but since RTE_CTE RTEs are hashing their referenced
names, we'd better hash the CTE names too to make sure we don't hash
semantically different queries the same.
Some Windows-only messages had apparently been forgotten so far.
Also make the wording of the messages more consistent with similar
messages other parts, such as pg_ctl and pg_regress.
Initialise ckptXidEpoch from starting checkpoint and maintain the correct
value as we roll forwards. This allows GetNextXidAndEpoch() to return the
correct epoch when executed during recovery. Backpatch to 9.0 when the
problem is first observable by a user.
Bug report from Daniel Farina
Postmaster sets max_safe_fds by testing how many open file descriptors it
can open, and that is normally inherited by all child processes at fork().
Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we
effectively ignored max_files_per_process on Windows, and always assumed
a conservative default of 32 simultaneous open files. That could have an
impact on performance, if you need to access a lot of different files
in a query. After this patch, the value is passed to child processes by
save/restore_backend_variables() among many other global variables.
It has been like this forever, but given the lack of complaints about it,
I'm not backpatching this.
pg_stat_statements now hashes selected fields of the analyzed parse tree
to assign a "fingerprint" to each query, and groups all queries with the
same fingerprint into a single entry in the pg_stat_statements view.
In practice it is expected that queries with the same fingerprint will be
equivalent except for values of literal constants. To make the display
more useful, such constants are replaced by "?" in the displayed query
strings.
This mechanism currently supports only optimizable queries (SELECT,
INSERT, UPDATE, DELETE). Utility commands are still matched on the
basis of their literal query strings.
There remain some open questions about how to deal with utility statements
that contain optimizable queries (such as EXPLAIN and SELECT INTO) and how
to deal with expiring speculative hashtable entries that are made to save
the normalized form of a query string. However, fixing these issues should
require only localized changes, and since there are other open patches
involving contrib/pg_stat_statements, it seems best to go ahead and commit
what we've got.
Peter Geoghegan, reviewed by Daniel Farina