Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
The default for the choice attribute of the <arg> element is "opt",
which would normally put the argument inside brackets. But the DSSSL
stylesheets contain a hack that treats <arg> directly inside <group>
specially, so that <group><arg>-x</arg><arg>-y</arg></group> comes out
as [ -x | -y ] rather than [ [-x] | [-y] ], which it would technically
be. But when building man pages, this doesn't work, and so the
command synopses on the man pages contain lots of extra brackets.
By putting choice="opt" or choice="plain" explicitly on every <arg>
and <group> element, we avoid any toolchain dependencies like that,
and it also makes it clearer in the source code what is meant.
In passing, make some small corrections in the documentation about
which arguments are really optional or not.
Add test cases for inline handler of plython2u (when using that
language name), and for result object element assignment. There is
now at least one test case for every top-level functionality, except
plpy.Fatal (annoying to use in regression tests) and result object
slice retrieval and slice assignment (which are somewhat broken).
Allocate PLyResultObject.tupdesc in TopMemoryContext, because its
lifetime is the lifetime of the Python object and it shouldn't be
freed by some other memory context, such as one controlled by SPI. We
trust that the Python object will clean up its own memory.
Before, this would crash the included regression test case by trying
to use memory that was already freed.
reported by Asif Naeem, analysis by Tom Lane
At the time we check whether the tuple is dead to all running
transactions, we've already verified that it isn't visible to our
scan, setting hint bits if appropriate. So there's no need to
recheck CLOG for the all-dead test we do just a moment later.
So, add HeapTupleIsSurelyDead() to test the appropriate condition
under the assumption that all relevant hit bits are already set.
Review by Tom Lane.
Remove the following ports:
- dgux
- nextstep
- sunos4
- svr4
- ultrix4
- univel
These are obsolete and not worth rescuing. In most cases, there is
circumstantial evidence that they wouldn't work anymore anyway.
Add more markup in particular so that the command options appear
consistently in monospace in the HTML output.
On the vacuumdb reference page, remove listing all the possible
options in the synopsis. They have become too many now; we have the
detailed options list for that.
We had changed this from the default bold to monospace for all output
formats, but for man pages, this creates visual inconsistencies, so
revert to the default for man pages.
This patch adjusts the core statistics views to match the decision already
taken for pg_stat_statements, that values representing elapsed time should
be represented as float8 and measured in milliseconds. By using float8,
we are no longer tied to a specific maximum precision of timing data.
(Internally, it's still microseconds, but we could now change that without
needing changes at the SQL level.)
The columns affected are
pg_stat_bgwriter.checkpoint_write_time
pg_stat_bgwriter.checkpoint_sync_time
pg_stat_database.blk_read_time
pg_stat_database.blk_write_time
pg_stat_user_functions.total_time
pg_stat_user_functions.self_time
pg_stat_xact_user_functions.total_time
pg_stat_xact_user_functions.self_time
The first four of these are new in 9.2, so there is no compatibility issue
from changing them. The others require a release note comment that they
are now double precision (and can show a fractional part) rather than
bigint as before; also their underlying statistics functions now match
the column definitions, instead of returning bigint microseconds.
Get rid of the per-column documentation of underlying functions, which did
far more to clutter the view descriptions than it did to be helpful, and
was rather incomplete and typo-ridden anyway. Instead suggest that people
consult the definitions of the standard views to see the underlying
functions.
The older functions for obtaining individual facts about backends are now
somewhat obsoleted by pg_stat_get_activity, which means that they are not
documented by any standard view. So I put that information into a separate
table. (Maybe we should just deprecate them instead?)
In passing, fix a couple more documentation errors.
In ancient times, it was thought that this wouldn't work because of
TrapMacro/AssertMacro, but changing those to use a comma operator
appears to work without compiler warnings.
Display total time and I/O timings in milliseconds, for consistency with
the units used for timings in the core statistics views. The columns
remain of float8 type, so that sub-msec precision is available. (At some
point we will probably want to convert the core views to use float8 type
for the same reason, but this patch does not touch that issue.)
This is a release-note-requiring change in the meaning of the total_time
column. The I/O timing columns are new as of 9.2, so there is no
compatibility impact from redefining them.
Do some minor copy-editing in the documentation, too.
Normally whole-row Vars are printed as "tabname.*". However, that does not
work at top level of a targetlist, because per SQL standard the parser will
think that the "*" should result in column-by-column expansion; which is
not at all what a whole-row Var implies. We used to just print the table
name in such cases, which works most of the time; but it fails if the table
name matches a column name available anywhere in the FROM clause. This
could lead for instance to a view being interpreted differently after dump
and reload. Adding parentheses doesn't fix it, but there is a reasonably
simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
syntactically at top level anymore. This makes the printing of such
whole-row Vars a lot more consistent with other Vars, and may indeed fix
more cases than just the reported one; I'm suspicious that cases involving
schema qualification probably didn't work properly before, either.
Per bug report and fix proposal from Abbas Butt, though this patch is quite
different in detail from his.
Back-patch to all supported versions.
If it fails to open a new log file, the syslogger assumes there's something
wrong with its parameters (such as log_directory), and stops attempting
automatic time-based or size-based log file rotations. Sending it SIGHUP
is supposed to start that up again. However, the original coding for that
was really bogus, involving clobbering a couple of GUC variables and hoping
that SIGHUP processing would restore them. Get rid of that technique in
favor of maintaining a separate flag showing we've turned rotation off.
Per report from Mark Kirkwood.
Also, the syslogger will automatically attempt to create the log_directory
directory if it doesn't exist, but that was only happening at startup.
For consistency and ease of use, it should do the same whenever the value
of log_directory is changed by SIGHUP.
Back-patch to all supported branches.
The alternative of disallowing index-only scans in HS operation was
discussed, but the consensus was that it was better to treat marking
a page all-visible as a recovery conflict for snapshots that could still
fail to see XIDs on that page. We may in the future try to soften this,
so that we simply force index scans to do heap fetches in cases where
this may be an issue, rather than throwing a hard conflict.
Get rid of section 8.5.6 (Date/Time Internals), which appears to confuse
people more than it helps, and anyway discussion of Postgres' internal
datetime calculation methods seems pretty out of place here. Instead,
make datatype.sgml just say that we follow the Gregorian calendar (a bit
of specification not previously present anywhere in that chapter :-()
and link to the History of Units appendix for more info. Do some mild
editorialization on that appendix, too, to make it clearer that we are
following proleptic Gregorian calendar rules rather than anything more
historically accurate.
Per a question from Florence Cousin and subsequent discussion in
pgsql-docs.
bitmap_scan_cost_est() has to be able to cope with a BitmapOrPath, but
I'd taken a shortcut that didn't work for that case. Noted by Heikki.
Add some regression tests since this area is evidently under-covered.
Before 9.1, PL/Python functions returning composite types could return
a string and it would be parsed using record_in. The 9.1 changes made
PL/Python only expect dictionaries, tuples, or objects supporting
getattr as output of composite functions, resulting in a regression
and a confusing error message, as the strings were interpreted as
sequences and the code for transforming lists to database tuples was
used. Fix this by treating strings separately as before, before
checking for the other types.
The reason why it's important to support string to database tuple
conversion is that trigger functions on tables with composite columns
get the composite row passed in as a string (from record_out).
Without supporting converting this back using record_in, this makes it
impossible to implement pass-through behavior for these columns, as
PL/Python no longer accepts strings for composite values.
A better solution would be to fix the code that transforms composite
inputs into Python objects to produce dictionaries that would then be
correctly interpreted by the Python->PostgreSQL counterpart code. But
that would be too invasive to backpatch to 9.1, and it is too late in
the 9.2 cycle to attempt it. It should be revisited in the future,
though.
Reported as bug #6559 by Kirill Simonov.
Jan Urbański
We have been seeing intermittent buildfarm failures due to a query
sometimes not using an index-only scan plan, because a background
auto-ANALYZE prevented the table's all-visible bits from being set
immediately, thereby causing the estimated cost of an index-only scan
to go up considerably. Adjust the test case so that a bitmap index scan is
preferred instead, which serves equally well for the purpose the test case
is actually meant for. (Of course, it would be better to eliminate the
interference from auto-ANALYZE, but I see no low-risk way to do that,
so any such fix will have to be left for 9.3 or later.)
setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists,
which meant they were left with the wrong varnos when the RETURNING list
was in a subquery. That was never possible before writable CTEs, of
course, but now it's broken. The executor fails to notice any problem
because ExecEvalVar just references the ecxt_scantuple for any normal
varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a
recent complaint from Bartosz Dmytrak.
Since the eventual rtoffset of the subquery is not known at the time
we are preparing its plan node, the previous scheme of executing
set_returning_clause_references() at that time cannot handle this
adjustment. Fortunately, it turns out that we don't really need to do it
that way, because all the needed information is available during normal
setrefs.c execution; we just have to dig it out of the ModifyTable node.
So, do that, and get rid of the kluge of early setrefs processing of
RETURNING lists. (This is a little bit of a cheat in the case of inherited
UPDATE/DELETE, because we are not passing a "root" struct that corresponds
exactly to what the subplan was built with. But that doesn't matter, and
anyway this is less ugly than early setrefs processing was.)
Back-patch to 9.1, where the problem became possible to hit.
Due to rather sloppy thinking (on my part, I'm afraid) about the
appropriate behavior for boundary conditions, pg_next_dst_boundary() gave
undefined, platform-dependent results when the input time is exactly the
last recorded DST transition time for the specified time zone, as a result
of fetching values one past the end of its data arrays.
Change its specification to be that it always finds the next DST boundary
*after* the input time, and adjust code to match that. The sole existing
caller, DetermineTimeZoneOffset, doesn't actually care about this
distinction, since it always uses a probe time earlier than the instant
that it does care about. So it seemed best to me to change the API to make
the result=1 and result=0 cases more consistent, specifically to ensure
that the "before" outputs always describe the state at the given time,
rather than hacking the code to obey the previous API comment exactly.
Per bug #6605 from Sergey Burladyan. Back-patch to all supported versions.
Prohibiting this outright would break dumps taken from older versions
that contain such casts, which would create far more pain than is
justified here.
Per report by Jaime Casanova and subsequent discussion.