When heap_lock_tuple decides to follow the update chain, it tried to
also lock any version of the tuple that was created by an update that
was subsequently rolled back. This is pointless, since for all intents
and purposes that tuple exists no more; and moreover it causes
misbehavior, as reported independently by Marko Tiikkaja and Marti
Raudsepp: some SELECT FOR UPDATE/SHARE queries may fail to return
the tuples, and assertion-enabled builds crash.
Fix by having heap_lock_updated_tuple test the xmin and return success
immediately if the tuple was created by an aborted transaction.
The condition where tuples become invisible occurs when an updated tuple
chain is followed by heap_lock_updated_tuple, which reports the problem
as HeapTupleSelfUpdated to its caller heap_lock_tuple, which in turn
propagates that code outwards possibly leading the calling code
(ExecLockRows) to believe that the tuple exists no longer.
Backpatch to 9.3. Only on 9.5 and newer this leads to a visible
failure, because of commit 27846f02c176; before that, heap_lock_tuple
skips the whole dance when the tuple is already locked by the same
transaction, because of the ancient HeapTupleSatisfiesUpdate behavior.
Still, the buggy condition may also exist in more convoluted scenarios
involving concurrent transactions, so it seems safer to fix the bug in
the old branches too.
Discussion:
https://www.postgresql.org/message-id/CABRT9RC81YUf1=jsmWopcKJEro=VoeG2ou6sPwyOUTx_qteRsg@mail.gmail.comhttps://www.postgresql.org/message-id/48d3eade-98d3-8b9a-477e-1a8dc32a724d@joh.to
PageAddItem stores the item length as-is. It MAXALIGN's the amount of
space actually allocated for each tuple, but not the stored length.
PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact
are all on board with this and MAXALIGN item lengths after fetching them.
But PageIndexTupleDelete expects the stored length to be a MAXALIGN
multiple already. This accidentally works for existing index AMs because
they all maxalign their tuple sizes internally; but we don't do that for
heap tuples, and it shouldn't be a requirement for index tuples either.
So, sync PageIndexTupleDelete with the rest of bufpage.c by having it
maxalign the item size after fetching.
Also add a check that pd_special is maxaligned, to ensure that the test
"(offset + size) > phdr->pd_special" is still doing the right thing.
(If offset and pd_special are aligned, it doesn't matter whether size is.)
Again, this is in sync with the rest of the routines here, except for
PageAddItem which doesn't test because it doesn't actually do anything
for which pd_special alignment matters.
This shouldn't have any immediate functional impact; it just adds the
flexibility to use PageIndexTupleDelete on index tuples with non-aligned
lengths.
Discussion: <3814.1473366762@sss.pgh.pa.us>
plpgsql.h defines a number of enums, but most of the code passes them
around as ints. Update structs and function prototypes to take enum
types instead. This clarifies the struct definitions in plpgsql.h in
particular.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
We have a not-terribly-thoroughly-enforced-yet project policy that internal
errors with SQLSTATE XX000 (ie, plain elog) should not be triggerable from
SQL. record_in, domain_in, and PL validator functions all failed to meet
this standard, because they threw plain elog("cache lookup failed for XXX")
errors on bad OIDs, and those are all invokable from SQL.
For record_in, the best fix is to upgrade typcache.c (lookup_type_cache)
to throw a user-facing error for this case. That seems consistent because
it was more than halfway there already, having user-facing errors for shell
types and non-composite types. Having done that, tweak domain_in to rely
on the typcache to throw an appropriate error. (This costs little because
InitDomainConstraintRef would fetch the typcache entry anyway.)
For the PL validator functions, we already have a single choke point at
CheckFunctionValidatorAccess, so just fix its error to be user-facing.
Dilip Kumar, reviewed by Haribabu Kommi
Discussion: <87wpxfygg9.fsf@credativ.de>
Reading 2PC state files during recovery was borked, causing corruptions during
recovery. Effect limited to servers with 2PC, subtransactions and
recovery/replication.
Stas Kelvich, reviewed by Michael Paquier and Pavan Deolasee
So far md.c used a linked list of segments. That proved to be a problem
when processing large relations, because every smgr.c/md.c level access
to a page incurred walking through a linked list of all preceding
segments. Thus making accessing pages O(#segments).
Replace the linked list of segments hanging off SMgrRelationData with an
array of opened segments. That allows O(1) access to individual
segments, if they've previously been opened.
Discussion: <20140331101001.GE13135@alap3.anarazel.de>
Reviewed-By: Peter Geoghegan, Tom Lane (in an older version)
That's primarily useful for testing very large relations, using sparse
files.
Discussion: <20140331101001.GE13135@alap3.anarazel.de>
Reviewed-By: Peter Geoghegan
mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting
it. That lead to a fd.c handle to a truncated file being kept open until
backend exit.
The issue appears to have been introduced way back in 1a5c450f30,
before that the handle was closed inside FileUnlink().
The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE
for temporary tables, truncate files in place (i.e. TRUNCATE itself is
not affected), and the relation has to be bigger than 1GB. The
consequences of a leaked fd.c handle aren't severe either.
Discussion: <20160908220748.oqh37ukwqqncbl3n@alap3.anarazel.de>
Backpatch: all supported releases
commit_ts and test_pg_dump were declaring targets before including the
PGXS stanza, which meant that the "all" target customarily defined as
the first (and therefore default target) was not the default anymore.
Fix that by moving those target definitions to after PGXS.
commit_ts was initially good, but I broke it in commit 9def031bd2;
test_pg_dump was born broken, probably copying from commit_ts' mistake.
In passing, fix a comment mistake in test_pg_dump/Makefile.
Backpatch to 9.6.
Noted by Tom Lane.
Previously, if a schema was created by an extension, a normal pg_dump run
(not --binary-upgrade) would summarily skip every object in that schema.
In a case where an extension creates a schema and then users create other
objects within that schema, this does the wrong thing: we want pg_dump
to skip the schema but still create the non-extension-owned objects.
There's no easy way to fix this pre-9.6, because in earlier versions the
"dump" status for a schema is just a bool and there's no way to distinguish
"dump me" from "dump my members". However, as of 9.6 we do have enough
state to represent that, so this is a simple correction of the logic in
selectDumpableNamespace.
In passing, make some cosmetic fixes in nearby code.
Martín Marqués, reviewed by Michael Paquier
Discussion: <99581032-71de-6466-c325-069861f1947d@2ndquadrant.com>
If the database has a non-default tablespace, we emitted a TABLESPACE
clause in the CREATE DATABASE command emitted by -C, even if
--no-tablespaces was also specified. This seems wrong, and it's
inconsistent with what pg_dumpall does, so change it. Per bug #14315
from Danylo Hlynskyi.
Back-patch to 9.5. The bug is much older, but it'd be a more invasive
change before 9.5 because dumpDatabase() hasn't got an easy way to get
to the outputNoTablespaces flag. Doesn't seem worth the work given
the lack of previous complaints.
Report: <20160908081953.1402.75347@wrigleys.postgresql.org>
StandbyRecoverPreparedTransactions() leaked the buffer
used for two phase state file. This was leaked once
at startup and at every shutdown checkpoint seen.
Backpatch to 9.6
Stas Kelvich
Until now, it used the current working directory. This makes it safe
for simultaneous invocations of gendef.pl, with different target
directories, to run from a single current working directory, such as
$(top_srcdir). The MSVC build system will soon rely on this.
Christian Ullrich, reviewed by Michael Paquier.
Not much to be said about this patch: it does what it says on the tin.
In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists
to clarify what it actually does. In the discussion of this patch
we considered supporting other similar options, such as IF EXISTS
on the type as a whole or IF NOT EXISTS on the target name. This
patch doesn't actually add any such feature, but it might happen later.
Dagfinn Ilmari Mannsåker, reviewed by Emre Hasegeli
Discussion: <CAO=2mx6uvgPaPDf-rHqG8=1MZnGyVDMQeh8zS4euRyyg4D35OQ@mail.gmail.com>
Document the formerly-undocumented behavior that schema and comment
control-file entries for an extension are honored only during initial
installation, whereas other properties are also honored during updates.
While at it, do some copy-editing on the recently-added docs for CREATE
EXTENSION ... CASCADE, use links for some formerly vague cross references,
and make a couple other minor improvements.
Back-patch to 9.6 where CASCADE was added. The other parts of this
could go further back, but they're probably not important enough to
bother.
Users often get confused between COPY and \copy and try to use client-side
paths with COPY. The server then cannot find the file (if remote), or sees
a permissions problem (if local), or some variant of that. Emit a hint
about this in the most common cases.
In future we might want to expand the set of errnos for which the hint
gets printed, but be conservative for now.
Craig Ringer, reviewed by Christoph Berg and Tom Lane
Discussion: <CAMsr+YEqtD97qPEzQDqrCt5QiqPbWP_X4hmvy2pQzWC0GWiyPA@mail.gmail.com>
Add a location field to the DefElem struct, used to parse many utility
commands. Update various error messages to supply error position
information.
To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands. This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Mostly, explain how row xmin's used to be replaced by FrozenTransactionId
and no longer are. Do a little copy-editing on the side.
Per discussion with Egor Rogov. Back-patch to 9.4 where the behavioral
change occurred.
Discussion: <575D7955.6060209@postgrespro.ru>
Negative availMemLessRefund would be problematic. It's not entirely
clear whether the case can be hit in the code as it stands, but this
seems like good future-proofing in any case. While we're at it,
insist that the value be not merely positive but not tiny, so as to
avoid doing a lot of repalloc work for little gain.
Peter Geoghegan
Discussion: <CAM3SWZRVkuUB68DbAkgw=532gW0f+fofKueAMsY7hVYi68MuYQ@mail.gmail.com>
Brain fade in commit a00c58314: I was thinking that a string starting with
"-" could be taken as a switch depending on command line syntax. That's
true, but having appendShellString() quote it will not help, so we may as
well not do so. Per complaint from Peter Eisentraut.
What used to be four spaces somehow turned into a tab and a couple of
spaces in commit a00c58314, no doubt from overhelpful emacs autoindent.
Noted by Peter Eisentraut.
lazy_truncate_heap() was waiting for
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds
not milliseconds as originally intended.
Found by code inspection.
Simon Riggs
Some of the comments added by the CREATE EXTENSION CASCADE patch were
a bit sloppy, and I didn't care for redeclaring the same local variable
inside a nested block either. No functional changes.
Previously, we failed to recognize Unicode characters above U+7FF as
being members of locale-dependent character classes such as [[:alpha:]].
(Actually, the same problem occurs for large pg_wchar values in any
multibyte encoding, but UTF8 is the only case people have actually
complained about.) It's impractical to get Spencer's original code to
handle character classes or ranges containing many thousands of characters,
because it insists on considering each member character individually at
regex compile time, whether or not the character will ever be of interest
at run time. To fix, choose a cutoff point MAX_SIMPLE_CHR below which
we process characters individually as before, and deal with entire ranges
or classes as single entities above that. We can actually make things
cheaper than before for chars below the cutoff, because the color map can
now be a simple linear array for those chars, rather than the multilevel
tree structure Spencer designed. It's more expensive than before for
chars above the cutoff, because we must do a binary search in a list of
high chars and char ranges used in the regex pattern, plus call iswalpha()
and friends for each locale-dependent character class used in the pattern.
However, multibyte encodings are normally designed to give smaller codes
to popular characters, so that we can expect that the slow path will be
taken relatively infrequently. In any case, the speed penalty appears
minor except when we have to apply iswalpha() etc. to high character codes
at runtime --- and the previous coding gave wrong answers for those cases,
so whether it was faster is moot.
Tom Lane, reviewed by Heikki Linnakangas
Discussion: <15563.1471913698@sss.pgh.pa.us>
To prevent possibly breaking indexes on enum columns, we must keep
uncommitted enum values from getting stored in tables, unless we
can be sure that any such column is new in the current transaction.
Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE
from being executed at all in a transaction block, unless the target
enum type had been created in the current transaction. This patch
removes that restriction, and instead insists that an uncommitted enum
value can't be referenced unless it belongs to an enum type created
in the same transaction as the value. Per discussion, this should be
a bit less onerous. It does require each function that could possibly
return a new enum value to SQL operations to check this restriction,
but there aren't so many of those that this seems unmaintainable.
Andrew Dunstan and Tom Lane
Discussion: <4075.1459088427@sss.pgh.pa.us>
When pg_logical_slot_get_changes(...) sets confirmed_flush_lsn to the point at
which replay stopped, it doesn't dirty the replication slot. So if the replay
didn't cause restart_lsn or catalog_xmin to change as well, this change will
not get written out to disk. Even on a clean shutdown.
If Pg crashes or restarts, a subsequent pg_logical_slot_get_changes(...) call
will see the same changes already replayed since it uses the slot's
confirmed_flush_lsn as the start point for fetching changes. The caller can't
specify a start LSN when using the SQL interface.
Mark the slot as dirty after reading changes using the SQL interface so that
users won't see repeated changes after a clean shutdown. Repeated changes still
occur when using the walsender interface or after an unclean shutdown.
Craig Ringer
Andres is apparently the only hacker who thinks this code is better as-is.
I (tgl) follow some of his logic, but the fact that it's setting off
warnings from static code analyzers seems like a sufficient reason to
put the complexity into a comment rather than the code.
Aleksander Alekseev
Discussion: <20160404190345.54d84ee8@fujitsu>
After further reflection about the mess cleaned up in commit 39b691f25,
I decided the main bit of test coverage that was still missing was to
check that the non-default abbreviation-set files we supply are usable.
Add that.
Back-patch to supported branches, just because it seems like a good
idea to keep this all in sync.
Commit b2cbced9e instituted a policy of referring to the timezone database
as the "IANA timezone database" in our user-facing documentation.
Propagate that wording into a couple of places that were still using "zic"
to refer to the database, which is definitely not right (zic is the
compilation tool, not the data).
Back-patch, not because this is very important in itself, but because
we routinely cherry-pick updates to the tznames files and I don't want
to risk future merge failures.
split_to_stringlist() doesn't modify its first argument nor expect it
to remain valid after exit, so there's no need to duplicate the optarg
string at the call sites. Per Coverity. (This has been wrong all along,
but commit 052cc223d changed the useless calls from "strdup" to
"pg_strdup", which apparently made Coverity think it's a new bug.
It's not, but it's also not worth back-patching.)
Coverity complained about the for(;;) loop, because it never actually
iterated. It was used just to be able to use "break" to exit it early. I
agree with Coverity, that's a bit confusing, so refactor the code to
use if-else instead.
While we're at it, use a local variable to hold the "current" node. That's
shorter and clearer than referring to "iter->last_visited" all the time.
In addition to the existing decimal-milliseconds output value,
display the same value in mm:ss.fff format if it exceeds one second.
Tack on hours and even days fields if the interval is large enough.
This avoids needing mental arithmetic to convert the values into
customary time units.
Corey Huinker, reviewed by Gerdan Santos; bikeshedding by many
Discussion: <CADkLM=dbC4R8sbbuFXQVBFWoJGQkTEW8RWnC0PbW9nZsovZpJQ@mail.gmail.com>
These were evidently introduced by yesterday's commit 9cca11c91,
which perhaps needs more review than it got.
Per report from Andreas Seltenreich and additional examination
of nearby code.
Report: <87oa45qfwq.fsf@credativ.de>
computeLeafRecompressWALData() tried to produce a uint16 WAL log field by
memcpy'ing the first two bytes of an int-sized variable. That accidentally
works on little-endian hardware, but not at all on big-endian. Replay then
thinks it's looking at an ADDITEMS action with zero entries, and reads the
first two bytes of the first TID therein as the next segno/action,
typically leading to "unexpected GIN leaf action" errors during replay.
Even if replay failed to crash, the resulting GIN index page would surely
be incorrect. To fix, just declare the variable as uint16 instead.
Per bug #14295 from Spencer Thomason (much thanks to Spencer for turning
his problem into a self-contained test case). This likely also explains
a previous report of the same symptom from Bernd Helmle.
Back-patch to 9.4 where the problem was introduced (by commit 14d02f0bb).
Discussion: <20160826072658.15676.7628@wrigleys.postgresql.org>
Possible-Report: <2DA7350F7296B2A142272901@eje.land.credativ.lan>
Previously, we threw an error if a dynamic timezone abbreviation did not
match any abbreviation recorded in the referenced IANA time zone entry.
That seemed like a good consistency check at the time, but it turns out
that a number of the abbreviations in the IANA database are things that
Olson and crew made up out of whole cloth. Their current policy is to
remove such names in favor of using simple numeric offsets. Perhaps
unsurprisingly, a lot of these made-up abbreviations have varied in meaning
over time, which meant that our commit b2cbced9e and later changes made
them into dynamic abbreviations. So with newer IANA database versions
that don't mention these abbreviations at all, we fail, as reported in bug
#14307 from Neil Anderson. It's worse than just a few unused-in-the-wild
abbreviations not working, because the pg_timezone_abbrevs view stops
working altogether (since its underlying function tries to compute the
whole view result in one call).
We considered deleting these abbreviations from our abbreviations list, but
the problem with that is that we can't stay ahead of possible future IANA
changes. Instead, let's leave the abbreviations list alone, and treat any
"orphaned" dynamic abbreviation as just meaning the referenced time zone.
It will behave a bit differently than it used to, in that you can't any
longer override the zone's standard vs. daylight rule by using the "wrong"
abbreviation of a pair, but that's better than failing entirely. (Also,
this solution can be interpreted as adding a small new feature, which is
that any abbreviation a user wants can be defined as referencing a time
zone name.)
Back-patch to all supported branches, since this problem affects all
of them when using tzdata 2016f or newer.
Report: <20160902031551.15674.67337@wrigleys.postgresql.org>
Discussion: <6189.1472820913@sss.pgh.pa.us>
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.
Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.
Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.
There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.
Patch by Michael Paquier, with some changes by me.
Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
This introduces a numeric sum accumulator, which performs better than
repeatedly calling add_var(). The performance comes from using wider digits
and delaying carry propagation, tallying positive and negative values
separately, and avoiding a round of palloc/pfree on every value. This
speeds up SUM(), as well as other standard aggregates like AVG() and
STDDEV() that also calculate a sum internally.
Reviewed-by: Andrey Borodin
Discussion: <c0545351-a467-5b76-6d46-4840d1ea8aa4@iki.fi>
While we don't need multiple iterators at the moment, the interface is
nicer and less dangerous this way.
Aleksander Alekseev, with some changes by me.