To that end, support tags rather than lengths for external datums.
As an example of how this can be used, add support or "indirect"
tuples which point to some externally allocated memory containing
a toast tuple. Similar infrastructure could be used for other
purposes, including, perhaps, support for alternative compression
algorithms.
Andres Freund, reviewed by Hitoshi Harada and myself
Make it easier for readers of the FP docs to find out about possibly
truncated values.
Per complaint from Tom Duffey in message
F0E0F874-C86F-48D1-AA2A-0C5365BF5118@trillitech.com
Author: Albe Laurenz
Reviewed by: Abhijit Menon-Sen
With -Wtype-limits, gcc correctly points out that size_t can never be < 0.
Backpatch to 9.3 and 9.2. It's been like this forever, but in <= 9.1 you got
a lot other warnings with -Wtype-limits anyway (at least with my version of
gcc).
Andres Freund
On the command line, GUC option strings are handled by the guc parser,
not by the shell parser, so '' is the proper way to represent a
zero-length string. This reverts commit
3132a9b7ab.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
The dependencies on the spi and dummy_seclabel contrib modules were
incomplete, because they did not pick up automatically generated
dependencies on header files. This will manifest itself especially when
switching major versions, where the contrib modules would not be
recompiled to contain the new version number, leading to regression test
failures.
To fix this, use the submake approach already in use elsewhere, so that
the contrib modules are built using their full rules.
On Unix, you can embed double-quotes in single-quotes, and via versa.
However, on Windows, you can only escape double-quotes in double-quotes,
so use that in the pg_dump -t/table example.
Backpatch to 9.3.
Report from Mike Toews
Add ability for to_char() to output the timezone's UTC offset (OF). We
already have the ability to return the timezone abbeviation (TZ/tz).
Per request from Andrew Dunstan
A VPATH build will be performed when the module's make file path is not
the current directory or when USE_VPATH is set.
This will assist packagers and others who prefer to build without
polluting the source directories.
There is still a bit of work to do here, notably documentation, but it's
probably a good idea to commit what we have so far and let people test
it out on their modules.
Cédric Villemain, with an addition from me.
The pglz compressor has a significant startup cost, because it has to
initialize to zeros the history-tracking hash table. On a 64-bit system, the
hash table was 64kB in size. While clearing memory is pretty fast, for very
short inputs the relative cost of that was quite large.
This patch alleviates that in two ways. First, instead of storing pointers
in the hash table, store 16-bit indexes into the hist_entries array. That
slashes the size of the hash table to 1/2 or 1/4 of the original, depending
on the pointer width. Secondly, adjust the size of the hash table based on
input size. For very small inputs, you don't need a large hash table to
avoid collisions.
Review by Amit Kapila.
We don't normally bother retrying when the number of bytes written by
write() is short of what was requested. It is generally assumed that a
write() to disk doesn't return short, unless you run out of disk space.
While writing the WAL, however, it seems prudent to try a bit harder,
because a failure leads to PANIC. The write() is also much larger than most
write()s in the backend (up to wal_buffers), so there's more room for
surprises.
Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
nowadays, so it shouldn't happen, but better to be defensive.
Previous code had old/new prefixes on option values, e.g.
--old-datadir=OLDDATADIR. Remove them, for simplicity; now:
--old-datadir=DATADIR. Also update docs to do the same.
On immediate shutdown, or during a restart-after-crash sequence,
postmaster used to send SIGQUIT (and then abandon ship if shutdown); but
this is not a good strategy if backends don't die because of that
signal. (This might happen, for example, if a backend gets tangled
trying to malloc() due to gettext(), as in an example illustrated by
MauMau.) This causes problems when later trying to restart the server,
because some processes are still attached to the shared memory segment.
Instead of just abandoning such backends to their fates, we now have
postmaster hang around for a little while longer, send a SIGKILL after
some reasonable waiting period, and then exit. This makes immediate
shutdown more reliable.
There is disagreement on whether it's best for postmaster to exit after
sending SIGKILL, or to stick around until all children have reported
death. If this controversy is resolved differently than what this patch
implements, it's an easy change to make.
Bug reported by MauMau in message 20DAEA8949EC4E2289C6E8E58560DEC0@maumau
MauMau and Álvaro Herrera
Change -u (user) option to -U, for consistency with other tools like
pg_dump and psql. Also expand --user to --username, again for
consistency.
BACKWARD INCOMPATIBILITY
This results in a slightly less specific error message when OVER
is used in a context where we don't accept window functions, but
per discussion, it's worth it to get the benefit of not needing
to reserve this keyword any more. This same refactoring will
also let us avoid reserving some other keywords that we expect
to add in upcoming patches (specifically, IGNORE, RESPECT, and
FILTER).
Troels Nielsen, with minor changes by me
In some cases, the use of these macros may be preferable to Assert()
or AssertMacro(), since this way the caller can set the trap message.
Andres Freund and Robert Haas
On many platforms the OS will round the sleep time to millisecond
resolution, but there is no reason for us to pre-emptively round the
argument to pg_usleep.
When the delay was measured in milliseconds and started from 1 ms, it
sometimes took many attempts until the logic that increases the delay by
multiplying with a random value between 1 and 2 actually managed to bump it
from 1 ms to 2 ms. That lead to a sequence of 1 ms waits until the delay
started to increase. This wasn't really a problem but it looked odd if you
observed the waits. There is no measurable difference in performance, but
it's more readable this way.
Jeff Janes
The MaxAllocSize guard is convenient for most callers, because it
reduces the need for careful attention to overflow, data type selection,
and the SET_VARSIZE() limit. A handful of callers are happy to navigate
those hazards in exchange for the ability to allocate a larger chunk.
Introduce MemoryContextAllocHuge() and repalloc_huge(). Use this in
tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX
tuples, a factor-of-48 increase. In particular, B-tree index builds can
now benefit from much-larger maintenance_work_mem settings.
Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.
When there's a comment on an index that was created with UNIQUE or PRIMARY
KEY constraint syntax, we need to label the comment as depending on the
constraint not the index, since only the constraint object actually appears
in the dump. This incorrect dependency can lead to parallel pg_restore
trying to restore the comment before the index has been created, per bug
#8257 from Lloyd Albin.
This patch fixes pg_dump to produce the right dependency in dumps made
in the future. Usually we also try to hack pg_restore to work around
bogus dependencies, so that existing (wrong) dumps can still be restored in
parallel mode; but that doesn't seem practical here since there's no easy
way to relate the constraint dump entry to the comment after the fact.
Andres Freund
On Unix-ish platforms, EWOULDBLOCK may be the same as EAGAIN, which is
*not* a success return, at least not on Linux. We need to treat it as a
failure to avoid giving a misleading error message. Per the Single Unix
Spec, only EINPROGRESS and EINTR returns indicate that the connection
attempt is in progress.
On Windows, on the other hand, EWOULDBLOCK (WSAEWOULDBLOCK) is the expected
case. We must accept EINPROGRESS as well because Cygwin will return that,
and it doesn't seem worth distinguishing Cygwin from native Windows here.
It's not very clear whether EINTR can occur on Windows, but let's leave
that part of the logic alone in the absence of concrete trouble reports.
Also, remove the test for errno == 0, effectively reverting commit
da9501bddb, which AFAICS was just a thinko;
or at best it might have been a workaround for a platform-specific bug,
which we can hope is gone now thirteen years later. In any case, since
libpq makes no effort to reset errno to zero before calling connect(),
it seems unlikely that that test has ever reliably done anything useful.
Andres Freund and Tom Lane
Adjust the wording in the first para of "Sequence Manipulation Functions"
so that neither of the link phrases in it break across line boundaries,
in either A4- or US-page-size PDF output. This fixes a reported build
failure for the 9.3beta2 A4 PDF docs, and future-proofs this particular
para against causing similar problems in future. (Perhaps somebody will
fix this issue in the SGML/TeX documentation tool chain someday, but I'm
not holding my breath.)
Back-patch to all supported branches, since the same problem could rise up
to bite us in future updates if anyone changes anything earlier than this
in func.sgml.
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator. This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks. Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions. An included "suppression file" silences nominal
errors we don't plan to fix.
Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
Move some repeated debugging code into functions and store intermediates
in variables where not presently necessary. No code-generation changes
in a production build, and no functional changes. This simplifies and
focuses the main patch.
Every other core buffer page consumer initializes the bytes it furnishes
to PageAddItem(). For consistency, do the same here. No back-patch;
regardless, we couldn't count on the fix so long as binary upgrade can
carry forward affected index builds.
GNU gettext selects a default encoding for the messages it emits in a
platform-specific manner; it uses the Windows ANSI code page on Windows
and follows LC_CTYPE on other platforms. This is inconvenient for
PostgreSQL server processes, so realize consistent cross-platform
behavior by calling bind_textdomain_codeset() on Windows each time we
permanently change LC_CTYPE. This primarily affects SQL_ASCII databases
and processes like the postmaster that do not attach to a database,
making their behavior consistent with PostgreSQL on non-Windows
platforms. Messages from SQL_ASCII databases use the encoding implied
by the database LC_CTYPE, and messages from non-database processes use
LC_CTYPE from the postmaster system environment. PlatformEncoding
becomes unused, so remove it.
Make write_console() prefer WriteConsoleW() to write() regardless of the
encodings in use. In this situation, write() will invariably mishandle
non-ASCII characters.
elog.c has assumed that messages conform to the database encoding.
While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL.
Introduce MessageEncoding to track the actual encoding of message text.
The present consumers are Windows-specific code for converting messages
to UTF16 for use in system interfaces. This fixes the appearance in
Windows event logs and consoles of translated messages from SQL_ASCII
processes like the postmaster. Note that SQL_ASCII inherently disclaims
a strong notion of encoding, so non-ASCII byte sequences interpolated
into messages by %s may yet yield a nonsensical message. MULE_INTERNAL
has similar problems at present, albeit for a different reason: its lack
of libiconv support or a conversion to UTF8.
Consequently, one need no longer restart Windows with a different
Windows ANSI code page to broadly test backend logging under a given
language. Changing the user's locale ("Format") is enough. Several
accounts can simultaneously run postmasters under different locales, all
correctly logging localized messages to Windows event logs and consoles.
Alexander Law and Noah Misch
Clang 3.3 correctly complains that a variable of type enum
MultiXactStatus cannot hold a value of -1, which makes sense. Change
the declared type of the variable to int instead, and apply casting as
necessary to avoid the warning.
Per notice from Andres Freund
In binary upgrade mode, we need to recreate and then drop dropped
columns so that all the columns get the right attribute number. This is
true for foreign tables as well as for native tables. For foreign
tables we have been getting the first part right but not the second,
leading to bogus columns in the upgraded database. Fix this all the way
back to 9.1, where foreign tables were introduced.