path_encode's "closed" argument used to take three values: TRUE, FALSE,
or -1, while being of type bool. Replace that with a three-valued enum
for more clarity.
Was broken by my xloginsert scaling patch. XLogCtl global variable needs
to be initialized in each process, as it's not inherited by fork() on
Windows.
This patch replaces WALInsertLock with a number of WAL insertion slots,
allowing multiple backends to insert WAL records to the WAL buffers
concurrently. This is particularly useful for parallel loading large amounts
of data on a system with many CPUs.
This has one user-visible change: switching to a new WAL segment with
pg_switch_xlog() now fills the remaining unused portion of the segment with
zeros. This potentially adds some overhead, but it has been a very common
practice by DBA's to clear the "tail" of the segment with an external
pg_clearxlogtail utility anyway, to make the WAL files compress better.
With this patch, it's no longer necessary to do that.
This patch adds a new GUC, xloginsert_slots, to tune the number of WAL
insertion slots. Performance testing suggests that the default, 8, works
pretty well for all kinds of worklods, but I left the GUC in place to allow
others with different hardware to test that easily. We might want to remove
that before release.
Reviewed by Andres Freund.
The code in set_append_rel_pathlist() for building parameterized paths
for append relations (inheritance and UNION ALL combinations) supposed
that the cheapest regular path for a child relation would still be cheapest
when reparameterized. Which might not be the case, particularly if the
added join conditions are expensive to compute, as in a recent example from
Jeff Janes. Fix it to compare child path costs *after* reparameterizing.
We can short-circuit that if the cheapest pre-existing path is already
parameterized correctly, which seems likely to be true often enough to be
worth checking for.
Back-patch to 9.2 where parameterized paths were introduced.
On some platforms, posix_fallocate() is available but may still return
EINVAL if the underlying filesystem does not support it. So, in case
of an error, fall through to the alternate implementation that just
writes zeros.
Per buildfarm failure and analysis by Tom Lane.
Commit 31a891857a added some tests in
plpgsql.sql that used a function rather unthinkingly named "foo()".
However, rangefuncs.sql has some much older tests that create a function
of that name, and since these test scripts run in parallel, there is a
chance of failures if the timing is just right. Use another name to
avoid that. Per buildfarm (failure seen today on "hamerkop", but
probably it's happened before and not been noticed).
The old implementation converted PostgreSQL numeric to Python float,
which was always considered a shortcoming. Now numeric is converted to
the Python Decimal object. Either the external cdecimal module or the
standard library decimal module are supported.
From: Szymon Guz <mabewlun@gmail.com>
From: Ronan Dunklau <rdunklau@gmail.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
This function is more efficient than actually writing out zeroes to
the new file, per microbenchmarks by Jon Nelson. Also, it may reduce
the likelihood of WAL file fragmentation.
Jon Nelson, with review by Andres Freund, Greg Smith and me.
This value, now pg_stat_all_tables.n_mod_since_analyze, was already
tracked and used by autovacuum, but not exposed to the user.
Mark Kirkwood, review by Laurenz Albe
Commit 263865a489 switched tuplesort.c and
tuplestore.c variables representing memory usage from type "long" to
type "Size". This was unnecessary; I thought doing so avoided overflow
scenarios on 64-bit Windows, but guc.c already limited work_mem so as to
prevent the overflow. It was also incomplete, not touching the logic
that assumed a signed data type. Change the affected variables to
"int64". This is perfect for 64-bit platforms, and it reduces the need
to contemplate platform-specific overflow scenarios. It also puts us
close to being able to support work_mem over 2 GiB on 64-bit Windows.
Per report from Andres Freund.
In 9.3, there's no particular limit on the number of bgworkers;
instead, we just count up the number that are actually registered,
and use that to set MaxBackends. However, that approach causes
problems for Hot Standby, which needs both MaxBackends and the
size of the lock table to be the same on the standby as on the
master, yet it may not be desirable to run the same bgworkers in
both places. 9.3 handles that by failing to notice the problem,
which will probably work fine in nearly all cases anyway, but is
not theoretically sound.
A further problem with simply counting the number of registered
workers is that new workers can't be registered without a
postmaster restart. This is inconvenient for administrators,
since bouncing the postmaster causes an interruption of service.
Moreover, there are a number of applications for background
processes where, by necessity, the background process must be
started on the fly (e.g. parallel query). While this patch
doesn't actually make it possible to register new background
workers after startup time, it's a necessary prerequisite.
Patch by me. Review by Michael Paquier.
Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.
Patch by Michael Paquier, reviewed by Andres Freund and me.
The collate.linux.utf8 test covers some of the same territory, but
isn't portable and so probably does not get run often, or on
non-Linux platforms. If this approach turns out to be sufficiently
portable, we may want to look at trimming the redundant tests out
of that file to avoid duplication.
Robins Tharakan, reviewed by Michael Paquier and Fabien Coelho,
with further changes and cleanup by me.
An INSERT into such a view should work just like an INSERT into its base
table, ie the insertion should go directly into that table ... not be
duplicated into each child table, as was happening before, per bug #8275
from Rushabh Lathia. On the other hand, the current behavior for
UPDATE/DELETE seems reasonable: the update/delete traverses the child
tables, or not, depending on whether the view specifies ONLY or not.
Add some regression tests covering this area.
Dean Rasheed
In patch 82233ce7ea, AbortStartTime wasn't being reset appropriately
after the restart sequence, causing subsequent iterations through
ServerLoop to malfunction.
Specifically, permit attaching them to the error in RAISE and retrieving
them from a caught error in GET STACKED DIAGNOSTICS. RAISE enforces
nothing about the content of the fields; for its purposes, they are just
additional string fields. Consequently, clarify in the protocol and
libpq documentation that the usual relationships between error fields,
like a schema name appearing wherever a table name appears, are not
universal. This freedom has other applications; consider a FDW
propagating an error from an RDBMS having no schema support.
Back-patch to 9.3, where core support for the error fields was
introduced. This prevents the confusion of having a release where libpq
exposes the fields and PL/pgSQL does not.
Pavel Stehule, lexical revisions by Noah Misch.
To that end, support tags rather than lengths for external datums.
As an example of how this can be used, add support or "indirect"
tuples which point to some externally allocated memory containing
a toast tuple. Similar infrastructure could be used for other
purposes, including, perhaps, support for alternative compression
algorithms.
Andres Freund, reviewed by Hitoshi Harada and myself
Make it easier for readers of the FP docs to find out about possibly
truncated values.
Per complaint from Tom Duffey in message
F0E0F874-C86F-48D1-AA2A-0C5365BF5118@trillitech.com
Author: Albe Laurenz
Reviewed by: Abhijit Menon-Sen
With -Wtype-limits, gcc correctly points out that size_t can never be < 0.
Backpatch to 9.3 and 9.2. It's been like this forever, but in <= 9.1 you got
a lot other warnings with -Wtype-limits anyway (at least with my version of
gcc).
Andres Freund
On the command line, GUC option strings are handled by the guc parser,
not by the shell parser, so '' is the proper way to represent a
zero-length string. This reverts commit
3132a9b7ab.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.