It's against project policy to use elog() for user-facing errors, or to
omit an errcode() selection for errors that aren't supposed to be "can't
happen" cases. Fix all the violations of this policy that result in
ERRCODE_INTERNAL_ERROR log entries during the standard regression tests,
as errors that can reliably be triggered from SQL surely should be
considered user-facing.
I also looked through all the files touched by this commit and fixed
other nearby problems of the same ilk. I do not claim to have fixed
all violations of the policy, just the ones in these files.
In a few places I also changed existing ERRCODE choices that didn't
seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR
by something more specific.
Back-patch to 9.5, but no further; changing ERRCODE assignments in
stable branches doesn't seem like a good idea.
An EAN beginning with 979 (but not 9790 - those are ISMN's) are accepted
as ISBN numbers, but they cannot be represented in the old, 10-digit ISBN
format. They must be output in the new 13-digit ISBN-13 format. We printed
out an incorrect value for those.
Also add a regression test, to test this and some other basic functionality
of the module.
Patch by Fabien Coelho. This fixes bug #13442, reported by B.Z. Backpatch
to 9.1, where we started to recognize ISBN-13 numbers.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
The result closely resembles linking of these modules for the "win32"
port. Augment the $(exports_file) header so the file is also usable as
an import file. Unfortunately, relocating an AIX installation will now
require adding $(pkglibdir) to LD_LIBRARY_PATH. Back-patch to 9.5,
where the modules were introduced.
The AIX 7.1 libm is static, and AIX postgres executables do not export
symbols acquired from libraries. Back-patch to 9.5, where commit
cfe12763c3 added a sqrt() call.
This allows convenient checking for existence of a GUC from SQL, which is
particularly useful when dealing with custom variables.
David Christensen, reviewed by Jeevan Chalke
Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in
9.5, and keeping the code in sync with master makes future backpatching
easier.
Commit 179cdd09 added macros to check if a filename is a WAL segment
or other such file. However there were still some instances of the
strlen + strspn combination to check for that in WAL-related utilities
like pg_archivecleanup. Those checks can be replaced with the macros.
This patch makes use of the macros in those utilities and
which would make the code a bit easier to read.
Back-patch to 9.5.
Michael Paquier
test_decoding used fastgetattr() to extract column values. That's wrong
when decoding updates and deletes if a table's replica identity is set
to FULL and new columns have been added since the old version of the
tuple was created. Due to the lack of a crosscheck with the datum's
natts values an invalid value will be output, leading to errors or
worse.
Bug: #13470
Reported-By: Krzysztof Kotlarski
Discussion: 20150626100333.3874.90852@wrigleys.postgresql.org
Backpatch to 9.4, where the feature, including the bug, was added.
Newer Python versions randomize the hash seed for dictionaries,
resulting in a random output order, which messes up the regression test
diffs.
Instead, use Python assert to compare the dictionaries with their
expected value.
Fix some places where pgindent did silly stuff, often because project
style wasn't followed to begin with. (I've not touched the atomics
headers, though.)
Remove a bunch of "extern Datum foo(PG_FUNCTION_ARGS);" declarations that
are no longer needed now that PG_FUNCTION_INFO_V1(foo) provides that.
Some of these were evidently missed in commit e7128e8dbb, but others
were cargo-culted in in code added since then. Possibly that can be blamed
in part on the fact that we'd not fixed relevant documentation examples,
which I've now done.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.
Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.
Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
Defer lookup of opfamily and input type of a of a user specified opclass
until the optimizer selects among available unique indexes; and store
the opclass in the parse analyzed tree instead. The primary reason for
doing this is that for rule deparsing it's easier to use the opclass
than the previous representation.
While at it also rename a variable in the inference code to better fit
it's purpose.
This is separate from the actual fixes for deparsing to make review
easier.
The plain C string language name needs to be wrapped in makeString() so
that the parse tree is copyable. This is detectable by
-DCOPY_PARSE_PLAN_TREES. Add a test case for the COMMENT case.
Also make the quoting in the error messages more consistent.
discovered by Tom Lane
This has been the predominant outcome. When the output of decrypting
with a wrong key coincidentally resembled an OpenPGP packet header,
pgcrypto could instead report "Corrupt data", "Not text data" or
"Unsupported compression algorithm". The distinct "Corrupt data"
message added no value. The latter two error messages misled when the
decrypted payload also exhibited fundamental integrity problems. Worse,
error message variance in other systems has enabled cryptologic attacks;
see RFC 4880 section "14. Security Considerations". Whether these
pgcrypto behaviors are likewise exploitable is unknown.
In passing, document that pgcrypto does not resist side-channel attacks.
Back-patch to 9.0 (all supported versions).
Security: CVE-2015-3167
The previous coding in hstore_plpython and ltree_plpython wiped out any
values set by the base makefiles. This at least had the effect of running
the tests in "regression" not "contrib_regression" as expected. These
being pretty new modules, there might be other bad effects we'd not
noticed yet.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place. Since there's nothing appropriate, create
it. The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h). skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new
A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.
Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org
Authored by Emre Hasegeli and Álvaro.
(It's not clear to me why bootscanner.l has any #include lines at all.)
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.
Petr Jelinek
Reviewed by Michael Paquier and Simon Riggs
No need to have pg_audit.conf any longer since the regression tests are
just loading the module at the start of each session (to simulate being
in shared_preload_libraries, which isn't something we can actually make
happen on the buildfarm itself, it seems).
Pointed out by Tom
In pg_audit, set client_min_messages up to warning, then reset the role
attributes, to completely reset the session while not making the
regression tests depend on being run by any particular user.
Instead of creating a new superuser role, extract out what the current
user is and use that user instead. Further, clean up and drop all
objects created by the regression test.
Pointed out by Tom.
Also, use a function to load the extension ahead of all other calls,
simulating load from shared_libraries_preload, to make sure the
hooks are in place before logging start.
Remove the check that pg_audit be installed by
shared_preload_libraries as that's not going to work when running the
regressions tests in the buildfarm. That check was primairly a nice to
have and isn't required anyway.
This extension provides detailed logging classes, ability to control
logging at a per-object level, and includes fully-qualified object
names for logged statements (DML and DDL) in independent fields of the
log output.
Authors: Ian Barwick, Abhijit Menon-Sen, David Steele
Reviews by: Robert Haas, Tatsuo Ishii, Sawada Masahiko, Fujii Masao,
Simon Riggs
Discussion with: Josh Berkus, Jaime Casanova, Peter Eisentraut,
David Fetter, Yeb Havinga, Alvaro Herrera, Petr Jelinek, Tom Lane,
MauMau, Bruce Momjian, Jim Nasby, Michael Paquier,
Fabrízio de Royes Mello, Neil Tiffin
If a postgres_fdw foreign table is a non-locked source relation in an
UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its
ctid column, the wrong value would be returned if an EvalPlanQual
recheck occurred. This happened because the foreign table's result row
was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks
just unconditionally set the reconstructed tuple's t_self to "invalid".
To fix that, we can have EvalPlanQualFetchRowMarks copy the composite
datum's t_ctid field, and be sure to initialize that along with t_self
when postgres_fdw constructs a tuple to return.
If we just did that much then EvalPlanQualFetchRowMarks would start
returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps
does not matter much, but then again maybe it might. The cause of that is
that heap_form_tuple, which is the ultimate source of all composite datums,
simply leaves t_ctid as zeroes in newly constructed tuples. That seems
like a bad idea on general principles: a field that's really not been
initialized shouldn't appear to have a valid value. So let's eat the
trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))"
in heap_form_tuple.
This closes out our handling of Etsuro Fujita's report that tableoid and
ctid weren't correctly set in postgres_fdw EvalPlanQual cases. Along the
way we did a great deal of work to improve FDWs' ability to control row
locking behavior; which was not wasted effort by any means, but it didn't
end up being a fix for this problem because that feature would be too
expensive for postgres_fdw to use all the time.
Although the fix for the tableoid misbehavior was back-patched, I'm
hesitant to do so here; it seems far less likely that people would care
about remote ctid than tableoid, and even such a minor behavioral change
as this in heap_form_tuple is perhaps best not back-patched. So commit
to HEAD only, at least for the moment.
Etsuro Fujita, with some adjustments by me
The new function allows to estimate bloat and other table level statics
in a faster, but approximate, way. It does so by using information from
the free space map for pages marked as all visible in the visibility
map. The rest of the table is actually read and free space/bloat is
measured accurately. In many cases that allows to get bloat information
much quicker, causing less IO.
Author: Abhijit Menon-Sen
Reviewed-By: Andres Freund, Amit Kapila and Tomas Vondra
Discussion: 20140402214144.GA28681@kea.toroid.org
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments. Clean up
as follows:
* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function. In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs. Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.
* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.
* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that. Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.
* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries. The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks. It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway. I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.
* Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.
* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.
* Lots of cleanup of documentation and missed comments. Re-order some
code additions into more logical places.