Commit Graph

29057 Commits

Author SHA1 Message Date
Simon Riggs
efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00
Bruce Momjian
78a09145e0 binary migration: pg_migrator
Add comments about places where system oids have to be preserved for
binary migration.
2009-12-19 00:47:57 +00:00
Robert Haas
2e9468f2c8 Fix a few typos in the latest 8.5alpha3 release notes. 2009-12-19 00:05:27 +00:00
Peter Eisentraut
2692274dd9 8.5alpha3 release notes up to Fri Dec 18 21:37:38 2009 +0000 2009-12-18 22:11:09 +00:00
Tom Lane
b35b16e696 Fix link that doesn't work in standalone INSTALL document. 2009-12-18 21:37:38 +00:00
Bruce Momjian
96c102fe27 Install server-side language PL/pgSQL by default. 2009-12-18 21:28:42 +00:00
Tom Lane
be3a24de19 Force the TZ environment variable to be set during initdb. This is to
short-circuit the rather expensive identify_system_timezone() procedure,
which we have no real need for during initdb since nothing done here depends
on the timezone setting.  Since we launch quite a few standalone backends
during the initdb sequence, this adds up to a significant savings, and seems
worth doing to save developer time even though it will hardly matter to end
users.  Per my report today on pgsql-hackers.
2009-12-18 18:45:50 +00:00
Robert Haas
f5fd651e1b Improve documentation for pg_largeobject changes.
Rewrite the documentation in more idiomatic English, and in the process make
it somewhat more succinct.  Move the discussion of specific large object
privileges out of the "server-side functions" section, where it certainly
doesn't belong, and into "implementation features".  That might not be
exactly right either, but it doesn't seem worth creating a new section for
this amount of information. Fix a few spelling and layout problems, too.
2009-12-17 14:36:16 +00:00
Michael Meskes
36d192ad7d Reverting accidently commited changes. 2009-12-17 07:28:58 +00:00
Peter Eisentraut
d6de43099a Don't unblock SIGQUIT in the SIGQUIT handler
This was possibly linked to a deadlock-like situation in glibc syslog code
invoked by the ereport call in quickdie().  In any case, a signal handler
should not unblock its own signal unless there is a specific reason to.
2009-12-16 23:05:00 +00:00
Peter Eisentraut
b63b967a7e If there is no sigdelset(), define it as a macro.
This removes some duplicate code that recreated the identical workaround
when the newer signal API is missing.
2009-12-16 22:55:34 +00:00
Tom Lane
52fc0075ab Avoid a premature coercion failure in transformSetOperationTree() when
presented with an UNKNOWN-type Var, which can happen in cases where an
unknown literal appeared in a subquery.  While many such cases will fail
later on anyway in the planner, there are some cases where the planner is
able to flatten the query and replace the Var by the constant before it has
to coerce the union column to the final type.  I had added this check in 8.4
to provide earlier/better error detection, but it causes a regression for
some cases that worked OK before.  Fix by not making the check if the input
node is UNKNOWN type and not a Const or Param.  If it isn't going to work,
it will fail anyway at plan time, with the only real loss being inability to
provide an error cursor.  Per gripe from Britt Piehler.

In passing, rename a couple of variables to remove confusion from an
inner scope masking the same variable names in an outer scope.
2009-12-16 22:24:13 +00:00
Robert Haas
ff499613d2 Several fixes for EXPLAIN (FORMAT YAML), plus one for EXPLAIN (FORMAT JSON).
ExplainSeparatePlans() was busted for both JSON and YAML output - the present
code is a holdover from the original version of my machine-readable explain
patch, which didn't have the grouping_stack machinery.  Also, fix an odd
distribution of labor between ExplainBeginGroup() and ExplainYAMLLineStarting()
when marking lists with "- ", with each providing one character.  This broke
the output format for multi-query statements.  Also, fix ExplainDummyGroup()
for the YAML output format.

Along the way, make the YAML format use escape_yaml() in situations where the
JSON format uses escape_json().  Right now, it doesn't matter because all the
values are known not to need escaping, but it seems safer this way.  Finally,
I added some comments to better explain what the YAML output format is doing.

Greg Sabino Mullane reported the issues with multi-query statements.
Analysis and remaining cleanups by me.
2009-12-16 22:16:16 +00:00
Magnus Hagander
3dfe7e8e0f Remove spurious '22' that clearly shouldn't be there.
David E. Wheeler
2009-12-16 19:38:54 +00:00
Michael Meskes
d19669e5f9 Fixed auto-prepare to not try preparing statements that are not preparable. Bug
found and solved by Boszormenyi Zoltan <zb@cybertec.at>, some small adjustments
by me.
2009-12-16 10:15:07 +00:00
Peter Eisentraut
dd4cd55c15 Python 3 support in PL/Python
Behaves more or less unchanged compared to Python 2, but the new language
variant is called plpython3u.  Documentation describing the naming scheme
is included.
2009-12-15 22:59:55 +00:00
Tom Lane
21d11e7ee2 Avoid unnecessary copying of source string when generating a cloned TParser.
For long source strings the copying results in O(N^2) behavior, and the
multiplier can be significant if wide-char conversion is involved.

Andres Freund, reviewed by Kevin Grittner.
2009-12-15 20:37:17 +00:00
Tom Lane
a5495cd841 Add a hook to let loadable modules get control at ProcessUtility execution,
and use it to extend contrib/pg_stat_statements to track utility commands.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 20:04:49 +00:00
Tom Lane
34d26872ed Support ORDER BY within aggregate function calls, at long last providing a
non-kluge method for controlling the order in which values are fed to an
aggregate function.  At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.

Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally.  Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.

Andrew Gierth, reviewed by Hitoshi Harada
2009-12-15 17:57:48 +00:00
Tom Lane
6a6efb9640 Fix broken markup. 2009-12-15 15:59:57 +00:00
Itagaki Takahiro
7d67e06297 Add \shell and \setshell meta commands to pgbench.
\shell command runs an external shell command.
\setshell also does the same and sets the result to a variable.

original patch by Michael Paquier with some editorialization by Itagaki,
and reviewed by Greg Smith.
2009-12-15 07:17:57 +00:00
Robert Haas
cddca5ec13 Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 04:57:48 +00:00
Itagaki Takahiro
6f1bf75d50 Fix pg_ctl initdb without options.
Passing NULL string to snprintf is avoided.
2009-12-15 00:17:50 +00:00
Tom Lane
a620d5005d Fix a bug introduced when set-returning SQL functions were made inline-able:
we have to cope with the possibility that the declared result rowtype contains
dropped columns.  This fails in 8.4, as per bug #5240.

While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement.  However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION.  In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
2009-12-14 02:15:54 +00:00
Itagaki Takahiro
84f910a707 Additional fixes for large object access control.
Use pg_largeobject_metadata.oid instead of pg_largeobject.loid
to enumerate existing large objects in pg_dump, pg_restore, and
contrib modules.
2009-12-14 00:39:11 +00:00
Magnus Hagander
0182d6f646 Allow LDAP authentication to operate in search+bind mode, meaning it
does a search for the user in the directory first, and then binds with
the DN found for this user.

This allows for LDAP logins in scenarios where the DN of the user cannot
be determined simply by prefix and suffix, such as the case where different
users are located in different containers.

The old way of authentication can be significantly faster, so it's kept
as an option.

Robert Fleming and Magnus Hagander
2009-12-12 21:35:21 +00:00
Tom Lane
a4e035b2f1 Fix integer-to-bit-string conversions to handle the first fractional byte
correctly when the output bit width is wider than the given integer by
something other than a multiple of 8 bits.

This has been wrong since I first wrote that code for 8.0 :-(.  Kudos to
Roman Kononov for being the first to notice, though I didn't use his
patch.  Per bug #5237.
2009-12-12 19:24:35 +00:00
Robert Haas
02490d4692 Export ExplainBeginOutput() and ExplainEndOutput() for auto_explain.
Without these functions, anyone outside of explain.c can't actually use
ExplainPrintPlan, because the ExplainState won't be initialized properly.
The user-visible result of this was a crash when using auto_explain with
the JSON output format.

Report by Euler Taveira de Oliveira.  Analysis by Tom Lane.  Patch by me.
2009-12-12 00:35:34 +00:00
Tom Lane
6b45e3b7aa Arrange to generate different random sequences in the different child
processes of a pgbench run, when we are using -j > 1 and are emulating
threads via fork().  Otherwise the children all inherit the same random
sequence state and produce the same random-number sequence.

In the threaded case the different threads will share one RNG state, so
they will produce different subsets of one sequence, which is maybe more
correlated than a purist would like but will not be "the same".  So we
leave that case alone.

First noticed by Takahiro Itagaki, and is also part of the explanation
for the pgbench misbehavior recently reported by Jaime Casanova.
2009-12-11 21:50:06 +00:00
Tom Lane
d8e511fabb Ensure that the result tuple of an EvalPlanQual cycle gets materialized
before we zap the input tuple.  Otherwise, pass-by-reference columns of
the result slot are likely to contain just references to the input
tuple, leading to big trouble if the pfree'd space is reused.  Per
trouble report from Jaime Casanova.  This is a new bug in the recent
rewrite of EvalPlanQual, so nothing to back-patch.
2009-12-11 18:14:43 +00:00
Itagaki Takahiro
f1325ce213 Add large object access control.
A new system catalog pg_largeobject_metadata manages
ownership and access privileges of large objects.

KaiGai Kohei, reviewed by Jaime Casanova.
2009-12-11 03:34:57 +00:00
Bruce Momjian
64579962bb Properly define ENABLE_THREAD_SAFETY in conflgure, per suggestion from Peter. 2009-12-11 02:21:21 +00:00
Andrew Dunstan
324385d67f Add YAML to list of EXPLAIN formats. Greg Sabino Mullane, reviewed by Takahiro Itagaki. 2009-12-11 01:33:35 +00:00
Peter Eisentraut
db7386187f PL/Python array support
Support arrays as parameters and return values of PL/Python functions.
2009-12-10 20:43:40 +00:00
Peter Eisentraut
a37b001b80 Add init[db] option to pg_ctl
pg_ctl gets a new mode that runs initdb.  Adjust the documentation a bit to
not assume that initdb is the only way to run database cluster initialization.
But don't replace initdb as the canonical way.

Author: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
2009-12-10 06:32:28 +00:00
Robert Haas
da07641481 Fix levenshtein with costs. The previous code multiplied by the cost in only
3 of the 7 relevant locations.

Marcin Mank, slightly adjusted by me.
2009-12-10 01:54:17 +00:00
Tom Lane
03d7b0647f Update release notes for releases 8.4.2, 8.3.9, 8.2.15, 8.1.19, 8.0.23,
7.4.27.
2009-12-10 00:31:14 +00:00
Tom Lane
62aba76568 Prevent indirect security attacks via changing session-local state within
an allegedly immutable index function.  It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user.  However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.

The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue.  GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation.  Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement.  (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)

Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.

Security: CVE-2009-4136
2009-12-09 21:57:51 +00:00
Magnus Hagander
7aeaa97de2 Add notes about updating disk and shared memory size information in the
documentation when doing new major release.
2009-12-09 17:03:30 +00:00
Magnus Hagander
2367d6893f Update size references in installation instructions to be a bit
more up-to-date with current versions.
2009-12-09 16:16:34 +00:00
Magnus Hagander
abf23ee86d Reject certificates with embedded NULLs in the commonName field. This stops
attacks where an attacker would put <attack>\0<propername> in the field and
trick the validation code that the certificate was for <attack>.

This is a very low risk attack since it reuqires the attacker to trick the
CA into issuing a certificate with an incorrect field, and the common
PostgreSQL deployments are with private CAs, and not external ones. Also,
default mode in 8.4 does not do any name validation, and is thus also not
vulnerable - but the higher security modes are.

Backpatch all the way. Even though versions 8.3.x and before didn't have
certificate name validation support, they still exposed this field for
the user to perform the validation in the application code, and there
is no way to detect this problem through that API.

Security: CVE-2009-4034
2009-12-09 06:37:06 +00:00
Tom Lane
65ed203910 Update time zone data files to tzdata release 2009s: DST law changes in
Antarctica, Argentina, Bangladesh, Fiji, Novokuznetsk, Pakistan, Palestine,
Samoa, Syria.  Also historical corrections for Hong Kong.
2009-12-09 00:35:32 +00:00
Magnus Hagander
28519de02d Fix a couple of broken links to third-party sites. 2009-12-08 20:08:30 +00:00
Magnus Hagander
ac1b614cc0 Replace broken link to custom local gettext package with one to the main
GNU site for gettext.
2009-12-08 19:22:43 +00:00
Magnus Hagander
8df283d25a Update CVS documentation to be more current and add documentation about
git mirror.

Remove information about cvsup and documentation that's more about cvs
than our use of cvs.

Backpatch to 8.4 so we get the git information up on the website as
soon as possible.
2009-12-07 19:19:56 +00:00
Tom Lane
0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Tom Lane
8de7472b45 Don't use a duplicate OID for aclexplode(). 2009-12-06 02:55:54 +00:00
Peter Eisentraut
36f887c41c Speed up information schema privilege views
Instead of expensive cross joins to resolve the ACL, add table-returning
function aclexplode() that expands the ACL into a useful form, and join
against that.

Also, implement the role_*_grants views as a thin layer over the respective
*_privileges views instead of essentially repeating the same code twice.

fixes bug #4596

by Joachim Wieland, with cleanup by me
2009-12-05 21:43:36 +00:00
Peter Eisentraut
636bac6e46 Information schema documentation
Add a sentence of documentation about the differences between the
*_privileges and the role_*_grants views.
2009-12-05 21:31:05 +00:00
Heikki Linnakangas
ab3148b712 Fix bug in temporary file management with subtransactions. A cursor opened
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).

At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
2009-12-03 11:03:29 +00:00