An elog that reports the value of a transaction ID stored on a deleted
nbtree page was added by commit e5d8a999, which taught page deletion to
store full 64-bit XIDs. It seems very chatty on further reflection, so
lower its elevel from NOTICE to DEBUG2.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch: 14-, just like the nbtree XID enhancement.
ld_upper, ld_lower, pd_special and the page size have been using
smallint as return type, which could cause those fields to return
negative values in certain cases for builds configures with a page size
of 32kB.
Bump pageinspect to 1.10. page_header() is able to handle the correct
return type of those fields at runtime when using an older version of
the extension, with some tests are added to cover that.
Author: Quan Zongliang
Reviewed-by: Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/8b8ec36e-61fe-14f9-005d-07bc85aa4eed@yeah.net
Most error messages about a relkind that was not supported or
appropriate for the command was of the pattern
"relation \"%s\" is not a table, foreign table, or materialized view"
This style can become verbose and tedious to maintain. Moreover, it's
not very helpful: If I'm trying to create a comment on a TOAST table,
which is not supported, then the information that I could have created
a comment on a materialized view is pointless.
Instead, write the primary error message shorter and saying more
directly that what was attempted is not possible. Then, in the detail
message, explain that the operation is not supported for the relkind
the object was. To simplify that, add a new function
errdetail_relkind_not_supported() that does this.
In passing, make use of RELKIND_HAS_STORAGE() where appropriate,
instead of listing out the relkinds individually.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely. Commit 6655a729 did the same thing for deleted pages in
GiST indexes. That work was used as a starting point here.
Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages. There is no longer any
reason to care about that condition/the oldest XID. It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.
The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).
The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much. We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever. We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).
These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index. All that matters now is keeping the costs and
benefits in balance over time.
Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored. We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().
A backpatchable fix cannot easily be extracted from this commit. A
targeted fix for the issue will follow in a later commit, though that
won't happen today.
I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users. The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future. Perhaps some passing mention of page
deletion will be added back at the same time.
Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
GistPageSetDeleted() sets pd_lower when deleting a page, and sets the
page contents to a GISTDeletedPageContents. Avoid treating deleted GiST
pages as regular slotted pages within pageinspect.
Oversight in commit 756ab291.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
By default VACUUM will skip pages that it can't immediately get
exclusive access to, which means that even activities as harmless
and unpredictable as checkpoint buffer writes might prevent a page
from being processed. Ordinarily this is no big deal, but we have
a small number of test cases that examine the results of VACUUM's
processing and therefore will fail if the page of interest is skipped.
This seems to be the explanation for some rare buildfarm failures.
To fix, add the DISABLE_PAGE_SKIPPING option to the VACUUM commands
in tests where this could be an issue.
In passing, remove a duplicated query in pageinspect/sql/page.sql.
Back-patch as necessary (some of these cases are as old as v10).
Discussion: https://postgr.es/m/413923.1611006484@sss.pgh.pa.us
Block numbers are 32-bit unsigned integers. Therefore, the smallest
SQL integer type that they can fit in is bigint. However, in the
pageinspect module, most input and output parameters dealing with
block numbers were declared as int. The behavior with block numbers
larger than a signed 32-bit integer was therefore dubious. Change
these arguments to type bigint and add some more explicit error
checking on the block range.
(Other contrib modules appear to do this correctly already.)
Since we are changing argument types of existing functions, in order
to not misbehave if the binary is updated before the extension is
updated, we need to create new C symbols for the entry points, similar
to how it's done in other extensions as well.
Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local
Per Coverity. BuildIndexValueDescription() cannot actually return NULL in
this instance, because it only returns NULL if the user doesn't have the
required privileges, and this function can only be used by superuser. But
better safe than sorry.
The gist_page_items() function opened the index relation on first call and
closed it on the last call. But there's no guarantee that the function is
run to completion, leading to a relcache leak and warning at the end of
the transaction. To fix, refactor the function to return all the rows in
one call, as a tuplestore.
Reported-by: Tom Lane
Discussion: https://www.postgresql.org/message-id/234863.1610916631%40sss.pgh.pa.us
The newly-added gist pageinspect test prints the LSNs of GiST pages,
expecting them all to be 1 (GistBuildLSN). But with wal_level=minimal,
they got updated by the whole-relation WAL-logging at commit. Fix by
wrapping the problematic tests in the same transaction with the CREATE
INDEX.
Per buildfarm failure on thorntail.
Discussion: https://www.postgresql.org/message-id/3B4F97E5-40FB-4142-8CAA-B301CDFBF982%40iki.fi
1. The raw bytea representation of the point-type keys used in the test
depends on endianess. Remove the raw key_data column from the test.
2. The items stored on non-leftmost gist page depends on how many items
git on the other pages. This showed up as a failure on 32-bit i386
systems. To fix, only test the gist_page_items() function on the
leftmost leaf page.
Per Andrey Borodin and the buildfarm.
Discussion: https://www.postgresql.org/message-id/9FCEC1DC-86FB-4A57-88EF-DD13663B36AF%40yandex-team.ru
Since %c only passes a C "char" to printf, it's incapable of dealing
with multibyte characters. Passing just the first byte of such a
character leads to an output string that is visibly not correctly
encoded, resulting in undesirable behavior such as encoding conversion
failures while sending error messages to clients.
We've lived with this issue for a long time because it was inconvenient
to avoid in a portable fashion. However, now that we always use our own
snprintf code, it's reasonable to use the %.*s format to print just one
possibly-multibyte character in a string. (We previously avoided that
obvious-looking answer in order to work around glibc's bug #6530, cf
commits 54cd4f045 and ed437e2b2.)
Hence, run around and fix a bunch of places that used %c to report
a character found in a user-supplied string. For simplicity, I did
not touch places that were emitting non-user-facing debug messages,
or reporting catalog data that should always be ASCII. (It's also
unclear how useful this approach could be in frontend code, where
it's less certain that we know what encoding we're dealing with.)
In passing, improve a couple of poorly-written error messages in
pageinspect/heapfuncs.c.
This is a longstanding issue, but I'm hesitant to back-patch because
of the impact on translatable message strings. In any case this fix
would not work reliably before v12.
Tom Lane and Quan Zongliang
Discussion: https://postgr.es/m/a120087c-4c88-d9d4-1ec5-808d7a7f133d@gmail.com
We don't need to manually clean up allocations in a SRF's
multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure
takes care of that (and also ensures that it will happen even if the
function never gets a final call, which simple manual cleanup cannot
do).
Hence, the code removed by this patch is a waste of code and cycles.
Worse, it gives the impression that cleaning up manually is a thing,
which can lead to more serious errors such as those fixed in
commits 085b6b667 and b4570d33a. So we should get rid of it.
These are not quite actual bugs though, so I couldn't muster the
enthusiasm to back-patch. Fix in HEAD only.
Justin Pryzby
Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
The data types that contrib/pageinspect's bt_metap() function were
declared to return as OUT arguments were wrong in some cases. For
example, the oldest_xact column (a TransactionId/xid field) was declared
integer/int4 within the pageinspect extension's sql file. This led to
errors when an oldest_xact value that exceeded 2^31-1 was encountered.
Some of the other columns were defined incorrectly ever since
pageinspect was first introduced, though they were far less likely to
produce problems in practice.
Fix these issues by changing the declaration of bt_metap() to
consistently use data types that can reliably represent all possible
values. This fixes things on HEAD only. No backpatch, since it doesn't
seem like there is a safe way to fix the issue without including a new
version of the pageinspect extension (HEAD/Postgres 13 already
introduced a new version of the extension). Besides, the oldest_xact
issue has been around since the release of Postgres 11, and we haven't
heard any complaints about it before now.
Also, throw an error when we detect a bt_metap() declaration that must
be from an old version of the pageinspect extension by examining the
number of attributes from the tuple descriptor for the return tuples.
It seems better to throw an error in a reliable and obvious way
following a Postgres upgrade, rather than letting bt_metap() fail
unpredictably. The problem is fundamentally with the CREATE FUNCTION
declared data types themselves, so I see no sensible alternative.
Reported-By: Victor Yegorov
Bug: #16285
Discussion: https://postgr.es/m/16285-df8fc1000ab3d5fc@postgresql.org
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code. But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage. It's never
too late to make it better though, so let's do that.
The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.
I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references. But that's not fatal --- there's no expectation that
we'd actually change any of these values. We can clean up stragglers
over time.
Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
Add a new bt_metap() column to display the metapage's allequalimage
field. Also add three new columns to contrib/pageinspect's
bt_page_items() function:
* Add a boolean column ("dead") that displays the LP_DEAD bit value for
each non-pivot tuple.
* Add a TID column ("htid") that displays a single heap TID value for
each tuple. This is the TID that is returned by BTreeTupleGetHeapTID(),
so comparable values are shown for pivot tuples, plain non-pivot tuples,
and posting list tuples.
* Add a TID array column ("tids") that displays TIDs from each tuple's
posting list, if any. This works just like the "tids" column from
pageinspect's gin_leafpage_items() function.
No version bump for the pageinspect extension, since there hasn't been a
stable Postgres release since the last version bump (the last bump was
part of commit 58b4cb30).
Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
Andres Freund pointed out that allowing non-superusers to run
"CREATE EXTENSION ... FROM unpackaged" has security risks, since
the unpackaged-to-1.0 scripts don't try to verify that the existing
objects they're modifying are what they expect. Just attaching such
objects to an extension doesn't seem too dangerous, but some of them
do more than that.
We could have resolved this, perhaps, by still requiring superuser
privilege to use the FROM option. However, it's fair to ask just what
we're accomplishing by continuing to lug the unpackaged-to-1.0 scripts
forward. None of them have received any real testing since 9.1 days,
so they may not even work anymore (even assuming that one could still
load the previous "loose" object definitions into a v13 database).
And an installation that's trying to go from pre-9.1 to v13 or later
in one jump is going to have worse compatibility problems than whether
there's a trivial way to convert their contrib modules into extension
style.
Hence, let's just drop both those scripts and the core-code support
for "CREATE EXTENSION ... FROM".
Discussion: https://postgr.es/m/20200213233015.r6rnubcvl4egdh5r@alap3.anarazel.de
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars. However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason. Remove those newlines, and reflow some
of those lines for some extra naturality.
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.
By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.
Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
The basic rule we follow here is to always first include 'postgres.h' or
'postgres_fe.h' whichever is applicable, then system header includes and
then Postgres header includes. In this, we also follow that all the
Postgres header includes are in order based on their ASCII value. We
generally follow these rules, but the code has deviated in many places.
This commit makes it consistent just for contrib modules. The later
commits will enforce similar rules in other parts of code.
Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
After more discussion, the new function added by ddbd5d8 could have been
designed in a better way. Based on an idea from Álvaro, instead of
returning one column which includes both the raw and combined flags, use
two columns, with one for the raw flags and one for the combined flags.
This also takes care of some issues with HEAP_LOCKED_UPGRADED and
HEAP_XMAX_IS_LOCKED_ONLY which are not really combined flags as they
depend on conditions defined by other raw bits, as mentioned by Amit.
While on it, fix an extra issue with combined flags. A combined flag
was returned if at least one of its bits was set, but all its bits need
to be set to include it in the result.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Amit Kapila
Discussion: https://postgr.es/m/20190913114950.GA3824@alvherre.pgsql
Flags of t_infomask and t_infomask2 for each tuple are already included
in the information returned by heap_page_items as integers, and we
lacked a way to make that information human-readable.
Per discussion, the function includes an option which controls if
combined flags should be decomposed or not. The default is false, to
not decompose combined flags.
The module is bumped to version 1.8.
Author: Craig Ringer, Sawada Masahiko
Reviewed-by: Peter Geoghegan, Robert Haas, Álvaro Herrera, Moon Insung,
Amit Kapila, Michael Paquier, Tomas Vondra
Discussion: https://postgr.es/m/CAMsr+YEY7jeaXOb+oX+RhDyOFuTMdmHjGsBxL=igCm03J0go9Q@mail.gmail.com
For most uses of acl.h the details of how "Acl" internally looks like
are irrelevant. It might make sense to move a lot of the
implementation details into a separate header at a later point.
The main motivation of this change is to avoid including fmgr.h (via
array.h, which needs it for exposed structs) in a lot of files that
otherwise don't need it. A subsequent commit will remove the fmgr.h
include from a lot of files.
Directly include utils/array.h and utils/expandeddatum.h from the
files that need them, but previously included them indirectly, via
acl.h.
Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
This feature was using a process local map to track the first few blocks
in the relation. The map was reset each time we get the block with enough
freespace. It was discussed that it would be better to track this map on
a per-relation basis in relcache and then invalidate the same whenever
vacuum frees up some space in the page or when FSM is created. The new
design would be better both in terms of API design and performance.
List of commits reverted, in reverse chronological order:
06c8a5090e Improve code comments in b0eaa4c51b.
13e8643bfc During pg_upgrade, conditionally skip transfer of FSMs.
6f918159a9 Add more tests for FSM.
9c32e4c350 Clear the local map when not used.
29d108cdec Update the documentation for FSM behavior..
08ecdfe7e5 Make FSM test portable.
b0eaa4c51b Avoid creation of the free space map for small heap relations.
Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
Contrib modules pgrowlocks, pgstattuple and some functionality in
pageinspect currently only supports the heap table AM. As they are all
concerned with low-level details that aren't reasonably exposed via
tableam, error out if invoked on a non heap relation.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
Make nbtree treat all index tuples as having a heap TID attribute.
Index searches can distinguish duplicates by heap TID, since heap TID is
always guaranteed to be unique. This general approach has numerous
benefits for performance, and is prerequisite to teaching VACUUM to
perform "retail index tuple deletion".
Naively adding a new attribute to every pivot tuple has unacceptable
overhead (it bloats internal pages), so suffix truncation of pivot
tuples is added. This will usually truncate away the "extra" heap TID
attribute from pivot tuples during a leaf page split, and may also
truncate away additional user attributes. This can increase fan-out,
especially in a multi-column index. Truncation can only occur at the
attribute granularity, which isn't particularly effective, but works
well enough for now. A future patch may add support for truncating
"within" text attributes by generating truncated key values using new
opclass infrastructure.
Only new indexes (BTREE_VERSION 4 indexes) will have insertions that
treat heap TID as a tiebreaker attribute, or will have pivot tuples
undergo suffix truncation during a leaf page split (on-disk
compatibility with versions 2 and 3 is preserved). Upgrades to version
4 cannot be performed on-the-fly, unlike upgrades from version 2 to
version 3. contrib/amcheck continues to work with version 2 and 3
indexes, while also enforcing stricter invariants when verifying version
4 indexes. These stricter invariants are the same invariants described
by "3.1.12 Sequencing" from the Lehman and Yao paper.
A later patch will enhance the logic used by nbtree to pick a split
point. This patch is likely to negatively impact performance without
smarter choices around the precise point to split leaf pages at. Making
these two mostly-distinct sets of enhancements into distinct commits
seems like it might clarify their design, even though neither commit is
particularly useful on its own.
The maximum allowed size of new tuples is reduced by an amount equal to
the space required to store an extra MAXALIGN()'d TID in a new high key
during leaf page splits. The user-facing definition of the "1/3 of a
page" restriction is already imprecise, and so does not need to be
revised. However, there should be a compatibility note in the v12
release notes.
Author: Peter Geoghegan
Reviewed-By: Heikki Linnakangas, Alexander Korotkov
Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
In b0eaa4c51b, we allow FSM to be created only after 4 pages. One of the
tests check the FSM contents and to do that it populates many tuples in
the relation. The FSM contents depend on the availability of freespace in
the page and it could vary because of the alignment of tuples.
This commit removes the dependency on FSM contents.
Author: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KADF6K1bagr0--mGv3dMcZ%3DH_Z-Qtvdfbp5PjaC6PJJA%40mail.gmail.com
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.
Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation. We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.
Author: John Naylor, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.
Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation. We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.
Author: John Naylor with design inputs and some code contribution by Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.
heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.
Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.
As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.
Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
contrib/pageinspect's tuple_data_split() function thought it could get
away with opening the referenced relation with NoLock. In practice
there's no guarantee that the current session holds any lock on that
rel (even if we just read a page from it), so that this is unsafe.
Switch to using AccessShareLock. Also, postpone closing the relation,
so that we needn't copy its tupdesc. Also, fix unsafe use of
att_isnull() for attributes past the end of the tuple.
Per testing with a patch that complains if we open a relation without
holding any lock on it. I don't plan to back-patch that patch, but we
should close the holes it identifies in all supported branches.
Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us
The "l" (ell) width spec means something in the corresponding scanf usage,
but not here. While modern POSIX says that applying "l" to "f" and other
floating format specs is a no-op, SUSv2 says it's undefined. Buildfarm
experience says that some old compilers emit warnings about it, and at
least one old stdio implementation (mingw's "ANSI" option) actually
produces wrong answers and/or crashes.
Discussion: https://postgr.es/m/21670.1526769114@sss.pgh.pa.us
Discussion: https://postgr.es/m/c085e1da-0d64-1c15-242d-c921f32e0d5c@dunslane.net
Commit 8b08f7d482 failed to update these modules to at least give
non-broken error messages for partitioned indexes. Add appropriate
error support to them.
Peter G. was complaining about a problem of unfriendly error messages;
while we haven't fixed that yet, subsequent discussion let to discovery
of these unhandled cases.
Author: Michaël Paquier
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzkOKptQiE51Bh4_xeEHhaBwHkZkGtKizrFMgEkfUuRRQg@mail.gmail.com
Recent gcc can warn about switch-case fall throughs that are not
explicitly labeled as intentional. This seems like a good thing,
so clean up the warnings exposed thereby by labeling all such
cases with comments that gcc will recognize.
In files that already had one or more suitable comments, I generally
matched the existing style of those. Otherwise I went with
/* FALLTHROUGH */, which is one of the spellings approved at the
more-restrictive-than-default level -Wimplicit-fallthrough=4.
(At the default level you can also spell it /* FALL ?THRU */,
and it's not picky about case. What you can't do is include
additional text in the same comment, so some existing comments
containing versions of this aren't good enough.)
Testing with gcc 8.0.1 (Fedora 28's current version), I found that
I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR);
apparently, for this purpose gcc doesn't recognize that those don't
return. That seems like possibly a gcc bug, but it's fine because
in most places we did that anyway; so this amounts to a visit from the
style police.
Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us