Update assorted TOAST-related documentation.

While working on documentation for expanded arrays, I noticed a number of details in the TOAST-related documentation that were already inaccurate or obsolete. This should be fixed independently of whether expanded arrays get in or not. One issue is that the already existing indirect-pointer facility was not documented at all. Also, the documentation says that you only need to use VARSIZE/SET_VARSIZE if you've made your variable-length type TOAST-aware, but actually we've forced that business on all varlena types even if they've opted out of TOAST by setting storage = plain. Wordsmith a few other things too, like an amusingly archaic claim that there are few 64-bit machines. I thought about back-patching this, but since all this doco is oriented to hackers and C-coded extension authors, fixing it in HEAD is probably good enough.
2024-12-15 08:20:16 +08:00 · 2015-02-18 22:33:39 -05:00 · 2015-02-18 22:33:39 -05:00 · 9bb955c828
commit 9bb955c828
parent 56a79a869b
3 changed files with 158 additions and 64 deletions
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@ -329,15 +329,17 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
   to <literal>VARIABLE</literal>.  (Internally, this is represented
   by setting <literal>typlen</> to -1.)  The internal representation of all
   variable-length types must start with a 4-byte integer giving the total
-   length of this value of the type.
+   length of this value of the type.  (Note that the length field is often
+   encoded, as described in <xref linkend="storage-toast">; it's unwise
+   to access it directly.)
  </para>

  <para>
   The optional flag <literal>PASSEDBYVALUE</literal> indicates that
   values of this data type are passed by value, rather than by
-   reference.  You cannot pass by value types whose internal
-   representation is larger than the size of the <type>Datum</> type
-   (4 bytes on most machines, 8 bytes on a few).
+   reference.  Types passed by value must be fixed-length, and their internal
+   representation cannot be larger than the size of the <type>Datum</> type
+   (4 bytes on some machines, 8 bytes on others).
  </para>

  <para>
@ -367,6 +369,17 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
   <literal>external</literal> items.)
  </para>

+  <para>
+   All <replaceable class="parameter">storage</replaceable> values other
+   than <literal>plain</literal> imply that the functions of the data type
+   can handle values that have been <firstterm>toasted</>, as described
+   in <xref linkend="storage-toast"> and <xref linkend="xtypes-toast">.
+   The specific other value given merely determines the default TOAST
+   storage strategy for columns of a toastable data type; users can pick
+   other strategies for individual columns using <literal>ALTER TABLE
+   SET STORAGE</>.
+  </para>
+
  <para>
   The <replaceable class="parameter">like_type</replaceable> parameter
   provides an alternative method for specifying the basic representation
@ -465,8 +478,8 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
    identical things, and you want to allow these things to be accessed
    directly by subscripting, in addition to whatever operations you plan
    to provide for the type as a whole.  For example, type <type>point</>
-    is represented as just two floating-point numbers, each can be accessed using
-    <literal>point[0]</> and <literal>point[1]</>.
+    is represented as just two floating-point numbers, which can be accessed
+    using <literal>point[0]</> and <literal>point[1]</>.
    Note that
    this facility only works for fixed-length types whose internal form
    is exactly a sequence of identical fixed-length fields.  A subscriptable
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@ -309,19 +309,27 @@ this limitation, large  field values are compressed and/or broken up into
 multiple physical rows.  This happens transparently to the user, with only
 small impact on most of the backend code.  The technique is affectionately
 known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>).
+The <acronym>TOAST</> infrastructure is also used to improve handling of
+large data values in-memory.
 </para>

 <para>
 Only certain data types support <acronym>TOAST</> &mdash; there is no need to
 impose the overhead on data types that cannot produce large field values.
 To support <acronym>TOAST</>, a data type must have a variable-length
-(<firstterm>varlena</>) representation, in which the first 32-bit word of any
-stored value contains the total length of the value in bytes (including
-itself).  <acronym>TOAST</> does not constrain the rest of the representation.
-All the C-level functions supporting a <acronym>TOAST</>-able data type must
-be careful to handle <acronym>TOAST</>ed input values.  (This is normally done
-by invoking <function>PG_DETOAST_DATUM</> before doing anything with an input
-value, but in some cases more efficient approaches are possible.)
+(<firstterm>varlena</>) representation, in which, ordinarily, the first
+four-byte word of any stored value contains the total length of the value in
+bytes (including itself).  <acronym>TOAST</> does not constrain the rest
+of the data type's representation.  The special representations collectively
+called <firstterm><acronym>TOAST</>ed values</firstterm> work by modifying or
+reinterpreting this initial length word.  Therefore, the C-level functions
+supporting a <acronym>TOAST</>-able data type must be careful about how they
+handle potentially <acronym>TOAST</>ed input values: an input might not
+actually consist of a four-byte length word and contents until after it's
+been <firstterm>detoasted</>.  (This is normally done by invoking
+<function>PG_DETOAST_DATUM</> before doing anything with an input value,
+but in some cases more efficient approaches are possible.
+See <xref linkend="xtypes-toast"> for more detail.)
 </para>

 <para>
@ -333,58 +341,84 @@ the value is an ordinary un-<acronym>TOAST</>ed value of the data type, and
 the remaining bits of the length word give the total datum size (including
 length word) in bytes.  When the highest-order or lowest-order bit is set,
 the value has only a single-byte header instead of the normal four-byte
-header, and the remaining bits give the total datum size (including length
-byte) in bytes.  As a special case, if the remaining bits are all zero
-(which would be impossible for a self-inclusive length), the value is a
-pointer to out-of-line data stored in a separate TOAST table.  (The size of
-a TOAST pointer is given in the second byte of the datum.)
-Values with single-byte headers aren't aligned on any particular
-boundary, either.  Lastly, when the highest-order or lowest-order bit is
-clear but the adjacent bit is set, the content of the datum has been
-compressed and must be decompressed before use.  In this case the remaining
-bits of the length word give the total size of the compressed datum, not the
+header, and the remaining bits of that byte give the total datum size
+(including length byte) in bytes.  This alternative supports space-efficient
+storage of values shorter than 127 bytes, while still allowing the data type
+to grow to 1 GB at need.  Values with single-byte headers aren't aligned on
+any particular boundary, whereas values with four-byte headers are aligned on
+at least a four-byte boundary; this omission of alignment padding provides
+additional space savings that is significant compared to short values.
+As a special case, if the remaining bits of a single-byte header are all
+zero (which would be impossible for a self-inclusive length), the value is
+a pointer to out-of-line data, with several possible alternatives as
+described below.  The type and size of such a <firstterm>TOAST pointer</>
+are determined by a code stored in the second byte of the datum.
+Lastly, when the highest-order or lowest-order bit is clear but the adjacent
+bit is set, the content of the datum has been compressed and must be
+decompressed before use.  In this case the remaining bits of the four-byte
+length word give the total size of the compressed datum, not the
 original data.  Note that compression is also possible for out-of-line data
 but the varlena header does not tell whether it has occurred &mdash;
-the content of the TOAST pointer tells that, instead.
+the content of the <acronym>TOAST</> pointer tells that, instead.
 </para>

+<para>
+As mentioned, there are multiple types of <acronym>TOAST</> pointer datums.
+The oldest and most common type is a pointer to out-of-line data stored in
+a <firstterm><acronym>TOAST</> table</firstterm> that is separate from, but
+associated with, the table containing the <acronym>TOAST</> pointer datum
+itself.  These <firstterm>on-disk</> pointer datums are created by the
+<acronym>TOAST</> management code (in <filename>access/heap/tuptoaster.c</>)
+when a tuple to be stored on disk is too large to be stored as-is.
+Further details appear in <xref linkend="storage-toast-ondisk">.
+Alternatively, a <acronym>TOAST</> pointer datum can contain a pointer to
+out-of-line data that appears elsewhere in memory.  Such datums are
+necessarily short-lived, and will never appear on-disk, but they are very
+useful for avoiding copying and redundant processing of large data values.
+Further details appear in <xref linkend="storage-toast-inmemory">.
+</para>
+
+<para>
+The compression technique used for either in-line or out-of-line compressed
+data is a fairly simple and very fast member
+of the LZ family of compression techniques.  See
+<filename>src/common/pg_lzcompress.c</> for the details.
+</para>
+
+<sect2 id="storage-toast-ondisk">
+ <title>Out-of-line, on-disk TOAST storage</title>
+
 <para>
 If any of the columns of a table are <acronym>TOAST</>-able, the table will
 have an associated <acronym>TOAST</> table, whose OID is stored in the table's
-<structname>pg_class</>.<structfield>reltoastrelid</> entry.  Out-of-line
+<structname>pg_class</>.<structfield>reltoastrelid</> entry.  On-disk
 <acronym>TOAST</>ed values are kept in the <acronym>TOAST</> table, as
 described in more detail below.
 </para>

-<para>
-The compression technique used is a fairly simple and very fast member
-of the LZ family of compression techniques.  See
-<filename>src/common/pg_lzcompress.c</> for the details.
-</para>
-
 <para>
 Out-of-line values are divided (after compression if used) into chunks of at
 most <symbol>TOAST_MAX_CHUNK_SIZE</> bytes (by default this value is chosen
 so that four chunk rows will fit on a page, making it about 2000 bytes).
-Each chunk is stored
-as a separate row in the <acronym>TOAST</> table for the owning table.  Every
+Each chunk is stored as a separate row in the <acronym>TOAST</> table
+belonging to the owning table.  Every
 <acronym>TOAST</> table has the columns <structfield>chunk_id</> (an OID
 identifying the particular <acronym>TOAST</>ed value),
 <structfield>chunk_seq</> (a sequence number for the chunk within its value),
 and <structfield>chunk_data</> (the actual data of the chunk).  A unique index
 on <structfield>chunk_id</> and <structfield>chunk_seq</> provides fast
-retrieval of the values.  A pointer datum representing an out-of-line
+retrieval of the values.  A pointer datum representing an out-of-line on-disk
 <acronym>TOAST</>ed value therefore needs to store the OID of the
 <acronym>TOAST</> table in which to look and the OID of the specific value
 (its <structfield>chunk_id</>).  For convenience, pointer datums also store the
-logical datum size (original uncompressed data length) and actual stored size
+logical datum size (original uncompressed data length) and physical stored size
 (different if compression was applied).  Allowing for the varlena header bytes,
-the total size of a <acronym>TOAST</> pointer datum is therefore 18 bytes
-regardless of the actual size of the represented value.
+the total size of an on-disk <acronym>TOAST</> pointer datum is therefore 18
+bytes regardless of the actual size of the represented value.
 </para>

 <para>
-The <acronym>TOAST</> code is triggered only
+The <acronym>TOAST</> management code is triggered only
 when a row value to be stored in a table is wider than
 <symbol>TOAST_TUPLE_THRESHOLD</> bytes (normally 2 kB).
 The <acronym>TOAST</> code will compress and/or move
@ -397,8 +431,8 @@ none of the out-of-line values change.
 </para>

 <para>
-The <acronym>TOAST</> code recognizes four different strategies for storing
-<acronym>TOAST</>-able columns:
+The <acronym>TOAST</> management code recognizes four different strategies
+for storing <acronym>TOAST</>-able columns on disk:

   <itemizedlist>
    <listitem>
@ -460,6 +494,41 @@ pages). There was no run time difference compared to an un-<acronym>TOAST</>ed
 comparison table, in which all the HTML pages were cut down to 7 kB to fit.
 </para>

+</sect2>
+
+<sect2 id="storage-toast-inmemory">
+ <title>Out-of-line, in-memory TOAST storage</title>
+
+<para>
+<acronym>TOAST</> pointers can point to data that is not on disk, but is
+elsewhere in the memory of the current server process.  Such pointers
+obviously cannot be long-lived, but they are nonetheless useful.  There
+is currently just one sub-case:
+pointers to <firstterm>indirect</> data.
+</para>
+
+<para>
+Indirect <acronym>TOAST</> pointers simply point at a non-indirect varlena
+value stored somewhere in memory.  This case was originally created merely
+as a proof of concept, but it is currently used during logical decoding to
+avoid possibly having to create physical tuples exceeding 1 GB (as pulling
+all out-of-line field values into the tuple might do).  The case is of
+limited use since the creator of the pointer datum is entirely responsible
+that the referenced data survives for as long as the pointer could exist,
+and there is no infrastructure to help with this.
+</para>
+
+<para>
+For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
+management code ensures that no such pointer datum can accidentally get
+stored on disk.  In-memory <acronym>TOAST</> pointers are automatically
+expanded to normal in-line varlena values before storage &mdash; and then
+possibly converted to on-disk <acronym>TOAST</> pointers, if the containing
+tuple would otherwise be too big.
+</para>
+
+</sect2>
+
 </sect1>

 <sect1 id="storage-fsm">
--- a/doc/src/sgml/xtypes.sgml
+++ b/doc/src/sgml/xtypes.sgml
@ -234,35 +234,49 @@ CREATE TYPE complex (
 </para>

 <para>
+  If the internal representation of the data type is variable-length, the
+  internal representation must follow the standard layout for variable-length
+  data: the first four bytes must be a <type>char[4]</type> field which is
+  never accessed directly (customarily named <structfield>vl_len_</>). You
+  must use the <function>SET_VARSIZE()</function> macro to store the total
+  size of the datum (including the length field itself) in this field
+  and <function>VARSIZE()</function> to retrieve it.  (These macros exist
+  because the length field may be encoded depending on platform.)
+ </para>
+
+ <para>
+  For further details see the description of the
+  <xref linkend="sql-createtype"> command.
+ </para>
+
+ <sect2 id="xtypes-toast">
+  <title>TOAST Considerations</title>
   <indexterm>
    <primary>TOAST</primary>
    <secondary>and user-defined types</secondary>
   </indexterm>
-  If the values of your data type vary in size (in internal form), you should
-  make the data type <acronym>TOAST</>-able (see <xref
-  linkend="storage-toast">). You should do this even if the data are always
+
+ <para>
+  If the values of your data type vary in size (in internal form), it's
+  usually desirable to make the data type <acronym>TOAST</>-able (see <xref
+  linkend="storage-toast">). You should do this even if the values are always
  too small to be compressed or stored externally, because
  <acronym>TOAST</> can save space on small data too, by reducing header
  overhead.
 </para>

 <para>
-  To do this, the internal representation must follow the standard layout for
-  variable-length data: the first four bytes must be a <type>char[4]</type>
-  field which is never accessed directly (customarily named
-  <structfield>vl_len_</>). You
-  must use <function>SET_VARSIZE()</function> to store the size of the datum
-  in this field and <function>VARSIZE()</function> to retrieve it. The C
-  functions operating on the data type must always be careful to unpack any
-  toasted values they are handed, by using <function>PG_DETOAST_DATUM</>.
-  (This detail is customarily hidden by defining type-specific
-  <function>GETARG_DATATYPE_P</function> macros.) Then, when running the
-  <command>CREATE TYPE</command> command, specify the internal length as
-  <literal>variable</> and select the appropriate storage option.
+  To support <acronym>TOAST</> storage, the C functions operating on the data
+  type must always be careful to unpack any toasted values they are handed
+  by using <function>PG_DETOAST_DATUM</>.  (This detail is customarily hidden
+  by defining type-specific <function>GETARG_DATATYPE_P</function> macros.)
+  Then, when running the <command>CREATE TYPE</command> command, specify the
+  internal length as <literal>variable</> and select some appropriate storage
+  option other than <literal>plain</>.
 </para>

 <para>
-  If the alignment is unimportant (either just for a specific function or
+  If data alignment is unimportant (either just for a specific function or
  because the data type specifies byte alignment anyway) then it's possible
  to avoid some of the overhead of <function>PG_DETOAST_DATUM</>. You can use
  <function>PG_DETOAST_DATUM_PACKED</> instead (customarily hidden by
@ -286,8 +300,6 @@ CREATE TYPE complex (
  </para>
 </note>

- <para>
-  For further details see the description of the
-  <xref linkend="sql-createtype"> command.
- </para>
+ </sect2>
+
 </sect1>