mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-12-09 08:10:09 +08:00
Peter Eisentraut wrote:
> So I would base this discussion on the premise "bytea stores binary data" > (insert examples). > > Some stylistic issues: > > bytea => <type>bytea</type> > > NULLs => zero bytes/bytes of value zero ("NULL" is too overloaded) > > 'non-printable' => <quote>nonprintable</quote> > > MUST => <emphasis>must</emphasis> > Here's a patch against *CVS tip* to address Peter's comments. Please let me know what you think! Joe Conway
This commit is contained in:
parent
76c879cd9a
commit
8e6467fff3
@ -1,5 +1,5 @@
|
|||||||
<!--
|
<!--
|
||||||
$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.72 2001/11/20 15:42:44 momjian Exp $
|
$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.73 2001/11/21 03:17:22 momjian Exp $
|
||||||
-->
|
-->
|
||||||
|
|
||||||
<chapter id="datatype">
|
<chapter id="datatype">
|
||||||
@ -984,7 +984,7 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
<tbody>
|
<tbody>
|
||||||
<row>
|
<row>
|
||||||
<entry>bytea</entry>
|
<entry>bytea</entry>
|
||||||
<entry>4 bytes plus the actual string</entry>
|
<entry>4 bytes plus the actual binary string</entry>
|
||||||
<entry>Variable (not specifically limited)
|
<entry>Variable (not specifically limited)
|
||||||
length binary string</entry>
|
length binary string</entry>
|
||||||
</row>
|
</row>
|
||||||
@ -994,29 +994,28 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
A binary string is a sequence of octets that does not have either a
|
A binary string is a sequence of octets that does not have either a
|
||||||
character set or collation associated with it. Bytea specifically
|
character set or collation associated with it. <type>Bytea</type>
|
||||||
allows storage of NULLs and other 'non-printable' <acronym>ASCII
|
specifically allows storing octets of zero value and other
|
||||||
</acronym> characters.
|
<quote>non-printable</quote> octets.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Certain <acronym>ASCII</acronym> characters MUST be escaped (but all
|
Octets of certain values <emphasis>must</emphasis> be escaped (but all
|
||||||
characters MAY be escaped) when used as part of a string literal in an
|
octet values <emphasis>may</emphasis> be escaped) when used as part of
|
||||||
<acronym>SQL</acronym> statement. In general, to escape a character, it
|
a string literal in an <acronym>SQL</acronym> statement. In general,
|
||||||
is converted into the three digit octal number equal to the decimal
|
to escape an octet, it is converted into the three digit octal number
|
||||||
<acronym>ASCII</acronym> value, and preceeded by two backslashes. The
|
equivalent of its decimal octet value, and preceeded by two
|
||||||
single quote (') and backslash (\) characters have special alternate
|
backslashes. Octets with the decimal values 39 (single quote), and 92
|
||||||
escape sequences. Details are in
|
(backslash), have special alternate escape sequences. Details are in
|
||||||
<xref linkend="datatype-binary-sqlesc">.
|
<xref linkend="datatype-binary-sqlesc">.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<table id="datatype-binary-sqlesc">
|
<table id="datatype-binary-sqlesc">
|
||||||
<title><acronym>SQL</acronym> Literal Escaped <acronym>ASCII</acronym>
|
<title><acronym>SQL</acronym> Literal Escaped Octets</title>
|
||||||
Characters</title>
|
|
||||||
<tgroup cols="5">
|
<tgroup cols="5">
|
||||||
<thead>
|
<thead>
|
||||||
<row>
|
<row>
|
||||||
<entry>Decimal <acronym>ASCII</acronym> Value</entry>
|
<entry>Decimal Octet Value</entry>
|
||||||
<entry>Description</entry>
|
<entry>Description</entry>
|
||||||
<entry>Input Escaped Representation</entry>
|
<entry>Input Escaped Representation</entry>
|
||||||
<entry>Example</entry>
|
<entry>Example</entry>
|
||||||
@ -1027,7 +1026,7 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
<tbody>
|
<tbody>
|
||||||
<row>
|
<row>
|
||||||
<entry> <literal> 0 </literal> </entry>
|
<entry> <literal> 0 </literal> </entry>
|
||||||
<entry> null byte </entry>
|
<entry> zero octet </entry>
|
||||||
<entry> <literal> '\\000' </literal> </entry>
|
<entry> <literal> '\\000' </literal> </entry>
|
||||||
<entry> <literal> select '\\000'::bytea; </literal> </entry>
|
<entry> <literal> select '\\000'::bytea; </literal> </entry>
|
||||||
<entry> <literal> \000 </literal></entry>
|
<entry> <literal> \000 </literal></entry>
|
||||||
@ -1055,24 +1054,23 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
Note that the result in each of the examples above was exactly one
|
Note that the result in each of the examples above was exactly one
|
||||||
byte in length, even though the output representation of the null byte
|
octet in length, even though the output representation of the zero
|
||||||
and backslash are more than one character. Bytea output characters
|
octet and backslash are more than one character. <type>Bytea</type>
|
||||||
are also escaped. In general, each "non-printable" character is
|
output octets are also escaped. In general, each
|
||||||
converted into the three digit octal number equal to its decimal
|
<quote>non-printable</quote> octet decimal value is converted into
|
||||||
<acronym>ASCII</acronym> value, and preceeded by one backslash. Most
|
its equivalent three digit octal value, and preceeded by one backslash.
|
||||||
"printable" characters are represented by their standard
|
Most <quote>printable</quote> octets are represented by their standard
|
||||||
<acronym>ASCII</acronym> representation. The backslash (\) character
|
representation in the client character set. The octet with decimal
|
||||||
has a special alternate output representation. Details are in
|
value 92 (backslash) has a special alternate output representation.
|
||||||
<xref linkend="datatype-binary-resesc">.
|
Details are in <xref linkend="datatype-binary-resesc">.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<table id="datatype-binary-resesc">
|
<table id="datatype-binary-resesc">
|
||||||
<title><acronym>SQL</acronym> Output Escaped <acronym>ASCII</acronym>
|
<title><acronym>SQL</acronym> Output Escaped Octets</title>
|
||||||
Characters</title>
|
|
||||||
<tgroup cols="5">
|
<tgroup cols="5">
|
||||||
<thead>
|
<thead>
|
||||||
<row>
|
<row>
|
||||||
<entry>Decimal <acronym>ASCII</acronym> Value</entry>
|
<entry>Decimal Octet Value</entry>
|
||||||
<entry>Description</entry>
|
<entry>Description</entry>
|
||||||
<entry>Output Escaped Representation</entry>
|
<entry>Output Escaped Representation</entry>
|
||||||
<entry>Example</entry>
|
<entry>Example</entry>
|
||||||
@ -1100,7 +1098,7 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
|
|
||||||
<row>
|
<row>
|
||||||
<entry> <literal> 0 to 31 and 127 to 255 </literal> </entry>
|
<entry> <literal> 0 to 31 and 127 to 255 </literal> </entry>
|
||||||
<entry> non-printable characters </entry>
|
<entry> <quote>non-printable</quote> octets </entry>
|
||||||
<entry> <literal> \### (octal value) </literal> </entry>
|
<entry> <literal> \### (octal value) </literal> </entry>
|
||||||
<entry> <literal> select '\\001'::bytea; </literal> </entry>
|
<entry> <literal> select '\\001'::bytea; </literal> </entry>
|
||||||
<entry> <literal> \001 </literal></entry>
|
<entry> <literal> \001 </literal></entry>
|
||||||
@ -1108,8 +1106,8 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
|
|
||||||
<row>
|
<row>
|
||||||
<entry> <literal> 32 to 126 </literal> </entry>
|
<entry> <literal> 32 to 126 </literal> </entry>
|
||||||
<entry> printable characters </entry>
|
<entry> <quote>printable</quote> octets </entry>
|
||||||
<entry> ASCII representation </entry>
|
<entry> client character set representation </entry>
|
||||||
<entry> <literal> select '\\176'::bytea; </literal> </entry>
|
<entry> <literal> select '\\176'::bytea; </literal> </entry>
|
||||||
<entry> <literal> ~ </literal></entry>
|
<entry> <literal> ~ </literal></entry>
|
||||||
</row>
|
</row>
|
||||||
@ -1123,76 +1121,81 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
preceeded with two backslashes due to the fact that they must pass
|
preceeded with two backslashes due to the fact that they must pass
|
||||||
through two parsers in the PostgreSQL backend. The first backslash
|
through two parsers in the PostgreSQL backend. The first backslash
|
||||||
is interpreted as an escape character by the string literal parser,
|
is interpreted as an escape character by the string literal parser,
|
||||||
and therefore is consumed, leaving the characters that follow it.
|
and therefore is consumed, leaving the octets that follow.
|
||||||
The second backslash is recognized by <type>bytea</> input function
|
The second backslash is recognized by <type>bytea</type> input function
|
||||||
as the prefix of a three digit octal value. For example, a string
|
as the prefix of a three digit octal value. For example, a string
|
||||||
literal passed to the backend as <literal>'\\001'</literal> becomes
|
literal passed to the backend as <literal>'\\001'</literal> becomes
|
||||||
<literal>'\001'</literal> after passing through the string literal
|
<literal>'\001'</literal> after passing through the string literal
|
||||||
parser. The <literal>'\001'</literal> is then sent to the bytea
|
parser. The <literal>'\001'</literal> is then sent to the
|
||||||
input function, where it is converted to a single byte with a decimal
|
<type>bytea</type> input function, where it is converted to a single
|
||||||
<acronym>ASCII</acronym> value of 1.
|
octet with a decimal value of 1.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
For a similar reason, a backslash must be input as
|
For a similar reason, a backslash must be input as
|
||||||
<literal>'\\\\'</literal> (or <literal>'\\134'</literal>). The first
|
<literal>'\\\\'</literal> (or <literal>'\\134'</literal>). The first
|
||||||
and third backslashes are interpreted as escape characters by the
|
and third backslashes are interpreted as escape octets by the
|
||||||
string literal parser, and therefore are consumed, leaving the
|
string literal parser, and therefore are consumed, leaving the
|
||||||
second and forth backslashes untouched. The second and forth
|
second and forth backslashes untouched. The second and forth
|
||||||
backslashes are recognized by <type>bytea</> input function as a single
|
backslashes are recognized by the <type>bytea</type> input function
|
||||||
backslash. For example, a string literal passed to the backend as
|
as a single backslash. For example, a string literal passed to the
|
||||||
<literal>'\\\\'</literal> becomes <literal>'\\'</literal> after passing
|
backend as <literal>'\\\\'</literal> becomes <literal>'\\'</literal>
|
||||||
through the string literal parser. The <literal>'\\'</literal> is then
|
after passing through the string literal parser. The
|
||||||
sent to the bytea input function, where it is converted to a single
|
<literal>'\\'</literal> is then sent to the <type>bytea</type> input
|
||||||
byte with a decimal <acronym>ASCII</acronym> value of 92.
|
function, where it is converted to a single octet with a decimal
|
||||||
|
value of 92.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
A single quote is a bit different in that it must be input as
|
A single quote is a bit different in that it must be input as
|
||||||
<literal>'\''</literal> (or <literal>'\\134'</literal>), NOT as
|
<literal>'\''</literal> (or <literal>'\\134'</literal>),
|
||||||
<literal>'\\''</literal>. This is because, while the literal parser
|
<emphasis>not</emphasis> as <literal>'\\''</literal>. This is because,
|
||||||
interprets the single quote as a special character, and will consume
|
while the literal parser interprets the single quote as a special
|
||||||
the single backslash, the bytea input function does NOT recognize
|
character, and will consume the single backslash, the
|
||||||
a single quote as a special character. Therefore a string
|
<type>bytea</type> input function does <emphasis>not</emphasis>
|
||||||
|
recognize a single quote as a special octet. Therefore a string
|
||||||
literal passed to the backend as <literal>'\''</literal> becomes
|
literal passed to the backend as <literal>'\''</literal> becomes
|
||||||
<literal>'''</literal> after passing through the string literal
|
<literal>'''</literal> after passing through the string literal
|
||||||
parser. The <literal>'''</literal> is then sent to the bytea
|
parser. The <literal>'''</literal> is then sent to the
|
||||||
input function, where it is retains its single byte decimal
|
<type>bytea</type> input function, where it is retains its single
|
||||||
<acronym>ASCII</acronym> value of 39.
|
octet decimal value of 39.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Depending on the front end to PostgreSQL you use, you may have
|
Depending on the front end to PostgreSQL you use, you may have
|
||||||
additional work to do in terms of escaping and unescaping bytea
|
additional work to do in terms of escaping and unescaping
|
||||||
strings. For example, you may also have to escape line feeds and
|
<type>bytea</type> strings. For example, you may also have to escape
|
||||||
carriage return if your interface automatically translates these.
|
line feeds and carriage returns if your interface automatically
|
||||||
Or you may have to double up on backslashes if the parser for your
|
translates these. Or you may have to double up on backslashes if
|
||||||
language or choice also treats them as an escape character.
|
the parser for your language or choice also treats them as an
|
||||||
|
escape octet.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<sect2 id="datatype-binary-compat">
|
<sect2 id="datatype-binary-compat">
|
||||||
<title>Compatibility</title>
|
<title>Compatibility</title>
|
||||||
<para>
|
<para>
|
||||||
Bytea provides most of the functionality of the SQL99 binary string
|
<type>Bytea</type> provides most of the functionality of the binary
|
||||||
type per SQL99 section 4.3. A comparison of PostgreSQL bytea and SQL99
|
string type per SQL99 section 4.3. A comparison of SQL99 Binary
|
||||||
Binary Strings is presented in
|
Strings and PostgreSQL <type>bytea</type> is presented in
|
||||||
<xref linkend="datatype-binary-compat-comp">.
|
<xref linkend="datatype-binary-compat-comp">.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<table id="datatype-binary-compat-comp">
|
<table id="datatype-binary-compat-comp">
|
||||||
<title>Comparison of SQL99 Binary String and BYTEA types</title>
|
<title>Comparison of SQL99 Binary String and PostgreSQL
|
||||||
|
<type>BYTEA</type> types</title>
|
||||||
<tgroup cols="2">
|
<tgroup cols="2">
|
||||||
<thead>
|
<thead>
|
||||||
<row>
|
<row>
|
||||||
<entry>SQL99</entry>
|
<entry>SQL99</entry>
|
||||||
<entry>BYTEA</entry>
|
<entry><type>BYTEA</type></entry>
|
||||||
</row>
|
</row>
|
||||||
</thead>
|
</thead>
|
||||||
|
|
||||||
<tbody>
|
<tbody>
|
||||||
<row>
|
<row>
|
||||||
<entry> Name of data type BINARY LARGE OBJECT or BLOB </entry>
|
<entry> Name of data type <type>BINARY LARGE OBJECT</type>
|
||||||
<entry> Name of data type BYTEA </entry>
|
or <type>BLOB</type> </entry>
|
||||||
|
<entry> Name of data type <type>BYTEA</type> </entry>
|
||||||
</row>
|
</row>
|
||||||
|
|
||||||
<row>
|
<row>
|
||||||
@ -1242,9 +1245,9 @@ SELECT b, char_length(b) FROM test2;
|
|||||||
|
|
||||||
<row>
|
<row>
|
||||||
<entry> A binary string literal is comprised of an even number of
|
<entry> A binary string literal is comprised of an even number of
|
||||||
hexidecimal digits, in single quotes, preceeded by "X",
|
hexidecimal digits, in single quotes, preceeded by <quote>X</quote>,
|
||||||
e.g. X'1a43fe'</entry>
|
e.g. <literal>X'1a43fe'</literal></entry>
|
||||||
<entry> A binary string literal is comprised of ASCII characters
|
<entry> A binary string literal is comprised of octets
|
||||||
escaped according to the rules shown in
|
escaped according to the rules shown in
|
||||||
<xref linkend="datatype-binary-sqlesc"> </entry>
|
<xref linkend="datatype-binary-sqlesc"> </entry>
|
||||||
</row>
|
</row>
|
||||||
|
Loading…
Reference in New Issue
Block a user