Backpatch array I/O code and documentation fixes, also array slice

lower subscript bounds change.
This commit is contained in:
Tom Lane 2002-03-17 20:05:59 +00:00
parent efec53adb3
commit bbc1fb07c6
2 changed files with 167 additions and 101 deletions

View File

@ -1,4 +1,4 @@
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/array.sgml,v 1.19 2002/01/20 22:19:55 petere Exp $ -->
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/array.sgml,v 1.19.2.1 2002/03/17 20:05:58 tgl Exp $ -->
<chapter id="arrays">
<title>Arrays</title>
@ -22,8 +22,9 @@ CREATE TABLE sal_emp (
As shown, an array data type is named by appending square brackets
(<literal>[]</>) to the data type name of the array elements.
The above query will create a table named
<structname>sal_emp</structname> with a <type>text</type> string
(<structfield>name</structfield>), a one-dimensional array of type
<structname>sal_emp</structname> with columns including
a <type>text</type> string (<structfield>name</structfield>),
a one-dimensional array of type
<type>integer</type> (<structfield>pay_by_quarter</structfield>),
which represents the employee's salary by quarter, and a
two-dimensional array of <type>text</type>
@ -35,7 +36,7 @@ CREATE TABLE sal_emp (
Now we do some <command>INSERT</command>s. Observe that to write an array
value, we enclose the element values within curly braces and separate them
by commas. If you know C, this is not unlike the syntax for
initializing structures.
initializing structures. (More details appear below.)
<programlisting>
INSERT INTO sal_emp
@ -66,7 +67,7 @@ SELECT name FROM sal_emp WHERE pay_by_quarter[1] &lt;&gt; pay_by_quarter[2];
</programlisting>
The array subscript numbers are written within square brackets.
<productname>PostgreSQL</productname> uses the
By default <productname>PostgreSQL</productname> uses the
<quote>one-based</quote> numbering convention for arrays, that is,
an array of <replaceable>n</> elements starts with <literal>array[1]</literal> and
ends with <literal>array[<replaceable>n</>]</literal>.
@ -99,7 +100,7 @@ SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
schedule
--------------------
{{"meeting"},{""}}
{{meeting},{""}}
(1 row)
</programlisting>
@ -144,11 +145,17 @@ UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
those already present, or by assigning to a slice that is adjacent
to or overlaps the data already present. For example, if an array
value currently has 4 elements, it will have five elements after an
update that assigns to array[5]. Currently, enlargement in this
fashion is only allowed for one-dimensional arrays, not
update that assigns to <literal>array[5]</>. Currently, enlargement in
this fashion is only allowed for one-dimensional arrays, not
multidimensional arrays.
</para>
<para>
Array slice assignment allows creation of arrays that do not use one-based
subscripts. For example one might assign to <literal>array[-2:7]</> to
create an array with subscript values running from -2 to 7.
</para>
<para>
The syntax for <command>CREATE TABLE</command> allows fixed-length
arrays to be defined:
@ -168,7 +175,9 @@ CREATE TABLE tictactoe (
Actually, the current implementation does not enforce the declared
number of dimensions either. Arrays of a particular element type are
all considered to be of the same type, regardless of size or number
of dimensions.
of dimensions. So, declaring number of dimensions or sizes in
<command>CREATE TABLE</command> is simply documentation, it does not
affect runtime behavior.
</para>
<para>
@ -248,19 +257,55 @@ SELECT * FROM sal_emp WHERE pay_by_quarter **= 10000;
</para>
</note>
<formalpara>
<title>Array input and output syntax.</title>
<para>
The external representation of an array value consists of items that
are interpreted according to the I/O conversion rules for the array's
element type, plus decoration that indicates the array structure.
The decoration consists of curly braces (<literal>{</> and <literal>}</>)
around the array value plus delimiter characters between adjacent items.
The delimiter character is usually a comma (<literal>,</>) but can be
something else: it is determined by the <literal>typdelim</> setting
for the array's element type. (Among the standard datatypes provided
in the <productname>PostgreSQL</productname> distribution, type
<literal>box</> uses a semicolon (<literal>;</>) but all the others
use comma.) In a multidimensional array, each dimension (row, plane,
cube, etc.) gets its own level of curly braces, and delimiters
must be written between adjacent curly-braced entities of the same level.
You may write whitespace before a left brace, after a right
brace, or before any individual item string. Whitespace after an item
is not ignored, however: after skipping leading whitespace, everything
up to the next right brace or delimiter is taken as the item value.
</para>
</formalpara>
<formalpara>
<title>Quoting array elements.</title>
<para>
As shown above, when writing an array literal value you may write double
As shown above, when writing an array value you may write double
quotes around any individual array
element. You <emphasis>must</> do so if the element value would otherwise
confuse the array-value parser. For example, elements containing curly
braces, commas, double quotes, backslashes, or white space must be
double-quoted. To put a double quote or backslash in an array element
value, precede it with a backslash.
braces, commas (or whatever the delimiter character is), double quotes,
backslashes, or leading white space must be double-quoted. To put a double
quote or backslash in an array element value, precede it with a backslash.
Alternatively, you can use backslash-escaping to protect all data characters
that would otherwise be taken as array syntax or ignorable white space.
</para>
</formalpara>
<para>
The array output routine will put double quotes around element values
if they are empty strings or contain curly braces, delimiter characters,
double quotes, backslashes, or white space. Double quotes and backslashes
embedded in element values will be backslash-escaped. For numeric
datatypes it is safe to assume that double quotes will never appear, but
for textual datatypes one should be prepared to cope with either presence
or absence of quotes. (This is a change in behavior from pre-7.2
<productname>PostgreSQL</productname> releases.)
</para>
<tip>
<para>
Remember that what you write in an SQL query will first be interpreted

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/utils/adt/arrayfuncs.c,v 1.72 2001/11/29 21:02:41 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/utils/adt/arrayfuncs.c,v 1.72.2.1 2002/03/17 20:05:59 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -46,19 +46,12 @@
* Local definitions
* ----------
*/
#ifndef MIN
#define MIN(a,b) (((a)<(b)) ? (a) : (b))
#endif
#ifndef MAX
#define MAX(a,b) (((a)>(b)) ? (a) : (b))
#endif
#define ASSGN "="
#define RETURN_NULL(type) do { *isNull = true; return (type) 0; } while (0)
static int ArrayCount(char *str, int *dim, int typdelim);
static int ArrayCount(char *str, int *dim, char typdelim);
static Datum *ReadArrayStr(char *arrayStr, int nitems, int ndim, int *dim,
FmgrInfo *inputproc, Oid typelem, int32 typmod,
char typdelim, int typlen, bool typbyval,
@ -252,7 +245,7 @@ array_in(PG_FUNCTION_ARGS)
*-----------------------------------------------------------------------------
*/
static int
ArrayCount(char *str, int *dim, int typdelim)
ArrayCount(char *str, int *dim, char typdelim)
{
int nest_level = 0,
i;
@ -260,7 +253,7 @@ ArrayCount(char *str, int *dim, int typdelim)
temp[MAXDIM];
bool scanning_string = false;
bool eoArray = false;
char *q;
char *ptr;
for (i = 0; i < MAXDIM; ++i)
temp[i] = dim[i] = 0;
@ -268,65 +261,68 @@ ArrayCount(char *str, int *dim, int typdelim)
if (strncmp(str, "{}", 2) == 0)
return 0;
q = str;
while (eoArray != true)
ptr = str;
while (!eoArray)
{
bool done = false;
bool itemdone = false;
while (!done)
while (!itemdone)
{
switch (*q)
switch (*ptr)
{
case '\\':
/* skip escaped characters (\ and ") inside strings */
if (scanning_string && *(q + 1))
q++;
break;
case '\0':
/*
* Signal a premature end of the string. DZ -
* 2-9-1996
*/
/* Signal a premature end of the string */
elog(ERROR, "malformed array constant: %s", str);
break;
case '\\':
/* skip the escaped character */
if (*(ptr + 1))
ptr++;
else
elog(ERROR, "malformed array constant: %s", str);
break;
case '\"':
scanning_string = !scanning_string;
break;
case '{':
if (!scanning_string)
{
if (nest_level >= MAXDIM)
elog(ERROR, "array_in: illformed array constant");
temp[nest_level] = 0;
nest_level++;
if (ndim < nest_level)
ndim = nest_level;
}
break;
case '}':
if (!scanning_string)
{
if (!ndim)
ndim = nest_level;
nest_level--;
if (nest_level)
temp[nest_level - 1]++;
if (nest_level == 0)
eoArray = done = true;
elog(ERROR, "array_in: illformed array constant");
nest_level--;
if (nest_level == 0)
eoArray = itemdone = true;
else
{
/*
* We don't set itemdone here; see comments in
* ReadArrayStr
*/
temp[nest_level - 1]++;
}
}
break;
default:
if (!ndim)
ndim = nest_level;
if (*q == typdelim && !scanning_string)
done = true;
if (*ptr == typdelim && !scanning_string)
itemdone = true;
break;
}
if (!done)
q++;
if (!itemdone)
ptr++;
}
temp[ndim - 1]++;
q++;
if (!eoArray)
while (isspace((unsigned char) *q))
q++;
ptr++;
}
for (i = 0; i < ndim; ++i)
dim[i] = temp[i];
@ -366,103 +362,119 @@ ReadArrayStr(char *arrayStr,
int i,
nest_level = 0;
Datum *values;
char *p,
*q,
*r;
char *ptr;
bool scanning_string = false;
bool eoArray = false;
int indx[MAXDIM],
prod[MAXDIM];
bool eoArray = false;
mda_get_prod(ndim, dim, prod);
values = (Datum *) palloc(nitems * sizeof(Datum));
MemSet(values, 0, nitems * sizeof(Datum));
MemSet(indx, 0, sizeof(indx));
q = p = arrayStr;
/* read array enclosed within {} */
ptr = arrayStr;
while (!eoArray)
{
bool done = false;
bool itemdone = false;
int i = -1;
char *itemstart;
while (!done)
/* skip leading whitespace */
while (isspace((unsigned char) *ptr))
ptr++;
itemstart = ptr;
while (!itemdone)
{
switch (*q)
switch (*ptr)
{
case '\0':
/* Signal a premature end of the string */
elog(ERROR, "malformed array constant: %s", arrayStr);
break;
case '\\':
{
char *cptr;
/* Crunch the string on top of the backslash. */
for (r = q; *r != '\0'; r++)
*r = *(r + 1);
for (cptr = ptr; *cptr != '\0'; cptr++)
*cptr = *(cptr + 1);
if (*ptr == '\0')
elog(ERROR, "malformed array constant: %s", arrayStr);
break;
}
case '\"':
if (!scanning_string)
{
while (p != q)
p++;
p++; /* get p past first doublequote */
}
else
*q = '\0';
{
char *cptr;
scanning_string = !scanning_string;
/* Crunch the string on top of the quote. */
for (cptr = ptr; *cptr != '\0'; cptr++)
*cptr = *(cptr + 1);
/* Back up to not miss following character. */
ptr--;
break;
}
case '{':
if (!scanning_string)
{
p++;
nest_level++;
if (nest_level > ndim)
if (nest_level >= ndim)
elog(ERROR, "array_in: illformed array constant");
nest_level++;
indx[nest_level - 1] = 0;
indx[ndim - 1] = 0;
/* skip leading whitespace */
while (isspace((unsigned char) *(ptr+1)))
ptr++;
itemstart = ptr+1;
}
break;
case '}':
if (!scanning_string)
{
if (nest_level == 0)
elog(ERROR, "array_in: illformed array constant");
if (i == -1)
i = ArrayGetOffset0(ndim, indx, prod);
indx[nest_level - 1] = 0;
nest_level--;
if (nest_level == 0)
eoArray = done = true;
eoArray = itemdone = true;
else
{
*q = '\0';
/*
* tricky coding: terminate item value string at
* first '}', but don't process it till we see
* a typdelim char or end of array. This handles
* case where several '}'s appear successively
* in a multidimensional array.
*/
*ptr = '\0';
indx[nest_level - 1]++;
}
}
break;
default:
if (*q == typdelim && !scanning_string)
if (*ptr == typdelim && !scanning_string)
{
if (i == -1)
i = ArrayGetOffset0(ndim, indx, prod);
done = true;
itemdone = true;
indx[ndim - 1]++;
}
break;
}
if (!done)
q++;
if (!itemdone)
ptr++;
}
*q = '\0';
if (i >= nitems)
*ptr++ = '\0';
if (i < 0 || i >= nitems)
elog(ERROR, "array_in: illformed array constant");
values[i] = FunctionCall3(inputproc,
CStringGetDatum(p),
CStringGetDatum(itemstart),
ObjectIdGetDatum(typelem),
Int32GetDatum(typmod));
p = ++q;
/*
* if not at the end of the array skip white space
*/
if (!eoArray)
while (isspace((unsigned char) *q))
{
p++;
q++;
}
}
/*
@ -819,6 +831,7 @@ array_ref(ArrayType *array,
retptr = array_seek(arraydataptr, elmlen, offset);
*isNull = false;
return ArrayCast(retptr, elmbyval, elmlen);
}
@ -845,7 +858,8 @@ array_get_slice(ArrayType *array,
int i,
ndim,
*dim,
*lb;
*lb,
*newlb;
int fixedDim[1],
fixedLb[1];
char *arraydataptr;
@ -917,7 +931,14 @@ array_get_slice(ArrayType *array,
newarray->ndim = ndim;
newarray->flags = 0;
memcpy(ARR_DIMS(newarray), span, ndim * sizeof(int));
memcpy(ARR_LBOUND(newarray), lowerIndx, ndim * sizeof(int));
/*
* Lower bounds of the new array are set to 1. Formerly (before 7.3)
* we copied the given lowerIndx values ... but that seems confusing.
*/
newlb = ARR_LBOUND(newarray);
for (i = 0; i < ndim; i++)
newlb[i] = 1;
array_extract_slice(ndim, dim, lb, arraydataptr, elmlen,
lowerIndx, upperIndx, ARR_DATA_PTR(newarray));
@ -1222,8 +1243,8 @@ array_set_slice(ArrayType *array,
*/
int oldlb = ARR_LBOUND(array)[0];
int oldub = oldlb + ARR_DIMS(array)[0] - 1;
int slicelb = MAX(oldlb, lowerIndx[0]);
int sliceub = MIN(oldub, upperIndx[0]);
int slicelb = Max(oldlb, lowerIndx[0]);
int sliceub = Min(oldub, upperIndx[0]);
char *oldarraydata = ARR_DATA_PTR(array);
lenbefore = array_nelems_size(oldarraydata,