Commit Graph

15 Commits

Author SHA1 Message Date
Bruce Momjian
54f7338fa1 This patch implements holdable cursors, following the proposal
(materialization into a tuple store) discussed on pgsql-hackers earlier.
I've updated the documentation and the regression tests.

Notes on the implementation:

- I needed to change the tuple store API slightly -- it assumes that it
won't be used to hold data across transaction boundaries, so the temp
files that it uses for on-disk storage are automatically reclaimed at
end-of-transaction. I added a flag to tuplestore_begin_heap() to control
this behavior. Is changing the tuple store API in this fashion OK?

- in order to store executor results in a tuple store, I added a new
CommandDest. This works well for the most part, with one exception: the
current DestFunction API doesn't provide enough information to allow the
Executor to store results into an arbitrary tuple store (where the
particular tuple store to use is chosen by the call site of
ExecutorRun). To workaround this, I've temporarily hacked up a solution
that works, but is not ideal: since the receiveTuple DestFunction is
passed the portal name, we can use that to lookup the Portal data
structure for the cursor and then use that to get at the tuple store the
Portal is using. This unnecessarily ties the Portal code with the
tupleReceiver code, but it works...

The proper fix for this is probably to change the DestFunction API --
Tom suggested passing the full QueryDesc to the receiveTuple function.
In that case, callers of ExecutorRun could "subclass" QueryDesc to add
any additional fields that their particular CommandDest needed to get
access to. This approach would work, but I'd like to think about it for
a little bit longer before deciding which route to go. In the mean time,
the code works fine, so I don't think a fix is urgent.

- (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and
adjusted the behavior of SCROLL in accordance with the discussion on
-hackers.

- (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml

Neil Conway
2003-03-27 16:51:29 +00:00
Bruce Momjian
64d0b8b05f Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:

Data:
-------------------------------------------------------------------
select * from cth;
  id | rowid |        rowdt        |   attribute    |      val
----+-------+---------------------+----------------+---------------
   1 | test1 | 2003-03-01 00:00:00 | temperature    | 42
   2 | test1 | 2003-03-01 00:00:00 | test_result    | PASS
   3 | test1 | 2003-03-01 00:00:00 | volts          | 2.6987
   4 | test2 | 2003-03-02 00:00:00 | temperature    | 53
   5 | test2 | 2003-03-02 00:00:00 | test_result    | FAIL
   6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
   7 | test2 | 2003-03-02 00:00:00 | volts          | 3.1234
(7 rows)

Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
   'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
  rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
  test1 | 42          | PASS        | 2.6987         |
  test2 | 53          | FAIL        | 01 March 2003  | 3.1234
(2 rows)

Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
   'SELECT rowid, attribute, val FROM cth ORDER BY 1',
   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
  rowid | temperature | test_result |   test_startdate    | volts
-------+-------------+-------------+---------------------+--------
  test1 |          42 | PASS        |                     | 2.6987
  test2 |          53 | FAIL        | 2003-03-01 00:00:00 | 3.1234
(2 rows)

Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.

The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).

In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).

I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.

README and regression test adjustments included. If there are no
objections, please apply.

Joe Conway
2003-03-20 06:46:30 +00:00
Tom Lane
e4704001ea This patch fixes a bunch of spelling mistakes in comments throughout the
PostgreSQL source code.

Neil Conway
2003-03-10 22:28:22 +00:00
Tom Lane
aa60eecc37 Revise tuplestore and nodeMaterial so that we don't have to read the
entire contents of the subplan into the tuplestore before we can return
any tuples.  Instead, the tuplestore holds what we've already read, and
we fetch additional rows from the subplan as needed.  Random access to
the previously-read rows works with the tuplestore, and doesn't affect
the state of the partially-read subplan.  This is a step towards fixing
the problems with cursors over complex queries --- we don't want to
stick in Materialize nodes if they'll prevent quick startup for a cursor.
2003-03-09 02:19:13 +00:00
Tom Lane
1f1c332381 Remove inappropriate double-quoting in connectby() code; adjust
regression test to avoid using VALUE as a name.  From Joe Conway.
2002-11-23 01:54:09 +00:00
Bruce Momjian
620dddf88a > The previous patch fixed an infinite recursion bug in
> contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error
> seems to occur even if a table has commonplace tree data(see below).
>
> I would think the patch, ancestor check, should be
>
>   if (strstr(branch_delim || branchstr->data || branch_delim,
>                        branch_delim || current_key || branch_delim))
>
> This is my image, not a real code. However, if branchstr->data includes
> branch_delim, my image will not be perfect.

Good point. Thank you Masaru for the suggested fix.

Attached is a patch to fix the bug found by Masaru. His example now
produces:

regression=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level
int,
branch text);
  keyid | parent_keyid | level |  branch

-------+--------------+-------+----------
     11 |              |     0 | 11
     10 |           11 |     1 | 11-10
    111 |           11 |     1 | 11-111
      1 |          111 |     2 | 11-111-1
(4 rows)

While making the patch I also realized that the "no show branch" form of
the  function was not going to work very well for recursion detection.
Therefore  there is now a default branch delimiter ('~') that is used
internally, for  that case, to enable recursion detection to work. If
you need a different  delimiter for your specific data, you will have to
use the "show branch" form  of the function.

Joe Conway
2002-10-03 17:11:12 +00:00
Tom Lane
d3ebc1ae4a Fix portability bug in get_normal_pair (RAND_MAX != MAX_RANDOM_VALUE).
Also try to improve readability and performance.
2002-09-14 19:32:54 +00:00
Bruce Momjian
f490dbe594 > Now I'm testing connectby() in the /contrib/tablefunc in 7.3b1, which would
> be a useful function for many users.   However, I found the fact that
> if connectby_tree has the following data, connectby() tries to search the end
> of roots without knowing that the relations are infinite(-5-9-10-11-9-10-11-)
.
> I hope connectby() supports a check routine to find infinite relations.
>
>
> CREATE TABLE connectby_tree(keyid int, parent_keyid int);
> INSERT INTO connectby_tree VALUES(1,NULL);
> INSERT INTO connectby_tree VALUES(2,1);
> INSERT INTO connectby_tree VALUES(3,1);
> INSERT INTO connectby_tree VALUES(4,2);
> INSERT INTO connectby_tree VALUES(5,2);
> INSERT INTO connectby_tree VALUES(6,4);
> INSERT INTO connectby_tree VALUES(7,3);
> INSERT INTO connectby_tree VALUES(8,6);
> INSERT INTO connectby_tree VALUES(9,5);
>
> INSERT INTO connectby_tree VALUES(10,9);
> INSERT INTO connectby_tree VALUES(11,10);
> INSERT INTO connectby_tree VALUES(9,11);    <-- infinite
>

The attached patch fixes the infinite recursion bug in
contrib/tablefunc/tablefunc.c:connectby found by Masaru Sugawara.

test=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '2', 4, '~') AS t(keyid int, parent_keyid int, level
int, branch text);
  keyid | parent_keyid | level |   branch
-------+--------------+-------+-------------
      2 |              |     0 | 2
      4 |            2 |     1 | 2~4
      6 |            4 |     2 | 2~4~6
      8 |            6 |     3 | 2~4~6~8
      5 |            2 |     1 | 2~5
      9 |            5 |     2 | 2~5~9
     10 |            9 |     3 | 2~5~9~10
     11 |           10 |     4 | 2~5~9~10~11
(8 rows)

test=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '2', 5, '~') AS t(keyid int, parent_keyid int, level
int, branch text);
ERROR:  infinite recursion detected

I implemented it by checking the branch string for repeated keys
(whether or not the branch is returned). The performance hit was pretty
minimal -- about 1% for a moderately complex test case (220000 record
table, 9 level tree with 3800 members).

Joe Conway
2002-09-12 00:19:44 +00:00
Tom Lane
52c9d25933 Be careful to include postgres.h *before* any system headers, to ensure
that the right flavors of largefile-related definitions are seen.
Most of these changes are probably unnecessary, but better safe than
sorry.
2002-09-05 00:43:07 +00:00
Bruce Momjian
e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Bruce Momjian
6aa4482f2f Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for

the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.

Description:

   connectby(text relname, text keyid_fld, text parent_keyid_fld,
     text start_with, int max_depth [, text branch_delim])
   - returns keyid, parent_keyid, level, and an optional branch string
   - requires anonymous composite type syntax in the FROM clause. See
     the instructions in the documentation below.

Joe Conway
2002-09-02 05:44:05 +00:00
Tom Lane
c7a165adc6 Code review for HeapTupleHeader changes. Add version number to page headers
(overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask,
per earlier discussion.  Simplify scheme for overlaying fields in tuple
header (no need for cmax to live in more than one place).  Don't try to
clear infomask status bits in tqual.c --- not safe to do it there.  Don't
try to force output table of a SELECT INTO to have OIDs, either.  Get rid
of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which
has already caused one recent failure.  Improve documentation.
2002-09-02 01:05:06 +00:00
Tom Lane
e4186762ff Adjust nodeFunctionscan.c to reset transient memory context between calls
to the table function, thus preventing memory leakage accumulation across
calls.  This means that SRFs need to be careful to distinguish permanent
and local storage; adjust code and documentation accordingly.  Patch by
Joe Conway, very minor tweaks by Tom Lane.
2002-08-29 17:14:33 +00:00
Bruce Momjian
45e2544584 As discussed on several occasions previously, the new anonymous
composite type capability makes it possible to create a system view
based on a table function in a way that is hopefully palatable to
everyone. The attached patch takes advantage of this, moving
show_all_settings() from contrib/tablefunc into the backend (renamed
all_settings(). It is defined as a builtin returning type RECORD. During
initdb a system view is created to expose the same information presently
available through SHOW ALL. For example:

test=# select * from pg_settings where name like '%debug%';
          name          | setting
-----------------------+---------
  debug_assertions      | on
  debug_pretty_print    | off
  debug_print_parse     | off
  debug_print_plan      | off
  debug_print_query     | off
  debug_print_rewritten | off
  wal_debug             | 0
(7 rows)


Additionally during initdb two rules are created which make it possible
to change settings by updating the system view -- a "virtual table" as
Tom put it. Here's an example:

Joe Conway
2002-08-15 02:51:27 +00:00
Bruce Momjian
41f862ba87 As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.

show_all_settings()
   - returns the same information as SHOW ALL, but as a query result

normal_rand(int numvals, float8 mean, float8 stddev, int seed)
   - returns a set of normally distributed float8 values
   - This routine implements Algorithm P (Polar method for normal
     deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
     3rd ed., pages 122-126. Knuth cites his source as "The polar
     method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
     _Annals_Math,_Stat._ 29 (1958), 610-611.

crosstabN(text sql)
   - returns a set of row_name plus N category value columns
   - crosstab2(), crosstab3(), and crosstab4() are defined for you,
     but you can create additional crosstab functions per directions
     in the README.

Joe Conway
2002-07-30 16:31:11 +00:00