postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-12-27 08:39:28 +08:00

Author	SHA1	Message	Date
Bruce Momjian	64d0b8b05f	Attached is an update to contrib/tablefunc. It implements a new hashed version of crosstab. This fixes a major deficiency in real-world use of the original version. Easiest to undestand with an illustration: Data: ------------------------------------------------------------------- select * from cth; id \| rowid \| rowdt \| attribute \| val ----+-------+---------------------+----------------+--------------- 1 \| test1 \| 2003-03-01 00:00:00 \| temperature \| 42 2 \| test1 \| 2003-03-01 00:00:00 \| test_result \| PASS 3 \| test1 \| 2003-03-01 00:00:00 \| volts \| 2.6987 4 \| test2 \| 2003-03-02 00:00:00 \| temperature \| 53 5 \| test2 \| 2003-03-02 00:00:00 \| test_result \| FAIL 6 \| test2 \| 2003-03-02 00:00:00 \| test_startdate \| 01 March 2003 7 \| test2 \| 2003-03-02 00:00:00 \| volts \| 3.1234 (7 rows) Original crosstab: ------------------------------------------------------------------- SELECT * FROM crosstab( 'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4) AS c(rowid text, temperature text, test_result text, test_startdate text, volts text); rowid \| temperature \| test_result \| test_startdate \| volts -------+-------------+-------------+----------------+-------- test1 \| 42 \| PASS \| 2.6987 \| test2 \| 53 \| FAIL \| 01 March 2003 \| 3.1234 (2 rows) Hashed crosstab: ------------------------------------------------------------------- SELECT * FROM crosstab( 'SELECT rowid, attribute, val FROM cth ORDER BY 1', 'SELECT DISTINCT attribute FROM cth ORDER BY 1') AS c(rowid text, temperature int4, test_result text, test_startdate timestamp, volts float8); rowid \| temperature \| test_result \| test_startdate \| volts -------+-------------+-------------+---------------------+-------- test1 \| 42 \| PASS \| \| 2.6987 test2 \| 53 \| FAIL \| 2003-03-01 00:00:00 \| 3.1234 (2 rows) Notice that the original crosstab slides data over to the left in the result tuple when it encounters missing data. In order to work around this you have to be make your source sql do all sorts of contortions (cartesian join of distinct rowid with distinct attribute; left join that back to the real source data). The new version avoids this by building a hash table using a second distinct attribute query. The new version also allows for "extra" columns (see the README) and allows the result columns to be coerced into differing datatypes if they are suitable (as shown above). In testing a "real-world" data set (69 distinct rowid's, 27 distinct categories/attributes, multiple missing data points) I saw about a 5-fold improvement in execution time (from about 2200 ms old, to 440 ms new). I left the original version intact because: 1) BC, 2) it is probably slightly faster if you know that you have no missing attributes. README and regression test adjustments included. If there are no objections, please apply. Joe Conway	2003-03-20 06:46:30 +00:00
Tom Lane	e4704001ea	This patch fixes a bunch of spelling mistakes in comments throughout the PostgreSQL source code. Neil Conway	2003-03-10 22:28:22 +00:00
Tom Lane	aa60eecc37	Revise tuplestore and nodeMaterial so that we don't have to read the entire contents of the subplan into the tuplestore before we can return any tuples. Instead, the tuplestore holds what we've already read, and we fetch additional rows from the subplan as needed. Random access to the previously-read rows works with the tuplestore, and doesn't affect the state of the partially-read subplan. This is a step towards fixing the problems with cursors over complex queries --- we don't want to stick in Materialize nodes if they'll prevent quick startup for a cursor.	2003-03-09 02:19:13 +00:00
Tom Lane	1f1c332381	Remove inappropriate double-quoting in connectby() code; adjust regression test to avoid using VALUE as a name. From Joe Conway.	2002-11-23 01:54:09 +00:00
Bruce Momjian	e5cf1a8a26	SET autocommit no longer needed in /contrib because pg_regress.sh does it automatically now on regression session startup.	2002-10-21 01:42:14 +00:00
Bruce Momjian	aa4c702eac	Update /contrib for "autocommit TO 'on'". Create objects in public schema. Make spacing/capitalization consistent. Remove transaction block use for object creation. Remove unneeded function GRANTs.	2002-10-18 18:41:22 +00:00
Bruce Momjian	a62873d279	The attached adds a bit to the contrib/tablefunc regression test for behavior of connectby() in the presence of infinite recursion. Please apply this one in addition to the one sent earlier. Joe Conway	2002-10-03 17:15:36 +00:00
Bruce Momjian	620dddf88a	> The previous patch fixed an infinite recursion bug in > contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error > seems to occur even if a table has commonplace tree data(see below). > > I would think the patch, ancestor check, should be > > if (strstr(branch_delim \|\| branchstr->data \|\| branch_delim, > branch_delim \|\| current_key \|\| branch_delim)) > > This is my image, not a real code. However, if branchstr->data includes > branch_delim, my image will not be perfect. Good point. Thank you Masaru for the suggested fix. Attached is a patch to fix the bug found by Masaru. His example now produces: regression=# SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level int, branch text); keyid \| parent_keyid \| level \| branch -------+--------------+-------+---------- 11 \| \| 0 \| 11 10 \| 11 \| 1 \| 11-10 111 \| 11 \| 1 \| 11-111 1 \| 111 \| 2 \| 11-111-1 (4 rows) While making the patch I also realized that the "no show branch" form of the function was not going to work very well for recursion detection. Therefore there is now a default branch delimiter ('~') that is used internally, for that case, to enable recursion detection to work. If you need a different delimiter for your specific data, you will have to use the "show branch" form of the function. Joe Conway	2002-10-03 17:11:12 +00:00
Tom Lane	bd04184b11	Attached is a patch to fix some recently raised issues that exist in contrib/tablefunc. Specifically it replaces the use of VIEWs (for needed composite type creation) with use of CREATE TYPE. It also performs GRANT EXECUTE ON FUNCTION foo() TO PUBLIC for all of the created functions. There was also a cosmetic change to two regression files. Joe Conway	2002-09-14 19:53:59 +00:00
Tom Lane	d3ebc1ae4a	Fix portability bug in get_normal_pair (RAND_MAX != MAX_RANDOM_VALUE). Also try to improve readability and performance.	2002-09-14 19:32:54 +00:00
Bruce Momjian	f490dbe594	> Now I'm testing connectby() in the /contrib/tablefunc in 7.3b1, which would > be a useful function for many users. However, I found the fact that > if connectby_tree has the following data, connectby() tries to search the end > of roots without knowing that the relations are infinite(-5-9-10-11-9-10-11-) . > I hope connectby() supports a check routine to find infinite relations. > > > CREATE TABLE connectby_tree(keyid int, parent_keyid int); > INSERT INTO connectby_tree VALUES(1,NULL); > INSERT INTO connectby_tree VALUES(2,1); > INSERT INTO connectby_tree VALUES(3,1); > INSERT INTO connectby_tree VALUES(4,2); > INSERT INTO connectby_tree VALUES(5,2); > INSERT INTO connectby_tree VALUES(6,4); > INSERT INTO connectby_tree VALUES(7,3); > INSERT INTO connectby_tree VALUES(8,6); > INSERT INTO connectby_tree VALUES(9,5); > > INSERT INTO connectby_tree VALUES(10,9); > INSERT INTO connectby_tree VALUES(11,10); > INSERT INTO connectby_tree VALUES(9,11); <-- infinite > The attached patch fixes the infinite recursion bug in contrib/tablefunc/tablefunc.c:connectby found by Masaru Sugawara. test=# SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', '2', 4, '~') AS t(keyid int, parent_keyid int, level int, branch text); keyid \| parent_keyid \| level \| branch -------+--------------+-------+------------- 2 \| \| 0 \| 2 4 \| 2 \| 1 \| 2~4 6 \| 4 \| 2 \| 2~4~6 8 \| 6 \| 3 \| 2~4~6~8 5 \| 2 \| 1 \| 2~5 9 \| 5 \| 2 \| 2~5~9 10 \| 9 \| 3 \| 2~5~9~10 11 \| 10 \| 4 \| 2~5~9~10~11 (8 rows) test=# SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', '2', 5, '~') AS t(keyid int, parent_keyid int, level int, branch text); ERROR: infinite recursion detected I implemented it by checking the branch string for repeated keys (whether or not the branch is returned). The performance hit was pretty minimal -- about 1% for a moderately complex test case (220000 record table, 9 level tree with 3800 members). Joe Conway	2002-09-12 00:19:44 +00:00
Bruce Momjian	6fff9a7475	The attached removes the current non-standard file "contrib/tablefunc/tablefunc-test.sql", and adds a standard regression test suite to contrib/tablefunc. Joe Conway	2002-09-12 00:14:40 +00:00
Tom Lane	52c9d25933	Be careful to include postgres.h before any system headers, to ensure that the right flavors of largefile-related definitions are seen. Most of these changes are probably unnecessary, but better safe than sorry.	2002-09-05 00:43:07 +00:00
Bruce Momjian	e50f52a074	pgindent run.	2002-09-04 20:31:48 +00:00
Bruce Momjian	6aa4482f2f	Attached is an update to contrib/tablefunc. It introduces a new function, connectby(), which can serve as a reference implementation for the changes made in the last few days -- namely the ability of a function to return an entire tuplestore, and the ability of a function to make use of the query provided "expected" tuple description. Description: connectby(text relname, text keyid_fld, text parent_keyid_fld, text start_with, int max_depth [, text branch_delim]) - returns keyid, parent_keyid, level, and an optional branch string - requires anonymous composite type syntax in the FROM clause. See the instructions in the documentation below. Joe Conway	2002-09-02 05:44:05 +00:00
Tom Lane	c7a165adc6	Code review for HeapTupleHeader changes. Add version number to page headers (overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask, per earlier discussion. Simplify scheme for overlaying fields in tuple header (no need for cmax to live in more than one place). Don't try to clear infomask status bits in tqual.c --- not safe to do it there. Don't try to force output table of a SELECT INTO to have OIDs, either. Get rid of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which has already caused one recent failure. Improve documentation.	2002-09-02 01:05:06 +00:00
Tom Lane	e4186762ff	Adjust nodeFunctionscan.c to reset transient memory context between calls to the table function, thus preventing memory leakage accumulation across calls. This means that SRFs need to be careful to distinguish permanent and local storage; adjust code and documentation accordingly. Patch by Joe Conway, very minor tweaks by Tom Lane.	2002-08-29 17:14:33 +00:00
Bruce Momjian	45e2544584	As discussed on several occasions previously, the new anonymous composite type capability makes it possible to create a system view based on a table function in a way that is hopefully palatable to everyone. The attached patch takes advantage of this, moving show_all_settings() from contrib/tablefunc into the backend (renamed all_settings(). It is defined as a builtin returning type RECORD. During initdb a system view is created to expose the same information presently available through SHOW ALL. For example: test=# select * from pg_settings where name like '%debug%'; name \| setting -----------------------+--------- debug_assertions \| on debug_pretty_print \| off debug_print_parse \| off debug_print_plan \| off debug_print_query \| off debug_print_rewritten \| off wal_debug \| 0 (7 rows) Additionally during initdb two rules are created which make it possible to change settings by updating the system view -- a "virtual table" as Tom put it. Here's an example: Joe Conway	2002-08-15 02:51:27 +00:00
Bruce Momjian	41f862ba87	As mentioned above, here is my contrib/tablefunc patch. It includes three functions which exercise the tablefunc API. show_all_settings() - returns the same information as SHOW ALL, but as a query result normal_rand(int numvals, float8 mean, float8 stddev, int seed) - returns a set of normally distributed float8 values - This routine implements Algorithm P (Polar method for normal deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2, 3rd ed., pages 122-126. Knuth cites his source as "The polar method", G. E. P. Box, M. E. Muller, and G. Marsaglia, _Annals_Math,_Stat._ 29 (1958), 610-611. crosstabN(text sql) - returns a set of row_name plus N category value columns - crosstab2(), crosstab3(), and crosstab4() are defined for you, but you can create additional crosstab functions per directions in the README. Joe Conway	2002-07-30 16:31:11 +00:00

19 Commits