diff --git a/doc/TODO b/doc/TODO index d8bd5566f4..91609b717b 100644 --- a/doc/TODO +++ b/doc/TODO @@ -1,6 +1,6 @@ TODO list for PostgreSQL ======================== -Last updated: Sun Oct 10 16:40:40 EDT 1999 +Last updated: Sun Oct 17 21:06:50 EDT 1999 Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us) @@ -229,8 +229,8 @@ INDEXES a matching index [limit] * Improve LIMIT processing by using index to limit rows processed [limit] * Have optimizer take LIMIT into account when considering index scans [limit] -* Make index creation use psort code, because it is now faster(Vadim) -* Allow creation of sort temp tables > 1 Gig +* -Make index creation use psort code, because it is now faster(Vadim) +* -Allow creation of sort temp tables > 1 Gig * Create more system table indexes for faster cache lookups * fix indexscan() so it does leak memory by not requiring caller to free * Improve _bt_binsrch() to handle equal keys better, remove _bt_firsteq()(Tom) diff --git a/doc/TODO.detail/pglog b/doc/TODO.detail/pglog new file mode 100644 index 0000000000..1810a8911f --- /dev/null +++ b/doc/TODO.detail/pglog @@ -0,0 +1,2900 @@ +From aoki@postgres.Berkeley.EDU Sun Jun 22 19:31:06 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA19488 + for ; Sun, 22 Jun 1997 19:31:03 -0400 (EDT) +Received: from faerie.CS.Berkeley.EDU (faerie.CS.Berkeley.EDU [128.32.37.53]) by renoir.op.net ($ Revision: 1.12 $) with SMTP id TAA18795 for ; Sun, 22 Jun 1997 19:18:06 -0400 (EDT) +Received: from localhost.Berkeley.EDU (localhost.Berkeley.EDU [127.0.0.1]) by faerie.CS.Berkeley.EDU (8.6.10/8.6.3) with SMTP id QAA07816 for maillist@candle.pha.pa.us; Sun, 22 Jun 1997 16:16:44 -0700 +Message-Id: <199706222316.QAA07816@faerie.CS.Berkeley.EDU> +X-Authentication-Warning: faerie.CS.Berkeley.EDU: Host localhost.Berkeley.EDU didn't use HELO protocol +From: aoki@CS.Berkeley.EDU (Paul M. Aoki) +To: Bruce Momjian +Subject: Re: PostgreSQL psort() function performance +Reply-To: aoki@CS.Berkeley.EDU (Paul M. Aoki) +In-reply-to: Your message of Sun, 22 Jun 1997 09:45:31 -0400 (EDT) + <199706221345.JAA11476@candle.pha.pa.us> +Date: Sun, 22 Jun 97 16:16:43 -0700 +Sender: aoki@postgres.Berkeley.EDU +X-Mts: smtp +Status: OR + +the mariposa distribution (http://mariposa.cs.berkeley.edu/) contains +some hacks to nodeSort.c and psort.c that + - make psort read directly from the executor node below it + (instead of an input relation) + - makes the Sort node read directly from the last set of psort runs + (instead of an output relation) +speeds things up quite a bit. kind of ruins psort for other purposes, +though (which is why nbtsort.c exists). + +i'd merge these in first and see how far that gets you. +-- + Paul M. Aoki | University of California at Berkeley + aoki@CS.Berkeley.EDU | Dept. of EECS, Computer Science Division #1776 + | Berkeley, CA 94720-1776 + +From owner-pgsql-hackers@hub.org Mon Nov 3 09:31:04 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA01676 + for ; Mon, 3 Nov 1997 09:31:02 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA07345 for ; Mon, 3 Nov 1997 09:13:20 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id IAA13315; Mon, 3 Nov 1997 08:50:26 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 08:48:07 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id IAA11722 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 08:48:02 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id IAA11539 for ; Mon, 3 Nov 1997 08:47:34 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id UAA19066; Mon, 3 Nov 1997 20:48:04 +0700 (KRS) +Message-ID: <345DD614.345BF651@sable.krasnoyarsk.su> +Date: Mon, 03 Nov 1997 20:48:04 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Marc Howard Zuckman +CC: Bruce Momjian , hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Marc Howard Zuckman wrote: +> +> On Mon, 3 Nov 1997, Bruce Momjian wrote: +> +> > With fsync off, I just did an insert of 1000 integers into a table +> > containing a single int4 column and no indexes, and it completed in 2.3 +> > seconds. This is on the new source tree.. That is 434 inserts/second. +> > Pretty major performance, or 2.3 ms/insert. This is on a idle PP200 +> > with UltraSCSI drives. +> > +> > With fsync on, the time goes to 51 seconds. Wow, big difference. +> +> If better alternative error recovery methods were available, perhaps +> a facility to replay an interval transactions log from a prior dump, +> it would be reasonable to run the backend without fsync and +> take advantage of the performance gains. + +??? + +> +> I don't know the answer, but I suspect that the commercial databases +> don't "fsync" the way pgsql does. + +Could someone try 1000 int4 inserts using postgres and +some commercial database (on the same machine) ? + +Vadim + + +From owner-pgsql-hackers@hub.org Mon Nov 3 09:01:02 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA01183 + for ; Mon, 3 Nov 1997 09:01:00 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id IAA06632 for ; Mon, 3 Nov 1997 08:51:58 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id IAA05964; Mon, 3 Nov 1997 08:39:39 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 08:37:32 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id IAA04729 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 08:37:26 -0500 (EST) +Received: from fallon.classyad.com (root@classyad.com [152.160.43.1]) by hub.org (8.8.5/8.7.5) with ESMTP id IAA04614 for ; Mon, 3 Nov 1997 08:37:16 -0500 (EST) +Received: from fallon.classyad.com (marc@fallon.classyad.com [152.160.43.1]) by fallon.classyad.com (8.8.5/8.7.3) with SMTP id JAA22108; Mon, 3 Nov 1997 09:11:09 -0500 +Date: Mon, 3 Nov 1997 09:11:09 -0500 (EST) +From: Marc Howard Zuckman +To: Bruce Momjian +cc: "Vadim B. Mikheev" , hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +In-Reply-To: <199711030513.AAA23474@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +On Mon, 3 Nov 1997, Bruce Momjian wrote: + +> > +> > Removed... +> > +> > Also, ItemPointerData t_chain (6 bytes) removed from HeapTupleHeader. +> > CommandId is uint32 now (up to the 2^32 - 1 commands per transaction). +> > DOUBLEALIGN(Sizeof(HeapTupleHeader)) is 40 bytes now. +> > +> > 1000 inserts (into table with single int4 column, 1 insert per transaction) +> > takes 70 - 80 sec now (12.5 - 14 transactions/sec). +> > This is hardware/OS limitation: +> > +> > fd = open ("t", O_RDWR); +> > for (i = 1; i <= 1000; i++) +> > { +> > lseek(fd, 0, SEEK_END); +> > write(fd, buf, 56); +> > fsync(fd); +> > } +> > close (fd); +> > +> > takes 33 - 39 sec and so it's not possible to be faster +> > having 2 fsync-s per transaction. +> > +> > The same test on 6.2.1: 92 - 107 sec +> +> With fsync off, I just did an insert of 1000 integers into a table +> containing a single int4 column and no indexes, and it completed in 2.3 +> seconds. This is on the new source tree.. That is 434 inserts/second. +> Pretty major performance, or 2.3 ms/insert. This is on a idle PP200 +> with UltraSCSI drives. +> +> With fsync on, the time goes to 51 seconds. Wow, big difference. + +If better alternative error recovery methods were available, perhaps +a facility to replay an interval transactions log from a prior dump, +it would be reasonable to run the backend without fsync and +take advantage of the performance gains. + +I don't know the answer, but I suspect that the commercial databases +don't "fsync" the way pgsql does. + +Marc Zuckman +marc@fallon.classyad.com + +_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ +_ Visit The Home and Condo MarketPlace _ +_ http://www.ClassyAd.com _ +_ _ +_ FREE basic property listings/advertisements and searches. _ +_ _ +_ Try our premium, yet inexpensive services for a real _ +_ selling or buying edge! _ +_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ + + + +From owner-pgsql-hackers@hub.org Mon Nov 3 11:31:03 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA04080 + for ; Mon, 3 Nov 1997 11:31:00 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA13680 for ; Mon, 3 Nov 1997 11:21:30 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id LAA07566; Mon, 3 Nov 1997 11:04:52 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 11:02:59 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id LAA07372 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 11:02:52 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id LAA07196 for ; Mon, 3 Nov 1997 11:02:22 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id KAA02525; + Mon, 3 Nov 1997 10:42:03 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711031542.KAA02525@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 3 Nov 1997 10:42:03 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <345DD614.345BF651@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 08:48:04 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> > I don't know the answer, but I suspect that the commercial databases +> > don't "fsync" the way pgsql does. +> +> Could someone try 1000 int4 inserts using postgres and +> some commercial database (on the same machine) ? + +I have been thinking about this since seeing the performance change +with/without fsync. + +Commerical databases usually do a log write every 5 or 15 minutes, and +guarantee the logs will contain everything up to this time interval. + +Couldn't we have some such mechanism? Usually they have raw space, so +they can control when the data is hitting the disk. Using a file +system, some of it may be getting to the disk without our knowing it. + +What exactly is a scenario where lack of doing explicit fsync's will +cause data corruption, rather than just lost data from the past few +minutes? + +I think Vadim has gotten fsync's down to fsync'ing the modified data +page, and pg_log. + +Let's suppose we did not fsync. There could be cases where pg_log was +fsync'ed by the OS, and some of the modified data pages are fyncs'ed by +the OS, but not others. This would leave us with a partial transaction. + +However, let's suppose we prevent pg_log from being fsync'ed somehow. +Then, because we have a no-overwrite database, we could keep control of +this, and write of some data pages, but not others would not cause us +problems because the pg_log would show all such transactions, which had +not had all their modified data pages fsync'ed, as non-committed. + +Perhaps we can even set a flag in pg_log every five minutes to indicate +whether all buffers for the page have been flushed? That way we could +not have to worry about preventing flushing of pg_log. + +Comments? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Mon Nov 3 12:00:42 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA04456 + for ; Mon, 3 Nov 1997 12:00:40 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id LAA26054; Mon, 3 Nov 1997 11:46:49 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 11:46:33 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id LAA25932 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 11:46:30 -0500 (EST) +Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by hub.org (8.8.5/8.7.5) with SMTP id LAA25750 for ; Mon, 3 Nov 1997 11:45:53 -0500 (EST) +Received: by orion.SAPserv.Hamburg.dsh.de + (Linux Smail3.1.29.1 #1)} + id m0xSPfE-000BGZC; Mon, 3 Nov 97 17:47 MET +Message-Id: +From: wieck@sapserv.debis.de +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: maillist@candle.pha.pa.us (Bruce Momjian) +Date: Mon, 3 Nov 1997 17:47:43 +0100 (MET) +Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com, + hackers@postgreSQL.org +Reply-To: wieck@sapserv.debis.de (Jan Wieck) +In-Reply-To: <199711031542.KAA02525@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:42:03 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> > > I don't know the answer, but I suspect that the commercial databases +> > > don't "fsync" the way pgsql does. +> > +> > Could someone try 1000 int4 inserts using postgres and +> > some commercial database (on the same machine) ? +> +> I have been thinking about this since seeing the performance change +> with/without fsync. +> +> Commerical databases usually do a log write every 5 or 15 minutes, and +> guarantee the logs will contain everything up to this time interval. +> + + Without fsync PostgreSQL would only loose data if the OS + crashes between the last write operation of a backend and the + next regular update sync. This is seldom but if it happens it + really hurts. + + A database can omit fsync on data files (e.g. tablespaces) if + it writes a redo log. With that redo log, a backup can be + restored and than all transactions since the backup redone. + + PostgreSQL doesn't write such a redo log. So an OS crash + after the fsync of pg_log could corrupt the database without + a chance to recover. + + Isn't it time to get an (optional) redo log. I don't exactly + know all the places where our datafiles can get modified, but + I hope this is only done in the heap access methods and + vacuum. So these are the places from where the redo log data + comes from (plus transaction commit/rollback). + + +Until later, Jan + +-- +#define OPINIONS "they are all mine - not those of debis or daimler-benz" + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#================================== wieck@sapserv.debis.de (Jan Wieck) # + + + + +From owner-pgsql-hackers@hub.org Mon Nov 3 14:01:06 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA06775 + for ; Mon, 3 Nov 1997 14:01:04 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA22235 for ; Mon, 3 Nov 1997 13:43:15 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id NAA11482; Mon, 3 Nov 1997 13:32:40 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 13:32:02 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id NAA11204 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 13:31:58 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id NAA11119 for ; Mon, 3 Nov 1997 13:31:44 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id MAA05464; + Mon, 3 Nov 1997 12:59:01 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711031759.MAA05464@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: wieck@sapserv.debis.de +Date: Mon, 3 Nov 1997 12:59:01 -0500 (EST) +Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com, + hackers@postgreSQL.org +In-Reply-To: from "wieck@sapserv.debis.de" at Nov 3, 97 05:47:43 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> > +> > > > I don't know the answer, but I suspect that the commercial databases +> > > > don't "fsync" the way pgsql does. +> > > +> > > Could someone try 1000 int4 inserts using postgres and +> > > some commercial database (on the same machine) ? +> > +> > I have been thinking about this since seeing the performance change +> > with/without fsync. +> > +> > Commerical databases usually do a log write every 5 or 15 minutes, and +> > guarantee the logs will contain everything up to this time interval. +> > +> +> Without fsync PostgreSQL would only loose data if the OS +> crashes between the last write operation of a backend and the +> next regular update sync. This is seldom but if it happens it +> really hurts. +> +> A database can omit fsync on data files (e.g. tablespaces) if +> it writes a redo log. With that redo log, a backup can be +> restored and than all transactions since the backup redone. +> +> PostgreSQL doesn't write such a redo log. So an OS crash +> after the fsync of pg_log could corrupt the database without +> a chance to recover. +> +> Isn't it time to get an (optional) redo log. I don't exactly +> know all the places where our datafiles can get modified, but +> I hope this is only done in the heap access methods and +> vacuum. So these are the places from where the redo log data +> comes from (plus transaction commit/rollback). +> + +Yes, but because we are a non-over-write database, I don't see why we +can't just do this without a redo log. + +Every five minutes, we fsync() all dirty pages, mark all completed +transactions as fsync'ed in pg_log, and fsync() pg_log. + +On postmaster startup, any transaction marked as completed, but not +marked as fsync'ed gets marked as aborted. + +Of course, all vacuum operations would have to be fsync'ed. + +Comments? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Mon Nov 3 16:46:01 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA10292 + for ; Mon, 3 Nov 1997 16:45:59 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id QAA02040 for ; Mon, 3 Nov 1997 16:42:40 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA17422; Mon, 3 Nov 1997 16:34:28 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 16:34:10 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA17210 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 16:34:06 -0500 (EST) +Received: from fallon.classyad.com (root@classyad.com [152.160.43.1]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA16690 for ; Mon, 3 Nov 1997 16:33:27 -0500 (EST) +Received: from fallon.classyad.com (marc@fallon.classyad.com [152.160.43.1]) by fallon.classyad.com (8.8.5/8.7.3) with SMTP id RAA32498; Mon, 3 Nov 1997 17:33:42 -0500 +Date: Mon, 3 Nov 1997 17:33:42 -0500 (EST) +From: Marc Howard Zuckman +To: Bruce Momjian +cc: wieck@sapserv.debis.de, vadim@sable.krasnoyarsk.su, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +In-Reply-To: <199711031759.MAA05464@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +On Mon, 3 Nov 1997, Bruce Momjian wrote: + +> > +> > > +> > > > > I don't know the answer, but I suspect that the commercial databases +> > > > > don't "fsync" the way pgsql does. +> > > > +> > > > Could someone try 1000 int4 inserts using postgres and +> > > > some commercial database (on the same machine) ? +> > > +> > > I have been thinking about this since seeing the performance change +> > > with/without fsync. +> > > +> > > Commerical databases usually do a log write every 5 or 15 minutes, and +> > > guarantee the logs will contain everything up to this time interval. +> > > +> > +> > Without fsync PostgreSQL would only loose data if the OS +> > crashes between the last write operation of a backend and the +> > next regular update sync. This is seldom but if it happens it +> > really hurts. +> > +> > A database can omit fsync on data files (e.g. tablespaces) if +> > it writes a redo log. With that redo log, a backup can be +> > restored and than all transactions since the backup redone. +> > +> > PostgreSQL doesn't write such a redo log. So an OS crash +> > after the fsync of pg_log could corrupt the database without +> > a chance to recover. +> > +> > Isn't it time to get an (optional) redo log. I don't exactly +> > know all the places where our datafiles can get modified, but +> > I hope this is only done in the heap access methods and +> > vacuum. So these are the places from where the redo log data +> > comes from (plus transaction commit/rollback). +> > +> +> Yes, but because we are a non-over-write database, I don't see why we +> can't just do this without a redo log. + +Because if the hard drive is the reason for the failure (instead of +power out, OS bites dust, etc), the database won't be of much help. + +The redo log should be on a device different than the database. + +Marc Zuckman +marc@fallon.classyad.com + +_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ +_ Visit The Home and Condo MarketPlace _ +_ http://www.ClassyAd.com _ +_ _ +_ FREE basic property listings/advertisements and searches. _ +_ _ +_ Try our premium, yet inexpensive services for a real _ +_ selling or buying edge! _ +_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ + + + +From maillist Mon Nov 3 22:59:31 1997 +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id WAA16264; + Mon, 3 Nov 1997 22:59:31 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711040359.WAA16264@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: maillist@candle.pha.pa.us (Bruce Momjian) +Date: Mon, 3 Nov 1997 22:59:30 -0500 (EST) +Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com, + hackers@postgreSQL.org +In-Reply-To: <199711031542.KAA02525@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:42:03 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Status: OR + +> +> > > I don't know the answer, but I suspect that the commercial databases +> > > don't "fsync" the way pgsql does. +> > +> > Could someone try 1000 int4 inserts using postgres and +> > some commercial database (on the same machine) ? +> +> I have been thinking about this since seeing the performance change +> with/without fsync. +> +> Commercial databases usually do a log write every 5 or 15 minutes, and +> guarantee the logs will contain everything up to this time interval. +> +> Couldn't we have some such mechanism? Usually they have raw space, so +> they can control when the data is hitting the disk. Using a file +> system, some of it may be getting to the disk without our knowing it. +> +> What exactly is a scenario where lack of doing explicit fsync's will +> cause data corruption, rather than just lost data from the past few +> minutes? +> +> I think Vadim has gotten fsync's down to fsync'ing the modified data +> page, and pg_log. +> +> Let's suppose we did not fsync. There could be cases where pg_log was +> fsync'ed by the OS, and some of the modified data pages are fyncs'ed by +> the OS, but not others. This would leave us with a partial transaction. +> +> However, let's suppose we prevent pg_log from being fsync'ed somehow. +> Then, because we have a no-overwrite database, we could keep control of +> this, and write of some data pages, but not others would not cause us +> problems because the pg_log would show all such transactions, which had +> not had all their modified data pages fsync'ed, as non-committed. +> +> Perhaps we can even set a flag in pg_log every five minutes to indicate +> whether all buffers for the page have been flushed? That way we could +> not have to worry about preventing flushing of pg_log. +> +> Comments? + +OK, here is a more formal description of what I am suggesting. It will +give us commercial dbms reliability with no-fsync performance. +Commercial dbms's usually only give restore up to 5 minutes before the +crash, and this is what I am suggesting. If we can do this, we can +remove the no-fsync option. + +First, lets suppose there exists a shared queue that is visible to all +backends and the postmaster that allows transaction id's to be added to +the queue. We also add a bit to the pg_log record called 'been_synced' +that is initially false. + +OK, once a backend starts a transaction, it puts a transaction id in +pg_log. Once the transaction is finished, it is marked as committed. +At the same time, we now put the transaction id on the shared queue. + +Every five minutes, or as defined by the administrator, the postmaster +does a sync() call. On my OS, anyone use can call sync, and I think +this is typical. update/pagecleaner does this every 30 seconds anyway, +so it is no big deal for the postmaster to call it every 5 minutes. The +nice thing about this is that the OS does the syncing of all the dirty +pages for us. (An alarm() call can set up this 5 minute timing.) + +The postmaster then locks the shared transaction id queue, makes a copy +of the entries in the queue, clears the queue, and unlocks the queue. +It does this so no one else modifies the queue while it is being +cleared. + +The postmaster then goes through pg_log, and marks each transaction as +'been_synced'. + +The postmaster also performs this on shutdown. + +On postmaster startup, all transactions are checked and any transaction +that is marked as committed but not 'been_synced' is marked as not +committed. In this way, we prevent non-synced or partially synced +transactions from being used. + +Of course, vacuum would have to do normal fsyncs because it is removing +the transaction log. + +We need the shared transaction id queue because there is no way to find +the newly committed transactions since the last sync. A transaction +can last for hours. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + +From owner-pgsql-hackers@hub.org Tue Nov 4 02:13:08 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA17544 + for ; Tue, 4 Nov 1997 02:13:06 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA14126; Tue, 4 Nov 1997 02:07:55 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 02:04:59 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA12859 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 02:04:51 -0500 (EST) +Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by hub.org (8.8.5/8.7.5) with SMTP id CAA12625 for ; Tue, 4 Nov 1997 02:04:12 -0500 (EST) +Received: by orion.SAPserv.Hamburg.dsh.de + (Linux Smail3.1.29.1 #1)} + id m0xSd44-000BFQC; Tue, 4 Nov 97 08:06 MET +Message-Id: +From: wieck@sapserv.debis.de +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: maillist@candle.pha.pa.us (Bruce Momjian) +Date: Tue, 4 Nov 1997 08:06:16 +0100 (MET) +Cc: maillist@candle.pha.pa.us, vadim@sable.krasnoyarsk.su, + marc@fallon.classyad.com, hackers@postgreSQL.org +Reply-To: wieck@sapserv.debis.de (Jan Wieck) +In-Reply-To: <199711040359.WAA16264@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:59:30 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> OK, here is a more formal description of what I am suggesting. It will +> give us commercial dbms reliability with no-fsync performance. +> Commercial dbms's usually only give restore up to 5 minutes before the +> crash, and this is what I am suggesting. If we can do this, we can +> remove the no-fsync option. + + I'm not 100% sure but as far as I know Oracle, it can recover + up to the last committed transaction using the online redo + logs. And even if commercial dbms's aren't able to do that, + it should be our target. + +> [description about transaction queue] + + This all depends on the fact that PostgreSQL is a no + overwrite dbms. Otherwise the space of deleted tuples might + get overwritten by later transactions and the information is + finally lost. + + Another issue: All we up to now though of are crashes where + the database files are still usable after restart. But take + the simple case of a write error. A new bad block or track + will get remapped (in some way) but the data in it is lost. + So we end up with one or more totally corrupted database + files. And I don't trust mirrored disks farer than I can + throw them. A bug in the OS or a memory failure (many new + PeeCee boards don't support parity and even with parity a two + bit failure is still the wrong data but with a valid parity + bit) can also currupt the data. + + I still prefer redo logs. They should reside on a different + disk and the possibility of loosing the database files along + with the redo log is very small. + + +Until later, Jan + +-- +#define OPINIONS "they are all mine - not those of debis or daimler-benz" + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#================================== wieck@sapserv.debis.de (Jan Wieck) # + + + + +From vadim@sable.krasnoyarsk.su Tue Nov 4 04:12:50 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA18487 + for ; Tue, 4 Nov 1997 04:12:48 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA03152 for ; Tue, 4 Nov 1997 04:12:06 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA20591; Tue, 4 Nov 1997 16:14:06 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345EE75D.398A68D@sable.krasnoyarsk.su> +Date: Tue, 04 Nov 1997 16:14:05 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711040359.WAA16264@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> OK, here is a more formal description of what I am suggesting. It will +> give us commercial dbms reliability with no-fsync performance. +> Commercial dbms's usually only give restore up to 5 minutes before the + ^^^^^^^^^^^^^^^^^^^^^^^ +I'm sure that this is not true! +If on-line redo_file is damaged then you have +single ability: restore your last backup. +In all other cases database will be recovered up to the last +committed transaction automatically! + +DBMS-s using WAL have to fsync only redo file on commit +(and they do it!), non-overwriting systems have to +fsync data files and transaction log. + +We could optimize fsync-s for multi-user environment: do not +fsync when we're ensured that our changes flushed to disk by +another backend. + +> crash, and this is what I am suggesting. If we can do this, we can +> remove the no-fsync option. +> +... +> +> On postmaster startup, all transactions are checked and any transaction +> that is marked as committed but not 'been_synced' is marked as not +> committed. In this way, we prevent non-synced or partially synced +> transactions from being used. + +And what should users (ensured that their transaction are +committed) do in this case ? + +Vadim + +From owner-pgsql-hackers@hub.org Tue Nov 4 04:21:04 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA18536 + for ; Tue, 4 Nov 1997 04:21:01 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA15551; Tue, 4 Nov 1997 04:15:15 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 04:14:23 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id EAA14464 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 04:14:18 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id EAA13437 for ; Tue, 4 Nov 1997 04:13:33 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA20591; Tue, 4 Nov 1997 16:14:06 +0700 (KRS) +Message-ID: <345EE75D.398A68D@sable.krasnoyarsk.su> +Date: Tue, 04 Nov 1997 16:14:05 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711040359.WAA16264@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> OK, here is a more formal description of what I am suggesting. It will +> give us commercial dbms reliability with no-fsync performance. +> Commercial dbms's usually only give restore up to 5 minutes before the + ^^^^^^^^^^^^^^^^^^^^^^^ +I'm sure that this is not true! +If on-line redo_file is damaged then you have +single ability: restore your last backup. +In all other cases database will be recovered up to the last +committed transaction automatically! + +DBMS-s using WAL have to fsync only redo file on commit +(and they do it!), non-overwriting systems have to +fsync data files and transaction log. + +We could optimize fsync-s for multi-user environment: do not +fsync when we're ensured that our changes flushed to disk by +another backend. + +> crash, and this is what I am suggesting. If we can do this, we can +> remove the no-fsync option. +> +... +> +> On postmaster startup, all transactions are checked and any transaction +> that is marked as committed but not 'been_synced' is marked as not +> committed. In this way, we prevent non-synced or partially synced +> transactions from being used. + +And what should users (ensured that their transaction are +committed) do in this case ? + +Vadim + + +From owner-pgsql-hackers@hub.org Tue Nov 4 06:43:00 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA19743 + for ; Tue, 4 Nov 1997 06:42:57 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id GAA10352; Tue, 4 Nov 1997 06:36:08 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 06:35:42 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id GAA10158 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 06:35:37 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id GAA10096 for ; Tue, 4 Nov 1997 06:35:27 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id GAA19665; + Tue, 4 Nov 1997 06:35:10 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711041135.GAA19665@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: wieck@sapserv.debis.de +Date: Tue, 4 Nov 1997 06:35:10 -0500 (EST) +Cc: hackers@postgreSQL.org (PostgreSQL-development) +In-Reply-To: from "wieck@sapserv.debis.de" at Nov 4, 97 08:06:16 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> > OK, here is a more formal description of what I am suggesting. It will +> > give us commercial dbms reliability with no-fsync performance. +> > Commercial dbms's usually only give restore up to 5 minutes before the +> > crash, and this is what I am suggesting. If we can do this, we can +> > remove the no-fsync option. +> +> I'm not 100% sure but as far as I know Oracle, it can recover +> up to the last committed transaction using the online redo +> logs. And even if commercial dbms's aren't able to do that, +> it should be our target. +> +> > [description about transaction queue] +> +> This all depends on the fact that PostgreSQL is a no +> overwrite dbms. Otherwise the space of deleted tuples might +> get overwritten by later transactions and the information is +> finally lost. +> +> Another issue: All we up to now though of are crashes where +> the database files are still usable after restart. But take +> the simple case of a write error. A new bad block or track +> will get remapped (in some way) but the data in it is lost. +> So we end up with one or more totally corrupted database +> files. And I don't trust mirrored disks farer than I can +> throw them. A bug in the OS or a memory failure (many new +> PeeCee boards don't support parity and even with parity a two +> bit failure is still the wrong data but with a valid parity +> bit) can also currupt the data. +> +> I still prefer redo logs. They should reside on a different +> disk and the possibility of loosing the database files along +> with the redo log is very small. + +I have been thinking about re-do logs, and I think it is a good idea. +It would not be hard to have the queries spit out to a separate file +configurable by the user. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Tue Nov 4 07:31:01 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22051 + for ; Tue, 4 Nov 1997 07:30:59 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA07444 for ; Tue, 4 Nov 1997 07:25:14 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id HAA08818; Tue, 4 Nov 1997 07:03:30 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 07:02:44 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id HAA08418 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 07:02:29 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA08331 for ; Tue, 4 Nov 1997 07:02:07 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id GAA21484; + Tue, 4 Nov 1997 06:50:24 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711041150.GAA21484@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Tue, 4 Nov 1997 06:50:24 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <345EE75D.398A68D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 04:14:05 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > OK, here is a more formal description of what I am suggesting. It will +> > give us commercial dbms reliability with no-fsync performance. +> > Commercial dbms's usually only give restore up to 5 minutes before the +> ^^^^^^^^^^^^^^^^^^^^^^^ +> I'm sure that this is not true! + +You may be right. This five minute figure is when you restore from your +previous backup, then restore from the log file. + +Can't we do something like sync every 5 seconds, rather than after every +transaction? It just seems like such overkill. + +Actually, I found a problem with my description. Because pg_log is not +fsync'ed, after a crash, pages with new transactions could have been +flushed to disk, but not the pg_log table that contains the transaction +ids. The problem is that the new backend could assign a transaction id +that is already in use. + +We could set a flag upon successful shutdown, and if it is not set on +reboot, either do a vacuum to find the max transaction id, and +invalidate all them not in pg_log as synced, or increase the next +transaction id to some huge number and invalidate all them in between. + + +> If on-line redo_file is damaged then you have +> single ability: restore your last backup. +> In all other cases database will be recovered up to the last +> committed transaction automatically! +> +> DBMS-s using WAL have to fsync only redo file on commit +> (and they do it!), non-overwriting systems have to +> fsync data files and transaction log. +> +> We could optimize fsync-s for multi-user environment: do not +> fsync when we're ensured that our changes flushed to disk by +> another backend. +> +> > crash, and this is what I am suggesting. If we can do this, we can +> > remove the no-fsync option. +> > +> ... +> > +> > On postmaster startup, all transactions are checked and any transaction +> > that is marked as committed but not 'been_synced' is marked as not +> > committed. In this way, we prevent non-synced or partially synced +> > transactions from being used. +> +> And what should users (ensured that their transaction are +> committed) do in this case ? +> +> Vadim +> +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From wieck@sapserv.debis.de Tue Nov 4 07:01:00 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA21697 + for ; Tue, 4 Nov 1997 07:00:58 -0500 (EST) +From: wieck@sapserv.debis.de +Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by renoir.op.net (o1/$ Revision: 1.14 $) with SMTP id GAA06401 for ; Tue, 4 Nov 1997 06:48:25 -0500 (EST) +Received: by orion.SAPserv.Hamburg.dsh.de + (Linux Smail3.1.29.1 #1)} + id m0xShVQ-000BGZC; Tue, 4 Nov 97 12:50 MET +Message-Id: +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: maillist@candle.pha.pa.us (Bruce Momjian) +Date: Tue, 4 Nov 1997 12:50:45 +0100 (MET) +Cc: wieck@sapserv.debis.de, hackers@postgreSQL.org +Reply-To: wieck@sapserv.debis.de (Jan Wieck) +In-Reply-To: <199711041135.GAA19665@candle.pha.pa.us> from "Bruce Momjian" at Nov 4, 97 06:35:10 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +Status: OR + + +Bruce Momjian wrote: +> I have been thinking about re-do logs, and I think it is a good idea. +> It would not be hard to have the queries spit out to a separate file +> configurable by the user. + + This way the recovery process will be very complicated. When + multiple backends run concurrently, there are multiple + transactions active at the same time. And what tuples are + affected by an update e.g. depends much on the timing. + + I had something different in mind. The redo log contains the + information from the executor (e.g. the transactionId, the + tupleId and the new tuple values when calling ExecReplace()) + and the information which transactions commit and which not. + When recovering, those operations where the transactions + committed are again passed to the executors functions that do + the real updates with the values from the logfile. + + +Until later, Jan + +-- +#define OPINIONS "they are all mine - not those of debis or daimler-benz" + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#================================== wieck@sapserv.debis.de (Jan Wieck) # + + + +From owner-pgsql-hackers@hub.org Tue Nov 4 07:30:59 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22048 + for ; Tue, 4 Nov 1997 07:30:57 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA07189 for ; Tue, 4 Nov 1997 07:18:02 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id HAA08856; Tue, 4 Nov 1997 07:03:37 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 07:03:03 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id HAA08487 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 07:02:46 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA08192 for ; Tue, 4 Nov 1997 07:02:02 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id HAA21653; + Tue, 4 Nov 1997 07:00:20 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711041200.HAA21653@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!u +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Tue, 4 Nov 1997 07:00:19 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <345EE75D.398A68D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 04:14:05 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > OK, here is a more formal description of what I am suggesting. It will +> > give us commercial dbms reliability with no-fsync performance. +> > Commercial dbms's usually only give restore up to 5 minutes before the +> ^^^^^^^^^^^^^^^^^^^^^^^ +> I'm sure that this is not true! +> If on-line redo_file is damaged then you have +> single ability: restore your last backup. +> In all other cases database will be recovered up to the last +> committed transaction automatically! + +I doubt commercial dbms's sync to disk after every transaction. They +pick a time, maybe five seconds, and see all dirty pages get flushed by +then. + +What they do do is to make certain that you are restored to a consistent +state, perhaps 15 seconds ago. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Tue Nov 4 07:32:45 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22066 + for ; Tue, 4 Nov 1997 07:32:35 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id TAA20889; Tue, 4 Nov 1997 19:35:12 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345F1680.60E33853@sable.krasnoyarsk.su> +Date: Tue, 04 Nov 1997 19:35:12 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Jan Wieck +CC: Bruce Momjian , marc@fallon.classyad.com, + hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +wieck@sapserv.debis.de wrote: +> +> I still prefer redo logs. They should reside on a different +> disk and the possibility of loosing the database files along +> with the redo log is very small. + +Agreed. This way we could don't fsync data files and +fsync both redo and pg_log. This is much faster. + +Vadim + +From vadim@sable.krasnoyarsk.su Tue Nov 4 08:00:58 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA22371 + for ; Tue, 4 Nov 1997 08:00:56 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA08540 for ; Tue, 4 Nov 1997 07:57:25 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id TAA20935; Tue, 4 Nov 1997 19:59:46 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345F1C42.1F1A7590@sable.krasnoyarsk.su> +Date: Tue, 04 Nov 1997 19:59:46 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Jan Wieck +CC: Bruce Momjian , hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +wieck@sapserv.debis.de wrote: +> +> Bruce Momjian wrote: +> > I have been thinking about re-do logs, and I think it is a good idea. +> > It would not be hard to have the queries spit out to a separate file +> > configurable by the user. +> +> This way the recovery process will be very complicated. When +> multiple backends run concurrently, there are multiple +> transactions active at the same time. And what tuples are +> affected by an update e.g. depends much on the timing. +> +> I had something different in mind. The redo log contains the +> information from the executor (e.g. the transactionId, the +> tupleId and the new tuple values when calling ExecReplace()) +> and the information which transactions commit and which not. +> When recovering, those operations where the transactions +> committed are again passed to the executors functions that do +> the real updates with the values from the logfile. + +It seems that this is what Oracle does, but Sybase writes queries +(with transaction ids, of 'course, and before execution) and +begin, commit/abort events <-- this is better for non-overwriting +system (shorter redo file), but, agreed, recovering is more complicated. + +Vadim + +From owner-pgsql-hackers@hub.org Tue Nov 4 22:35:45 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA05060 + for ; Tue, 4 Nov 1997 22:35:43 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA26725 for ; Tue, 4 Nov 1997 22:35:10 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA27875; Tue, 4 Nov 1997 22:23:14 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 22:20:55 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA24162 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 22:20:50 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA22727 for ; Tue, 4 Nov 1997 22:20:18 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id WAA04674; + Tue, 4 Nov 1997 22:17:52 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711050317.WAA04674@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Tue, 4 Nov 1997 22:17:52 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <345F14E7.28CC1042@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 07:28:23 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > > +> > > Bruce Momjian wrote: +> > > > +> > > > OK, here is a more formal description of what I am suggesting. It will +> > > > give us commercial dbms reliability with no-fsync performance. +> > > > Commercial dbms's usually only give restore up to 5 minutes before the +> > > ^^^^^^^^^^^^^^^^^^^^^^^ +> > > I'm sure that this is not true! +> > +> > You may be right. This five minute figure is when you restore from your +> > previous backup, then restore from the log file. +> > +> > Can't we do something like sync every 5 seconds, rather than after every +> > transaction? It just seems like such overkill. +> +> Isn't -F and sync in crontab the same ? + +OK, let me again try to marshall some (any?) support for my suggestion. + +Informix version 5/7 has three levels of logging: unbuffered +logging(our normal fsync mode), buffered logging, and no logging(our no +fsync mode). + +We don't have buffered logging. Buffered logging guarantees you get put +back to a consistent state after an os/server crash, usually to within +30/90 seconds. You do not have any partial transactions lying around, +but you do have some transactions that you thought were done, but are +not. + +This is faster then non-buffered logging, but not as fast as no logging. +Guess what mode everyone uses? The one we don't have, buffered logging! + +Unbuffered logging performance is terrible. Non-buffered logging is +used to load huge chunks of data during off-hours. + +The problem we have is that we fsync every transaction, which causes a +9-times slowdown in performance on single-integer inserts. + +That is a pretty heavy cost. But the alternative we give people is +no-fsync mode, where we don't sync anything, and in a crash, you could +come back with partially committed data in your database, if pg_log was +sync'ed by the database, and only some of the data pages were sync'ed, +so if any data was changing within 30 seconds of the crash, you have to +restore your previous backup. + +We really need a middle solution, that gives better data integrity, for +a smaller price. + +> +> > +> > Actually, I found a problem with my description. Because pg_log is not +> > fsync'ed, after a crash, pages with new transactions could have been +> > flushed to disk, but not the pg_log table that contains the transaction +> > ids. The problem is that the new backend could assign a transaction id +> > that is already in use. +> +> Impossible. Backend flushes pg_variable after fetching nex 32 xids. + +My suggestion is that we don't need to flush pg_variable or pg_log that +much. My suggestion would speed up the test you do with 100 inserts +inside a single transaction vs. 100 separate inserts. + +> > +> > We could set a flag upon successful shutdown, and if it is not set on +> > reboot, either do a vacuum to find the max transaction id, and +> > invalidate all them not in pg_log as synced, or increase the next +> > transaction id to some huge number and invalidate all them in between. +> > + +I have a fix for the problem stated above, and it doesn't require a +vacuum. + +We decide to fsync pg_variable and pg_log every 10,000 transactions or +oids. Then if the database is brought up, and it was not brought down +cleanly, you increment oid and transaction_id by 10,000, because you +know you couldn't have gotten more than that. All intermediate +transactions that are not marked committed/synced are marked aborted. + +--------------------------------------------------------------------------- + +The problem we have with the current system is that we sync by action, +not by time interval. If you are doing tons of inserts or updates, it +is syncing after every one. What people really want is something that +will sync not after every action, but after every minute or five +minutes, so when the system is busy, the syncing every minutes is just a +small amount, and when the system is idle, no one cares if is syncs, and +no one has to wait for the sync to complete. + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From matti@algonet.se Wed Nov 5 11:02:33 1997 +Received: from smtp.algonet.se (tomei.algonet.se [194.213.74.114]) + by candle.pha.pa.us (8.8.5/8.8.5) with SMTP id LAA02099 + for ; Wed, 5 Nov 1997 11:02:28 -0500 (EST) +Received: (qmail 6685 invoked from network); 5 Nov 1997 17:01:06 +0100 +Received: from du228-6.ppp.algonet.se (HELO gamma) (root@195.100.6.228) + by tomei.algonet.se with SMTP; 5 Nov 1997 17:01:06 +0100 +Sender: root +Message-ID: <34609871.27EED9D@algonet.se> +Date: Wed, 05 Nov 1997 17:02:16 +0100 +From: Mattias Kregert +Organization: Algonet ISP +X-Mailer: Mozilla 3.0Gold (X11; I; Linux 2.0.29 i586) +MIME-Version: 1.0 +To: Bruce Momjian +CC: pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711050317.WAA04674@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> We don't have buffered logging. Buffered logging guarantees you get put +> back to a consistent state after an os/server crash, usually to within +> 30/90 seconds. You do not have any partial transactions lying around, +> but you do have some transactions that you thought were done, but are + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> not. + ^^^^ +> +> This is faster then non-buffered logging, but not as fast as no logging. +> Guess what mode everyone uses? The one we don't have, buffered logging! + +Ouch! I would *not* like to use "buffered logging". +What's the point in having the wrong data in the database and not +knowing what updates, inserts or deletes to do to get the correct data? + +That's irrecoverable loss of data. Not what *I* want. Do *you* want it? + + +> We really need a middle solution, that gives better data integrity, for +> a smaller price. + +What I would like to have is this: + +If a backend tells the frontend that a transaction has completed, +then that transaction should absolutely not get lost in case of a crash. + +What is needed is a log of changes since the last backup. This +log would preferrably reside on a remote machine or at least +another disk. Then, if the power goes in the middle of a disk write, +the disk explodes and the computer goes up in flames, you can +install Postgresql on a new machine, restore the last backup and +re-run the change log. + + +> The problem we have with the current system is that we sync by action, +> not by time interval. If you are doing tons of inserts or updates, it +> is syncing after every one. What people really want is something that +> will sync not after every action, but after every minute or five +> minutes, so when the system is busy, the syncing every minutes is just a +> small amount, and when the system is idle, no one cares if is syncs, and +> no one has to wait for the sync to complete. + +Yes, but this would only be the first step on the way to better +crash-recovery. + +/* m */ + +From vadim@sable.krasnoyarsk.su Wed Nov 5 12:20:23 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05156 + for ; Wed, 5 Nov 1997 12:20:13 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA24123 for ; Wed, 5 Nov 1997 11:44:49 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id XAA23062; Wed, 5 Nov 1997 23:48:52 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <3460A374.41C67EA6@sable.krasnoyarsk.su> +Date: Wed, 05 Nov 1997 23:48:52 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711050317.WAA04674@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> OK, let me again try to marshall some (any?) support for my suggestion. +> +> Informix version 5/7 has three levels of logging: unbuffered +> logging(our normal fsync mode), buffered logging, and no logging(our no +> fsync mode). +> +> We don't have buffered logging. Buffered logging guarantees you get put +> back to a consistent state after an os/server crash, usually to within +> 30/90 seconds. You do not have any partial transactions lying around, +> but you do have some transactions that you thought were done, but are +> not. +> +> This is faster then non-buffered logging, but not as fast as no logging. +> Guess what mode everyone uses? The one we don't have, buffered logging! +> +> Unbuffered logging performance is terrible. Non-buffered logging is +> used to load huge chunks of data during off-hours. +> +> The problem we have is that we fsync every transaction, which causes a +> 9-times slowdown in performance on single-integer inserts. +> +> That is a pretty heavy cost. But the alternative we give people is +> no-fsync mode, where we don't sync anything, and in a crash, you could +> come back with partially committed data in your database, if pg_log was +> sync'ed by the database, and only some of the data pages were sync'ed, +> so if any data was changing within 30 seconds of the crash, you have to +> restore your previous backup. +> +> We really need a middle solution, that gives better data integrity, for +> a smaller price. + +There is no fsync synchronization currently. +How could we be ensured that all modified data pages are flushed +when we decided to flush pg_log ? +If backend doesn't fsync data pages & pg_log at the commit time +then when he must flush them (data first) ? + +This is what Oracle does: + +it uses dedicated DBWR process for writing/flushing modified +data pages and LGWR process for writing/flushing redo log +(redo log is transaction log also). LGWR always flushes log pages +when committing, but durty data pages can be flushed _after_ transaction +commit when DBWR decides that it's time to do it (ala checkpoints interval). + +Using redo log we could implement buffered logging quite easy. +We can even don't use dedicated processes (but flush redo before pg_log), +though having LGWR could simplify things. + +Without redo log or without some fsync synchronization we can't implement +buffered logging. BTW, shared system cache could help with +fsync synchonization, but, imho, redo is better (and faster for +un-buffered logging too). + +> > > Actually, I found a problem with my description. Because pg_log is not +> > > fsync'ed, after a crash, pages with new transactions could have been +> > > flushed to disk, but not the pg_log table that contains the transaction +> > > ids. The problem is that the new backend could assign a transaction id +> > > that is already in use. +> > +> > Impossible. Backend flushes pg_variable after fetching nex 32 xids. +> +> My suggestion is that we don't need to flush pg_variable or pg_log that +> much. My suggestion would speed up the test you do with 100 inserts +> inside a single transaction vs. 100 separate inserts. +> +> > > +> > > We could set a flag upon successful shutdown, and if it is not set on +> > > reboot, either do a vacuum to find the max transaction id, and +> > > invalidate all them not in pg_log as synced, or increase the next +> > > transaction id to some huge number and invalidate all them in between. +> > > +> +> I have a fix for the problem stated above, and it doesn't require a +> vacuum. +> +> We decide to fsync pg_variable and pg_log every 10,000 transactions or +> oids. Then if the database is brought up, and it was not brought down +> cleanly, you increment oid and transaction_id by 10,000, because you +> know you couldn't have gotten more than that. All intermediate +> transactions that are not marked committed/synced are marked aborted. + +This is what I suppose to do by placing next available oid/xid +in shmem: this allows pre-fetch much more than 32 ids at once +without losing them when session closed. + +> The problem we have with the current system is that we sync by action, +> not by time interval. If you are doing tons of inserts or updates, it +> is syncing after every one. What people really want is something that +> will sync not after every action, but after every minute or five +> minutes, so when the system is busy, the syncing every minutes is just a +> small amount, and when the system is idle, no one cares if is syncs, and +> no one has to wait for the sync to complete. + +When I'm really doing tons of inserts/updates/deletes I use +BEGIN/END. But it doesn't work for multi-user environment, of 'course. +As for about what people really want, I remember that recently someone +said in user list that if one want to have 10-20 inserts/sec then he +should use mysql, but I got 25 inserts/sec on AIC-7880 & WD Enterprise +when using one session, 32 inserts/sec with two sessions inserting +in two different tables and only 20 inserts/sec with two sessions +inserting in the same table. Imho, this difference between 20 and 32 +is more important thing to fix, and these results are not so bad +in comparison with others. + +(BTW, we shouldn't forget about using raw devices to speed up things). + +Vadim + +From vadim@sable.krasnoyarsk.su Wed Nov 5 12:20:08 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05150 + for ; Wed, 5 Nov 1997 12:20:07 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA24889 for ; Wed, 5 Nov 1997 11:59:27 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id AAA23096; Thu, 6 Nov 1997 00:03:19 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <3460A6D7.167EB0E7@sable.krasnoyarsk.su> +Date: Thu, 06 Nov 1997 00:03:19 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Mattias Kregert +CC: Bruce Momjian , pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711050317.WAA04674@candle.pha.pa.us> <34609871.27EED9D@algonet.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Mattias Kregert wrote: +> +> Bruce Momjian wrote: +> > +> > We don't have buffered logging. Buffered logging guarantees you get put +> > back to a consistent state after an os/server crash, usually to within +> > 30/90 seconds. You do not have any partial transactions lying around, +> > but you do have some transactions that you thought were done, but are +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > not. +> ^^^^ +> > +> > This is faster then non-buffered logging, but not as fast as no logging. +> > Guess what mode everyone uses? The one we don't have, buffered logging! +> +> Ouch! I would *not* like to use "buffered logging". + +And I. + +> What's the point in having the wrong data in the database and not +> knowing what updates, inserts or deletes to do to get the correct data? +> +> That's irrecoverable loss of data. Not what *I* want. Do *you* want it? +> +> > We really need a middle solution, that gives better data integrity, for +> > a smaller price. +> +> What I would like to have is this: +> +> If a backend tells the frontend that a transaction has completed, +> then that transaction should absolutely not get lost in case of a crash. + +Agreed. + +> +> What is needed is a log of changes since the last backup. This +> log would preferrably reside on a remote machine or at least +> another disk. Then, if the power goes in the middle of a disk write, +> the disk explodes and the computer goes up in flames, you can +> install Postgresql on a new machine, restore the last backup and +> re-run the change log. + +Yes. And as I already said - this will speed up things because +redo flushing is faster than flushing NNN tables which can be +unflushed for some interval. + +Vadim + +From owner-pgsql-hackers@hub.org Wed Nov 5 12:20:39 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05168 + for ; Wed, 5 Nov 1997 12:20:38 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA25888 for ; Wed, 5 Nov 1997 12:14:14 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA02259; Wed, 5 Nov 1997 12:02:33 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 12:00:21 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA00750 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 12:00:10 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id LAA00598 for ; Wed, 5 Nov 1997 11:59:45 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id AAA23096; Thu, 6 Nov 1997 00:03:19 +0700 (KRS) +Message-ID: <3460A6D7.167EB0E7@sable.krasnoyarsk.su> +Date: Thu, 06 Nov 1997 00:03:19 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Mattias Kregert +CC: Bruce Momjian , pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711050317.WAA04674@candle.pha.pa.us> <34609871.27EED9D@algonet.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Mattias Kregert wrote: +> +> Bruce Momjian wrote: +> > +> > We don't have buffered logging. Buffered logging guarantees you get put +> > back to a consistent state after an os/server crash, usually to within +> > 30/90 seconds. You do not have any partial transactions lying around, +> > but you do have some transactions that you thought were done, but are +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > not. +> ^^^^ +> > +> > This is faster then non-buffered logging, but not as fast as no logging. +> > Guess what mode everyone uses? The one we don't have, buffered logging! +> +> Ouch! I would *not* like to use "buffered logging". + +And I. + +> What's the point in having the wrong data in the database and not +> knowing what updates, inserts or deletes to do to get the correct data? +> +> That's irrecoverable loss of data. Not what *I* want. Do *you* want it? +> +> > We really need a middle solution, that gives better data integrity, for +> > a smaller price. +> +> What I would like to have is this: +> +> If a backend tells the frontend that a transaction has completed, +> then that transaction should absolutely not get lost in case of a crash. + +Agreed. + +> +> What is needed is a log of changes since the last backup. This +> log would preferrably reside on a remote machine or at least +> another disk. Then, if the power goes in the middle of a disk write, +> the disk explodes and the computer goes up in flames, you can +> install Postgresql on a new machine, restore the last backup and +> re-run the change log. + +Yes. And as I already said - this will speed up things because +redo flushing is faster than flushing NNN tables which can be +unflushed for some interval. + +Vadim + + +From owner-pgsql-hackers@hub.org Wed Nov 5 14:01:02 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA07017 + for ; Wed, 5 Nov 1997 14:00:59 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA01759 for ; Wed, 5 Nov 1997 13:52:36 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id NAA03611; Wed, 5 Nov 1997 13:29:43 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 13:27:48 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id NAA03291 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 13:27:41 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id NAA02823 for ; Wed, 5 Nov 1997 13:26:20 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id NAA05863; + Wed, 5 Nov 1997 13:16:09 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711051816.NAA05863@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Wed, 5 Nov 1997 13:16:09 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <3460A374.41C67EA6@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 5, 97 11:48:52 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> There is no fsync synchronization currently. +> How could we be ensured that all modified data pages are flushed +> when we decided to flush pg_log ? +> If backend doesn't fsync data pages & pg_log at the commit time +> then when he must flush them (data first) ? + +My idea was to have the backend do a 'sync' that causes the OS to sync +all dirty pages, then mark all committed transactions on pg_log as +'synced'. Then sync pg_log. That way, there is a clear system where we +know everything is flushed to disk, and we mark the transactions as +synced. + +The only time that synced flag is used, is when the database starts up, +and it sees that the previous shutdown was not clean. + +What am I missing here? + +> +> This is what Oracle does: +> +> it uses dedicated DBWR process for writing/flushing modified +> data pages and LGWR process for writing/flushing redo log +> (redo log is transaction log also). LGWR always flushes log pages +> when committing, but durty data pages can be flushed _after_ transaction +> commit when DBWR decides that it's time to do it (ala checkpoints interval). +> +> Using redo log we could implement buffered logging quite easy. +> We can even don't use dedicated processes (but flush redo before pg_log), +> though having LGWR could simplify things. +> +> Without redo log or without some fsync synchronization we can't implement +> buffered logging. BTW, shared system cache could help with +> fsync synchonization, but, imho, redo is better (and faster for +> un-buffered logging too). +> + +I suggested my solution because it is clean, does flushing in one +central location(postmaster), and does quick restores. + +> > > > Actually, I found a problem with my description. Because pg_log is not +> > > > fsync'ed, after a crash, pages with new transactions could have been +> > > > flushed to disk, but not the pg_log table that contains the transaction +> > > > ids. The problem is that the new backend could assign a transaction id +> > > > that is already in use. +> > > +> > > Impossible. Backend flushes pg_variable after fetching nex 32 xids. +> > +> > My suggestion is that we don't need to flush pg_variable or pg_log that +> > much. My suggestion would speed up the test you do with 100 inserts +> > inside a single transaction vs. 100 separate inserts. +> > +> > > > +> > > > We could set a flag upon successful shutdown, and if it is not set on +> > > > reboot, either do a vacuum to find the max transaction id, and +> > > > invalidate all them not in pg_log as synced, or increase the next +> > > > transaction id to some huge number and invalidate all them in between. +> > > > +> > +> > I have a fix for the problem stated above, and it doesn't require a +> > vacuum. +> > +> > We decide to fsync pg_variable and pg_log every 10,000 transactions or +> > oids. Then if the database is brought up, and it was not brought down +> > cleanly, you increment oid and transaction_id by 10,000, because you +> > know you couldn't have gotten more than that. All intermediate +> > transactions that are not marked committed/synced are marked aborted. +> +> This is what I suppose to do by placing next available oid/xid +> in shmem: this allows pre-fetch much more than 32 ids at once +> without losing them when session closed. +> +> > The problem we have with the current system is that we sync by action, +> > not by time interval. If you are doing tons of inserts or updates, it +> > is syncing after every one. What people really want is something that +> > will sync not after every action, but after every minute or five +> > minutes, so when the system is busy, the syncing every minutes is just a +> > small amount, and when the system is idle, no one cares if is syncs, and +> > no one has to wait for the sync to complete. +> +> When I'm really doing tons of inserts/updates/deletes I use +> BEGIN/END. But it doesn't work for multi-user environment, of 'course. +> As for about what people really want, I remember that recently someone +> said in user list that if one want to have 10-20 inserts/sec then he +> should use mysql, but I got 25 inserts/sec on AIC-7880 & WD Enterprise +> when using one session, 32 inserts/sec with two sessions inserting +> in two different tables and only 20 inserts/sec with two sessions +> inserting in the same table. Imho, this difference between 20 and 32 +> is more important thing to fix, and these results are not so bad +> in comparison with others. +> +> (BTW, we shouldn't forget about using raw devices to speed up things). +> +> Vadim +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From james@blarg.net Wed Nov 5 13:26:46 1997 +Received: from animal.blarg.net (mail@animal.blarg.net [206.114.144.1]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA06130 + for ; Wed, 5 Nov 1997 13:26:26 -0500 (EST) +Received: from animal.blarg.net (james@animal.blarg.net [206.114.144.1]) + by animal.blarg.net (8.8.5/8.8.4) with SMTP + id KAA09775; Wed, 5 Nov 1997 10:26:10 -0800 +Date: Wed, 5 Nov 1997 10:26:10 -0800 (PST) +From: "James A. Hillyerd" +To: Bruce Momjian +cc: Mattias Kregert , pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +In-Reply-To: <199711051615.LAA02260@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + +On Wed, 5 Nov 1997, Bruce Momjian wrote: +> +> The strange thing I am hearing is that the people who use PostgreSQL are +> more worried about data recovery from a crash than million-dollar +> companies that use commercial databases. +> + +If I may throw in my 2 cents, I'd prefer to see that database in a +consistent state, with the data being up to date as of 1 minute or +less before the crash. I'd rather have higher performance than up to the +second data. + +-james + +[ James A. Hillyerd (JH2162) - james@blarg.net - Web Developer ] +[ http://www.blarg.net/~james/ http://www.hyperglyphics.com/ ] +[ 1024/B11C3751 CA 1C B3 A9 07 2F 57 C9 91 F4 73 F2 19 A4 C5 88 ] + + +From vadim@sable.krasnoyarsk.su Wed Nov 5 14:24:03 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA07830 + for ; Wed, 5 Nov 1997 14:24:02 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA02778 for ; Wed, 5 Nov 1997 14:13:45 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id CAA23376; Thu, 6 Nov 1997 02:17:51 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <3460C65E.446B9B3D@sable.krasnoyarsk.su> +Date: Thu, 06 Nov 1997 02:17:50 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711051816.NAA05863@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > There is no fsync synchronization currently. +> > How could we be ensured that all modified data pages are flushed +> > when we decided to flush pg_log ? +> > If backend doesn't fsync data pages & pg_log at the commit time +> > then when he must flush them (data first) ? +> +> My idea was to have the backend do a 'sync' that causes the OS to sync +> all dirty pages, then mark all committed transactions on pg_log as +> 'synced'. Then sync pg_log. That way, there is a clear system where we +> know everything is flushed to disk, and we mark the transactions as +> synced. +> +> The only time that synced flag is used, is when the database starts up, +> and it sees that the previous shutdown was not clean. +> +> What am I missing here? + +Ok, I see. But we can avoid 'synced' flag: we can make (just before +sync-ing data pages) in-memory copies of "on-line" durty pg_log pages +to being written/fsynced and perform write/fsync from these copies +without stopping new commits in "on-line" page(s) (nothing must go +to disk from "on-line" log pages). + +Vadim + +From owner-pgsql-hackers@hub.org Wed Nov 5 14:32:25 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA08101 + for ; Wed, 5 Nov 1997 14:32:21 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id OAA22970; Wed, 5 Nov 1997 14:26:47 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 14:24:59 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id OAA22344 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 14:24:56 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id OAA22319 for ; Wed, 5 Nov 1997 14:24:38 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id OAA07661; + Wed, 5 Nov 1997 14:22:46 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711051922.OAA07661@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Wed, 5 Nov 1997 14:22:45 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <3460A374.41C67EA6@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 5, 97 11:48:52 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Just a clarification. When I say the postmaster issues a sync, I mean +sync(2), not fsync(2). + +The sync flushes all dirty pages on all file systems. Ordinary users +can issue this, and update usually does this every 30 seconds anyway. + +By using this, we let the kernel figure out which buffers are dirty. We +don't have to figure this out in the postmaster. + +Then we update the pg_log table to mark those transactions as synced. +On recovery from a crash, we mark the committed transactions as +uncommitted if they do not have the synced flag. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Wed Nov 5 15:11:07 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA08751 + for ; Wed, 5 Nov 1997 15:10:59 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA01986; Wed, 5 Nov 1997 15:01:24 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 14:59:32 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id OAA01414 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 14:59:28 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id OAA01403 for ; Wed, 5 Nov 1997 14:59:14 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id OAA08283; + Wed, 5 Nov 1997 14:53:55 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711051953.OAA08283@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Wed, 5 Nov 1997 14:53:54 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <3460C65E.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 6, 97 02:17:50 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> > The only time that synced flag is used, is when the database starts up, +> > and it sees that the previous shutdown was not clean. +> > +> > What am I missing here? +> +> Ok, I see. But we can avoid 'synced' flag: we can make (just before +> sync-ing data pages) in-memory copies of "on-line" durty pg_log pages +> to being written/fsynced and perform write/fsync from these copies +> without stopping new commits in "on-line" page(s) (nothing must go +> to disk from "on-line" log pages). + +[Working late tonight?] + +OK, now I am lost. We need the sync'ed flag so when we start the +postmaster, and we see the database we not shut down properly, we use +the flag to clear the commit flag from comitted transactions that were +not sync'ed by the postmaster. + +In my opinion, we don't need any extra copies of pg_log, we can set +those sync'ed flags while others are making changes, because before we +did our sync, we gathered a list of committed transaction ids from the +shared transaction id queue that I mentioned a while ago. + +We need this queue so we can find the newly-committed transactions that +do not have a sync flag. Another way we could do this would be to scan +pg_log before we sync, getting all the committed transaction ids without +sync flags. No lock is needed on the table. If we miss some new ones, +we will get them next time we scan. The problem I saw is that there is +no way to see when to stop scanning the pg_log table for such +transactions, so I thought each backend would have to put its newly +committed transactions in a separate place. Maybe I am wrong. + +This syncing method just seems so natural since we have pg_log. That is +why I keep bringing it up until people tell me I am stupid. + +This transaction commit/sync stuff is complicated, and takes a while to +hash out in a group. + +--------------------------------------------------------------------------- + +I just re-read your description, and I see what you are saying. My idea +has pg_log commit flag be real commit flags while the system is running, +but on reboot after failure, we remove the commit flags on non-synced +stuff before we start up. + +Your idea is to make pg_log commit flags only appear in in-memory copies +of pg_log, and write the commit flags to disk only after the sync is +done. + +Either way will work. The question is, "Which is easier?" The OS is +going to sync pg_log on its own. We would almost need a second copy of +pg_log, one copy to be used on postmaster startup, and a second to be +used by running backends, and the postmaster would make a copy of the +running backend pg_log, sync the disks, and copy it to the boot copy. + +I don't see how the backend is going to figure out which pg_log pages +were modified and need to be sent to the boot copy of pg_log. + +Now that I am thinking, here is a good idea. Instead of a fancy +transaction queue, what if we just have the backend record the lowest +numbered transaction they commit in a shared memory area. If the +current transaction id they commit is greater than the minimum, then +change nothing. That way, the backend could copy all pg_log pages +containing that minimum pg_log transaction id up to the most recent +pg_log page, do the sync, and copy just those to the boot copy of +pg_log. + +This eliminates the transaction id queue. + +The nice thing about the sync-flag in pg_log is that there is no copying +by the backend. But we would have to spin through the file to set those +sync bits. Your method just copies whole pages to the boot copy. + +--------------------------------------------------------------------------- + +I don't want to force this idea on anyone, or annoy anyone. I just +think it needs to be considered. The concepts are unusual, so once +people get the full idea, if they don't like it, we can trash it. I +still think it holds promise. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From hotz@jpl.nasa.gov Wed Nov 5 15:30:18 1997 +Received: from hotzsun.jpl.nasa.gov (hotzsun.jpl.nasa.gov [137.79.51.138]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA09500 + for ; Wed, 5 Nov 1997 15:30:16 -0500 (EST) +Received: from [137.79.51.141] (hotzmac [137.79.51.141]) by hotzsun.jpl.nasa.gov (8.7.6/8.7.3) with SMTP id MAA10100; Wed, 5 Nov 1997 12:29:58 -0800 (PST) +X-Sender: hotzmail@hotzsun.jpl.nasa.gov +Message-Id: +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Date: Wed, 5 Nov 1997 12:29:58 -0800 +To: Bruce Momjian , + matti@algonet.se (Mattias Kregert) +From: hotz@jpl.nasa.gov (Henry B. Hotz) +Subject: Re: [HACKERS] My $.02, was: PERFORMANCE and Good Bye, Time Travel! +Cc: pgsql-hackers@postgreSQL.org +Status: OR + +At 11:15 AM 11/5/97, Bruce Momjian wrote: +>The strange thing I am hearing is that the people who use PostgreSQL are +>more worried about data recovery from a crash than million-dollar +>companies that use commercial databases. +> +>I don't get it. + +I would run PG to make sure that committed transactions were really written +to disk because that seems "correct" and I don't have the kind of +performance requirements that would push me to do otherwise. + +That said, I can see a need for varying performance/crash-immunity +tradeoffs, and at least *one* option in between "correct" and "unprotected" +operation would seem desirable. + +Signature failed Preliminary Design Review. +Feasibility of a new signature is currently being evaluated. +h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu + + + +From owner-pgsql-hackers@hub.org Thu Nov 6 15:51:23 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA04634 + for ; Thu, 6 Nov 1997 15:51:08 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA24783; Thu, 6 Nov 1997 15:36:47 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 06 Nov 1997 15:36:07 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id PAA24514 for pgsql-hackers-outgoing; Thu, 6 Nov 1997 15:36:02 -0500 (EST) +Received: from guevara.bildbasen.kiruna.se (guevara.bildbasen.kiruna.se [193.45.225.110]) by hub.org (8.8.5/8.7.5) with SMTP id PAA24319 for ; Thu, 6 Nov 1997 15:35:32 -0500 (EST) +Received: (qmail 9764 invoked by uid 129); 6 Nov 1997 20:34:35 -0000 +Date: 6 Nov 1997 20:34:35 -0000 +Message-ID: <19971106203435.9763.qmail@guevara.bildbasen.kiruna.se> +From: Goran Thyni +To: pgsql-hackers@postgreSQL.org +In-reply-to: <34619E9E.622F563@algonet.se> (message from Mattias Kregert on + Thu, 06 Nov 1997 11:40:30 +0100) +Subject: [HACKERS] Re: Performance vs. Crash Recovery +Mime-Version: 1.0 +Content-Type: text/plain; charset=ISO-8859-1 +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + + +I am getting quiet bored by this discussion, +if someone has a strong opinion about how this +should be done go ahead and make a test implementation +then we have something to discuss. + +In the mean time, if you want best possible data protection +mount you database disk sync:ed. This is safer than any scheme +we could come up with. +D*mned slow too, so everybody should be happy. :-) + +And I see no point implement a periodic sync in postmaster. +All unices has cron, why not just use that. +Or even a stupid 1-liner (ba)sh-script like: + +while true; do sleep 20; sync; done + + best regards, +-- +--------------------------------------------- +Göran Thyni, sysadm, JMS Bildbasen, Kiruna + + + +From vadim@sable.krasnoyarsk.su Thu Nov 6 23:31:41 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04723 + for ; Thu, 6 Nov 1997 23:31:21 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25438; Fri, 7 Nov 1997 11:36:25 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <34629AC9.15FB7483@sable.krasnoyarsk.su> +Date: Fri, 07 Nov 1997 11:36:25 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711051953.OAA08283@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > The only time that synced flag is used, is when the database starts up, +> > > and it sees that the previous shutdown was not clean. +> > > +> > > What am I missing here? +> > +> > Ok, I see. But we can avoid 'synced' flag: we can make (just before +> > sync-ing data pages) in-memory copies of "on-line" durty pg_log pages +> > to being written/fsynced and perform write/fsync from these copies +> > without stopping new commits in "on-line" page(s) (nothing must go +> > to disk from "on-line" log pages). +> +> [Working late tonight?] + +[Yes] + +> I just re-read your description, and I see what you are saying. My idea +> has pg_log commit flag be real commit flags while the system is running, +> but on reboot after failure, we remove the commit flags on non-synced +> stuff before we start up. +> +> Your idea is to make pg_log commit flags only appear in in-memory copies +> of pg_log, and write the commit flags to disk only after the sync is +> done. +> +> Either way will work. The question is, "Which is easier?" The OS is +> going to sync pg_log on its own. We would almost need a second copy of +> pg_log, one copy to be used on postmaster startup, and a second to be +> used by running backends, and the postmaster would make a copy of the +> running backend pg_log, sync the disks, and copy it to the boot copy. +> +> I don't see how the backend is going to figure out which pg_log pages +> were modified and need to be sent to the boot copy of pg_log. +> +> Now that I am thinking, here is a good idea. Instead of a fancy +> transaction queue, what if we just have the backend record the lowest +> numbered transaction they commit in a shared memory area. If the +> current transaction id they commit is greater than the minimum, then +> change nothing. That way, the backend could copy all pg_log pages +> containing that minimum pg_log transaction id up to the most recent +> pg_log page, do the sync, and copy just those to the boot copy of +> pg_log. +> +> This eliminates the transaction id queue. +> +> The nice thing about the sync-flag in pg_log is that there is no copying +> by the backend. But we would have to spin through the file to set those +> sync bits. Your method just copies whole pages to the boot copy. + + In my plans to re-design transaction system I supposed to keep in shmem +two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer +to access them is not good idea. Also, we could use spinlock instead of +lock manager to synchronize access to these pages (as I see in spin.c +spinlock-s could be shared, but only exclusive ones are used) - spinlocks +are faster. + These two last pg_log pages are "online" ones. Race condition: when one or +both of online pages becomes non-online ones, i.e. pg_log has to be expanded +when writing commit/abort of "big" xid. This is how we could handle this +in "buffered" logging (delayed fsync) mode: + + When backend want to write commit/abort status he acquires exclusive +OnLineLogLock. If xid belongs to online pages then backend writes status +and releases spin. If xid is less than least xid on 1st online page then +backend releases spin and does exactly the same what he does in normal mode: +flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, +update xid status, WriteBuffer, release write lock, flush pg_log. +If xid is greater than max xid on 2nd online page then the simplest way is +just do sync(); sync() (two times), flush 1st or both online pages, +read new page(s) into online pages space, update xid status, +release OnLineLogLock spin. We could try other ways but pg_log expanding +is rare case (32K xids in one pg_log page)... + All what postmaster will have to do is: +1. Get shared OnLineLogLock. +2. Copy 2 x 8K data to private place. +3. Release spinlock. +4. sync(); sync(); (two times!) +5. Flush online pages. + +We could use -F DELAY_TIME to turn fsync delayed mode ON. + +And, btw, having two bits for xact status we have only one unused +status value (0x11) currently - I would like to use this for +nested xactions and savepoints... + +> I don't want to force this idea on anyone, or annoy anyone. I just +> think it needs to be considered. The concepts are unusual, so once +> people get the full idea, if they don't like it, we can trash it. I +> still think it holds promise. + +Agreed. + +Vadim + +From owner-pgsql-hackers@hub.org Fri Nov 7 01:32:49 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07651 + for ; Fri, 7 Nov 1997 01:32:47 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA23328 for ; Thu, 6 Nov 1997 23:46:08 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA19565; Thu, 6 Nov 1997 23:38:55 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 06 Nov 1997 23:36:53 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA18911 for pgsql-hackers-outgoing; Thu, 6 Nov 1997 23:36:44 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA18779 for ; Thu, 6 Nov 1997 23:36:02 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25448; Fri, 7 Nov 1997 11:40:29 +0700 (KRS) +Message-ID: <34629BBD.59E2B600@sable.krasnoyarsk.su> +Date: Fri, 07 Nov 1997 11:40:29 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: Mattias Kregert , pgsql-hackers@postgreSQL.org +Subject: Re: Sync:ing data and log (Was: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!) +References: <199711061810.NAA02118@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> > +> > Never use sync(). Use fsync(). Other processes should take care of their +> > own syncing. If you use sync(), and you have a lot of disks, the sync +> > can +> > take half a minute if you are unlucky. +> +> We could use fsync() but then the postmaster has to know what tables +> have dirty buffers, and I don't think there is an easy way to do this. + +There is one way - shared system cache... + +Vadim + + +From vadim@sable.krasnoyarsk.su Fri Nov 7 01:31:24 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07639 + for ; Fri, 7 Nov 1997 01:31:22 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA23094 for ; Thu, 6 Nov 1997 23:39:00 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25457; Fri, 7 Nov 1997 11:43:52 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <34629C87.3F54BC7E@sable.krasnoyarsk.su> +Date: Fri, 07 Nov 1997 11:43:51 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Mattias Kregert +CC: Bruce Momjian , pgsql-hackers@postgreSQL.org +Subject: Re: Performance vs. Crash Recovery (Was: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!) +References: <199711051615.LAA02260@candle.pha.pa.us> <34619E9E.622F563@algonet.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Mattias Kregert wrote: +> +> > The strange thing I am hearing is that the people who use PostgreSQL are +> > more worried about data recovery from a crash than million-dollar +> > companies that use commercial databases. +> > +> > I don't get it. +> +> Perhaps the million-dollar companies have more sophisticated hardware, +> like big expensive disk arrays, big UPS:es and parallell backup +> servers? +> If so, the risk of harware failure is much smaller for them. + +More of that - Informix is more stable than postgres: elog(FATAL) +occures sometime and in fsync delayed mode this will cause +of losing xaction too, not onle hard/OS failure. + +Vadim + +From owner-pgsql-hackers@hub.org Fri Nov 7 01:31:26 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07642 + for ; Fri, 7 Nov 1997 01:31:24 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id AAA24358 for ; Fri, 7 Nov 1997 00:09:47 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA00167; Fri, 7 Nov 1997 00:03:17 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 00:01:26 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA29427 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 00:01:19 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA29364 for ; Fri, 7 Nov 1997 00:01:02 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id XAA05565; + Thu, 6 Nov 1997 23:54:33 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711070454.XAA05565@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Thu, 6 Nov 1997 23:54:33 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <34629AC9.15FB7483@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 7, 97 11:36:25 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +I was worried when you didn't respond to my last list of ideas. I +thought perhaps the idea was getting on your nerves. + +I haven't dropped the idea because: + + 1) it offers 2-9 times speedup in database modifications + 2) this is how the big commercial system handle it, and I think + we need to give users this option. + 3) in the way I had it designed, it wouldn't take much work to + do it. + +Anything that promises that much speedup, if it can be done easy, I say +lets consider it, even if you loose 60 seconds of changes. + + +> In my plans to re-design transaction system I supposed to keep in shmem +> two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer +> to access them is not good idea. Also, we could use spinlock instead of +> lock manager to synchronize access to these pages (as I see in spin.c +> spinlock-s could be shared, but only exclusive ones are used) - spinlocks +> are faster. + +Ah, so you already had the idea of having on-line pages in shared memory +as part of a transaction system overhaul? Right now, does each backend +lock/read/write/unlock to get at pg_log? Wow, that is bad. + +Perhaps mmap() would be a good idea. My system has msync() to flush +mmap()'ed pages to the underlying file. You would still run fsync() +after that. This may give us the best of both worlds: a shared-memory +area of variable size, and control of when it get flushed to disk. Do +other OS's have this? I have a feeling OS's with unified buffer caches +don't have this ability to determine when the underlying mmap'ed file +gets sent to the underlying file and disk. + + +> These two last pg_log pages are "online" ones. Race condition: when one or +> both of online pages becomes non-online ones, i.e. pg_log has to be expanded +> when writing commit/abort of "big" xid. This is how we could handle this +> in "buffered" logging (delayed fsync) mode: +> +> When backend want to write commit/abort status he acquires exclusive +> OnLineLogLock. If xid belongs to online pages then backend writes status +> and releases spin. If xid is less than least xid on 1st online page then +> backend releases spin and does exactly the same what he does in normal mode: +> flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, +> update xid status, WriteBuffer, release write lock, flush pg_log. +> If xid is greater than max xid on 2nd online page then the simplest way is +> just do sync(); sync() (two times), flush 1st or both online pages, +> read new page(s) into online pages space, update xid status, +> release OnLineLogLock spin. We could try other ways but pg_log expanding +> is rare case (32K xids in one pg_log page)... +> All what postmaster will have to do is: +> 1. Get shared OnLineLogLock. +> 2. Copy 2 x 8K data to private place. +> 3. Release spinlock. +> 4. sync(); sync(); (two times!) +> 5. Flush online pages. +> +> We could use -F DELAY_TIME to turn fsync delayed mode ON. +> +> And, btw, having two bits for xact status we have only one unused +> status value (0x11) currently - I would like to use this for +> nested xactions and savepoints... + +I saw that. By keeping two copies of pg_log, one in memory to be used +by all backend, and another that hits the disk, it certainly will work. + +> +> > I don't want to force this idea on anyone, or annoy anyone. I just +> > think it needs to be considered. The concepts are unusual, so once +> > people get the full idea, if they don't like it, we can trash it. I +> > still think it holds promise. +> +> Agreed. +> +> Vadim +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Fri Nov 7 01:03:09 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07314 + for ; Fri, 7 Nov 1997 01:03:05 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA07879; Fri, 7 Nov 1997 00:57:42 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 00:55:52 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA03918 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 00:55:46 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA02961 for ; Fri, 7 Nov 1997 00:55:18 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA25567; Fri, 7 Nov 1997 12:59:29 +0700 (KRS) +Message-ID: <3462AE40.FF6D5DF@sable.krasnoyarsk.su> +Date: Fri, 07 Nov 1997 12:59:28 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711070454.XAA05565@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> I was worried when you didn't respond to my last list of ideas. I +> thought perhaps the idea was getting on your nerves. + +No, I was (and, unfortunately, I still) busy... + +> +> I haven't dropped the idea because: +> +> 1) it offers 2-9 times speedup in database modifications +> 2) this is how the big commercial system handle it, and I think +> we need to give users this option. +> 3) in the way I had it designed, it wouldn't take much work to +> do it. +> +> Anything that promises that much speedup, if it can be done easy, I say +> lets consider it, even if you loose 60 seconds of changes. + +I agreed with your un-buffered logging idea. This would be excellent +feature for un-critical dbase usings (WWW, etc). + +> +> > In my plans to re-design transaction system I supposed to keep in shmem +> > two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer +> > to access them is not good idea. Also, we could use spinlock instead of +> > lock manager to synchronize access to these pages (as I see in spin.c +> > spinlock-s could be shared, but only exclusive ones are used) - spinlocks +> > are faster. +> +> Ah, so you already had the idea of having on-line pages in shared memory +> as part of a transaction system overhaul? Right now, does each backend + +Yes. I hope to implement this in the next 1-2 weeks. + +> lock/read/write/unlock to get at pg_log? Wow, that is bad. + +Yes, he does. + +> +> Perhaps mmap() would be a good idea. My system has msync() to flush +> mmap()'ed pages to the underlying file. You would still run fsync() +> after that. This may give us the best of both worlds: a shared-memory + ^^^^^^^^^^^^^ +> area of variable size, and control of when it get flushed to disk. Do + ^^^^^^^^^^^^^^^^^^^^^ +I like it. FreeBSD supports + +MAP_ANON Map anonymous memory not associated with any specific file. + +It would be nice to use mmap to get more "shared" memory, but I don't see +reasons to mmap any particular file to memory. Having two last pg_log pages +in memory + xact commit/abort writeback optimization (updation of commit/abort +xmin/xmax status in tuples by any scan - we already have this) reduce access +to "old" pg_log pages to zero. + +> other OS's have this? I have a feeling OS's with unified buffer caches +> don't have this ability to determine when the underlying mmap'ed file +> gets sent to the underlying file and disk. +> +> > These two last pg_log pages are "online" ones. Race condition: when one or +> > both of online pages becomes non-online ones, i.e. pg_log has to be expanded +> > when writing commit/abort of "big" xid. This is how we could handle this +> > in "buffered" logging (delayed fsync) mode: +> > +> > When backend want to write commit/abort status he acquires exclusive +> > OnLineLogLock. If xid belongs to online pages then backend writes status +> > and releases spin. If xid is less than least xid on 1st online page then +> > backend releases spin and does exactly the same what he does in normal mode: +> > flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, +> > update xid status, WriteBuffer, release write lock, flush pg_log. +> > If xid is greater than max xid on 2nd online page then the simplest way is +> > just do sync(); sync() (two times), flush 1st or both online pages, +> > read new page(s) into online pages space, update xid status, +> > release OnLineLogLock spin. We could try other ways but pg_log expanding +> > is rare case (32K xids in one pg_log page)... +> > All what postmaster will have to do is: +> > 1. Get shared OnLineLogLock. +> > 2. Copy 2 x 8K data to private place. +> > 3. Release spinlock. +> > 4. sync(); sync(); (two times!) +> > 5. Flush online pages. +> > +> > We could use -F DELAY_TIME to turn fsync delayed mode ON. +> > +> > And, btw, having two bits for xact status we have only one unused +> > status value (0x11) currently - I would like to use this for +> > nested xactions and savepoints... + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +More about this: 0x11 could mean "this _child_ transaction is committed - +you have to lookup in pg_xact_child to get parent xid and use pg_log again +to get parent xact status". If parent committed then child xact status +will be changed to 0x10 (committed) else - to 0x01 (aborted). Using this +we could get xact nesting and savepoints by starting new child xaction +inside running one... + +> +> I saw that. By keeping two copies of pg_log, one in memory to be used + ^^^^^^ + Just two pg_log pages... + +> by all backend, and another that hits the disk, it certainly will work. + +Vadim + + +From vadim@sable.krasnoyarsk.su Fri Nov 7 01:30:59 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07599 + for ; Fri, 7 Nov 1997 01:30:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26793 for ; Fri, 7 Nov 1997 01:12:33 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA25592; Fri, 7 Nov 1997 13:16:39 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <3462B247.ABD322C@sable.krasnoyarsk.su> +Date: Fri, 07 Nov 1997 13:16:39 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Jan Wieck +CC: Bruce Momjian , hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +wieck@sapserv.debis.de wrote: +> +> Bruce wrote: +> > +> > > > It seems that this is what Oracle does, but Sybase writes queries +> > > > (with transaction ids, of 'course, and before execution) and +> > > > begin, commit/abort events <-- this is better for non-overwriting +> > > > system (shorter redo file), but, agreed, recovering is more complicated. +> > > > +> > > > Vadim +> > > > +> > > +> > > Writing only the queries (and only those that really modify +> > > data - no selects) would be much smarter and the redo files +> > > will be shorter. But it wouldn't fit for PostgreSQL as long +> > > as someone can submit a query like +> > > +> > > DELETE FROM xxx WHERE oid = 59337; +> > +> > Interesting point. Currently, an insert shows the OID as output in +> > psql. Perhaps we could do a little oid-manipulating to set the oid of +> > the insert. +> +> Only for simple inserts, not on +> +> INSERT INTO xxx SELECT any_type_of_merge_join; + +I don't know how but Sybase handle this and IDENTITY (case of OIDs) too. +But I don't object you, Jan, just because I havn't time to do +"log queries" redo implementation and so I would like to have "log changes" +redo at least. (Actually, "log changes" is good for my production dbase +with 1 - 2 thousand updations per day). +(BTW, "incrementing" backup could be implemented without redo - I have +some thoughts about this, - but having additional recovering is good +in any case). + +Vadim + +From owner-pgsql-hackers@hub.org Fri Nov 7 15:42:58 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA22341 + for ; Fri, 7 Nov 1997 15:42:55 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA02769; Fri, 7 Nov 1997 15:28:54 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 15:24:00 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id PAA01318 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 15:23:52 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id PAA00705 for ; Fri, 7 Nov 1997 15:21:56 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id PAA20010; + Fri, 7 Nov 1997 15:20:10 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711072020.PAA20010@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Fri, 7 Nov 1997 15:20:10 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <3462AE40.FF6D5DF@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 7, 97 12:59:28 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> > Anything that promises that much speedup, if it can be done easy, I say +> > lets consider it, even if you loose 60 seconds of changes. +> +> I agreed with your un-buffered logging idea. This would be excellent +> feature for un-critical dbase usings (WWW, etc). + +Actually, it is buffered logging. We currently have unbuffered logging, +I think. + +> > > In my plans to re-design transaction system I supposed to keep in shmem +> > > two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer +> > > to access them is not good idea. Also, we could use spinlock instead of +> > > lock manager to synchronize access to these pages (as I see in spin.c +> > > spinlock-s could be shared, but only exclusive ones are used) - spinlocks +> > > are faster. +> > +> > Ah, so you already had the idea of having on-line pages in shared memory +> > as part of a transaction system overhaul? Right now, does each backend +> +> Yes. I hope to implement this in the next 1-2 weeks. +> +> > lock/read/write/unlock to get at pg_log? Wow, that is bad. +> +> Yes, he does. +> +> > +> > Perhaps mmap() would be a good idea. My system has msync() to flush +> > mmap()'ed pages to the underlying file. You would still run fsync() +> > after that. This may give us the best of both worlds: a shared-memory +> ^^^^^^^^^^^^^ +> > area of variable size, and control of when it get flushed to disk. Do +> ^^^^^^^^^^^^^^^^^^^^^ +> I like it. FreeBSD supports +> +> MAP_ANON Map anonymous memory not associated with any specific file. +> +> It would be nice to use mmap to get more "shared" memory, but I don't see +> reasons to mmap any particular file to memory. Having two last pg_log pages +> in memory + xact commit/abort writeback optimization (updation of commit/abort +> xmin/xmax status in tuples by any scan - we already have this) reduce access +> to "old" pg_log pages to zero. + +I totally agree. There is no advantage to mmap() vs. shared memory for +us. I thought if we could control when the mmap() gets flushed to disk, +we could let the OS handle the syncing, but I doubt this is going to be +portable. + +Though, we could mmap() pg_log, and that way backends would not have to +read/write the blocks, and they could all see the same data. But with +the new scheme, they have most transaction ids in shared memory. + +Interesting you mention the scan updating the transaction status. We +would have a problem here. It is possible a backend will update the +commit status of a data page, and that data page will make it to disk, +but if there is a crash before the update pg_log gets sync'ed, there +would be a partial transaction in the system. + +I don't know any way that a backend would know the transaction has hit +disk, and the data commit flag could be set. You don't want to update +the commit flag of the data page until entire transaction has been +sync'ed. The only way to do that would be to have a 'commit and synced' +flag, but you want to save that for nested transactions. + +Another case this could come in handy is to allow reuse of superceeded +data rows. If the transaction is committed and synced, the row space +could be reused by another transaction. + +> > other OS's have this? I have a feeling OS's with unified buffer caches +> > don't have this ability to determine when the underlying mmap'ed file +> > gets sent to the underlying file and disk. +> > +> > > These two last pg_log pages are "online" ones. Race condition: when one or +> > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded +> > > when writing commit/abort of "big" xid. This is how we could handle this +> > > in "buffered" logging (delayed fsync) mode: +> > > +> > > When backend want to write commit/abort status he acquires exclusive +> > > OnLineLogLock. If xid belongs to online pages then backend writes status + +This confuses me. Why does a backend need to lock pg_log to update a +transaction status? + +> > > and releases spin. If xid is less than least xid on 1st online page then +> > > backend releases spin and does exactly the same what he does in normal mode: +> > > flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, +> > > update xid status, WriteBuffer, release write lock, flush pg_log. +> > > If xid is greater than max xid on 2nd online page then the simplest way is +> > > just do sync(); sync() (two times), flush 1st or both online pages, +> > > read new page(s) into online pages space, update xid status, +> > > release OnLineLogLock spin. We could try other ways but pg_log expanding +> > > is rare case (32K xids in one pg_log page)... +> > > All what postmaster will have to do is: +> > > 1. Get shared OnLineLogLock. +> > > 2. Copy 2 x 8K data to private place. +> > > 3. Release spinlock. +> > > 4. sync(); sync(); (two times!) +> > > 5. Flush online pages. + +Great. + +> > > +> > > We could use -F DELAY_TIME to turn fsync delayed mode ON. +> > > +> > > And, btw, having two bits for xact status we have only one unused +> > > status value (0x11) currently - I would like to use this for +> > > nested xactions and savepoints... +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> More about this: 0x11 could mean "this _child_ transaction is committed - +> you have to lookup in pg_xact_child to get parent xid and use pg_log again +> to get parent xact status". If parent committed then child xact status +> will be changed to 0x10 (committed) else - to 0x01 (aborted). Using this +> we could get xact nesting and savepoints by starting new child xaction +> inside running one... + +OK. + +> +> > +> > I saw that. By keeping two copies of pg_log, one in memory to be used +> ^^^^^^ +> Just two pg_log pages... + +Got it. + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Sun Nov 9 22:07:36 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA04655 + for ; Sun, 9 Nov 1997 22:07:30 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id VAA07023; Sun, 9 Nov 1997 21:55:54 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 09 Nov 1997 21:52:20 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id VAA06174 for pgsql-hackers-outgoing; Sun, 9 Nov 1997 21:52:13 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id VAA06092 for ; Sun, 9 Nov 1997 21:51:58 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id VAA04150; + Sun, 9 Nov 1997 21:50:29 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711100250.VAA04150@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! (fwd) +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Sun, 9 Nov 1997 21:50:29 -0500 (EST) +Cc: hackers@postgreSQL.org (PostgreSQL-development) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Forwarded message: +> > > Perhaps mmap() would be a good idea. My system has msync() to flush +> > > mmap()'ed pages to the underlying file. You would still run fsync() +> > > after that. This may give us the best of both worlds: a shared-memory +> > ^^^^^^^^^^^^^ +> > > area of variable size, and control of when it get flushed to disk. Do +> > ^^^^^^^^^^^^^^^^^^^^^ +> > I like it. FreeBSD supports +> > +> > MAP_ANON Map anonymous memory not associated with any specific file. +> > +> > It would be nice to use mmap to get more "shared" memory, but I don't see +> > reasons to mmap any particular file to memory. Having two last pg_log pages +> > in memory + xact commit/abort writeback optimization (updation of commit/abort +> > xmin/xmax status in tuples by any scan - we already have this) reduce access +> > to "old" pg_log pages to zero. +> +> I totally agree. There is no advantage to mmap() vs. shared memory for +> us. I thought if we could control when the mmap() gets flushed to disk, +> we could let the OS handle the syncing, but I doubt this is going to be +> portable. +> +> Though, we could mmap() pg_log, and that way backends would not have to +> read/write the blocks, and they could all see the same data. But with +> the new scheme, they have most transaction ids in shared memory. +> +> Interesting you mention the scan updating the transaction status. We +> would have a problem here. It is possible a backend will update the +> commit status of a data page, and that data page will make it to disk, +> but if there is a crash before the update pg_log gets sync'ed, there +> would be a partial transaction in the system. +> +> I don't know any way that a backend would know the transaction has hit +> disk, and the data commit flag could be set. You don't want to update +> the commit flag of the data page until entire transaction has been +> sync'ed. The only way to do that would be to have a 'commit and synced' +> flag, but you want to save that for nested transactions. +> +> Another case this could come in handy is to allow reuse of superceeded +> data rows. If the transaction is committed and synced, the row space +> could be reused by another transaction. +> + +I have been thinking about the mmap() issue, and it seems a natural for +pg_log. You can have every backend mmap() pg_log. It becomes a dynamic +shared memory area that is auto-initialized to the contents of pg_log, +and all changes can be made by all backends. No locking needed. We can +also flush the changes to the underlying file. Under bsdi, you can also +have the mmap area follow you across exec() calls, so each backend +doesn't have to do anything. I want to replace exec with fork also, so +the stuff would be auto-loaded in the address space of each backend. + +This way, you don't have to have two on-line pages and move them around +as pg_log grows. + +The only problem remains how to mark certain transactions as synced or +force only synced transactions to hit the pg_log file itself, and data +row commit status only should be updated for synced transactions. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Sun Nov 9 23:00:58 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA05394 + for ; Sun, 9 Nov 1997 23:00:55 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA25139 for ; Sun, 9 Nov 1997 22:42:33 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id KAA01845; Mon, 10 Nov 1997 10:49:25 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <34668444.237C228A@sable.krasnoyarsk.su> +Date: Mon, 10 Nov 1997 10:49:24 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: marc@fallon.classyad.com, hackers@postgreSQL.org +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +References: <199711072020.PAA20010@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > Anything that promises that much speedup, if it can be done easy, I say +> > > lets consider it, even if you loose 60 seconds of changes. +> > +> > I agreed with your un-buffered logging idea. This would be excellent +> > feature for un-critical dbase usings (WWW, etc). +> +> Actually, it is buffered logging. We currently have unbuffered logging, +> I think. + +Sorry - mistyping. + +> +> Interesting you mention the scan updating the transaction status. We +> would have a problem here. It is possible a backend will update the +> commit status of a data page, and that data page will make it to disk, +> but if there is a crash before the update pg_log gets sync'ed, there +> would be a partial transaction in the system. + +You're right! Currently, only system relations can be affected by this: +backend releases locks on user tables after syncing data and pg_log. +I'll keep this in mind... + +> > > > These two last pg_log pages are "online" ones. Race condition: when one or +> > > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded +> > > > when writing commit/abort of "big" xid. This is how we could handle this +> > > > in "buffered" logging (delayed fsync) mode: +> > > > +> > > > When backend want to write commit/abort status he acquires exclusive +> > > > OnLineLogLock. If xid belongs to online pages then backend writes status +> +> This confuses me. Why does a backend need to lock pg_log to update a +> transaction status? + +What if two backends try to change xact statuses in the same byte ? + +Vadim + +From owner-pgsql-hackers@hub.org Sun Nov 9 23:59:50 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA06523 + for ; Sun, 9 Nov 1997 23:59:48 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA27105 for ; Sun, 9 Nov 1997 23:41:39 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA08860; Sun, 9 Nov 1997 23:35:42 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 09 Nov 1997 23:31:50 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA07962 for pgsql-hackers-outgoing; Sun, 9 Nov 1997 23:31:43 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA07875 for ; Sun, 9 Nov 1997 23:31:28 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id XAA05566; + Sun, 9 Nov 1997 23:17:41 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711100417.XAA05566@candle.pha.pa.us> +Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Sun, 9 Nov 1997 23:17:41 -0500 (EST) +Cc: marc@fallon.classyad.com, hackers@postgreSQL.org +In-Reply-To: <34668444.237C228A@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 10, 97 10:49:24 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> > > > > These two last pg_log pages are "online" ones. Race condition: when one or +> > > > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded +> > > > > when writing commit/abort of "big" xid. This is how we could handle this +> > > > > in "buffered" logging (delayed fsync) mode: +> > > > > +> > > > > When backend want to write commit/abort status he acquires exclusive +> > > > > OnLineLogLock. If xid belongs to online pages then backend writes status +> > +> > This confuses me. Why does a backend need to lock pg_log to update a +> > transaction status? +> +> What if two backends try to change xact statuses in the same byte ? + +Ooo, you got me. I so hoped to prevent locking. It would be nice if: + + *x |= 3; + +would be atomic, but I don't think it is. Most RISC machines don't even +have an OR against a memory address, I think. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + diff --git a/doc/TODO.detail/subquery b/doc/TODO.detail/subquery index cbf3a9b88c..cdc55c8580 100644 --- a/doc/TODO.detail/subquery +++ b/doc/TODO.detail/subquery @@ -74,3 +74,9633 @@ form, if we can handle that more efficiently? regards, tom lane +From aixssd!darrenk@abs.net Thu Dec 5 10:30:53 1996 +Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for ; Thu, 5 Dec 1996 10:30:43 -0500 (EST) +Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST) +Received: by aixssd (AIX 3.2/UCB 5.64/4.03) + id AA36963; Thu, 5 Dec 1996 10:10:24 -0500 +Received: by ceodev (AIX 4.1/UCB 5.64/4.03) + id AA34942; Thu, 5 Dec 1996 10:07:56 -0500 +Date: Thu, 5 Dec 1996 10:07:56 -0500 +From: aixssd!darrenk@abs.net (Darren King) +Message-Id: <9612051507.AA34942@ceodev> +To: maillist@candle.pha.pa.us +Subject: Subselect info. +Mime-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Content-Md5: jaWdPH2KYtdr7ESzqcOp5g== +Status: OR + +> Any of them deal with implementing subselects? + +There's a white paper at the www.sybase.com that might +help a little. It's just a copy of a presentation +given by the optimizer guru there. Nothing code-wise, +but he gives a few ways of flattening them with temp +tables, etc... + +Darren + +From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109 + for ; Thu, 21 Aug 1997 23:42:43 -0400 (EDT) +Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD) +Sender: root@www.krasnet.ru +Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> +Date: Fri, 22 Aug 1997 12:04:31 +0800 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +Subject: Re: subselects +References: <199708220219.WAA23745@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Considering the complexity of the primary/secondary changes you are +> making, I believe subselects will be easier than that. + +I don't do changes for P/F keys - just thinking... +Yes, I think that impl of referential integrity is +more complex work. + +As for subselects: + +in plannodes.h + +typedef struct Plan { +... + struct Plan *lefttree; + struct Plan *righttree; +} Plan; + +/* ---------------- + * these are are defined to avoid confusion problems with "left" + ^^^^^^^^^^^^^^^^^^ + * and "right" and "inner" and "outer". The convention is that + * the "left" plan is the "outer" plan and the "right" plan is + * the inner plan, but these make the code more readable. + * ---------------- + */ +#define innerPlan(node) (((Plan *)(node))->righttree) +#define outerPlan(node) (((Plan *)(node))->lefttree) + +First thought is avoid any confusions by re-defining + +#define rightPlan(node) (((Plan *)(node))->righttree) +#define leftPlan(node) (((Plan *)(node))->lefttree) + +and change all occurrences of 'outer' & 'inner' in code +to 'left' & 'inner' ones: + +this will allow to use 'outer' & 'inner' things for subselects +latter, without confusion. My hope is that we may change Executor +very easy by adding outer/inner plans/TupleSlots to +EState, CommonState, JoinState, etc and by doing node +processing in right order. + +Subselects are mostly Planner problem. + +Unfortunately, I havn't time at the moment: CHECK/DEFAULT... + +Vadim + +From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354 + for ; Fri, 22 Aug 1997 00:00:51 -0400 (EDT) +Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD) +Sender: root@www.krasnet.ru +Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su> +Date: Fri, 22 Aug 1997 12:22:37 +0800 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +Subject: Re: subselects +References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Vadim B. Mikheev wrote: +> +> this will allow to use 'outer' & 'inner' things for subselects +> latter, without confusion. My hope is that we may change Executor + +Or may be use 'high' & 'low' for subselecs (to avoid confusion +with outter hoins). + +> very easy by adding outer/inner plans/TupleSlots to +> EState, CommonState, JoinState, etc and by doing node +> processing in right order. + ^^^^^^^^^^^^^^ +Rule is easy: +1. Uncorrelated subselect - do 'low' plan node first +2. Correlated - do left/right first + +- just some flag in structures. + +Vadim + +From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682 + for ; Thu, 30 Oct 1997 17:02:28 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for ; Thu, 30 Oct 1997 16:57:54 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726 + for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST) +From: Bruce Momjian +Message-Id: <199710302150.QAA07726@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +The only thing I have to add to what I had written earlier is that I +think it is best to have these subqueries executed as early in query +execution as possible. + +Every piece of the backend: parser, optimizer, executor, is designed to +work on a single query. The earlier we can split up the queries, the +better those pieces will work at doing their job. You want to be able +to use the parser and optimizer on each part of the query separately, if +you can. + + +Forwarded message: +> I have done some thinking about subselects. There are basically two +> issues: + > +> Does the query return one row or several rows? This can be +> determined by seeing if the user uses equals on 'IN' to join the +> subquery. +> +> Is the query correlated, meaning "Does the subquery reference +> values from the outer query?" +> +> (We already have the third type of subquery, the INSERT...SELECT query.) +> +> So we have these four combinations: +> +> 1) one row, no correlation +> 2) multiple rows, no correlation +> 3) one row, correlated +> 4) multiple rows, correlated +> +> +> With #1, we can execute the subquery, get the value, replace the +> subquery with the constant returned from the subquery, and execute the +> outer query. +> +> With #2, we can execute the subquery and put the result into a temporary +> table. We then rewrite the outer query to access the temporary table +> and replace the subquery with the column name from the temporary table. +> We probabally put an index on the temp. table, which has only one +> column, because a subquery can only return one column. We remove the +> temp. table after query execution. +> +> With #3 and #4, we potentially need to execute the subquery for every +> row returned by the outer query. Performance would be horrible for +> anything but the smallest query. Another way to handle this is to +> execute the subquery WITHOUT using any of the outer-query columns to +> restrict the WHERE clause, and add those columns used to join the outer +> variables into the target list of the subquery. So for query: +> +> select t1.name +> from tab t1 +> where t1.age = (select max(t2.age) +> from tab2 +> where tab2.name = t1.name) +> +> Execute the subquery and put it in a temporary table: +> +> select t2.name, max(t2.age) +> into table temp999 +> from tab2 +> where tab2.name = t1.name +> +> create index i_temp999 on temp999 (name) +> +> Then re-write the outer query: +> +> select t1.name +> from tab t1, temp999 +> where t1.age = temp999.age and +> t1.name = temp999.name +> +> The only problem here is that the subselect is running for all entries +> in tab2, even if the outer query is only going to need a few rows. +> Determining whether to execute the subquery each time, or create a temp. +> table is often difficult to determine. Even some non-correlated +> subqueries are better to execute for each row rather the pre-execute the +> entire subquery, expecially if the outer query returns few rows. +> +> One requirement to handle these issues is better column statistics, +> which I am working on. +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643 + for ; Fri, 31 Oct 1997 22:30:56 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for ; Fri, 31 Oct 1997 22:06:08 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for ; Fri, 31 Oct 1997 22:00:53 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566; + Fri, 31 Oct 1997 21:37:06 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711010237.VAA14566@candle.pha.pa.us> +Subject: Re: [HACKERS] subselects +To: maillist@candle.pha.pa.us (Bruce Momjian) +Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +One more issue I thought of. You can have multiple subselects in a +single query, and subselects can have their own subselects. + +This makes it particularly important that we define a system that always +is able to process the subselect BEFORE the upper select. This will +allow use to handle all these cases without limitations. + +> +> The only thing I have to add to what I had written earlier is that I +> think it is best to have these subqueries executed as early in query +> execution as possible. +> +> Every piece of the backend: parser, optimizer, executor, is designed to +> work on a single query. The earlier we can split up the queries, the +> better those pieces will work at doing their job. You want to be able +> to use the parser and optimizer on each part of the query separately, if +> you can. +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From hannu@trust.ee Sun Nov 2 10:33:33 1997 +Received: from sid.trust.ee (sid.trust.ee [194.204.23.180]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619 + for ; Sun, 2 Nov 1997 10:32:04 -0500 (EST) +Received: from sid.trust.ee (wink.trust.ee [194.204.23.184]) + by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233; + Sun, 2 Nov 1997 17:30:11 +0200 +Message-ID: <345C9BFD.986C68AA@sid.trust.ee> +Date: Sun, 02 Nov 1997 17:27:57 +0200 +From: Hannu Krosing +X-Mailer: Mozilla 4.02 [en] (Win95; I) +MIME-Version: 1.0 +To: hackers-digest@postgresql.org +CC: maillist@candle.pha.pa.us +Subject: Re: [HACKERS] subselects +References: <199711010401.XAA09216@hub.org> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) +> From: Bruce Momjian +> Subject: Re: [HACKERS] subselects +> +> One more issue I thought of. You can have multiple subselects in a +> single query, and subselects can have their own subselects. +> +> This makes it particularly important that we define a system that always +> is able to process the subselect BEFORE the upper select. This will +> allow use to handle all these cases without limitations. + +This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a +search criteria for the subselect, +for example you can't do + +update parts p1 +set parts.current_id = ( + select new_id + from parts p2 + where p1.old_id = p2.new_id);or + +select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice +from parts p1; + +there may be of course ways to rewrite these queries (which the optimiser should do +if it can) but IMHO, these kinds of subselects should still be allowed + +> > The only thing I have to add to what I had written earlier is that I +> > think it is best to have these subqueries executed as early in query +> > execution as possible. +> > +> > Every piece of the backend: parser, optimizer, executor, is designed to +> > work on a single query. The earlier we can split up the queries, the +> > better those pieces will work at doing their job. You want to be able +> > to use the parser and optimizer on each part of the query separately, if +> > you can. +> > +> + +Hannu + + +From vadim@sable.krasnoyarsk.su Sun Nov 2 21:30:59 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831 + for ; Sun, 2 Nov 1997 21:30:57 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for ; Sun, 2 Nov 1997 21:20:13 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su> +Date: Mon, 03 Nov 1997 09:22:38 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselects +References: <199711021848.NAA08319@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > One more issue I thought of. You can have multiple subselects in a +> > > single query, and subselects can have their own subselects. +> > > +> > > This makes it particularly important that we define a system that always +> > > is able to process the subselect BEFORE the upper select. This will +> > > allow use to handle all these cases without limitations. +> > +> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a +> > search criteria for the subselect, +> > for example you can't do +> > +> > update parts p1 +> > set parts.current_id = ( +> > select new_id +> > from parts p2 +> > where p1.old_id = p2.new_id);or +> > +> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice +> > from parts p1; +> > +> > there may be of course ways to rewrite these queries (which the optimiser should do +> > if it can) but IMHO, these kinds of subselects should still be allowed +> +> I hadn't even gotten to this point yet, but it is a good thing to keep +> in mind. +> +> In these cases, as in correlated subqueries in the where clause, we will +> create a temporary table, and add the proper join fields and tables to +> the clauses. Our version of UPDATE accepts a FROM section, and we will +> certainly use this for this purpose. + +We can't replace subselect with join if there is aggregate +in subselect. + +Actually, I don't see any problems if we going to process subselect +like sql-funcs: non-correlated subselects can be emulated by +funcs without args, for correlated subselects parser (analyze.c) +has to change all upper query references to $1, $2,... + +Vadim + +From vadim@sable.krasnoyarsk.su Mon Nov 3 06:07:12 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433 + for ; Mon, 3 Nov 1997 06:07:03 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su> +Date: Mon, 03 Nov 1997 18:09:43 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselects +References: <199711030316.WAA15401@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > +> > > In these cases, as in correlated subqueries in the where clause, we will +> > > create a temporary table, and add the proper join fields and tables to +> > > the clauses. Our version of UPDATE accepts a FROM section, and we will +> > > certainly use this for this purpose. +> > +> > We can't replace subselect with join if there is aggregate +> > in subselect. +> +> I got lost here. Why can't we handle aggregates? + +Sorry, I missed using of temp tables. Sybase uses joins (without +temp tables) for non-correlated subqueries: + + A noncorrelated subquery can be evaluated as if it were an independent query. + Conceptually, the results of the subquery are substituted in the main statement, or + outer query. This is not how SQL Server actually processes statements with + subqueries. Noncorrelated subqueries can be alternatively stated as joins and + are processed as joins by SQL Server. + +but this is not possible if there are aggregates in subquery. + +> +> My idea was this. This is a non-correlated subquery. +... +No problems with it... + +> +> Here is a correlated example: +> +> select * +> from table_a +> where table_a.col_a in (select table_b.col_b +> from table_b +> where table_b.col_b = table_a.col_c) +> +> rewrite as: +> +> select distinct table_b.col_b, table_a.col_c -- the distinct is needed +> into table_sub +> from table_a, table_b + +First, could we add 'where table_b.col_b = table_a.col_c' here ? +Just to avoid Cartesian results ? I hope we can. + +Note that for query + + select * + from table_a + where table_a.col_a in (select table_b.col_b * table_a.col_c + from table_b) + +it's better to do + + select distinct table_a.col_a + into table table_sub + from table_b, table_a + where table_a.col_a = table_b.col_b * table_a.col_c + +once again - to avoid Cartesians. + +But what could we do for + + select * + from table_a + where table_a.col_a = (select max(table_b.col_b * table_a.col_c) + from table_b) +??? + select max(table_b.col_b * table_a.col_c), table_a.col_a + into table table_sub + from table_b, table_a + group by table_a.col_a + +first tries to sort sizeof(table_a) * sizeof(table_b) tuples... +For tables big and small with 100 000 and 1000 tuples + +select max(x*y), x from big, small group by x + +"ate" all free 140M in my file system after 20 minutes (just for +sorting - nothing more) and was killed... + +select x from big where x = cor(x); +(cor(int4) is 'select max($1*y) from small') takes 20 minutes - +this is bad too. + +> > +> > Actually, I don't see any problems if we going to process subselect +> > like sql-funcs: non-correlated subselects can be emulated by +> > funcs without args, for correlated subselects parser (analyze.c) +> > has to change all upper query references to $1, $2,... +> +> Yes, logically, they are SQL functions, but aren't we going to see +> terrible performance in such circumstances. My experience is that when + ^^^^^^^^^^^^^^^^^^^^ +You're right. + +> people are given subselects, they start to do huge jobs with them. +> +> In fact, the final solution may be to have both methods available, and +> switch between them depending on the size of the query sets. Each +> method has its advantages. The function example lets the outside query +> be executed, and only calls the subquery when needed. +> +> For large tables where the subselect is small and is the entire WHERE +> restriction, the SQL function gets call much too often. A simple join +> of the subquery result and the large table would be much better. This +> method also allows for sort/merge join of the subquery results, and +> index use. + +...keep thinking... + +Vadim + +From owner-pgsql-hackers@hub.org Mon Nov 3 11:01:01 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633 + for ; Mon, 3 Nov 1997 11:00:59 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for ; Mon, 3 Nov 1997 10:49:42 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for ; Mon, 3 Nov 1997 10:31:23 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262; + Mon, 3 Nov 1997 10:25:34 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711031525.KAA02262@candle.pha.pa.us> +Subject: Re: [HACKERS] subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> Sorry, I missed using of temp tables. Sybase uses joins (without +> temp tables) for non-correlated subqueries: +> +> A noncorrelated subquery can be evaluated as if it were an independent query. +> Conceptually, the results of the subquery are substituted in the main statement, or +> outer query. This is not how SQL Server actually processes statements with +> subqueries. Noncorrelated subqueries can be alternatively stated as joins and +> are processed as joins by SQL Server. +> +> but this is not possible if there are aggregates in subquery. +> +> > +> > My idea was this. This is a non-correlated subquery. +> ... +> No problems with it... +> +> > +> > Here is a correlated example: +> > +> > select * +> > from table_a +> > where table_a.col_a in (select table_b.col_b +> > from table_b +> > where table_b.col_b = table_a.col_c) +> > +> > rewrite as: +> > +> > select distinct table_b.col_b, table_a.col_c -- the distinct is needed +> > into table_sub +> > from table_a, table_b +> +> First, could we add 'where table_b.col_b = table_a.col_c' here ? +> Just to avoid Cartesian results ? I hope we can. + +Yes, of course. I forgot that line here. We can also be fancy and move +some of the outer where restrictions on table_a into the subquery. + +I think the classic subquery for this would be if someone wanted all +customer names that had invoices in the past month: + +select custname +from customer +where custid in (select order.custid + from order + where order.date >= "09/01/97" and + order.date <= "09/30/97" + +In this case, the subquery can use an index on 'date' to quickly +evaluate the query, and the resulting temp table can quickly be joined +to the customer table. If we used SQL functions, every customer would +have an order query evaluated for it, and there may be no multi-column +index on customer and date, or even if there is, this could be many +query executions. + + +> +> Note that for query +> +> select * +> from table_a +> where table_a.col_a in (select table_b.col_b * table_a.col_c +> from table_b) +> +> it's better to do +> +> select distinct table_a.col_a +> into table table_sub +> from table_b, table_a +> where table_a.col_a = table_b.col_b * table_a.col_c + +Yes, I had not thought of cases where they are doing correlated column +arithmetic, but it looks like this would work. + +> +> once again - to avoid Cartesians. +> +> But what could we do for +> +> select * +> from table_a +> where table_a.col_a = (select max(table_b.col_b * table_a.col_c) +> from table_b) + +OK, who wrote this horrible query. :-) + +Without a join of table_b and table_a, even an SQL function would die on +this. You have to take the current value table_a.col_c, and multiply by +every value of table_b.col_b to get the maximum. + +Trying to do a temp table on this is certainly going to be a cartesian +product, but using an SQL function is also going to be a cartesian +product, except that the product is generated in small pieces instead of +in one big query. The SQL function example may eventually complete, but +it will take forever to do so in cases where the temp table would bomb. + +I can recommend some SQL books for anyone go sends in a bug report on +this query. :-) + + + +> ??? +> select max(table_b.col_b * table_a.col_c), table_a.col_a +> into table table_sub +> from table_b, table_a +> group by table_a.col_a +> +> first tries to sort sizeof(table_a) * sizeof(table_b) tuples... +> For tables big and small with 100 000 and 1000 tuples +> +> select max(x*y), x from big, small group by x +> +> "ate" all free 140M in my file system after 20 minutes (just for +> sorting - nothing more) and was killed... +> +> select x from big where x = cor(x); +> (cor(int4) is 'select max($1*y) from small') takes 20 minutes - +> this is bad too. + +Again, my feeling is that in cases where the temp table would bomb, the +SQL function will be so slow that neither will be acceptable. + +> +> > > +> > > Actually, I don't see any problems if we going to process subselect +> > > like sql-funcs: non-correlated subselects can be emulated by +> > > funcs without args, for correlated subselects parser (analyze.c) +> > > has to change all upper query references to $1, $2,... +> > +> > Yes, logically, they are SQL functions, but aren't we going to see +> > terrible performance in such circumstances. My experience is that when +> ^^^^^^^^^^^^^^^^^^^^ +> You're right. +> +> > people are given subselects, they start to do huge jobs with them. +> > +> > In fact, the final solution may be to have both methods available, and +> > switch between them depending on the size of the query sets. Each +> > method has its advantages. The function example lets the outside query +> > be executed, and only calls the subquery when needed. +> > +> > For large tables where the subselect is small and is the entire WHERE +> > restriction, the SQL function gets call much too often. A simple join +> > of the subquery result and the large table would be much better. This +> > method also allows for sort/merge join of the subquery results, and +> > index use. +> +> ...keep thinking... +> +> Vadim +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239 + for ; Thu, 20 Nov 1997 00:09:11 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for ; Wed, 19 Nov 1997 23:58:16 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103 + for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711200457.XAA03103@candle.pha.pa.us> +Subject: [HACKERS] subselect +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +I am going to overhaul all the /parser files, and I may give subselects +a try while I am in there. This is where it going to have to be done. + +Two things I think I need are: + + temp tables that go away at the end of a statement, so if the +query elog's out, the temp file gets destroyed + + how do I implement "not in": + + select * from a where x not in (select y from b) + +Using <> is not going to work because that returns multiple copies of a, +one for every one that doesn't equal. It is like we need not equals, +but don't return multiple rows. + +Any ideas? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019 + for ; Thu, 20 Nov 1997 10:00:56 -0500 (EST) +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for ; Thu, 20 Nov 1997 09:52:55 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754; + Thu, 20 Nov 1997 06:27:21 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <3473D849.16F67A2A@alumni.caltech.edu> +Date: Thu, 20 Nov 1997 06:27:21 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselect +References: <199711200457.XAA03103@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> I am going to overhaul all the /parser files + +?? + +> , and I may give subselects +> a try while I am in there. This is where it going to have to be done. + +A first cut at the subselect syntax is already in gram.y. I'm sure that the +e-mail you had sent which collected several items regarding subselects +covers some of this topic. I've been thinking about subselects also, and +had thought that there must be some existing mechanisms in the backend +which can be used to help implement subselects. It seems to me that UNION +might be a good thing to implement first, because it has a fairly +well-defined set of behaviors: + + select a union select b; + +chooses elements from a and from b and then sorts/uniques the result. + + select a union all select b; + +chooses elements from a, sorts/uniques, and then adds all elements from b. + + select a union select b union all select c; + +evaluates left to right, and first evaluates a union b, sorts/uniques, and +then evaluates + + (result) union all select c; + +There are several types of subselects. Examples of some are: + +1) select a.f from a union select b.f from b order by 1; +Needs temporary table(s), optional sort/unique, final order by. + +2) select a.f from a where a.f in (select b.f from b); +Needs temporary table(s). "in" can be first implemented by count(*) > 0 but +would be better performance to have the backend return after the first +match. + +3) select a.f from a where exists (select b.f from b where b.f = a); +Need to do the select and do a subselect on _each_ of the returned values? +Again could use count(*) to help implement. + +This brings up the point that perhaps the backend needs a row-counting +atomic operation and count(*) could be re-implemented using that. At the +moment count(*) is transformed to a select of OID columns and does not +quite work on table joins. + +I would think that outer joins could use some of these support routines +also. + + - Tom + +> Two things I think I need are: +> +> temp tables that go away at the end of a statement, so if the +> query elog's out, the temp file gets destroyed +> +> how do I implement "not in": +> +> select * from a where x not in (select y from b) +> +> Using <> is not going to work because that returns multiple copies of a, +> one for every one that doesn't equal. It is like we need not equals, +> but don't return multiple rows. +> +> Any ideas? +> +> -- +> Bruce Momjian +> maillist@candle.pha.pa.us + + + + +From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311 + for ; Mon, 22 Dec 1997 00:49:01 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for ; Mon, 22 Dec 1997 00:44:57 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605 + for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712220545.AAA11605@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +OK, a few questions: + + Should we use sortmerge, so we can use our psort as temp tables, +or do we use hashunique? + + How do we pass the query to the optimizer? How do we represent +the range table for each, and the links between them in correlated +subqueries? + +I have to think about this. Comments are welcome. +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608 + for ; Mon, 22 Dec 1997 02:01:25 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for ; Mon, 22 Dec 1997 01:37:29 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for ; Mon, 22 Dec 1997 01:30:15 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354 + for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712220605.BAA17354@candle.pha.pa.us> +Subject: [HACKERS] subselects (fwd) +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Forwarded message: +> OK, a few questions: +> +> Should we use sortmerge, so we can use our psort as temp tables, +> or do we use hashunique? +> +> How do we pass the query to the optimizer? How do we represent +> the range table for each, and the links between them in correlated +> subqueries? +> +> I have to think about this. Comments are welcome. + +One more thing. I guess I am seeing subselects as a different thing +that temp tables. I can see people wanting to put indexes on their temp +tables, so I think they will need more system catalog support. For +subselects, I think we can just stuff them into psort, perhaps, and do +the unique as we unload them. + +Seems like a natural to me. + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876 + for ; Tue, 23 Dec 1997 04:00:57 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042; + Tue, 23 Dec 1997 16:08:56 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su> +Date: Tue, 23 Dec 1997 16:08:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselects (fwd) +References: <199712220605.BAA17354@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Forwarded message: +> > OK, a few questions: +> > +> > Should we use sortmerge, so we can use our psort as temp tables, +> > or do we use hashunique? +> > +> > How do we pass the query to the optimizer? How do we represent +> > the range table for each, and the links between them in correlated +> > subqueries? +> > +> > I have to think about this. Comments are welcome. +> +> One more thing. I guess I am seeing subselects as a different thing +> that temp tables. I can see people wanting to put indexes on their temp +> tables, so I think they will need more system catalog support. For + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +What's the difference between temp tables and temp indices ? +Both of them are handled via catalog cache... + +Vadim + +From vadim@sable.krasnoyarsk.su Sat Jan 3 04:01:00 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565 + for ; Sat, 3 Jan 1998 04:00:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for ; Sat, 3 Jan 1998 03:47:07 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017; + Sat, 3 Jan 1998 16:08:55 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su> +Date: Sat, 03 Jan 1998 16:08:51 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian , + "Thomas G. Lockhart" +Subject: Re: subselects +References: <199712290516.AAA12579@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> With UNIONs done, how are things going with you on subselects? UNIONs +> are much easier that subselects. +> +> I am stumped on how to record the subselect query information in the +> parser and stuff. + + And I'm too. We definitely need in EXISTS node and may be in IN one. +Also, we have to support ANY and ALL modifiers of comparison operators +(it would be nice to support ANY and ALL for all operators returning +bool: >, =, ..., like, ~ and so on). Note, that IN is the same as += ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types, +and so, we could avoid IN node, but I'm not sure that I like such +assumption: postgres is OO-like system allowing operators to be overriden +and so, '=' can, in theory, mean not EQUAL but something else (someday +we could allow to specify "meaning" of operator in CREATE OPERATOR) - +in short, I would like IN node. + Also, I would suggest nodes for ANY and ALL. + (I need in few days to think more about recording of this stuff...) + +> +> Please let me know what I can do to help, if anything. + +Thanks. As I remember, Tom also wished to work here. Tom ? + +Bye, + Vadim + +P.S. I'll be "on-line" Jan 5. + +From owner-pgsql-hackers@hub.org Mon Jan 5 07:30:51 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466 + for ; Mon, 5 Jan 1998 07:30:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for ; Mon, 5 Jan 1998 07:20:57 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278; + Mon, 5 Jan 1998 19:36:06 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> +Date: Mon, 05 Jan 1998 19:35:59 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselect +References: <199801050516.AAA28005@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> I was thinking about subselects, and how to attach the two queries. +> +> What if the subquery makes a range table entry in the outer query, and +> the query is set up like the UNION queries where we put the scans in a +> row, but in the case we put them over/under each other. +> +> And we push a temp table into the catalog cache that represents the +> result of the subquery, then we could join to it in the outer query as +> though it was a real table. +> +> Also, can't we do the correlated subqueries by adding the proper +> target/output columns to the subquery, and have the outer query +> reference those columns in the subquery range table entry. + +Yes, this is a way to handle subqueries by joining to temp table. +After getting plan we could change temp table access path to +node material. On the other hand, it could be useful to let optimizer +know about cost of temp table creation (have to think more about it)... +Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +is one example of this - joining by <> will give us invalid results. +Setting special NOT EQUAL flag is not enough: subquery plan must be +always inner one in this case. The same for handling ALL modifier. +Note, that we generaly can't use aggregates here: we can't add MAX to +subquery in the case of > ALL (subquery), because of > ALL should return FALSE +if subquery returns NULL(s) but aggregates don't take NULLs into account. + +> +> Maybe I can write up a sample of this? Vadim, would this help? Is this +> the point we are stuck at? + +Personally, I was stuck by holydays -:) +Now I can spend ~ 8 hours ~ each day for development... + +Vadim + + +From owner-pgsql-hackers@hub.org Mon Jan 5 10:45:30 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769 + for ; Mon, 5 Jan 1998 10:45:28 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for ; Mon, 5 Jan 1998 10:31:06 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375; + Mon, 5 Jan 1998 10:28:48 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801051528.KAA10375@candle.pha.pa.us> +Subject: Re: [HACKERS] subselect +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> Yes, this is a way to handle subqueries by joining to temp table. +> After getting plan we could change temp table access path to +> node material. On the other hand, it could be useful to let optimizer +> know about cost of temp table creation (have to think more about it)... +> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +> is one example of this - joining by <> will give us invalid results. +> Setting special NOT EQUAL flag is not enough: subquery plan must be +> always inner one in this case. The same for handling ALL modifier. +> Note, that we generaly can't use aggregates here: we can't add MAX to +> subquery in the case of > ALL (subquery), because of > ALL should return FALSE +> if subquery returns NULL(s) but aggregates don't take NULLs into account. + +OK, here are my ideas. First, I think you have to handle subselects in +the outer node because a subquery could have its own subquery. Also, we +now have a field in Aggreg to all us to 'usenulls'. + +OK, here it is. I recommend we pass the outer and subquery through +the parser and optimizer separately. + +We parse the subquery first. If the subquery is not correlated, it +should parse fine. If it is correlated, any columns we find in the +subquery that are not already in the FROM list, we add the table to the +subquery FROM list, and add the referenced column to the target list of +the subquery. + +When we are finished parsing the subquery, we create a catalog cache +entry for it called 'sub1' and make its fields match the target +list of the subquery. + +In the outer query, we add 'sub1' to its target list, and change +the subquery reference to point to the new range table. We also add +WHERE clauses to do any correlated joins. + +Here is a simple example: + + select * + from taba + where col1 = (select col2 + from tabb) + +This is not correlated, and the subquery parser easily. We create a +'sub1' catalog cache entry, and add 'sub1' to the outer query FROM +clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'. + +Here is a more complex correlated subquery: + + select * + from taba + where col1 = (select col2 + from tabb + where taba.col3 = tabb.col4) + +Here we must add 'taba' to the subquery's FROM list, and add col3 to the +target list of the subquery. After we parse the subquery, add 'sub1' to +the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = +sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. +THe optimizer will do the correlation for us. + +In the optimizer, we can parse the subquery first, then the outer query, +and then replace all 'sub1' references in the outer query to use the +subquery plan. + +I realize making merging the two plans and doing IN and NOT IN is the +real challenge, but I hoped this would give us a start. + +What do you think? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Mon Jan 5 15:02:46 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690 + for ; Mon, 5 Jan 1998 15:02:44 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for ; Mon, 5 Jan 1998 14:28:43 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904; + Tue, 6 Jan 1998 02:56:00 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 02:55:57 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801051528.KAA10375@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > always inner one in this case. The same for handling ALL modifier. +> > Note, that we generaly can't use aggregates here: we can't add MAX to +> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE +> > if subquery returns NULL(s) but aggregates don't take NULLs into account. +> +> OK, here are my ideas. First, I think you have to handle subselects in +> the outer node because a subquery could have its own subquery. Also, we + +I hope that this is no matter: if results of subquery (with/without sub-subqueries) +will go into temp table then this table will be re-scanned for each outer tuple. + +> now have a field in Aggreg to all us to 'usenulls'. + ^^^^^^^^ + This can't help: + +vac=> select * from x; +y +- +1 +2 +3 + <<< this is NULL +(4 rows) + +vac=> select max(y) from x; +max +--- + 3 + +==> we can't replace + +select * from A where A.a > ALL (select y from x); + ^^^^^^^^^^^^^^^ + (NULL will be returned and so A.a > ALL is FALSE - this is what + Sybase does, is it right ?) +with + +select * from A where A.a > (select max(y) from x); + ^^^^^^^^^^^^^^^^^^^^ +just because of we lose knowledge about NULLs here. + +Also, I would like to handle ANY and ALL modifiers for all bool +operators, either built-in or user-defined, for all data types - +isn't PostgreSQL OO-like RDBMS -:) + +> OK, here it is. I recommend we pass the outer and subquery through +> the parser and optimizer separately. + +I don't like this. I would like to get parse-tree from parser for +entire query and let optimizer (on upper level) decide how to rewrite +parse-tree and what plans to produce and how these plans should be +merged. Note, that I don't object your methods below, but only where +to place handling of this. I don't understand why should we add +new part to the system which will do optimizer' work (parse-tree --> +execution plan) and deal with optimizer nodes. Imho, upper optimizer +level is nice place to do this. + +> +> We parse the subquery first. If the subquery is not correlated, it +> should parse fine. If it is correlated, any columns we find in the +> subquery that are not already in the FROM list, we add the table to the +> subquery FROM list, and add the referenced column to the target list of +> the subquery. +> +> When we are finished parsing the subquery, we create a catalog cache +> entry for it called 'sub1' and make its fields match the target +> list of the subquery. +> +> In the outer query, we add 'sub1' to its target list, and change +> the subquery reference to point to the new range table. We also add +> WHERE clauses to do any correlated joins. +... +> Here is a more complex correlated subquery: +> +> select * +> from taba +> where col1 = (select col2 +> from tabb +> where taba.col3 = tabb.col4) +> +> Here we must add 'taba' to the subquery's FROM list, and add col3 to the +> target list of the subquery. After we parse the subquery, add 'sub1' to +> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = +> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. +> THe optimizer will do the correlation for us. +> +> In the optimizer, we can parse the subquery first, then the outer query, +> and then replace all 'sub1' references in the outer query to use the +> subquery plan. +> +> I realize making merging the two plans and doing IN and NOT IN is the + ^^^^^^^^^^^^^^^^^^^^^ +This is very easy to do! As I already said we have just change sub1 +access path (SeqScan of sub1) with SeqScan of Material node with +subquery plan. + +> real challenge, but I hoped this would give us a start. + +Decision about how to record subquery stuff in to parse-tree +would be very good start -:) + +BTW, note that for _expression_ subqueries (which are introduced without +IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - +we have to check that subquery returns single tuple... + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:03 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836 + for ; Mon, 5 Jan 1998 20:31:01 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for ; Mon, 5 Jan 1998 19:56:05 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for ; Mon, 5 Jan 1998 19:02:30 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337 + for ; Mon, 5 Jan 1998 17:31:04 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675; + Mon, 5 Jan 1998 17:16:40 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801052216.RAA02675@candle.pha.pa.us> +Subject: Re: [HACKERS] subselect +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> > I am confused. Do you want one flat query and want to pass the whole +> > thing into the optimizer? That brings up some questions: +> +> No. I just want to follow Tom's way: I would like to see new +> SubSelect node as shortened version of struct Query (or use +> Query structure for each subquery - no matter for me), some +> subquery-related stuff added to Query (and SubSelect) to help +> optimizer to start, and see + +OK, so you want the subquery to actually be INSIDE the outer query +expression. Do they share a common range table? If they don't, we +could very easily just fly through when processing the WHERE clause, and +start a new query using a new query structure for the subquery. Believe +me, you don't want a separate SubQuery-type, just re-use Query for it. +It allows you to call all the normal query stuff with a consistent +structure. + +The parser will need to know it is in a subquery, so it can add the +proper target columns to the subquery, or are you going to do that in +the optimizer. You can do it in the optimizer, and join the range table +references there too. + +> +> typedef struct A_Expr +> { +> NodeTag type; +> int oper; /* type of operation +> * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> IN, NOT IN, ANY, ALL, EXISTS here, +> +> char *opname; /* name of operator/function */ +> Node *lexpr; /* left argument */ +> Node *rexpr; /* right argument */ +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> and SubSelect (Query) here (as possible case). +> +> One thought to follow this way: RULEs (and so - VIEWs) are handled by using +> Query - how else can we implement VIEWs on selects with subqueries ? + +Views are stored as nodeout structures, and are merged into the query's +from list, target list, and where clause. I am working out +readfunc,outfunc now to make sure they are up-to-date with all the +current fields. + +> +> BTW, is +> +> select * from A where (select TRUE from B); +> +> valid syntax ? + +I don't think so. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Mon Jan 5 17:01:54 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066 + for ; Mon, 5 Jan 1998 17:01:47 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063; + Tue, 6 Jan 1998 05:18:13 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 05:18:11 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801052051.PAA29341@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > OK, here it is. I recommend we pass the outer and subquery through +> > > the parser and optimizer separately. +> > +> > I don't like this. I would like to get parse-tree from parser for +> > entire query and let optimizer (on upper level) decide how to rewrite +> > parse-tree and what plans to produce and how these plans should be +> > merged. Note, that I don't object your methods below, but only where +> > to place handling of this. I don't understand why should we add +> > new part to the system which will do optimizer' work (parse-tree --> +> > execution plan) and deal with optimizer nodes. Imho, upper optimizer +> > level is nice place to do this. +> +> I am confused. Do you want one flat query and want to pass the whole +> thing into the optimizer? That brings up some questions: + +No. I just want to follow Tom's way: I would like to see new +SubSelect node as shortened version of struct Query (or use +Query structure for each subquery - no matter for me), some +subquery-related stuff added to Query (and SubSelect) to help +optimizer to start, and see + +typedef struct A_Expr +{ + NodeTag type; + int oper; /* type of operation + * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + IN, NOT IN, ANY, ALL, EXISTS here, + + char *opname; /* name of operator/function */ + Node *lexpr; /* left argument */ + Node *rexpr; /* right argument */ + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + and SubSelect (Query) here (as possible case). + +One thought to follow this way: RULEs (and so - VIEWs) are handled by using +Query - how else can we implement VIEWs on selects with subqueries ? + +BTW, is + +select * from A where (select TRUE from B); + +valid syntax ? + +Vadim + +From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:57 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296 + for ; Mon, 5 Jan 1998 18:00:55 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for ; Mon, 5 Jan 1998 17:22:21 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; + Tue, 6 Jan 1998 05:49:02 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 05:48:58 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Goran Thyni +CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Goran Thyni wrote: +> +> Vadim, +> +> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +> is one example of this - joining by <> will give us invalid results. +> +> What is you approach towards this problem? + +Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) +and so, we have to have not just NOT EQUAL flag but some ALL node +with modified operator. + +After that, one way is put subquery into inner plan of an join node +to be sure that for an outer tuple all corresponding subquery tuples +will be tested with modified operator (this will require either +changing code of all join nodes or addition of new plan type - we'll see) +and another way is ... suggested by you: + +> I got an idea that one could reverse the order, +> that is execute the outer first into a temptable +> and delete from that according to the result of the +> subquery and then return it. +> Probably this is too raw and slow. ;-) + +This will be faster in some cases (when subquery returns many results +and there are "not so many" results from outer query) - thanks for idea! + +> +> Personally, I was stuck by holydays -:) +> Now I can spend ~ 8 hours ~ each day for development... +> +> Oh, isn't it christmas eve right now in Russia? + +Due to historic reasons New Year is mu-u-u-uch popular +holiday in Russia -:) + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 5 19:32:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070 + for ; Mon, 5 Jan 1998 19:32:57 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for ; Mon, 5 Jan 1998 18:59:43 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for ; Mon, 5 Jan 1998 19:02:25 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438 + for ; Mon, 5 Jan 1998 17:35:43 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; + Tue, 6 Jan 1998 05:49:02 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 05:48:58 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Goran Thyni +CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Goran Thyni wrote: +> +> Vadim, +> +> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +> is one example of this - joining by <> will give us invalid results. +> +> What is you approach towards this problem? + +Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) +and so, we have to have not just NOT EQUAL flag but some ALL node +with modified operator. + +After that, one way is put subquery into inner plan of an join node +to be sure that for an outer tuple all corresponding subquery tuples +will be tested with modified operator (this will require either +changing code of all join nodes or addition of new plan type - we'll see) +and another way is ... suggested by you: + +> I got an idea that one could reverse the order, +> that is execute the outer first into a temptable +> and delete from that according to the result of the +> subquery and then return it. +> Probably this is too raw and slow. ;-) + +This will be faster in some cases (when subquery returns many results +and there are "not so many" results from outer query) - thanks for idea! + +> +> Personally, I was stuck by holydays -:) +> Now I can spend ~ 8 hours ~ each day for development... +> +> Oh, isn't it christmas eve right now in Russia? + +Due to historic reasons New Year is mu-u-u-uch popular +holiday in Russia -:) + +Vadim + + +From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300 + for ; Mon, 5 Jan 1998 18:00:57 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for ; Mon, 5 Jan 1998 17:42:15 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; + Tue, 6 Jan 1998 06:10:05 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 06:09:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801052216.RAA02675@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > I am confused. Do you want one flat query and want to pass the whole +> > > thing into the optimizer? That brings up some questions: +> > +> > No. I just want to follow Tom's way: I would like to see new +> > SubSelect node as shortened version of struct Query (or use +> > Query structure for each subquery - no matter for me), some +> > subquery-related stuff added to Query (and SubSelect) to help +> > optimizer to start, and see +> +> OK, so you want the subquery to actually be INSIDE the outer query +> expression. Do they share a common range table? If they don't, we + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +No. + +> could very easily just fly through when processing the WHERE clause, and +> start a new query using a new query structure for the subquery. Believe + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +... and filling some subquery-related stuff in upper query structure - +still don't know what exactly this could be -:) + +> me, you don't want a separate SubQuery-type, just re-use Query for it. +> It allows you to call all the normal query stuff with a consistent +> structure. + +No objections. + +> +> The parser will need to know it is in a subquery, so it can add the +> proper target columns to the subquery, or are you going to do that in + +I don't think that we need in it, but list of correlation clauses +could be good thing - all in all parser has to check all column +references... + +> the optimizer. You can do it in the optimizer, and join the range table +> references there too. + +Yes. + +> > typedef struct A_Expr +> > { +> > NodeTag type; +> > int oper; /* type of operation +> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > IN, NOT IN, ANY, ALL, EXISTS here, +> > +> > char *opname; /* name of operator/function */ +> > Node *lexpr; /* left argument */ +> > Node *rexpr; /* right argument */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > and SubSelect (Query) here (as possible case). +> > +> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using +> > Query - how else can we implement VIEWs on selects with subqueries ? +> +> Views are stored as nodeout structures, and are merged into the query's +> from list, target list, and where clause. I am working out +> readfunc,outfunc now to make sure they are up-to-date with all the +> current fields. + +Nice! This stuff was out-of-date for too long time. + +> > BTW, is +> > +> > select * from A where (select TRUE from B); +> > +> > valid syntax ? +> +> I don't think so. + +And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, +ANY, ALL, EXISTS - well. + +(Time to sleep -:) + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:08 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842 + for ; Mon, 5 Jan 1998 20:31:06 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for ; Mon, 5 Jan 1998 20:03:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for ; Mon, 5 Jan 1998 19:02:50 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919 + for ; Mon, 5 Jan 1998 17:54:47 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; + Tue, 6 Jan 1998 06:10:05 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 06:09:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801052216.RAA02675@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> > > I am confused. Do you want one flat query and want to pass the whole +> > > thing into the optimizer? That brings up some questions: +> > +> > No. I just want to follow Tom's way: I would like to see new +> > SubSelect node as shortened version of struct Query (or use +> > Query structure for each subquery - no matter for me), some +> > subquery-related stuff added to Query (and SubSelect) to help +> > optimizer to start, and see +> +> OK, so you want the subquery to actually be INSIDE the outer query +> expression. Do they share a common range table? If they don't, we + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +No. + +> could very easily just fly through when processing the WHERE clause, and +> start a new query using a new query structure for the subquery. Believe + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +... and filling some subquery-related stuff in upper query structure - +still don't know what exactly this could be -:) + +> me, you don't want a separate SubQuery-type, just re-use Query for it. +> It allows you to call all the normal query stuff with a consistent +> structure. + +No objections. + +> +> The parser will need to know it is in a subquery, so it can add the +> proper target columns to the subquery, or are you going to do that in + +I don't think that we need in it, but list of correlation clauses +could be good thing - all in all parser has to check all column +references... + +> the optimizer. You can do it in the optimizer, and join the range table +> references there too. + +Yes. + +> > typedef struct A_Expr +> > { +> > NodeTag type; +> > int oper; /* type of operation +> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > IN, NOT IN, ANY, ALL, EXISTS here, +> > +> > char *opname; /* name of operator/function */ +> > Node *lexpr; /* left argument */ +> > Node *rexpr; /* right argument */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > and SubSelect (Query) here (as possible case). +> > +> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using +> > Query - how else can we implement VIEWs on selects with subqueries ? +> +> Views are stored as nodeout structures, and are merged into the query's +> from list, target list, and where clause. I am working out +> readfunc,outfunc now to make sure they are up-to-date with all the +> current fields. + +Nice! This stuff was out-of-date for too long time. + +> > BTW, is +> > +> > select * from A where (select TRUE from B); +> > +> > valid syntax ? +> +> I don't think so. + +And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, +ANY, ALL, EXISTS - well. + +(Time to sleep -:) + +Vadim + + +From owner-pgsql-hackers@hub.org Thu Jan 8 23:10:50 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707 + for ; Thu, 8 Jan 1998 23:10:48 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for ; Thu, 8 Jan 1998 23:08:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for ; Thu, 8 Jan 1998 23:00:50 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243; + Thu, 8 Jan 1998 22:55:03 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801090355.WAA09243@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST) +Cc: hackers@postgreSQL.org (PostgreSQL-development) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Vadim, I know you are still thinking about subselects, but I have some +more clarification that may help. + +We have to add phantom range table entries to correlated subselects so +they will pass the parser. We might as well add those fields to the +target list of the subquery at the same time: + + select * + from taba + where col1 = (select col2 + from tabb + where taba.col3 = tabb.col4) + +becomes: + + select * + from taba + where col1 = (select col2, tabb.col4 <--- + from tabb, taba <--- + where taba.col3 = tabb.col4) + +We add a field to TargetEntry and RangeTblEntry to mark the fact that it +was entered as a correlation entry: + + bool isCorrelated; + +Second, we need to hook the subselect to the main query. I recommend we +add two fields to Query for this: + + Query *parentQuery; + List *subqueries; + +The parentQuery pointer is used to resolve field names in the correlated +subquery. + + select * + from taba + where col1 = (select col2, tabb.col4 <--- + from tabb, taba <--- + where taba.col3 = tabb.col4) + +In the query above, the subquery can be easily parsed, and we add the +subquery to the parsent's parentQuery list. + +In the parent query, to parse the WHERE clause, we create a new operator +type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +right side is an index to a slot in the subqueries List. + +We can then do the rest in the upper optimizer. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Fri Jan 9 10:01:01 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305 + for ; Fri, 9 Jan 1998 10:00:59 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for ; Fri, 9 Jan 1998 09:52:17 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623; + Fri, 9 Jan 1998 22:10:25 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> +Date: Fri, 09 Jan 1998 22:10:06 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: subselects +References: <199801090355.WAA09243@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Vadim, I know you are still thinking about subselects, but I have some +> more clarification that may help. +> +> We have to add phantom range table entries to correlated subselects so +> they will pass the parser. We might as well add those fields to the +> target list of the subquery at the same time: +> +> select * +> from taba +> where col1 = (select col2 +> from tabb +> where taba.col3 = tabb.col4) +> +> becomes: +> +> select * +> from taba +> where col1 = (select col2, tabb.col4 <--- +> from tabb, taba <--- +> where taba.col3 = tabb.col4) +> +> We add a field to TargetEntry and RangeTblEntry to mark the fact that it +> was entered as a correlation entry: +> +> bool isCorrelated; + +No, I don't like to add anything in parser. Example: + + select * + from tabA + where col1 = (select col2 + from tabB + where tabA.col3 = tabB.col4 + and exists (select * + from tabC + where tabB.colX = tabC.colX and + tabC.colY = tabA.col2) + ) + +: a column of tabA is referenced in sub-subselect +(is it allowable by standards ?) - in this case it's better +to don't add tabA to 1st subselect but add tabA to second one +and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +this gives us 2-tables join in 1st subquery instead of 3-tables join. +(And I'm still not sure that using temp tables is best of what can be +done in all cases...) + +Instead of using isCorrelated in TE & RTE we can add + +Index varlevel; + +to Var node to reflect (sub)query from where this Var is come +(where is range table to find var's relation using varno). Upmost query +will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. + ^^^ ^^^^^^^^^^^^ +(I don't see problems with distinguishing Vars of different children +on the same level...) + +> +> Second, we need to hook the subselect to the main query. I recommend we +> add two fields to Query for this: +> +> Query *parentQuery; +> List *subqueries; + +Agreed. And maybe Index queryLevel. + +> In the parent query, to parse the WHERE clause, we create a new operator +> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the + ^^^^^^^^^^^^^^^^^^ +No. We have to handle (a,b,c) OP (select x, y, z ...) and +'_a_constant_' OP (select ...) - I don't know is last in standards, +Sybase has this. + +Well, + +typedef enum OpType +{ + OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR + ++ OP_EXISTS, OP_ALL, OP_ANY + +} OpType; + +typedef struct Expr +{ + NodeTag type; + Oid typeOid; /* oid of the type of this expr */ + OpType opType; /* type of the op */ + Node *oper; /* could be Oper or Func */ + List *args; /* list of argument nodes */ +} Expr; + +OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries + List, following your suggestion) + +OP_ALL, OP_ANY: + +oper is List of Oper nodes. We need in list because of data types of +a, b, c (above) can be different and so Oper nodes will be different too. + +lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - +left side of subquery' operator. +lsecond(args) is SubSelect. + +Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +by parser into corresponding ANY and ALL. At the moment we can do: + +IN --> = ANY, NOT IN --> <> ALL + +but this will be "known bug": this breaks OO-nature of Postgres, because of +operators can be overrided and '=' can mean s o m e t h i n g (not equality). +Example: box data type. For boxes, = means equality of _areas_ and =~ +means that boxes are the same ==> =~ ANY should be used for IN. + +> right side is an index to a slot in the subqueries List. + +Vadim + +From owner-pgsql-hackers@hub.org Fri Jan 9 17:44:04 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779 + for ; Fri, 9 Jan 1998 17:44:01 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for ; Fri, 9 Jan 1998 17:31:24 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282; + Fri, 9 Jan 1998 17:31:41 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801092231.RAA24282@candle.pha.pa.us> +Subject: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > Vadim, I know you are still thinking about subselects, but I have some +> > more clarification that may help. +> > +> > We have to add phantom range table entries to correlated subselects so +> > they will pass the parser. We might as well add those fields to the +> > target list of the subquery at the same time: +> > +> > select * +> > from taba +> > where col1 = (select col2 +> > from tabb +> > where taba.col3 = tabb.col4) +> > +> > becomes: +> > +> > select * +> > from taba +> > where col1 = (select col2, tabb.col4 <--- +> > from tabb, taba <--- +> > where taba.col3 = tabb.col4) +> > +> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it +> > was entered as a correlation entry: +> > +> > bool isCorrelated; +> +> No, I don't like to add anything in parser. Example: +> +> select * +> from tabA +> where col1 = (select col2 +> from tabB +> where tabA.col3 = tabB.col4 +> and exists (select * +> from tabC +> where tabB.colX = tabC.colX and +> tabC.colY = tabA.col2) +> ) +> +> : a column of tabA is referenced in sub-subselect + +This is a strange case that I don't think we need to handle in our first +implementation. + +> (is it allowable by standards ?) - in this case it's better +> to don't add tabA to 1st subselect but add tabA to second one +> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +> this gives us 2-tables join in 1st subquery instead of 3-tables join. +> (And I'm still not sure that using temp tables is best of what can be +> done in all cases...) + +I don't see any use for temp tables in subselects anymore. After having +implemented UNIONS, I now see how much can be done in the upper +optimizer. I see you just putting the subquery PLAN into the proper +place in the plan tree, with some proper JOIN nodes for IN, NOT IN. + +> +> Instead of using isCorrelated in TE & RTE we can add +> +> Index varlevel; + +OK. Sounds good. + +> +> to Var node to reflect (sub)query from where this Var is come +> (where is range table to find var's relation using varno). Upmost query +> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. +> ^^^ ^^^^^^^^^^^^ +> (I don't see problems with distinguishing Vars of different children +> on the same level...) +> +> > +> > Second, we need to hook the subselect to the main query. I recommend we +> > add two fields to Query for this: +> > +> > Query *parentQuery; +> > List *subqueries; +> +> Agreed. And maybe Index queryLevel. + +Sure. If it helps. + +> +> > In the parent query, to parse the WHERE clause, we create a new operator +> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +> ^^^^^^^^^^^^^^^^^^ +> No. We have to handle (a,b,c) OP (select x, y, z ...) and +> '_a_constant_' OP (select ...) - I don't know is last in standards, +> Sybase has this. + +I have never seen this in my eight years of SQL. Perhaps we can leave +this for later, maybe much later. + +> +> Well, +> +> typedef enum OpType +> { +> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR +> +> + OP_EXISTS, OP_ALL, OP_ANY +> +> } OpType; +> +> typedef struct Expr +> { +> NodeTag type; +> Oid typeOid; /* oid of the type of this expr */ +> OpType opType; /* type of the op */ +> Node *oper; /* could be Oper or Func */ +> List *args; /* list of argument nodes */ +> } Expr; +> +> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries +> List, following your suggestion) +> +> OP_ALL, OP_ANY: +> +> oper is List of Oper nodes. We need in list because of data types of +> a, b, c (above) can be different and so Oper nodes will be different too. +> +> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - +> left side of subquery' operator. +> lsecond(args) is SubSelect. +> +> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +> by parser into corresponding ANY and ALL. At the moment we can do: +> +> IN --> = ANY, NOT IN --> <> ALL +> +> but this will be "known bug": this breaks OO-nature of Postgres, because of +> operators can be overrided and '=' can mean s o m e t h i n g (not equality). +> Example: box data type. For boxes, = means equality of _areas_ and =~ +> means that boxes are the same ==> =~ ANY should be used for IN. + +That is interesting, to use =~ for ANY. + +Yes, but how many operators take a SUBQUERY as an operand. This is a +special case to me. + +I think I see where you are trying to go. You want subselects to behave +like any other operator, with a subselect type, and you do all the +subselect handling in the optimizer, with special Nodes and actions. + +I think this may be just too much of a leap. We have such clean query +logic for single queries, I can't imagine having an operator that has a +Query operand, and trying to get everything to properly handle it. +UNIONS were very easy to implement as a List off of Query, with some +foreach()'s in rewrite and the high optimizer. + +Subselects are SQL standard, and are never going to be over-ridden by a +user. Same with UNION. They want UNION, they get UNION. They want +Subselect, we are going to spin through the Query structure and give +them what they want. + +The complexities of subselects and correlated queries and range tables +and stuff is so bizarre that trying to get it to work inside the type +system could be a huge project. + +> +> > right side is an index to a slot in the subqueries List. + +I guess the question is what can we have by February 1? + +I have been reading some postings, and it seems to me that subselects +are the litmus test for many evaluators when deciding if a database +engine is full-featured. + +Sorry to be so straightforward, but I want to keep hashing this around +until we get a conclusion, so coding can start. + +My suggestions have been, I believe, trying to get subselects working +with the fullest functionality by adding the least amount of code, and +keeping the logic clean. + +Have you checked out the UNION code? It is very small, but it works. I +think it could make a good sample for subselects. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742 + for ; Sat, 10 Jan 1998 12:00:43 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684; + Sun, 11 Jan 1998 00:19:10 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Date: Sun, 11 Jan 1998 00:19:08 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgresql.org, "Thomas G. Lockhart" +Subject: Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > No, I don't like to add anything in parser. Example: +> > +> > select * +> > from tabA +> > where col1 = (select col2 +> > from tabB +> > where tabA.col3 = tabB.col4 +> > and exists (select * +> > from tabC +> > where tabB.colX = tabC.colX and +> > tabC.colY = tabA.col2) +> > ) +> > +> > : a column of tabA is referenced in sub-subselect +> +> This is a strange case that I don't think we need to handle in our first +> implementation. + +I don't know is this strange case or not :) +But I would like to know is this allowed by standards - can someone +comment on this ? +And I don't see problems with handling this... + +> +> > (is it allowable by standards ?) - in this case it's better +> > to don't add tabA to 1st subselect but add tabA to second one +> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +> > this gives us 2-tables join in 1st subquery instead of 3-tables join. +> > (And I'm still not sure that using temp tables is best of what can be +> > done in all cases...) +> +> I don't see any use for temp tables in subselects anymore. After having +> implemented UNIONS, I now see how much can be done in the upper +> optimizer. I see you just putting the subquery PLAN into the proper +> place in the plan tree, with some proper JOIN nodes for IN, NOT IN. + +When saying about temp tables, I meant tables created by node Material +for subquery plan. This is one of two ways - run subquery once for all +possible upper plan tuples and then just join result table with upper +query. Another way is re-run subquery for each upper query tuple, +without temp table but may be with caching results by some ways. +Actually, there is special case - when subquery can be alternatively +formulated as joins, - but this is just special case. + +> > > In the parent query, to parse the WHERE clause, we create a new operator +> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +> > ^^^^^^^^^^^^^^^^^^ +> > No. We have to handle (a,b,c) OP (select x, y, z ...) and +> > '_a_constant_' OP (select ...) - I don't know is last in standards, +> > Sybase has this. +> +> I have never seen this in my eight years of SQL. Perhaps we can leave +> this for later, maybe much later. + +Are you saying about (a, b, c) or about 'a_constant' ? +Again, can someone comment on are they in standards or not ? +Tom ? +If yes then please add parser' support for them now... + +> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +> > by parser into corresponding ANY and ALL. At the moment we can do: +> > +> > IN --> = ANY, NOT IN --> <> ALL +> > +> > but this will be "known bug": this breaks OO-nature of Postgres, because of +> > operators can be overrided and '=' can mean s o m e t h i n g (not equality). +> > Example: box data type. For boxes, = means equality of _areas_ and =~ +> > means that boxes are the same ==> =~ ANY should be used for IN. +> +> That is interesting, to use =~ for ANY. +> +> Yes, but how many operators take a SUBQUERY as an operand. This is a +> special case to me. +> +> I think I see where you are trying to go. You want subselects to behave +> like any other operator, with a subselect type, and you do all the +> subselect handling in the optimizer, with special Nodes and actions. +> +> I think this may be just too much of a leap. We have such clean query +> logic for single queries, I can't imagine having an operator that has a +> Query operand, and trying to get everything to properly handle it. +> UNIONS were very easy to implement as a List off of Query, with some +> foreach()'s in rewrite and the high optimizer. +> +> Subselects are SQL standard, and are never going to be over-ridden by a +> user. Same with UNION. They want UNION, they get UNION. They want +> Subselect, we are going to spin through the Query structure and give +> them what they want. +> +> The complexities of subselects and correlated queries and range tables +> and stuff is so bizarre that trying to get it to work inside the type +> system could be a huge project. + +PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), +derived from the Berkeley Postgres database management system. While +PostgreSQL retains the powerful object-relational data model, rich data types and + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +easy extensibility of Postgres, it replaces the PostQuel query language with an +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +extended subset of SQL. +^^^^^^^^^^^^^^^^^^^^^^ + +Should we say users that subselect will work for standard data types only ? +I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? +Is there difference between handling = ANY and ~ ANY ? I don't see any. +Currently we can't get IN working properly for boxes (and may be for others too) +and I don't like to try to resolve these problems now, but hope that someday +we'll be able to do this. At the moment - just convert IN into = ANY and +NOT IN into <> ALL in parser. + +(BTW, do you know how DISTINCT is implemented ? It doesn't use = but +use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) + +> > +> > > right side is an index to a slot in the subqueries List. +> +> I guess the question is what can we have by February 1? +> +> I have been reading some postings, and it seems to me that subselects +> are the litmus test for many evaluators when deciding if a database +> engine is full-featured. +> +> Sorry to be so straightforward, but I want to keep hashing this around +> until we get a conclusion, so coding can start. +> +> My suggestions have been, I believe, trying to get subselects working +> with the fullest functionality by adding the least amount of code, and +> keeping the logic clean. +> +> Have you checked out the UNION code? It is very small, but it works. I +> think it could make a good sample for subselects. + +There is big difference between subqueries and queries in UNION - +there are not dependences between UNION queries. + +Ok, opened issues: + +1. Is using upper query' vars in all subquery levels in standard ? +2. Is (a, b, c) OP (subselect) in standard ? +3. What types of expressions (Var, Const, ...) are allowed on the left + side of operator with subquery on the right ? +4. What types of operators should we support (=, >, ..., like, ~, ...) ? + (My vote for all boolean operators). + +And - did we get consensus on presentation subqueries stuff in Query, +Expr and Var ? +I would like to have something done in parser near Jan 17 to get +subqueries working by Feb 1. I vote for support of all standard +things (1. - 3.) in parser right now - if there will be no time +to implement something like (a, b, c) then optimizer will call +elog(WARN) (oh, sorry, - elog(ERROR)). + +Vadim + +From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045 + for ; Sat, 10 Jan 1998 12:31:01 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for ; Sat, 10 Jan 1998 12:22:30 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725; + Sun, 11 Jan 1998 00:41:22 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su> +Date: Sun, 11 Jan 1998 00:41:19 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselects +References: <199712220545.AAA11605@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> OK, a few questions: +> +> Should we use sortmerge, so we can use our psort as temp tables, +> or do we use hashunique? +> +> How do we pass the query to the optimizer? How do we represent +> the range table for each, and the links between them in correlated +> subqueries? + +My suggestion is just use varlevel in Var and don't put upper query' +relations into subquery range table. + +Vadim + +From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357 + for ; Sat, 10 Jan 1998 13:00:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for ; Sat, 10 Jan 1998 12:40:02 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741; + Sun, 11 Jan 1998 00:58:56 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su> +Date: Sun, 11 Jan 1998 00:58:52 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] subselects +References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Vadim B. Mikheev wrote: +> +> Bruce Momjian wrote: +> > +> > OK, a few questions: +> > +> > Should we use sortmerge, so we can use our psort as temp tables, +> > or do we use hashunique? +> > +> > How do we pass the query to the optimizer? How do we represent +> > the range table for each, and the links between them in correlated +> > subqueries? +> +> My suggestion is just use varlevel in Var and don't put upper query' +> relations into subquery range table. + +Hmm... Sorry, it seems that I did reply to very old message - forget it. + +Vadim + +From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664 + for ; Sat, 10 Jan 1998 13:30:56 -0500 (EST) +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for ; Sat, 10 Jan 1998 13:05:09 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623; + Sat, 10 Jan 1998 18:01:03 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu> +Date: Sat, 10 Jan 1998 18:01:03 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , hackers@postgresql.org +Subject: Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +> > > by parser into corresponding ANY and ALL. At the moment we can do: +> > > +> > > IN --> = ANY, NOT IN --> <> ALL +> > > +> > > but this will be "known bug": this breaks OO-nature of Postgres, because of +> > > operators can be overrided and '=' can mean s o m e t h i n g (not equality). +> > > Example: box data type. For boxes, = means equality of _areas_ and =~ +> > > means that boxes are the same ==> =~ ANY should be used for IN. +> > +> > That is interesting, to use =~ for ANY. + +If I understand the discussion, I would think is is fine to make an assumption about +which operator is used to implement a subselect expression. If someone remaps an +operator to mean something different, then they will get a different result (or a +nonsensical one) from a subselect. + +I'd be happy to remap existing operators to fit into a convention which would work +with subselects (especially if I got to help choose :). + +> > Subselects are SQL standard, and are never going to be over-ridden by a +> > user. Same with UNION. They want UNION, they get UNION. They want +> > Subselect, we are going to spin through the Query structure and give +> > them what they want. +> +> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), +> derived from the Berkeley Postgres database management system. While +> PostgreSQL retains the powerful object-relational data model, rich data types and +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> easy extensibility of Postgres, it replaces the PostQuel query language with an +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> extended subset of SQL. +> ^^^^^^^^^^^^^^^^^^^^^^ +> +> Should we say users that subselect will work for standard data types only ? +> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? +> Is there difference between handling = ANY and ~ ANY ? I don't see any. +> Currently we can't get IN working properly for boxes (and may be for others too) +> and I don't like to try to resolve these problems now, but hope that someday +> we'll be able to do this. At the moment - just convert IN into = ANY and +> NOT IN into <> ALL in parser. +> +> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but +> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) + +?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted +list? That would give more consistant behavior... + +> > I have been reading some postings, and it seems to me that subselects +> > are the litmus test for many evaluators when deciding if a database +> > engine is full-featured. +> > +> > Sorry to be so straightforward, but I want to keep hashing this around +> > until we get a conclusion, so coding can start. +> > +> > My suggestions have been, I believe, trying to get subselects working +> > with the fullest functionality by adding the least amount of code, and +> > keeping the logic clean. +> > +> > Have you checked out the UNION code? It is very small, but it works. I +> > think it could make a good sample for subselects. +> +> There is big difference between subqueries and queries in UNION - +> there are not dependences between UNION queries. +> +> Ok, opened issues: +> +> 1. Is using upper query' vars in all subquery levels in standard ? + +I'm not certain. Let me know if you do not get an answer from someone else and I will +research it. + +> 2. Is (a, b, c) OP (subselect) in standard ? + +Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where +the parens are allowed to be omitted from a one element list. + +> 3. What types of expressions (Var, Const, ...) are allowed on the left +> side of operator with subquery on the right ? + +I think most expressions are allowed. The "constant OP (subselect)" case you were +asking about is just a simplified case since "(a, b, constant) OP (subselect)" where +a and b are column references should be allowed. Of course, our optimizer could +perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first +example "EXISTS (subselect where x = constant)". + +> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? +> (My vote for all boolean operators). + +Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is +important to get an initial implementation for v6.3 which covers a little, some, or +all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then +we will have the benefit of feedback from others in practical applications which +always uncovers new things to consider. + +> And - did we get consensus on presentation subqueries stuff in Query, +> Expr and Var ? +> I would like to have something done in parser near Jan 17 to get +> subqueries working by Feb 1. I vote for support of all standard +> things (1. - 3.) in parser right now - if there will be no time +> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh, +> sorry, - elog(ERROR)). + +Great. I'd like to help with the remaining parser issues; at the moment "row_expr" +does the right thing with expression comparisions but just parses then ignores +subselect expressions. Let me know what structures you want passed back and I'll put +them in, or if you prefer put in the first one and I'll go through and clean up and +add the rest. + + - Tom + + +From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728 + for ; Sat, 10 Jan 1998 15:00:56 -0500 (EST) +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for ; Sat, 10 Jan 1998 14:35:19 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002; + Sat, 10 Jan 1998 19:31:30 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu> +Date: Sat, 10 Jan 1998 19:31:29 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , hackers@postgresql.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> Are you saying about (a, b, c) or about 'a_constant' ? +> Again, can someone comment on are they in standards or not ? +> Tom ? +> If yes then please add parser' support for them now... + +As I mentioned a few minutes ago in my last message, I parse the row descriptors and +the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently +ignore the result. I didn't want to pass things back as lists until something in the +backend was ready to receive them. + +If it is OK, I'll go ahead and start passing back a list of expressions when a row +descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node +being a list rather than an atomic node. + +Also, I can start passing back the subselect expression as the rexpr; right now the +parser calls elog() and quits. + +btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called +makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. +If lists are handled farther back, this routine should move to there also and the +parser will just pass the lists. Note that some assumptions have to be made about the +meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of +"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK +to disallow those cases or to look for specific appearance of the operator to guess +the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if +it has "<>" or "!" then build as "or"s. + +Let me know what you want... + + - Tom + + +From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998 +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953 + for ; Sun, 11 Jan 1998 01:01:51 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797; + Sun, 11 Jan 1998 05:58:01 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu> +Date: Sun, 11 Jan 1998 05:58:01 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , hackers@postgresql.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702" +Status: OR + +This is a multi-part message in MIME format. +--------------D8B38A0D1F78A10C0023F702 +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit + +Here are context diffs of gram.y and keywords.c; sorry about sending the full files. +These start sending lists of arguments toward the backend from the parser to +implement row descriptors and subselects. + +They should apply OK even over Bruce's recent changes... + + - Tom + +--------------D8B38A0D1F78A10C0023F702 +Content-Type: text/plain; charset=us-ascii; name="gram.y.patch" +Content-Transfer-Encoding: 7bit +Content-Disposition: inline; filename="gram.y.patch" + +*** ../src/backend/parser/gram.y.orig Sat Jan 10 05:44:36 1998 +--- ../src/backend/parser/gram.y Sat Jan 10 19:29:37 1998 +*************** +*** 195,200 **** +--- 195,201 ---- + having_clause + %type row_descriptor, row_list + %type row_expr ++ %type RowOp, row_opt + %type OptCreateAs, CreateAsList + %type CreateAsElement + %type NumConst +*************** +*** 242,248 **** + */ + + /* Keywords (in SQL92 reserved words) */ +! %token ACTION, ADD, ALL, ALTER, AND, AS, ASC, + BEGIN_TRANS, BETWEEN, BOTH, BY, + CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, + CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, +--- 243,249 ---- + */ + + /* Keywords (in SQL92 reserved words) */ +! %token ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC, + BEGIN_TRANS, BETWEEN, BOTH, BY, + CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, + CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, +*************** +*** 258,264 **** + ON, OPTION, OR, ORDER, OUTER_P, + PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, + REFERENCES, REVOKE, RIGHT, ROLLBACK, +! SECOND_P, SELECT, SET, SUBSTRING, + TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, + UNION, UNIQUE, UPDATE, USING, + VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, +--- 259,265 ---- + ON, OPTION, OR, ORDER, OUTER_P, + PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, + REFERENCES, REVOKE, RIGHT, ROLLBACK, +! SECOND_P, SELECT, SET, SOME, SUBSTRING, + TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, + UNION, UNIQUE, UPDATE, USING, + VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, +*************** +*** 2853,2866 **** + /* Expressions using row descriptors + * Define row_descriptor to allow yacc to break the reduce/reduce conflict + * with singleton expressions. + */ + row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' + { +! $$ = NULL; + } + | '(' row_descriptor ')' NOT IN '(' SubSelect ')' + { +! $$ = NULL; + } + | '(' row_descriptor ')' '=' '(' row_descriptor ')' + { +--- 2854,2878 ---- + /* Expressions using row descriptors + * Define row_descriptor to allow yacc to break the reduce/reduce conflict + * with singleton expressions. ++ * ++ * Note that "SOME" is the same as "ANY" in syntax. ++ * - thomas 1998-01-10 + */ + row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' + { +! $$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6); + } + | '(' row_descriptor ')' NOT IN '(' SubSelect ')' + { +! $$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7); +! } +! | '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')' +! { +! char *opr; +! opr = palloc(strlen($4)+strlen($5)+1); +! strcpy(opr, $4); +! strcat(opr, $5); +! $$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7); + } + | '(' row_descriptor ')' '=' '(' row_descriptor ')' + { +*************** +*** 2880,2885 **** +--- 2892,2907 ---- + } + ; + ++ RowOp: '=' { $$ = "="; } ++ | '<' { $$ = "<"; } ++ | '>' { $$ = ">"; } ++ ; ++ ++ row_opt: ALL { $$ = "all"; } ++ | ANY { $$ = "any"; } ++ | SOME { $$ = "any"; } ++ ; ++ + row_descriptor: row_list ',' a_expr + { + $$ = lappend($1, $3); +*************** +*** 3432,3441 **** + ; + + in_expr: SubSelect +! { +! elog(ERROR,"IN (SUBSELECT) not yet implemented"); +! $$ = $1; +! } + | in_expr_nodes + { $$ = $1; } + ; +--- 3454,3460 ---- + ; + + in_expr: SubSelect +! { $$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); } + | in_expr_nodes + { $$ = $1; } + ; +*************** +*** 3449,3458 **** + ; + + not_in_expr: SubSelect +! { +! elog(ERROR,"NOT IN (SUBSELECT) not yet implemented"); +! $$ = $1; +! } + | not_in_expr_nodes + { $$ = $1; } + ; +--- 3468,3474 ---- + ; + + not_in_expr: SubSelect +! { $$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); } + | not_in_expr_nodes + { $$ = $1; } + ; + +--------------D8B38A0D1F78A10C0023F702 +Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch" +Content-Transfer-Encoding: 7bit +Content-Disposition: inline; filename="keywords.c.patch" + +*** ../src/backend/parser/keywords.c.orig Mon Jan 5 07:51:33 1998 +--- ../src/backend/parser/keywords.c Sat Jan 10 19:22:07 1998 +*************** +*** 39,44 **** +--- 39,45 ---- + {"alter", ALTER}, + {"analyze", ANALYZE}, + {"and", AND}, ++ {"any", ANY}, + {"append", APPEND}, + {"archive", ARCHIVE}, + {"as", AS}, +*************** +*** 178,183 **** +--- 179,185 ---- + {"set", SET}, + {"setof", SETOF}, + {"show", SHOW}, ++ {"some", SOME}, + {"stdin", STDIN}, + {"stdout", STDOUT}, + {"substring", SUBSTRING}, + +--------------D8B38A0D1F78A10C0023F702-- + + +From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255 + for ; Sun, 11 Jan 1998 01:31:10 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for ; Sun, 11 Jan 1998 01:10:48 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for ; Sun, 11 Jan 1998 01:01:05 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801; + Sun, 11 Jan 1998 00:59:23 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801110559.AAA11801@candle.pha.pa.us> +Subject: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST) +Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu +In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> I would like to have something done in parser near Jan 17 to get +> subqueries working by Feb 1. I vote for support of all standard +> things (1. - 3.) in parser right now - if there will be no time +> to implement something like (a, b, c) then optimizer will call +> elog(WARN) (oh, sorry, - elog(ERROR)). + +First, let me say I am glad we are still on schedule for Feb 1. I was +panicking because I thought we wouldn't make it in time. + + +> > > (is it allowable by standards ?) - in this case it's better +> > > to don't add tabA to 1st subselect but add tabA to second one +> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +> > > this gives us 2-tables join in 1st subquery instead of 3-tables join. +> > > (And I'm still not sure that using temp tables is best of what can be +> > > done in all cases...) +> > +> > I don't see any use for temp tables in subselects anymore. After having +> > implemented UNIONS, I now see how much can be done in the upper +> > optimizer. I see you just putting the subquery PLAN into the proper +> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN. +> +> When saying about temp tables, I meant tables created by node Material +> for subquery plan. This is one of two ways - run subquery once for all +> possible upper plan tuples and then just join result table with upper +> query. Another way is re-run subquery for each upper query tuple, +> without temp table but may be with caching results by some ways. +> Actually, there is special case - when subquery can be alternatively +> formulated as joins, - but this is just special case. + +This is interesting. It really only applies for correlated subqueries, +and certainly it may help sometimes to just evaluate the subquery for +valid values that are going to come from the upper query than for all +possible values. Perhaps we can use the 'cost' value of each query to +decide how to handle this. + +> +> > > > In the parent query, to parse the WHERE clause, we create a new operator +> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +> > > ^^^^^^^^^^^^^^^^^^ +> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and +> > > '_a_constant_' OP (select ...) - I don't know is last in standards, +> > > Sybase has this. +> > +> > I have never seen this in my eight years of SQL. Perhaps we can leave +> > this for later, maybe much later. +> +> Are you saying about (a, b, c) or about 'a_constant' ? +> Again, can someone comment on are they in standards or not ? +> Tom ? +> If yes then please add parser' support for them now... + +OK, Thomas says it is, so we will put in as much code as we can to handle +it. + +> Should we say users that subselect will work for standard data types only ? +> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? +> Is there difference between handling = ANY and ~ ANY ? I don't see any. +> Currently we can't get IN working properly for boxes (and may be for others too) +> and I don't like to try to resolve these problems now, but hope that someday +> we'll be able to do this. At the moment - just convert IN into = ANY and +> NOT IN into <> ALL in parser. + +OK. + +> +> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but +> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) + +I did not know that either. + +> There is big difference between subqueries and queries in UNION - +> there are not dependences between UNION queries. + +Yes, I know UNIONS are trivial compared to subselects. + +> +> Ok, opened issues: +> +> 1. Is using upper query' vars in all subquery levels in standard ? +> 2. Is (a, b, c) OP (subselect) in standard ? +> 3. What types of expressions (Var, Const, ...) are allowed on the left +> side of operator with subquery on the right ? +> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? +> (My vote for all boolean operators). +> +> And - did we get consensus on presentation subqueries stuff in Query, +> Expr and Var ? + +OK, here are my concrete ideas on changes and structures. + +I think we all agreed that Query needs new fields: + + Query *parentQuery; + List *subqueries; + +Maybe query level too, but I don't think so (see later ideas on Var). + +We need a new Node structure, call it Sublink: + + int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...) + Oid operator /* subquery must return single row */ + List *lefthand; /* parent stuff */ + Node *subquery; /* represents nodes from parser */ + Index Subindex; /* filled in to index Query->subqueries */ + +Of course, the names are just suggestions. Every time we run through +the parsenodes of a query to create a Query* structure, when we do the +WHERE clause, if we come upon one of these Sublink nodes (created in the +parser), we move the supplied Query* in Sublink->subquery to a local +List variable, and we set Subquery->subindex to equal the index of the +new query, i.e. is it the first subquery we found, 1, or the second, 2, +etc. + +After we have created the parent Query structure, we run through our +local List variable of subquery parsenodes we created above, and add +Query* entries to Query->subqueries. In each subquery Query*, we set +the parentQuery pointer. + +Also, when parsing the subqueries, we need to keep track of correlated +references. I recommend we add a field to the Var structure: + + Index sublevel; /* range table reference: + = 0 current level of query + < 0 parent above this many levels + > 0 index into subquery list + */ + +This way, a Var node with sublevel 0 is the current level, and is true +in most cases. This helps us not have to change much code. sublevel = +-1 means it references the range table in the parent query. sublevel = +-2 means the parent's parent. sublevel = 2 means it references the range +table of the second entry in Query->subqueries. Varno and varattno are +still meaningful. Of course, we can't reference variables in the +subqueries from the parent in the parser code, but Vadim may want to. + +When doing a Var lookup in the parser, we look in the current level +first, but if not found, if it is a subquery, we can look at the parent +and parent's parent to set the sublevel, varno, and varatno properly. + +We create no phantom range table entries in the subquery, and no phantom +target list entries. We can leave that all for the upper optimizer. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Tue Dec 9 12:14:09 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA16186 + for ; Tue, 9 Dec 1997 12:14:05 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA17524; Tue, 9 Dec 1997 12:05:31 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 09 Dec 1997 12:05:01 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA17316 for pgsql-hackers-outgoing; Tue, 9 Dec 1997 12:04:55 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id MAA17304 for ; Tue, 9 Dec 1997 12:04:40 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id MAA15973; + Tue, 9 Dec 1997 12:05:03 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712091705.MAA15973@candle.pha.pa.us> +Subject: Re: [HACKERS] Items for 6.3 +To: lockhart@alumni.caltech.edu (Thomas G. Lockhart) +Date: Tue, 9 Dec 1997 12:05:03 -0500 (EST) +Cc: hackers@postgreSQL.org, vadim@sable.krasnoyarsk.su +In-Reply-To: <348CE8BE.FE0F8AA1@alumni.caltech.edu> from "Thomas G. Lockhart" at Dec 9, 97 06:44:14 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> +> > Here are the items I think would make 6.3 a truly great release: +> > +> > subselects +> > outer joins +> +> These two would be sufficient (along with the changes already in the +> tree) to address the most visible deficiencies in SQL functionality. +> +> > temp tables +> > fix "Reliability" items attached to specific queries +> +> Sure, why not? + +We will need temp tables for subselects anyway. + +I could implement them, but again we come up against the problem of +storing these plans and executing them later. We need to do some of the +temp table stuff in the optimizer because the plan could be passed with +a temp table, and we can't bind the temp name to a real name in the +parser, especially if we save those plans in system tables that other +backends can execute. Multiple backends would be using the same temp +name. + +At the same time, we need some temp stuff in the parser so the parser +can recognize the temp table and its fields when it sees it. + +The hardest part is: + +select * into tmp mytmp from z where x=y; +select * from mytmp; + +If they are passed together, and we have to plan them both, before +either is executed, you have to make the parser aware of the fields in +mytmp, even though you have not executed the select yet, you are just +storing the plan. + +This was Vadim's point about not doing subselects in the parser. + +> +> > postmaster sync's pglog, giving almost fsync reliability with +> > no-fsync performance +> +> OK to save for v6.4. +> +> Could we try to do the subselect/join/union features for 6.3? I know you +> have been looking at it, and found the deepest parts of the backend to +> be a bit murky. I'm not familiar with that area at all, but perhaps we +> could divert Vadim for a week or two or three when he has some time. +> Especially if we trade him for help on his favorite topics for v6.4?? +> + +Sure. I may be able to do some of the pglog change myself, though Vadim +has some definite ideas on this. + +As for Vadim, trading help is a good idea, but what trade can we make? +He can do most of these tough things without us, and in 1/4 the time. +We can't even see where to start them. + +Basically, without Vadim, this project would have really major problems. + +He certainly likes working on PostgreSQL, so he must be busy with other +things. + +It is not fair to keep counting on Vadim to do all these tough jobs. We +really need to get other people up to Vadim's level of ability. +Unfortunately, the odds of this happening are very slim. + +This leaves me scratching my head. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Fri Dec 19 00:08:21 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25029 + for ; Fri, 19 Dec 1997 00:08:13 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA11825; + Fri, 19 Dec 1997 12:13:15 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <349A0265.7329D4EE@sable.krasnoyarsk.su> +Date: Fri, 19 Dec 1997 12:13:09 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Items for 6.3 +References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Thomas G. Lockhart wrote: +> +> Could we try to do the subselect/join/union features for 6.3? I know you +> have been looking at it, and found the deepest parts of the backend to +> be a bit murky. I'm not familiar with that area at all, but perhaps we +> could divert Vadim for a week or two or three when he has some time. + ^^^^^ +More realistic... And this is for initial release only: tuning performance +of subselects is very hard, long work. + +Ok - I'm ready to do subselects for 6.3 but this means that foreign keys +may appear in 6.4 only. And I'll need in help: could someone add support +for them in parser ? Not handling - but parsing and common checking. +Also, it would be nice to have better temp tables implementation +(without affecting pg_class etc) - node material need in query-level +temp tables anyway. I'd really like to see temp table files created +only when its data must go to disk due to local buffer pool is full +and can't more keep table data in memory. Also, local buffer manager +should be re-written to use hash table (like shared bufmgr) for buffer search, +not sequential scan as now (this is item for TODO) - this will speed up +things and allow to use more than 64 local buffers. + +I'm still sure that handling subselects in parser is not right way. +And the main problem is not in execution plans (we could use tricks +to resolve this) but in performance. Example: + +select b from big where b in (select s from small); + +If there is no duplicates in small then this is the same as + +select b from big, small where b = s; + +Without index on big postgres does seq scan of big and uses hashjoin with +hash on small. Using temp table makes query only 20% slower (in my test). +But with index on big postgres uses nestloop with seq scan of small and +index scan of big => select run faster and temp table stuff makes query +2.5 times slower! In the case of duplicates in small, handling in parser +will use distinct (and so - sorting). But using hashjoin plan distinct +may be avoided! Who can analize this ? Optimizer only. He can be smart +to check is there unique index on small or not. If not - what is more +costless: nestloop with sorting or slower hashjoin without sorting. +Only optimizer can find best way to execute query, parser can't. + +> Especially if we trade him for help on his favorite topics for v6.4?? + +Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) + +Vadim + +From owner-pgsql-hackers@hub.org Fri Dec 19 00:58:54 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25460 + for ; Fri, 19 Dec 1997 00:58:52 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA27667; Fri, 19 Dec 1997 00:54:39 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:54:09 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA27633 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:54:04 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA27623 for ; Fri, 19 Dec 1997 00:53:53 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA25415; + Fri, 19 Dec 1997 00:53:15 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712190553.AAA25415@candle.pha.pa.us> +Subject: Re: [HACKERS] Items for 6.3 +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Fri, 19 Dec 1997 00:53:15 -0500 (EST) +Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org +In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Thomas G. Lockhart wrote: +> > +> > Could we try to do the subselect/join/union features for 6.3? I know you +> > have been looking at it, and found the deepest parts of the backend to +> > be a bit murky. I'm not familiar with that area at all, but perhaps we +> > could divert Vadim for a week or two or three when he has some time. +> ^^^^^ +> More realistic... And this is for initial release only: tuning performance +> of subselects is very hard, long work. +> +> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys + +Great. + +> may appear in 6.4 only. And I'll need in help: could someone add support +> for them in parser ? Not handling - but parsing and common checking. +> Also, it would be nice to have better temp tables implementation +> (without affecting pg_class etc) - node material need in query-level +> temp tables anyway. I'd really like to see temp table files created +> only when its data must go to disk due to local buffer pool is full +> and can't more keep table data in memory. Also, local buffer manager +> should be re-written to use hash table (like shared bufmgr) for buffer search, +> not sequential scan as now (this is item for TODO) - this will speed up +> things and allow to use more than 64 local buffers. +> +> I'm still sure that handling subselects in parser is not right way. +> And the main problem is not in execution plans (we could use tricks +> to resolve this) but in performance. Example: +> +> select b from big where b in (select s from small); +> +> If there is no duplicates in small then this is the same as +> +> select b from big, small where b = s; +> +> Without index on big postgres does seq scan of big and uses hashjoin with +> hash on small. Using temp table makes query only 20% slower (in my test). +> But with index on big postgres uses nestloop with seq scan of small and +> index scan of big => select run faster and temp table stuff makes query +> 2.5 times slower! In the case of duplicates in small, handling in parser +> will use distinct (and so - sorting). But using hashjoin plan distinct +> may be avoided! Who can analize this ? Optimizer only. He can be smart +> to check is there unique index on small or not. If not - what is more +> costless: nestloop with sorting or slower hashjoin without sorting. +> Only optimizer can find best way to execute query, parser can't. +> + +OK, let me comment on this. Let's take your example: + +> select b from big where b in (select s from small); +> +> If there is no duplicates in small then this is the same as +> +> select b from big, small where b = s; + +My idea was to do this: + + select distinct s into temp table small2 from small; + select b from big,small2 where b = s; + +And let the optimizer decide how to do the join. Is this what you are +saying? + +The problem I see is that the temp table is already distinct, and was +sorted to do that, but you can't pass that information into the +optimizer. Is that the problem with using the parser? + +But you want the temp table never to hit disk unless it has to, but that +will not work unless we do a really good job with temp tables. + +Also NOT IN will need some type of non-join operator, perhaps a flag in +the Plan to say "look for a match, but only output if you find it." How +do we do that? + +We definately need temp tables, and I think we can stuff it into the +cache as LOCAL, which will make it usable without adding to pg_class. + +Perhaps if we create a special Plan in the optimizer called IN, and we +have the outer and inner queries as plans, and work that plan into the +executor. + +The problem with that is we need to specify a way to join the two plans, +and the same logic that determines what type of join to do can this too. +Maybe that's why you wanted stuff done in the optimizer and not the +parser. + +At least now, I understand enough to come up with ideas, and can +understand what you are saying. + +> > Especially if we trade him for help on his favorite topics for v6.4?? +> +> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) +> +> Vadim +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Fri Dec 19 01:00:58 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25512 + for ; Fri, 19 Dec 1997 01:00:56 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA28102; Fri, 19 Dec 1997 00:56:52 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:56:40 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA28077 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:56:36 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA28065 for ; Fri, 19 Dec 1997 00:56:19 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA25436; + Fri, 19 Dec 1997 00:55:56 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712190555.AAA25436@candle.pha.pa.us> +Subject: Re: [HACKERS] Items for 6.3 +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Fri, 19 Dec 1997 00:55:56 -0500 (EST) +Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org +In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> select b from big where b in (select s from small); +> +> If there is no duplicates in small then this is the same as +> +> select b from big, small where b = s; + +I think I see the problem you are describing now. If we put the +subselect into a temp table, we can't use the existing index on small.s, +even if there is one, or if sorting was involved in creating the temp +table. + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From lockhart@alumni.caltech.edu Fri Dec 19 01:34:26 1997 +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25750 + for ; Fri, 19 Dec 1997 01:34:23 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA15234; + Fri, 19 Dec 1997 06:29:45 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <349A1459.EBFE2C84@alumni.caltech.edu> +Date: Fri, 19 Dec 1997 06:29:45 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Items for 6.3 +References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> > Could we try to do the subselect/join/union features for 6.3? I know you +> > have been looking at it, and found the deepest parts of the backend to +> > be a bit murky. I'm not familiar with that area at all, but perhaps we +> > could divert Vadim for a week or two or three when he has some time. +> ^^^^^ +> More realistic... And this is for initial release only: tuning performance +> of subselects is very hard, long work. +> +> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys +> may appear in 6.4 only. And I'll need in help: could someone add support +> for them in parser ? Not handling - but parsing and common checking. + +Yes, I've already added subselect syntax in the parser, but we will need to +modify or add to the parse tree nodes to push that past the parser into the +backend. I'm happy to focus on that, since I understand those pieces pretty well. +There are several places where "subselect syntax" is used: subselects and unions +come to mind right away. If you have an opinion on how the parse nodes should be +structured I can start with that, or I can just put something in and then modify +it as you need later. Do you see unions as being similar to subselects, or are +they a separate problem? To me, they seem like a simpler case since (perhaps) not +as much optimization and internal reorganizing needs to happen. + +> Also, it would be nice to have better temp tables implementation +> (without affecting pg_class etc) - node material need in query-level +> temp tables anyway. I'd really like to see temp table files created +> only when its data must go to disk due to local buffer pool is full +> and can't more keep table data in memory. + +This sounds very desirable. I noticed that there are, or used to be, multiple +storage managers. Could a manager for temporary storage be written which stores +things in memory until it gets too big and then go to disk? Could that manager +use the mm and md managers internally? Or is all of that at too low a level to be +helpful for this problem? + +SQL92 has the concept of transaction-only and session-only tables and variables. +Could an implementation of "temporary tables" be used to implement this feature +at the same time (or form the basis for it later)? It seems like none of these +non-permanent tables need to go to any of the pg_ tables, since other backends do +not need to see them and they are allowed to disappear at the end of the session +(or at a crash). We would just need the "table manager" to cache information on +temporary stuff before looking at the permanent tables (??). + +> Also, local buffer manager +> should be re-written to use hash table (like shared bufmgr) for buffer search, +> not sequential scan as now (this is item for TODO) - this will speed up +> things and allow to use more than 64 local buffers. +> +> I'm still sure that handling subselects in parser is not right way. +> And the main problem is not in execution plans (we could use tricks +> to resolve this) but in performance. + +Seems to me that the subselect needs to stay untransformed (i.e. executable but +non-optimized) so that an optimizer can independently decide how to transform for +faster execution. That way, in the first implementation we have reliable but +stupid execution, but then can add a subselect optimizer which looks for cases +which can be transformed to run faster. + +> > Especially if we trade him for help on his favorite topics for v6.4?? +> +> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) + +Sure. (Tell me what it is later :) + + - Tom + + + +From vadim@sable.krasnoyarsk.su Fri Dec 19 06:23:14 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27849 + for ; Fri, 19 Dec 1997 06:22:46 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id SAA12239; + Fri, 19 Dec 1997 18:28:13 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <349A5A4C.DA366B47@sable.krasnoyarsk.su> +Date: Fri, 19 Dec 1997 18:28:12 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: lockhart@alumni.caltech.edu, hackers@postgresql.org +Subject: Re: [HACKERS] Items for 6.3 +References: <199712190553.AAA25415@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> OK, let me comment on this. Let's take your example: +> +> > select b from big where b in (select s from small); +> > +> > If there is no duplicates in small then this is the same as +> > +> > select b from big, small where b = s; +> +> My idea was to do this: +> +> select distinct s into temp table small2 from small; +> select b from big,small2 where b = s; +> +> And let the optimizer decide how to do the join. Is this what you are +> saying? +> +> The problem I see is that the temp table is already distinct, and was +> sorted to do that, but you can't pass that information into the +> optimizer. Is that the problem with using the parser? + +No. I said that in some cases we can avoid distinct at all: if either +unique index on small exists or by using hashjoin plans with !new! +HashUnique node (there was mistake in my prev description - not Hash, +but HashUnique on small should be used, - HashUnique is hash table +without duplicates, just another way to implement distinct, without +sorting). This new node can be usefull and for "normal" queries +(without subselects). + +My example is very simple. I just want to say that by handling subqueries +in optimizer we will have more chances to do better optimization. Maybe not +now, but latter. I'm sure that subqueries require some specific optimization +and this is not task of parser. + +> +> But you want the temp table never to hit disk unless it has to, but that +> will not work unless we do a really good job with temp tables. + +Of 'course. + +> +> Also NOT IN will need some type of non-join operator, perhaps a flag in +> the Plan to say "look for a match, but only output if you find it." How + ^^ + don't ? +> do we do that? + +Just as you said - by using of some flag. + +> +> We definately need temp tables, and I think we can stuff it into the +> cache as LOCAL, which will make it usable without adding to pg_class. + +We have Relation->rd_istemp flag... Just change it from bool to int: +0 -> is not temp, 1 -> session level temp table, etc... + +Vadim + +From vadim@sable.krasnoyarsk.su Fri Dec 19 08:09:11 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00349 + for ; Fri, 19 Dec 1997 08:09:05 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA12377; + Fri, 19 Dec 1997 20:14:25 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <349A7327.9A484B74@sable.krasnoyarsk.su> +Date: Fri, 19 Dec 1997 20:14:15 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Items for 6.3 +References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> <349A1459.EBFE2C84@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Thomas G. Lockhart wrote: +> +> > Ok - I'm ready to do subselects for 6.3 but this means that foreign keys +> > may appear in 6.4 only. And I'll need in help: could someone add support +> > for them in parser ? Not handling - but parsing and common checking. +> +> Yes, I've already added subselect syntax in the parser, but we will need to +> modify or add to the parse tree nodes to push that past the parser into the +> backend. I'm happy to focus on that, since I understand those pieces pretty well. + +Nice! + +> There are several places where "subselect syntax" is used: subselects and unions +> come to mind right away. If you have an opinion on how the parse nodes should be +> structured I can start with that, or I can just put something in and then modify + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +It's ok for me. + +> it as you need later. Do you see unions as being similar to subselects, or are +> they a separate problem? To me, they seem like a simpler case since (perhaps) not +> as much optimization and internal reorganizing needs to happen. + +I didn't think about unions at all... Yes, it's simpler to implement. +BTW, I recall Bruce mentioned that unions are used for selects from +superclass and all descendant classes (select ... from table* ) - maybe +something is already implemented ? Bruce ? + +> +> > Also, it would be nice to have better temp tables implementation +> > (without affecting pg_class etc) - node material need in query-level +> > temp tables anyway. I'd really like to see temp table files created +> > only when its data must go to disk due to local buffer pool is full +> > and can't more keep table data in memory. +> +> This sounds very desirable. I noticed that there are, or used to be, multiple +> storage managers. Could a manager for temporary storage be written which stores +> things in memory until it gets too big and then go to disk? Could that manager +> use the mm and md managers internally? Or is all of that at too low a level to be +> helpful for this problem? + +mm uses shmem... This feature could be implemented in local bufmgr +directly: when requested buffer is not found in pool and there is no free, +!dirty buffer then try to find some dirty buffer of created relation, flush +it to disk and use (exception below); if no such buffer -> create some relation +(and flush 1st block); exception: also create some relation if # of buffers +occupied by already created relations is too small (just to do not break +buffering of created relations). +(Note, that using some additional in-memory storage manager will cause +keeping some buffers in-memory twice - in local pool and in manager. +The way above is using local bufmgr as storage manager). + +> > +> > I'm still sure that handling subselects in parser is not right way. +> > And the main problem is not in execution plans (we could use tricks +> > to resolve this) but in performance. +> +> Seems to me that the subselect needs to stay untransformed (i.e. executable but +> non-optimized) so that an optimizer can independently decide how to transform for +> faster execution. That way, in the first implementation we have reliable but +> stupid execution, but then can add a subselect optimizer which looks for cases +> which can be transformed to run faster. + +Yes, I believe that this is right way. + +> +> > > Especially if we trade him for help on his favorite topics for v6.4?? +> > +> > Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) +> +> Sure. (Tell me what it is later :) + +Ok -:) + +Vadim + +From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:21 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08884 + for ; Tue, 23 Dec 1997 04:01:18 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA24250 for ; Tue, 23 Dec 1997 03:57:12 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028; + Tue, 23 Dec 1997 16:04:25 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su> +Date: Tue, 23 Dec 1997 16:04:23 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: lockhart@alumni.caltech.edu, hackers@postgresql.org +Subject: Re: [HACKERS] Items for 6.3 +References: <199712191607.LAA02362@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> > +> > I didn't think about unions at all... Yes, it's simpler to implement. +> > BTW, I recall Bruce mentioned that unions are used for selects from +> > superclass and all descendant classes (select ... from table* ) - maybe +> > something is already implemented ? Bruce ? +> +> Yes, it is already there. See optimizer/prep/prepunion.c, and see the +> call to it from optimizer/plan/planner.c. The current source tree has a +> cleaned up version that will be easier to understand. Basically, if +> there are any inherited tables, it calls prepunion, and and cycles +> through each inherited table, copying the Query plan, and calling the +> planner() for each one, then it returns to the planner() to so sorting +> and uniqueness. I am working on fixing aggregates. + +Could you try with unions ? +I would like to concentrate on single thing - subqueries. + +> +> > mm uses shmem... This feature could be implemented in local bufmgr +> > directly: when requested buffer is not found in pool and there is no free, +> > !dirty buffer then try to find some dirty buffer of created relation, flush +> > it to disk and use (exception below); if no such buffer -> create some relation +> > (and flush 1st block); exception: also create some relation if # of buffers +> > occupied by already created relations is too small (just to do not break +> > buffering of created relations). +> > (Note, that using some additional in-memory storage manager will cause +> > keeping some buffers in-memory twice - in local pool and in manager. +> > The way above is using local bufmgr as storage manager). +> +> In the psort code, we do a nice job of keeping the stuff in files or +> memory. Seems to work well. Can we use that somehow? Perhaps make it +> a separate module, or just force a psort rather than a hash! + +I would like to be not restricted to psort only, but use what is better +in each case. I even can foresee using indices on temp tables: we could +put data in index without putting data in table itself! +In any case, we can leave in-memory tables for future. + +Vadim + +From owner-pgsql-hackers@hub.org Tue Dec 23 04:31:23 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09186 + for ; Tue, 23 Dec 1997 04:31:20 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA24391 for ; Tue, 23 Dec 1997 04:04:44 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA06421; Tue, 23 Dec 1997 04:00:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Dec 1997 03:58:36 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id DAA06163 for pgsql-hackers-outgoing; Tue, 23 Dec 1997 03:58:32 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id DAA06151 for ; Tue, 23 Dec 1997 03:58:02 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028; + Tue, 23 Dec 1997 16:04:25 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su> +Date: Tue, 23 Dec 1997 16:04:23 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org +Subject: Re: [HACKERS] Items for 6.3 +References: <199712191607.LAA02362@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> > +> > I didn't think about unions at all... Yes, it's simpler to implement. +> > BTW, I recall Bruce mentioned that unions are used for selects from +> > superclass and all descendant classes (select ... from table* ) - maybe +> > something is already implemented ? Bruce ? +> +> Yes, it is already there. See optimizer/prep/prepunion.c, and see the +> call to it from optimizer/plan/planner.c. The current source tree has a +> cleaned up version that will be easier to understand. Basically, if +> there are any inherited tables, it calls prepunion, and and cycles +> through each inherited table, copying the Query plan, and calling the +> planner() for each one, then it returns to the planner() to so sorting +> and uniqueness. I am working on fixing aggregates. + +Could you try with unions ? +I would like to concentrate on single thing - subqueries. + +> +> > mm uses shmem... This feature could be implemented in local bufmgr +> > directly: when requested buffer is not found in pool and there is no free, +> > !dirty buffer then try to find some dirty buffer of created relation, flush +> > it to disk and use (exception below); if no such buffer -> create some relation +> > (and flush 1st block); exception: also create some relation if # of buffers +> > occupied by already created relations is too small (just to do not break +> > buffering of created relations). +> > (Note, that using some additional in-memory storage manager will cause +> > keeping some buffers in-memory twice - in local pool and in manager. +> > The way above is using local bufmgr as storage manager). +> +> In the psort code, we do a nice job of keeping the stuff in files or +> memory. Seems to work well. Can we use that somehow? Perhaps make it +> a separate module, or just force a psort rather than a hash! + +I would like to be not restricted to psort only, but use what is better +in each case. I even can foresee using indices on temp tables: we could +put data in index without putting data in table itself! +In any case, we can leave in-memory tables for future. + +Vadim + + +From aixssd!darrenk@abs.net Thu Dec 5 10:30:53 1996 +Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for ; Thu, 5 Dec 1996 10:30:43 -0500 (EST) +Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST) +Received: by aixssd (AIX 3.2/UCB 5.64/4.03) + id AA36963; Thu, 5 Dec 1996 10:10:24 -0500 +Received: by ceodev (AIX 4.1/UCB 5.64/4.03) + id AA34942; Thu, 5 Dec 1996 10:07:56 -0500 +Date: Thu, 5 Dec 1996 10:07:56 -0500 +From: aixssd!darrenk@abs.net (Darren King) +Message-Id: <9612051507.AA34942@ceodev> +To: maillist@candle.pha.pa.us +Subject: Subselect info. +Mime-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Content-Md5: jaWdPH2KYtdr7ESzqcOp5g== +Status: OR + +> Any of them deal with implementing subselects? + +There's a white paper at the www.sybase.com that might +help a little. It's just a copy of a presentation +given by the optimizer guru there. Nothing code-wise, +but he gives a few ways of flattening them with temp +tables, etc... + +Darren + +From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109 + for ; Thu, 21 Aug 1997 23:42:43 -0400 (EDT) +Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD) +Sender: root@www.krasnet.ru +Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> +Date: Fri, 22 Aug 1997 12:04:31 +0800 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +Subject: Re: subselects +References: <199708220219.WAA23745@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Considering the complexity of the primary/secondary changes you are +> making, I believe subselects will be easier than that. + +I don't do changes for P/F keys - just thinking... +Yes, I think that impl of referential integrity is +more complex work. + +As for subselects: + +in plannodes.h + +typedef struct Plan { +... + struct Plan *lefttree; + struct Plan *righttree; +} Plan; + +/* ---------------- + * these are are defined to avoid confusion problems with "left" + ^^^^^^^^^^^^^^^^^^ + * and "right" and "inner" and "outer". The convention is that + * the "left" plan is the "outer" plan and the "right" plan is + * the inner plan, but these make the code more readable. + * ---------------- + */ +#define innerPlan(node) (((Plan *)(node))->righttree) +#define outerPlan(node) (((Plan *)(node))->lefttree) + +First thought is avoid any confusions by re-defining + +#define rightPlan(node) (((Plan *)(node))->righttree) +#define leftPlan(node) (((Plan *)(node))->lefttree) + +and change all occurrences of 'outer' & 'inner' in code +to 'left' & 'inner' ones: + +this will allow to use 'outer' & 'inner' things for subselects +latter, without confusion. My hope is that we may change Executor +very easy by adding outer/inner plans/TupleSlots to +EState, CommonState, JoinState, etc and by doing node +processing in right order. + +Subselects are mostly Planner problem. + +Unfortunately, I havn't time at the moment: CHECK/DEFAULT... + +Vadim + +From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354 + for ; Fri, 22 Aug 1997 00:00:51 -0400 (EDT) +Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD) +Sender: root@www.krasnet.ru +Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su> +Date: Fri, 22 Aug 1997 12:22:37 +0800 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +Subject: Re: subselects +References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Vadim B. Mikheev wrote: +> +> this will allow to use 'outer' & 'inner' things for subselects +> latter, without confusion. My hope is that we may change Executor + +Or may be use 'high' & 'low' for subselecs (to avoid confusion +with outter hoins). + +> very easy by adding outer/inner plans/TupleSlots to +> EState, CommonState, JoinState, etc and by doing node +> processing in right order. + ^^^^^^^^^^^^^^ +Rule is easy: +1. Uncorrelated subselect - do 'low' plan node first +2. Correlated - do left/right first + +- just some flag in structures. + +Vadim + +From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682 + for ; Thu, 30 Oct 1997 17:02:28 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for ; Thu, 30 Oct 1997 16:57:54 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726 + for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST) +From: Bruce Momjian +Message-Id: <199710302150.QAA07726@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +The only thing I have to add to what I had written earlier is that I +think it is best to have these subqueries executed as early in query +execution as possible. + +Every piece of the backend: parser, optimizer, executor, is designed to +work on a single query. The earlier we can split up the queries, the +better those pieces will work at doing their job. You want to be able +to use the parser and optimizer on each part of the query separately, if +you can. + + +Forwarded message: +> I have done some thinking about subselects. There are basically two +> issues: + > +> Does the query return one row or several rows? This can be +> determined by seeing if the user uses equals on 'IN' to join the +> subquery. +> +> Is the query correlated, meaning "Does the subquery reference +> values from the outer query?" +> +> (We already have the third type of subquery, the INSERT...SELECT query.) +> +> So we have these four combinations: +> +> 1) one row, no correlation +> 2) multiple rows, no correlation +> 3) one row, correlated +> 4) multiple rows, correlated +> +> +> With #1, we can execute the subquery, get the value, replace the +> subquery with the constant returned from the subquery, and execute the +> outer query. +> +> With #2, we can execute the subquery and put the result into a temporary +> table. We then rewrite the outer query to access the temporary table +> and replace the subquery with the column name from the temporary table. +> We probabally put an index on the temp. table, which has only one +> column, because a subquery can only return one column. We remove the +> temp. table after query execution. +> +> With #3 and #4, we potentially need to execute the subquery for every +> row returned by the outer query. Performance would be horrible for +> anything but the smallest query. Another way to handle this is to +> execute the subquery WITHOUT using any of the outer-query columns to +> restrict the WHERE clause, and add those columns used to join the outer +> variables into the target list of the subquery. So for query: +> +> select t1.name +> from tab t1 +> where t1.age = (select max(t2.age) +> from tab2 +> where tab2.name = t1.name) +> +> Execute the subquery and put it in a temporary table: +> +> select t2.name, max(t2.age) +> into table temp999 +> from tab2 +> where tab2.name = t1.name +> +> create index i_temp999 on temp999 (name) +> +> Then re-write the outer query: +> +> select t1.name +> from tab t1, temp999 +> where t1.age = temp999.age and +> t1.name = temp999.name +> +> The only problem here is that the subselect is running for all entries +> in tab2, even if the outer query is only going to need a few rows. +> Determining whether to execute the subquery each time, or create a temp. +> table is often difficult to determine. Even some non-correlated +> subqueries are better to execute for each row rather the pre-execute the +> entire subquery, expecially if the outer query returns few rows. +> +> One requirement to handle these issues is better column statistics, +> which I am working on. +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643 + for ; Fri, 31 Oct 1997 22:30:56 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for ; Fri, 31 Oct 1997 22:06:08 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for ; Fri, 31 Oct 1997 22:00:53 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566; + Fri, 31 Oct 1997 21:37:06 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711010237.VAA14566@candle.pha.pa.us> +Subject: Re: [HACKERS] subselects +To: maillist@candle.pha.pa.us (Bruce Momjian) +Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +One more issue I thought of. You can have multiple subselects in a +single query, and subselects can have their own subselects. + +This makes it particularly important that we define a system that always +is able to process the subselect BEFORE the upper select. This will +allow use to handle all these cases without limitations. + +> +> The only thing I have to add to what I had written earlier is that I +> think it is best to have these subqueries executed as early in query +> execution as possible. +> +> Every piece of the backend: parser, optimizer, executor, is designed to +> work on a single query. The earlier we can split up the queries, the +> better those pieces will work at doing their job. You want to be able +> to use the parser and optimizer on each part of the query separately, if +> you can. +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From hannu@trust.ee Sun Nov 2 10:33:33 1997 +Received: from sid.trust.ee (sid.trust.ee [194.204.23.180]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619 + for ; Sun, 2 Nov 1997 10:32:04 -0500 (EST) +Received: from sid.trust.ee (wink.trust.ee [194.204.23.184]) + by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233; + Sun, 2 Nov 1997 17:30:11 +0200 +Message-ID: <345C9BFD.986C68AA@sid.trust.ee> +Date: Sun, 02 Nov 1997 17:27:57 +0200 +From: Hannu Krosing +X-Mailer: Mozilla 4.02 [en] (Win95; I) +MIME-Version: 1.0 +To: hackers-digest@postgresql.org +CC: maillist@candle.pha.pa.us +Subject: Re: [HACKERS] subselects +References: <199711010401.XAA09216@hub.org> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) +> From: Bruce Momjian +> Subject: Re: [HACKERS] subselects +> +> One more issue I thought of. You can have multiple subselects in a +> single query, and subselects can have their own subselects. +> +> This makes it particularly important that we define a system that always +> is able to process the subselect BEFORE the upper select. This will +> allow use to handle all these cases without limitations. + +This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a +search criteria for the subselect, +for example you can't do + +update parts p1 +set parts.current_id = ( + select new_id + from parts p2 + where p1.old_id = p2.new_id);or + +select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice +from parts p1; + +there may be of course ways to rewrite these queries (which the optimiser should do +if it can) but IMHO, these kinds of subselects should still be allowed + +> > The only thing I have to add to what I had written earlier is that I +> > think it is best to have these subqueries executed as early in query +> > execution as possible. +> > +> > Every piece of the backend: parser, optimizer, executor, is designed to +> > work on a single query. The earlier we can split up the queries, the +> > better those pieces will work at doing their job. You want to be able +> > to use the parser and optimizer on each part of the query separately, if +> > you can. +> > +> + +Hannu + + +From vadim@sable.krasnoyarsk.su Sun Nov 2 21:30:59 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831 + for ; Sun, 2 Nov 1997 21:30:57 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for ; Sun, 2 Nov 1997 21:20:13 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su> +Date: Mon, 03 Nov 1997 09:22:38 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselects +References: <199711021848.NAA08319@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > One more issue I thought of. You can have multiple subselects in a +> > > single query, and subselects can have their own subselects. +> > > +> > > This makes it particularly important that we define a system that always +> > > is able to process the subselect BEFORE the upper select. This will +> > > allow use to handle all these cases without limitations. +> > +> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a +> > search criteria for the subselect, +> > for example you can't do +> > +> > update parts p1 +> > set parts.current_id = ( +> > select new_id +> > from parts p2 +> > where p1.old_id = p2.new_id);or +> > +> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice +> > from parts p1; +> > +> > there may be of course ways to rewrite these queries (which the optimiser should do +> > if it can) but IMHO, these kinds of subselects should still be allowed +> +> I hadn't even gotten to this point yet, but it is a good thing to keep +> in mind. +> +> In these cases, as in correlated subqueries in the where clause, we will +> create a temporary table, and add the proper join fields and tables to +> the clauses. Our version of UPDATE accepts a FROM section, and we will +> certainly use this for this purpose. + +We can't replace subselect with join if there is aggregate +in subselect. + +Actually, I don't see any problems if we going to process subselect +like sql-funcs: non-correlated subselects can be emulated by +funcs without args, for correlated subselects parser (analyze.c) +has to change all upper query references to $1, $2,... + +Vadim + +From vadim@sable.krasnoyarsk.su Mon Nov 3 06:07:12 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433 + for ; Mon, 3 Nov 1997 06:07:03 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su> +Date: Mon, 03 Nov 1997 18:09:43 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselects +References: <199711030316.WAA15401@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > +> > > In these cases, as in correlated subqueries in the where clause, we will +> > > create a temporary table, and add the proper join fields and tables to +> > > the clauses. Our version of UPDATE accepts a FROM section, and we will +> > > certainly use this for this purpose. +> > +> > We can't replace subselect with join if there is aggregate +> > in subselect. +> +> I got lost here. Why can't we handle aggregates? + +Sorry, I missed using of temp tables. Sybase uses joins (without +temp tables) for non-correlated subqueries: + + A noncorrelated subquery can be evaluated as if it were an independent query. + Conceptually, the results of the subquery are substituted in the main statement, or + outer query. This is not how SQL Server actually processes statements with + subqueries. Noncorrelated subqueries can be alternatively stated as joins and + are processed as joins by SQL Server. + +but this is not possible if there are aggregates in subquery. + +> +> My idea was this. This is a non-correlated subquery. +... +No problems with it... + +> +> Here is a correlated example: +> +> select * +> from table_a +> where table_a.col_a in (select table_b.col_b +> from table_b +> where table_b.col_b = table_a.col_c) +> +> rewrite as: +> +> select distinct table_b.col_b, table_a.col_c -- the distinct is needed +> into table_sub +> from table_a, table_b + +First, could we add 'where table_b.col_b = table_a.col_c' here ? +Just to avoid Cartesian results ? I hope we can. + +Note that for query + + select * + from table_a + where table_a.col_a in (select table_b.col_b * table_a.col_c + from table_b) + +it's better to do + + select distinct table_a.col_a + into table table_sub + from table_b, table_a + where table_a.col_a = table_b.col_b * table_a.col_c + +once again - to avoid Cartesians. + +But what could we do for + + select * + from table_a + where table_a.col_a = (select max(table_b.col_b * table_a.col_c) + from table_b) +??? + select max(table_b.col_b * table_a.col_c), table_a.col_a + into table table_sub + from table_b, table_a + group by table_a.col_a + +first tries to sort sizeof(table_a) * sizeof(table_b) tuples... +For tables big and small with 100 000 and 1000 tuples + +select max(x*y), x from big, small group by x + +"ate" all free 140M in my file system after 20 minutes (just for +sorting - nothing more) and was killed... + +select x from big where x = cor(x); +(cor(int4) is 'select max($1*y) from small') takes 20 minutes - +this is bad too. + +> > +> > Actually, I don't see any problems if we going to process subselect +> > like sql-funcs: non-correlated subselects can be emulated by +> > funcs without args, for correlated subselects parser (analyze.c) +> > has to change all upper query references to $1, $2,... +> +> Yes, logically, they are SQL functions, but aren't we going to see +> terrible performance in such circumstances. My experience is that when + ^^^^^^^^^^^^^^^^^^^^ +You're right. + +> people are given subselects, they start to do huge jobs with them. +> +> In fact, the final solution may be to have both methods available, and +> switch between them depending on the size of the query sets. Each +> method has its advantages. The function example lets the outside query +> be executed, and only calls the subquery when needed. +> +> For large tables where the subselect is small and is the entire WHERE +> restriction, the SQL function gets call much too often. A simple join +> of the subquery result and the large table would be much better. This +> method also allows for sort/merge join of the subquery results, and +> index use. + +...keep thinking... + +Vadim + +From owner-pgsql-hackers@hub.org Mon Nov 3 11:01:01 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633 + for ; Mon, 3 Nov 1997 11:00:59 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for ; Mon, 3 Nov 1997 10:49:42 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for ; Mon, 3 Nov 1997 10:31:23 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262; + Mon, 3 Nov 1997 10:25:34 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711031525.KAA02262@candle.pha.pa.us> +Subject: Re: [HACKERS] subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> Sorry, I missed using of temp tables. Sybase uses joins (without +> temp tables) for non-correlated subqueries: +> +> A noncorrelated subquery can be evaluated as if it were an independent query. +> Conceptually, the results of the subquery are substituted in the main statement, or +> outer query. This is not how SQL Server actually processes statements with +> subqueries. Noncorrelated subqueries can be alternatively stated as joins and +> are processed as joins by SQL Server. +> +> but this is not possible if there are aggregates in subquery. +> +> > +> > My idea was this. This is a non-correlated subquery. +> ... +> No problems with it... +> +> > +> > Here is a correlated example: +> > +> > select * +> > from table_a +> > where table_a.col_a in (select table_b.col_b +> > from table_b +> > where table_b.col_b = table_a.col_c) +> > +> > rewrite as: +> > +> > select distinct table_b.col_b, table_a.col_c -- the distinct is needed +> > into table_sub +> > from table_a, table_b +> +> First, could we add 'where table_b.col_b = table_a.col_c' here ? +> Just to avoid Cartesian results ? I hope we can. + +Yes, of course. I forgot that line here. We can also be fancy and move +some of the outer where restrictions on table_a into the subquery. + +I think the classic subquery for this would be if someone wanted all +customer names that had invoices in the past month: + +select custname +from customer +where custid in (select order.custid + from order + where order.date >= "09/01/97" and + order.date <= "09/30/97" + +In this case, the subquery can use an index on 'date' to quickly +evaluate the query, and the resulting temp table can quickly be joined +to the customer table. If we used SQL functions, every customer would +have an order query evaluated for it, and there may be no multi-column +index on customer and date, or even if there is, this could be many +query executions. + + +> +> Note that for query +> +> select * +> from table_a +> where table_a.col_a in (select table_b.col_b * table_a.col_c +> from table_b) +> +> it's better to do +> +> select distinct table_a.col_a +> into table table_sub +> from table_b, table_a +> where table_a.col_a = table_b.col_b * table_a.col_c + +Yes, I had not thought of cases where they are doing correlated column +arithmetic, but it looks like this would work. + +> +> once again - to avoid Cartesians. +> +> But what could we do for +> +> select * +> from table_a +> where table_a.col_a = (select max(table_b.col_b * table_a.col_c) +> from table_b) + +OK, who wrote this horrible query. :-) + +Without a join of table_b and table_a, even an SQL function would die on +this. You have to take the current value table_a.col_c, and multiply by +every value of table_b.col_b to get the maximum. + +Trying to do a temp table on this is certainly going to be a cartesian +product, but using an SQL function is also going to be a cartesian +product, except that the product is generated in small pieces instead of +in one big query. The SQL function example may eventually complete, but +it will take forever to do so in cases where the temp table would bomb. + +I can recommend some SQL books for anyone go sends in a bug report on +this query. :-) + + + +> ??? +> select max(table_b.col_b * table_a.col_c), table_a.col_a +> into table table_sub +> from table_b, table_a +> group by table_a.col_a +> +> first tries to sort sizeof(table_a) * sizeof(table_b) tuples... +> For tables big and small with 100 000 and 1000 tuples +> +> select max(x*y), x from big, small group by x +> +> "ate" all free 140M in my file system after 20 minutes (just for +> sorting - nothing more) and was killed... +> +> select x from big where x = cor(x); +> (cor(int4) is 'select max($1*y) from small') takes 20 minutes - +> this is bad too. + +Again, my feeling is that in cases where the temp table would bomb, the +SQL function will be so slow that neither will be acceptable. + +> +> > > +> > > Actually, I don't see any problems if we going to process subselect +> > > like sql-funcs: non-correlated subselects can be emulated by +> > > funcs without args, for correlated subselects parser (analyze.c) +> > > has to change all upper query references to $1, $2,... +> > +> > Yes, logically, they are SQL functions, but aren't we going to see +> > terrible performance in such circumstances. My experience is that when +> ^^^^^^^^^^^^^^^^^^^^ +> You're right. +> +> > people are given subselects, they start to do huge jobs with them. +> > +> > In fact, the final solution may be to have both methods available, and +> > switch between them depending on the size of the query sets. Each +> > method has its advantages. The function example lets the outside query +> > be executed, and only calls the subquery when needed. +> > +> > For large tables where the subselect is small and is the entire WHERE +> > restriction, the SQL function gets call much too often. A simple join +> > of the subquery result and the large table would be much better. This +> > method also allows for sort/merge join of the subquery results, and +> > index use. +> +> ...keep thinking... +> +> Vadim +> + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239 + for ; Thu, 20 Nov 1997 00:09:11 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for ; Wed, 19 Nov 1997 23:58:16 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103 + for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711200457.XAA03103@candle.pha.pa.us> +Subject: [HACKERS] subselect +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +I am going to overhaul all the /parser files, and I may give subselects +a try while I am in there. This is where it going to have to be done. + +Two things I think I need are: + + temp tables that go away at the end of a statement, so if the +query elog's out, the temp file gets destroyed + + how do I implement "not in": + + select * from a where x not in (select y from b) + +Using <> is not going to work because that returns multiple copies of a, +one for every one that doesn't equal. It is like we need not equals, +but don't return multiple rows. + +Any ideas? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019 + for ; Thu, 20 Nov 1997 10:00:56 -0500 (EST) +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for ; Thu, 20 Nov 1997 09:52:55 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754; + Thu, 20 Nov 1997 06:27:21 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <3473D849.16F67A2A@alumni.caltech.edu> +Date: Thu, 20 Nov 1997 06:27:21 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselect +References: <199711200457.XAA03103@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> I am going to overhaul all the /parser files + +?? + +> , and I may give subselects +> a try while I am in there. This is where it going to have to be done. + +A first cut at the subselect syntax is already in gram.y. I'm sure that the +e-mail you had sent which collected several items regarding subselects +covers some of this topic. I've been thinking about subselects also, and +had thought that there must be some existing mechanisms in the backend +which can be used to help implement subselects. It seems to me that UNION +might be a good thing to implement first, because it has a fairly +well-defined set of behaviors: + + select a union select b; + +chooses elements from a and from b and then sorts/uniques the result. + + select a union all select b; + +chooses elements from a, sorts/uniques, and then adds all elements from b. + + select a union select b union all select c; + +evaluates left to right, and first evaluates a union b, sorts/uniques, and +then evaluates + + (result) union all select c; + +There are several types of subselects. Examples of some are: + +1) select a.f from a union select b.f from b order by 1; +Needs temporary table(s), optional sort/unique, final order by. + +2) select a.f from a where a.f in (select b.f from b); +Needs temporary table(s). "in" can be first implemented by count(*) > 0 but +would be better performance to have the backend return after the first +match. + +3) select a.f from a where exists (select b.f from b where b.f = a); +Need to do the select and do a subselect on _each_ of the returned values? +Again could use count(*) to help implement. + +This brings up the point that perhaps the backend needs a row-counting +atomic operation and count(*) could be re-implemented using that. At the +moment count(*) is transformed to a select of OID columns and does not +quite work on table joins. + +I would think that outer joins could use some of these support routines +also. + + - Tom + +> Two things I think I need are: +> +> temp tables that go away at the end of a statement, so if the +> query elog's out, the temp file gets destroyed +> +> how do I implement "not in": +> +> select * from a where x not in (select y from b) +> +> Using <> is not going to work because that returns multiple copies of a, +> one for every one that doesn't equal. It is like we need not equals, +> but don't return multiple rows. +> +> Any ideas? +> +> -- +> Bruce Momjian +> maillist@candle.pha.pa.us + + + + +From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311 + for ; Mon, 22 Dec 1997 00:49:01 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for ; Mon, 22 Dec 1997 00:44:57 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605 + for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712220545.AAA11605@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +OK, a few questions: + + Should we use sortmerge, so we can use our psort as temp tables, +or do we use hashunique? + + How do we pass the query to the optimizer? How do we represent +the range table for each, and the links between them in correlated +subqueries? + +I have to think about this. Comments are welcome. +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608 + for ; Mon, 22 Dec 1997 02:01:25 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for ; Mon, 22 Dec 1997 01:37:29 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for ; Mon, 22 Dec 1997 01:30:15 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354 + for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712220605.BAA17354@candle.pha.pa.us> +Subject: [HACKERS] subselects (fwd) +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Forwarded message: +> OK, a few questions: +> +> Should we use sortmerge, so we can use our psort as temp tables, +> or do we use hashunique? +> +> How do we pass the query to the optimizer? How do we represent +> the range table for each, and the links between them in correlated +> subqueries? +> +> I have to think about this. Comments are welcome. + +One more thing. I guess I am seeing subselects as a different thing +that temp tables. I can see people wanting to put indexes on their temp +tables, so I think they will need more system catalog support. For +subselects, I think we can just stuff them into psort, perhaps, and do +the unique as we unload them. + +Seems like a natural to me. + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876 + for ; Tue, 23 Dec 1997 04:00:57 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042; + Tue, 23 Dec 1997 16:08:56 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su> +Date: Tue, 23 Dec 1997 16:08:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselects (fwd) +References: <199712220605.BAA17354@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Forwarded message: +> > OK, a few questions: +> > +> > Should we use sortmerge, so we can use our psort as temp tables, +> > or do we use hashunique? +> > +> > How do we pass the query to the optimizer? How do we represent +> > the range table for each, and the links between them in correlated +> > subqueries? +> > +> > I have to think about this. Comments are welcome. +> +> One more thing. I guess I am seeing subselects as a different thing +> that temp tables. I can see people wanting to put indexes on their temp +> tables, so I think they will need more system catalog support. For + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +What's the difference between temp tables and temp indices ? +Both of them are handled via catalog cache... + +Vadim + +From vadim@sable.krasnoyarsk.su Sat Jan 3 04:01:00 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565 + for ; Sat, 3 Jan 1998 04:00:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for ; Sat, 3 Jan 1998 03:47:07 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017; + Sat, 3 Jan 1998 16:08:55 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su> +Date: Sat, 03 Jan 1998 16:08:51 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian , + "Thomas G. Lockhart" +Subject: Re: subselects +References: <199712290516.AAA12579@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> With UNIONs done, how are things going with you on subselects? UNIONs +> are much easier that subselects. +> +> I am stumped on how to record the subselect query information in the +> parser and stuff. + + And I'm too. We definitely need in EXISTS node and may be in IN one. +Also, we have to support ANY and ALL modifiers of comparison operators +(it would be nice to support ANY and ALL for all operators returning +bool: >, =, ..., like, ~ and so on). Note, that IN is the same as += ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types, +and so, we could avoid IN node, but I'm not sure that I like such +assumption: postgres is OO-like system allowing operators to be overriden +and so, '=' can, in theory, mean not EQUAL but something else (someday +we could allow to specify "meaning" of operator in CREATE OPERATOR) - +in short, I would like IN node. + Also, I would suggest nodes for ANY and ALL. + (I need in few days to think more about recording of this stuff...) + +> +> Please let me know what I can do to help, if anything. + +Thanks. As I remember, Tom also wished to work here. Tom ? + +Bye, + Vadim + +P.S. I'll be "on-line" Jan 5. + +From owner-pgsql-hackers@hub.org Mon Jan 5 07:30:51 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466 + for ; Mon, 5 Jan 1998 07:30:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for ; Mon, 5 Jan 1998 07:20:57 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278; + Mon, 5 Jan 1998 19:36:06 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> +Date: Mon, 05 Jan 1998 19:35:59 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselect +References: <199801050516.AAA28005@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> I was thinking about subselects, and how to attach the two queries. +> +> What if the subquery makes a range table entry in the outer query, and +> the query is set up like the UNION queries where we put the scans in a +> row, but in the case we put them over/under each other. +> +> And we push a temp table into the catalog cache that represents the +> result of the subquery, then we could join to it in the outer query as +> though it was a real table. +> +> Also, can't we do the correlated subqueries by adding the proper +> target/output columns to the subquery, and have the outer query +> reference those columns in the subquery range table entry. + +Yes, this is a way to handle subqueries by joining to temp table. +After getting plan we could change temp table access path to +node material. On the other hand, it could be useful to let optimizer +know about cost of temp table creation (have to think more about it)... +Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +is one example of this - joining by <> will give us invalid results. +Setting special NOT EQUAL flag is not enough: subquery plan must be +always inner one in this case. The same for handling ALL modifier. +Note, that we generaly can't use aggregates here: we can't add MAX to +subquery in the case of > ALL (subquery), because of > ALL should return FALSE +if subquery returns NULL(s) but aggregates don't take NULLs into account. + +> +> Maybe I can write up a sample of this? Vadim, would this help? Is this +> the point we are stuck at? + +Personally, I was stuck by holydays -:) +Now I can spend ~ 8 hours ~ each day for development... + +Vadim + + +From owner-pgsql-hackers@hub.org Mon Jan 5 10:45:30 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769 + for ; Mon, 5 Jan 1998 10:45:28 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for ; Mon, 5 Jan 1998 10:31:06 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375; + Mon, 5 Jan 1998 10:28:48 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801051528.KAA10375@candle.pha.pa.us> +Subject: Re: [HACKERS] subselect +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> Yes, this is a way to handle subqueries by joining to temp table. +> After getting plan we could change temp table access path to +> node material. On the other hand, it could be useful to let optimizer +> know about cost of temp table creation (have to think more about it)... +> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +> is one example of this - joining by <> will give us invalid results. +> Setting special NOT EQUAL flag is not enough: subquery plan must be +> always inner one in this case. The same for handling ALL modifier. +> Note, that we generaly can't use aggregates here: we can't add MAX to +> subquery in the case of > ALL (subquery), because of > ALL should return FALSE +> if subquery returns NULL(s) but aggregates don't take NULLs into account. + +OK, here are my ideas. First, I think you have to handle subselects in +the outer node because a subquery could have its own subquery. Also, we +now have a field in Aggreg to all us to 'usenulls'. + +OK, here it is. I recommend we pass the outer and subquery through +the parser and optimizer separately. + +We parse the subquery first. If the subquery is not correlated, it +should parse fine. If it is correlated, any columns we find in the +subquery that are not already in the FROM list, we add the table to the +subquery FROM list, and add the referenced column to the target list of +the subquery. + +When we are finished parsing the subquery, we create a catalog cache +entry for it called 'sub1' and make its fields match the target +list of the subquery. + +In the outer query, we add 'sub1' to its target list, and change +the subquery reference to point to the new range table. We also add +WHERE clauses to do any correlated joins. + +Here is a simple example: + + select * + from taba + where col1 = (select col2 + from tabb) + +This is not correlated, and the subquery parser easily. We create a +'sub1' catalog cache entry, and add 'sub1' to the outer query FROM +clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'. + +Here is a more complex correlated subquery: + + select * + from taba + where col1 = (select col2 + from tabb + where taba.col3 = tabb.col4) + +Here we must add 'taba' to the subquery's FROM list, and add col3 to the +target list of the subquery. After we parse the subquery, add 'sub1' to +the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = +sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. +THe optimizer will do the correlation for us. + +In the optimizer, we can parse the subquery first, then the outer query, +and then replace all 'sub1' references in the outer query to use the +subquery plan. + +I realize making merging the two plans and doing IN and NOT IN is the +real challenge, but I hoped this would give us a start. + +What do you think? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Mon Jan 5 15:02:46 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690 + for ; Mon, 5 Jan 1998 15:02:44 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for ; Mon, 5 Jan 1998 14:28:43 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904; + Tue, 6 Jan 1998 02:56:00 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 02:55:57 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801051528.KAA10375@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > always inner one in this case. The same for handling ALL modifier. +> > Note, that we generaly can't use aggregates here: we can't add MAX to +> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE +> > if subquery returns NULL(s) but aggregates don't take NULLs into account. +> +> OK, here are my ideas. First, I think you have to handle subselects in +> the outer node because a subquery could have its own subquery. Also, we + +I hope that this is no matter: if results of subquery (with/without sub-subqueries) +will go into temp table then this table will be re-scanned for each outer tuple. + +> now have a field in Aggreg to all us to 'usenulls'. + ^^^^^^^^ + This can't help: + +vac=> select * from x; +y +- +1 +2 +3 + <<< this is NULL +(4 rows) + +vac=> select max(y) from x; +max +--- + 3 + +==> we can't replace + +select * from A where A.a > ALL (select y from x); + ^^^^^^^^^^^^^^^ + (NULL will be returned and so A.a > ALL is FALSE - this is what + Sybase does, is it right ?) +with + +select * from A where A.a > (select max(y) from x); + ^^^^^^^^^^^^^^^^^^^^ +just because of we lose knowledge about NULLs here. + +Also, I would like to handle ANY and ALL modifiers for all bool +operators, either built-in or user-defined, for all data types - +isn't PostgreSQL OO-like RDBMS -:) + +> OK, here it is. I recommend we pass the outer and subquery through +> the parser and optimizer separately. + +I don't like this. I would like to get parse-tree from parser for +entire query and let optimizer (on upper level) decide how to rewrite +parse-tree and what plans to produce and how these plans should be +merged. Note, that I don't object your methods below, but only where +to place handling of this. I don't understand why should we add +new part to the system which will do optimizer' work (parse-tree --> +execution plan) and deal with optimizer nodes. Imho, upper optimizer +level is nice place to do this. + +> +> We parse the subquery first. If the subquery is not correlated, it +> should parse fine. If it is correlated, any columns we find in the +> subquery that are not already in the FROM list, we add the table to the +> subquery FROM list, and add the referenced column to the target list of +> the subquery. +> +> When we are finished parsing the subquery, we create a catalog cache +> entry for it called 'sub1' and make its fields match the target +> list of the subquery. +> +> In the outer query, we add 'sub1' to its target list, and change +> the subquery reference to point to the new range table. We also add +> WHERE clauses to do any correlated joins. +... +> Here is a more complex correlated subquery: +> +> select * +> from taba +> where col1 = (select col2 +> from tabb +> where taba.col3 = tabb.col4) +> +> Here we must add 'taba' to the subquery's FROM list, and add col3 to the +> target list of the subquery. After we parse the subquery, add 'sub1' to +> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = +> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. +> THe optimizer will do the correlation for us. +> +> In the optimizer, we can parse the subquery first, then the outer query, +> and then replace all 'sub1' references in the outer query to use the +> subquery plan. +> +> I realize making merging the two plans and doing IN and NOT IN is the + ^^^^^^^^^^^^^^^^^^^^^ +This is very easy to do! As I already said we have just change sub1 +access path (SeqScan of sub1) with SeqScan of Material node with +subquery plan. + +> real challenge, but I hoped this would give us a start. + +Decision about how to record subquery stuff in to parse-tree +would be very good start -:) + +BTW, note that for _expression_ subqueries (which are introduced without +IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - +we have to check that subquery returns single tuple... + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:03 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836 + for ; Mon, 5 Jan 1998 20:31:01 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for ; Mon, 5 Jan 1998 19:56:05 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for ; Mon, 5 Jan 1998 19:02:30 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337 + for ; Mon, 5 Jan 1998 17:31:04 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675; + Mon, 5 Jan 1998 17:16:40 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801052216.RAA02675@candle.pha.pa.us> +Subject: Re: [HACKERS] subselect +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> > I am confused. Do you want one flat query and want to pass the whole +> > thing into the optimizer? That brings up some questions: +> +> No. I just want to follow Tom's way: I would like to see new +> SubSelect node as shortened version of struct Query (or use +> Query structure for each subquery - no matter for me), some +> subquery-related stuff added to Query (and SubSelect) to help +> optimizer to start, and see + +OK, so you want the subquery to actually be INSIDE the outer query +expression. Do they share a common range table? If they don't, we +could very easily just fly through when processing the WHERE clause, and +start a new query using a new query structure for the subquery. Believe +me, you don't want a separate SubQuery-type, just re-use Query for it. +It allows you to call all the normal query stuff with a consistent +structure. + +The parser will need to know it is in a subquery, so it can add the +proper target columns to the subquery, or are you going to do that in +the optimizer. You can do it in the optimizer, and join the range table +references there too. + +> +> typedef struct A_Expr +> { +> NodeTag type; +> int oper; /* type of operation +> * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> IN, NOT IN, ANY, ALL, EXISTS here, +> +> char *opname; /* name of operator/function */ +> Node *lexpr; /* left argument */ +> Node *rexpr; /* right argument */ +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> and SubSelect (Query) here (as possible case). +> +> One thought to follow this way: RULEs (and so - VIEWs) are handled by using +> Query - how else can we implement VIEWs on selects with subqueries ? + +Views are stored as nodeout structures, and are merged into the query's +from list, target list, and where clause. I am working out +readfunc,outfunc now to make sure they are up-to-date with all the +current fields. + +> +> BTW, is +> +> select * from A where (select TRUE from B); +> +> valid syntax ? + +I don't think so. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Mon Jan 5 17:01:54 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066 + for ; Mon, 5 Jan 1998 17:01:47 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063; + Tue, 6 Jan 1998 05:18:13 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 05:18:11 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801052051.PAA29341@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > OK, here it is. I recommend we pass the outer and subquery through +> > > the parser and optimizer separately. +> > +> > I don't like this. I would like to get parse-tree from parser for +> > entire query and let optimizer (on upper level) decide how to rewrite +> > parse-tree and what plans to produce and how these plans should be +> > merged. Note, that I don't object your methods below, but only where +> > to place handling of this. I don't understand why should we add +> > new part to the system which will do optimizer' work (parse-tree --> +> > execution plan) and deal with optimizer nodes. Imho, upper optimizer +> > level is nice place to do this. +> +> I am confused. Do you want one flat query and want to pass the whole +> thing into the optimizer? That brings up some questions: + +No. I just want to follow Tom's way: I would like to see new +SubSelect node as shortened version of struct Query (or use +Query structure for each subquery - no matter for me), some +subquery-related stuff added to Query (and SubSelect) to help +optimizer to start, and see + +typedef struct A_Expr +{ + NodeTag type; + int oper; /* type of operation + * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + IN, NOT IN, ANY, ALL, EXISTS here, + + char *opname; /* name of operator/function */ + Node *lexpr; /* left argument */ + Node *rexpr; /* right argument */ + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + and SubSelect (Query) here (as possible case). + +One thought to follow this way: RULEs (and so - VIEWs) are handled by using +Query - how else can we implement VIEWs on selects with subqueries ? + +BTW, is + +select * from A where (select TRUE from B); + +valid syntax ? + +Vadim + +From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:57 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296 + for ; Mon, 5 Jan 1998 18:00:55 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for ; Mon, 5 Jan 1998 17:22:21 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; + Tue, 6 Jan 1998 05:49:02 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 05:48:58 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Goran Thyni +CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Goran Thyni wrote: +> +> Vadim, +> +> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +> is one example of this - joining by <> will give us invalid results. +> +> What is you approach towards this problem? + +Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) +and so, we have to have not just NOT EQUAL flag but some ALL node +with modified operator. + +After that, one way is put subquery into inner plan of an join node +to be sure that for an outer tuple all corresponding subquery tuples +will be tested with modified operator (this will require either +changing code of all join nodes or addition of new plan type - we'll see) +and another way is ... suggested by you: + +> I got an idea that one could reverse the order, +> that is execute the outer first into a temptable +> and delete from that according to the result of the +> subquery and then return it. +> Probably this is too raw and slow. ;-) + +This will be faster in some cases (when subquery returns many results +and there are "not so many" results from outer query) - thanks for idea! + +> +> Personally, I was stuck by holydays -:) +> Now I can spend ~ 8 hours ~ each day for development... +> +> Oh, isn't it christmas eve right now in Russia? + +Due to historic reasons New Year is mu-u-u-uch popular +holiday in Russia -:) + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 5 19:32:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070 + for ; Mon, 5 Jan 1998 19:32:57 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for ; Mon, 5 Jan 1998 18:59:43 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for ; Mon, 5 Jan 1998 19:02:25 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438 + for ; Mon, 5 Jan 1998 17:35:43 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; + Tue, 6 Jan 1998 05:49:02 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 05:48:58 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Goran Thyni +CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Goran Thyni wrote: +> +> Vadim, +> +> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN +> is one example of this - joining by <> will give us invalid results. +> +> What is you approach towards this problem? + +Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) +and so, we have to have not just NOT EQUAL flag but some ALL node +with modified operator. + +After that, one way is put subquery into inner plan of an join node +to be sure that for an outer tuple all corresponding subquery tuples +will be tested with modified operator (this will require either +changing code of all join nodes or addition of new plan type - we'll see) +and another way is ... suggested by you: + +> I got an idea that one could reverse the order, +> that is execute the outer first into a temptable +> and delete from that according to the result of the +> subquery and then return it. +> Probably this is too raw and slow. ;-) + +This will be faster in some cases (when subquery returns many results +and there are "not so many" results from outer query) - thanks for idea! + +> +> Personally, I was stuck by holydays -:) +> Now I can spend ~ 8 hours ~ each day for development... +> +> Oh, isn't it christmas eve right now in Russia? + +Due to historic reasons New Year is mu-u-u-uch popular +holiday in Russia -:) + +Vadim + + +From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300 + for ; Mon, 5 Jan 1998 18:00:57 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for ; Mon, 5 Jan 1998 17:42:15 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; + Tue, 6 Jan 1998 06:10:05 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 06:09:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801052216.RAA02675@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > > I am confused. Do you want one flat query and want to pass the whole +> > > thing into the optimizer? That brings up some questions: +> > +> > No. I just want to follow Tom's way: I would like to see new +> > SubSelect node as shortened version of struct Query (or use +> > Query structure for each subquery - no matter for me), some +> > subquery-related stuff added to Query (and SubSelect) to help +> > optimizer to start, and see +> +> OK, so you want the subquery to actually be INSIDE the outer query +> expression. Do they share a common range table? If they don't, we + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +No. + +> could very easily just fly through when processing the WHERE clause, and +> start a new query using a new query structure for the subquery. Believe + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +... and filling some subquery-related stuff in upper query structure - +still don't know what exactly this could be -:) + +> me, you don't want a separate SubQuery-type, just re-use Query for it. +> It allows you to call all the normal query stuff with a consistent +> structure. + +No objections. + +> +> The parser will need to know it is in a subquery, so it can add the +> proper target columns to the subquery, or are you going to do that in + +I don't think that we need in it, but list of correlation clauses +could be good thing - all in all parser has to check all column +references... + +> the optimizer. You can do it in the optimizer, and join the range table +> references there too. + +Yes. + +> > typedef struct A_Expr +> > { +> > NodeTag type; +> > int oper; /* type of operation +> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > IN, NOT IN, ANY, ALL, EXISTS here, +> > +> > char *opname; /* name of operator/function */ +> > Node *lexpr; /* left argument */ +> > Node *rexpr; /* right argument */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > and SubSelect (Query) here (as possible case). +> > +> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using +> > Query - how else can we implement VIEWs on selects with subqueries ? +> +> Views are stored as nodeout structures, and are merged into the query's +> from list, target list, and where clause. I am working out +> readfunc,outfunc now to make sure they are up-to-date with all the +> current fields. + +Nice! This stuff was out-of-date for too long time. + +> > BTW, is +> > +> > select * from A where (select TRUE from B); +> > +> > valid syntax ? +> +> I don't think so. + +And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, +ANY, ALL, EXISTS - well. + +(Time to sleep -:) + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:08 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842 + for ; Mon, 5 Jan 1998 20:31:06 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for ; Mon, 5 Jan 1998 20:03:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for ; Mon, 5 Jan 1998 19:02:50 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919 + for ; Mon, 5 Jan 1998 17:54:47 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; + Tue, 6 Jan 1998 06:10:05 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> +Date: Tue, 06 Jan 1998 06:09:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] subselect +References: <199801052216.RAA02675@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> > > I am confused. Do you want one flat query and want to pass the whole +> > > thing into the optimizer? That brings up some questions: +> > +> > No. I just want to follow Tom's way: I would like to see new +> > SubSelect node as shortened version of struct Query (or use +> > Query structure for each subquery - no matter for me), some +> > subquery-related stuff added to Query (and SubSelect) to help +> > optimizer to start, and see +> +> OK, so you want the subquery to actually be INSIDE the outer query +> expression. Do they share a common range table? If they don't, we + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +No. + +> could very easily just fly through when processing the WHERE clause, and +> start a new query using a new query structure for the subquery. Believe + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +... and filling some subquery-related stuff in upper query structure - +still don't know what exactly this could be -:) + +> me, you don't want a separate SubQuery-type, just re-use Query for it. +> It allows you to call all the normal query stuff with a consistent +> structure. + +No objections. + +> +> The parser will need to know it is in a subquery, so it can add the +> proper target columns to the subquery, or are you going to do that in + +I don't think that we need in it, but list of correlation clauses +could be good thing - all in all parser has to check all column +references... + +> the optimizer. You can do it in the optimizer, and join the range table +> references there too. + +Yes. + +> > typedef struct A_Expr +> > { +> > NodeTag type; +> > int oper; /* type of operation +> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > IN, NOT IN, ANY, ALL, EXISTS here, +> > +> > char *opname; /* name of operator/function */ +> > Node *lexpr; /* left argument */ +> > Node *rexpr; /* right argument */ +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > and SubSelect (Query) here (as possible case). +> > +> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using +> > Query - how else can we implement VIEWs on selects with subqueries ? +> +> Views are stored as nodeout structures, and are merged into the query's +> from list, target list, and where clause. I am working out +> readfunc,outfunc now to make sure they are up-to-date with all the +> current fields. + +Nice! This stuff was out-of-date for too long time. + +> > BTW, is +> > +> > select * from A where (select TRUE from B); +> > +> > valid syntax ? +> +> I don't think so. + +And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, +ANY, ALL, EXISTS - well. + +(Time to sleep -:) + +Vadim + + +From owner-pgsql-hackers@hub.org Thu Jan 8 23:10:50 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707 + for ; Thu, 8 Jan 1998 23:10:48 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for ; Thu, 8 Jan 1998 23:08:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for ; Thu, 8 Jan 1998 23:00:50 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243; + Thu, 8 Jan 1998 22:55:03 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801090355.WAA09243@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST) +Cc: hackers@postgreSQL.org (PostgreSQL-development) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Vadim, I know you are still thinking about subselects, but I have some +more clarification that may help. + +We have to add phantom range table entries to correlated subselects so +they will pass the parser. We might as well add those fields to the +target list of the subquery at the same time: + + select * + from taba + where col1 = (select col2 + from tabb + where taba.col3 = tabb.col4) + +becomes: + + select * + from taba + where col1 = (select col2, tabb.col4 <--- + from tabb, taba <--- + where taba.col3 = tabb.col4) + +We add a field to TargetEntry and RangeTblEntry to mark the fact that it +was entered as a correlation entry: + + bool isCorrelated; + +Second, we need to hook the subselect to the main query. I recommend we +add two fields to Query for this: + + Query *parentQuery; + List *subqueries; + +The parentQuery pointer is used to resolve field names in the correlated +subquery. + + select * + from taba + where col1 = (select col2, tabb.col4 <--- + from tabb, taba <--- + where taba.col3 = tabb.col4) + +In the query above, the subquery can be easily parsed, and we add the +subquery to the parsent's parentQuery list. + +In the parent query, to parse the WHERE clause, we create a new operator +type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +right side is an index to a slot in the subqueries List. + +We can then do the rest in the upper optimizer. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Fri Jan 9 10:01:01 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305 + for ; Fri, 9 Jan 1998 10:00:59 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for ; Fri, 9 Jan 1998 09:52:17 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623; + Fri, 9 Jan 1998 22:10:25 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> +Date: Fri, 09 Jan 1998 22:10:06 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: subselects +References: <199801090355.WAA09243@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Vadim, I know you are still thinking about subselects, but I have some +> more clarification that may help. +> +> We have to add phantom range table entries to correlated subselects so +> they will pass the parser. We might as well add those fields to the +> target list of the subquery at the same time: +> +> select * +> from taba +> where col1 = (select col2 +> from tabb +> where taba.col3 = tabb.col4) +> +> becomes: +> +> select * +> from taba +> where col1 = (select col2, tabb.col4 <--- +> from tabb, taba <--- +> where taba.col3 = tabb.col4) +> +> We add a field to TargetEntry and RangeTblEntry to mark the fact that it +> was entered as a correlation entry: +> +> bool isCorrelated; + +No, I don't like to add anything in parser. Example: + + select * + from tabA + where col1 = (select col2 + from tabB + where tabA.col3 = tabB.col4 + and exists (select * + from tabC + where tabB.colX = tabC.colX and + tabC.colY = tabA.col2) + ) + +: a column of tabA is referenced in sub-subselect +(is it allowable by standards ?) - in this case it's better +to don't add tabA to 1st subselect but add tabA to second one +and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +this gives us 2-tables join in 1st subquery instead of 3-tables join. +(And I'm still not sure that using temp tables is best of what can be +done in all cases...) + +Instead of using isCorrelated in TE & RTE we can add + +Index varlevel; + +to Var node to reflect (sub)query from where this Var is come +(where is range table to find var's relation using varno). Upmost query +will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. + ^^^ ^^^^^^^^^^^^ +(I don't see problems with distinguishing Vars of different children +on the same level...) + +> +> Second, we need to hook the subselect to the main query. I recommend we +> add two fields to Query for this: +> +> Query *parentQuery; +> List *subqueries; + +Agreed. And maybe Index queryLevel. + +> In the parent query, to parse the WHERE clause, we create a new operator +> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the + ^^^^^^^^^^^^^^^^^^ +No. We have to handle (a,b,c) OP (select x, y, z ...) and +'_a_constant_' OP (select ...) - I don't know is last in standards, +Sybase has this. + +Well, + +typedef enum OpType +{ + OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR + ++ OP_EXISTS, OP_ALL, OP_ANY + +} OpType; + +typedef struct Expr +{ + NodeTag type; + Oid typeOid; /* oid of the type of this expr */ + OpType opType; /* type of the op */ + Node *oper; /* could be Oper or Func */ + List *args; /* list of argument nodes */ +} Expr; + +OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries + List, following your suggestion) + +OP_ALL, OP_ANY: + +oper is List of Oper nodes. We need in list because of data types of +a, b, c (above) can be different and so Oper nodes will be different too. + +lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - +left side of subquery' operator. +lsecond(args) is SubSelect. + +Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +by parser into corresponding ANY and ALL. At the moment we can do: + +IN --> = ANY, NOT IN --> <> ALL + +but this will be "known bug": this breaks OO-nature of Postgres, because of +operators can be overrided and '=' can mean s o m e t h i n g (not equality). +Example: box data type. For boxes, = means equality of _areas_ and =~ +means that boxes are the same ==> =~ ANY should be used for IN. + +> right side is an index to a slot in the subqueries List. + +Vadim + +From owner-pgsql-hackers@hub.org Fri Jan 9 17:44:04 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779 + for ; Fri, 9 Jan 1998 17:44:01 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for ; Fri, 9 Jan 1998 17:31:24 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282; + Fri, 9 Jan 1998 17:31:41 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801092231.RAA24282@candle.pha.pa.us> +Subject: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > Vadim, I know you are still thinking about subselects, but I have some +> > more clarification that may help. +> > +> > We have to add phantom range table entries to correlated subselects so +> > they will pass the parser. We might as well add those fields to the +> > target list of the subquery at the same time: +> > +> > select * +> > from taba +> > where col1 = (select col2 +> > from tabb +> > where taba.col3 = tabb.col4) +> > +> > becomes: +> > +> > select * +> > from taba +> > where col1 = (select col2, tabb.col4 <--- +> > from tabb, taba <--- +> > where taba.col3 = tabb.col4) +> > +> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it +> > was entered as a correlation entry: +> > +> > bool isCorrelated; +> +> No, I don't like to add anything in parser. Example: +> +> select * +> from tabA +> where col1 = (select col2 +> from tabB +> where tabA.col3 = tabB.col4 +> and exists (select * +> from tabC +> where tabB.colX = tabC.colX and +> tabC.colY = tabA.col2) +> ) +> +> : a column of tabA is referenced in sub-subselect + +This is a strange case that I don't think we need to handle in our first +implementation. + +> (is it allowable by standards ?) - in this case it's better +> to don't add tabA to 1st subselect but add tabA to second one +> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +> this gives us 2-tables join in 1st subquery instead of 3-tables join. +> (And I'm still not sure that using temp tables is best of what can be +> done in all cases...) + +I don't see any use for temp tables in subselects anymore. After having +implemented UNIONS, I now see how much can be done in the upper +optimizer. I see you just putting the subquery PLAN into the proper +place in the plan tree, with some proper JOIN nodes for IN, NOT IN. + +> +> Instead of using isCorrelated in TE & RTE we can add +> +> Index varlevel; + +OK. Sounds good. + +> +> to Var node to reflect (sub)query from where this Var is come +> (where is range table to find var's relation using varno). Upmost query +> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. +> ^^^ ^^^^^^^^^^^^ +> (I don't see problems with distinguishing Vars of different children +> on the same level...) +> +> > +> > Second, we need to hook the subselect to the main query. I recommend we +> > add two fields to Query for this: +> > +> > Query *parentQuery; +> > List *subqueries; +> +> Agreed. And maybe Index queryLevel. + +Sure. If it helps. + +> +> > In the parent query, to parse the WHERE clause, we create a new operator +> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +> ^^^^^^^^^^^^^^^^^^ +> No. We have to handle (a,b,c) OP (select x, y, z ...) and +> '_a_constant_' OP (select ...) - I don't know is last in standards, +> Sybase has this. + +I have never seen this in my eight years of SQL. Perhaps we can leave +this for later, maybe much later. + +> +> Well, +> +> typedef enum OpType +> { +> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR +> +> + OP_EXISTS, OP_ALL, OP_ANY +> +> } OpType; +> +> typedef struct Expr +> { +> NodeTag type; +> Oid typeOid; /* oid of the type of this expr */ +> OpType opType; /* type of the op */ +> Node *oper; /* could be Oper or Func */ +> List *args; /* list of argument nodes */ +> } Expr; +> +> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries +> List, following your suggestion) +> +> OP_ALL, OP_ANY: +> +> oper is List of Oper nodes. We need in list because of data types of +> a, b, c (above) can be different and so Oper nodes will be different too. +> +> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - +> left side of subquery' operator. +> lsecond(args) is SubSelect. +> +> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +> by parser into corresponding ANY and ALL. At the moment we can do: +> +> IN --> = ANY, NOT IN --> <> ALL +> +> but this will be "known bug": this breaks OO-nature of Postgres, because of +> operators can be overrided and '=' can mean s o m e t h i n g (not equality). +> Example: box data type. For boxes, = means equality of _areas_ and =~ +> means that boxes are the same ==> =~ ANY should be used for IN. + +That is interesting, to use =~ for ANY. + +Yes, but how many operators take a SUBQUERY as an operand. This is a +special case to me. + +I think I see where you are trying to go. You want subselects to behave +like any other operator, with a subselect type, and you do all the +subselect handling in the optimizer, with special Nodes and actions. + +I think this may be just too much of a leap. We have such clean query +logic for single queries, I can't imagine having an operator that has a +Query operand, and trying to get everything to properly handle it. +UNIONS were very easy to implement as a List off of Query, with some +foreach()'s in rewrite and the high optimizer. + +Subselects are SQL standard, and are never going to be over-ridden by a +user. Same with UNION. They want UNION, they get UNION. They want +Subselect, we are going to spin through the Query structure and give +them what they want. + +The complexities of subselects and correlated queries and range tables +and stuff is so bizarre that trying to get it to work inside the type +system could be a huge project. + +> +> > right side is an index to a slot in the subqueries List. + +I guess the question is what can we have by February 1? + +I have been reading some postings, and it seems to me that subselects +are the litmus test for many evaluators when deciding if a database +engine is full-featured. + +Sorry to be so straightforward, but I want to keep hashing this around +until we get a conclusion, so coding can start. + +My suggestions have been, I believe, trying to get subselects working +with the fullest functionality by adding the least amount of code, and +keeping the logic clean. + +Have you checked out the UNION code? It is very small, but it works. I +think it could make a good sample for subselects. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742 + for ; Sat, 10 Jan 1998 12:00:43 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684; + Sun, 11 Jan 1998 00:19:10 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Date: Sun, 11 Jan 1998 00:19:08 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgresql.org, "Thomas G. Lockhart" +Subject: Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > No, I don't like to add anything in parser. Example: +> > +> > select * +> > from tabA +> > where col1 = (select col2 +> > from tabB +> > where tabA.col3 = tabB.col4 +> > and exists (select * +> > from tabC +> > where tabB.colX = tabC.colX and +> > tabC.colY = tabA.col2) +> > ) +> > +> > : a column of tabA is referenced in sub-subselect +> +> This is a strange case that I don't think we need to handle in our first +> implementation. + +I don't know is this strange case or not :) +But I would like to know is this allowed by standards - can someone +comment on this ? +And I don't see problems with handling this... + +> +> > (is it allowable by standards ?) - in this case it's better +> > to don't add tabA to 1st subselect but add tabA to second one +> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +> > this gives us 2-tables join in 1st subquery instead of 3-tables join. +> > (And I'm still not sure that using temp tables is best of what can be +> > done in all cases...) +> +> I don't see any use for temp tables in subselects anymore. After having +> implemented UNIONS, I now see how much can be done in the upper +> optimizer. I see you just putting the subquery PLAN into the proper +> place in the plan tree, with some proper JOIN nodes for IN, NOT IN. + +When saying about temp tables, I meant tables created by node Material +for subquery plan. This is one of two ways - run subquery once for all +possible upper plan tuples and then just join result table with upper +query. Another way is re-run subquery for each upper query tuple, +without temp table but may be with caching results by some ways. +Actually, there is special case - when subquery can be alternatively +formulated as joins, - but this is just special case. + +> > > In the parent query, to parse the WHERE clause, we create a new operator +> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +> > ^^^^^^^^^^^^^^^^^^ +> > No. We have to handle (a,b,c) OP (select x, y, z ...) and +> > '_a_constant_' OP (select ...) - I don't know is last in standards, +> > Sybase has this. +> +> I have never seen this in my eight years of SQL. Perhaps we can leave +> this for later, maybe much later. + +Are you saying about (a, b, c) or about 'a_constant' ? +Again, can someone comment on are they in standards or not ? +Tom ? +If yes then please add parser' support for them now... + +> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +> > by parser into corresponding ANY and ALL. At the moment we can do: +> > +> > IN --> = ANY, NOT IN --> <> ALL +> > +> > but this will be "known bug": this breaks OO-nature of Postgres, because of +> > operators can be overrided and '=' can mean s o m e t h i n g (not equality). +> > Example: box data type. For boxes, = means equality of _areas_ and =~ +> > means that boxes are the same ==> =~ ANY should be used for IN. +> +> That is interesting, to use =~ for ANY. +> +> Yes, but how many operators take a SUBQUERY as an operand. This is a +> special case to me. +> +> I think I see where you are trying to go. You want subselects to behave +> like any other operator, with a subselect type, and you do all the +> subselect handling in the optimizer, with special Nodes and actions. +> +> I think this may be just too much of a leap. We have such clean query +> logic for single queries, I can't imagine having an operator that has a +> Query operand, and trying to get everything to properly handle it. +> UNIONS were very easy to implement as a List off of Query, with some +> foreach()'s in rewrite and the high optimizer. +> +> Subselects are SQL standard, and are never going to be over-ridden by a +> user. Same with UNION. They want UNION, they get UNION. They want +> Subselect, we are going to spin through the Query structure and give +> them what they want. +> +> The complexities of subselects and correlated queries and range tables +> and stuff is so bizarre that trying to get it to work inside the type +> system could be a huge project. + +PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), +derived from the Berkeley Postgres database management system. While +PostgreSQL retains the powerful object-relational data model, rich data types and + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +easy extensibility of Postgres, it replaces the PostQuel query language with an +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +extended subset of SQL. +^^^^^^^^^^^^^^^^^^^^^^ + +Should we say users that subselect will work for standard data types only ? +I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? +Is there difference between handling = ANY and ~ ANY ? I don't see any. +Currently we can't get IN working properly for boxes (and may be for others too) +and I don't like to try to resolve these problems now, but hope that someday +we'll be able to do this. At the moment - just convert IN into = ANY and +NOT IN into <> ALL in parser. + +(BTW, do you know how DISTINCT is implemented ? It doesn't use = but +use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) + +> > +> > > right side is an index to a slot in the subqueries List. +> +> I guess the question is what can we have by February 1? +> +> I have been reading some postings, and it seems to me that subselects +> are the litmus test for many evaluators when deciding if a database +> engine is full-featured. +> +> Sorry to be so straightforward, but I want to keep hashing this around +> until we get a conclusion, so coding can start. +> +> My suggestions have been, I believe, trying to get subselects working +> with the fullest functionality by adding the least amount of code, and +> keeping the logic clean. +> +> Have you checked out the UNION code? It is very small, but it works. I +> think it could make a good sample for subselects. + +There is big difference between subqueries and queries in UNION - +there are not dependences between UNION queries. + +Ok, opened issues: + +1. Is using upper query' vars in all subquery levels in standard ? +2. Is (a, b, c) OP (subselect) in standard ? +3. What types of expressions (Var, Const, ...) are allowed on the left + side of operator with subquery on the right ? +4. What types of operators should we support (=, >, ..., like, ~, ...) ? + (My vote for all boolean operators). + +And - did we get consensus on presentation subqueries stuff in Query, +Expr and Var ? +I would like to have something done in parser near Jan 17 to get +subqueries working by Feb 1. I vote for support of all standard +things (1. - 3.) in parser right now - if there will be no time +to implement something like (a, b, c) then optimizer will call +elog(WARN) (oh, sorry, - elog(ERROR)). + +Vadim + +From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045 + for ; Sat, 10 Jan 1998 12:31:01 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for ; Sat, 10 Jan 1998 12:22:30 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725; + Sun, 11 Jan 1998 00:41:22 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su> +Date: Sun, 11 Jan 1998 00:41:19 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] subselects +References: <199712220545.AAA11605@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> OK, a few questions: +> +> Should we use sortmerge, so we can use our psort as temp tables, +> or do we use hashunique? +> +> How do we pass the query to the optimizer? How do we represent +> the range table for each, and the links between them in correlated +> subqueries? + +My suggestion is just use varlevel in Var and don't put upper query' +relations into subquery range table. + +Vadim + +From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357 + for ; Sat, 10 Jan 1998 13:00:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for ; Sat, 10 Jan 1998 12:40:02 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741; + Sun, 11 Jan 1998 00:58:56 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su> +Date: Sun, 11 Jan 1998 00:58:52 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] subselects +References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Vadim B. Mikheev wrote: +> +> Bruce Momjian wrote: +> > +> > OK, a few questions: +> > +> > Should we use sortmerge, so we can use our psort as temp tables, +> > or do we use hashunique? +> > +> > How do we pass the query to the optimizer? How do we represent +> > the range table for each, and the links between them in correlated +> > subqueries? +> +> My suggestion is just use varlevel in Var and don't put upper query' +> relations into subquery range table. + +Hmm... Sorry, it seems that I did reply to very old message - forget it. + +Vadim + +From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664 + for ; Sat, 10 Jan 1998 13:30:56 -0500 (EST) +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for ; Sat, 10 Jan 1998 13:05:09 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623; + Sat, 10 Jan 1998 18:01:03 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu> +Date: Sat, 10 Jan 1998 18:01:03 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , hackers@postgresql.org +Subject: Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in +> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred +> > > by parser into corresponding ANY and ALL. At the moment we can do: +> > > +> > > IN --> = ANY, NOT IN --> <> ALL +> > > +> > > but this will be "known bug": this breaks OO-nature of Postgres, because of +> > > operators can be overrided and '=' can mean s o m e t h i n g (not equality). +> > > Example: box data type. For boxes, = means equality of _areas_ and =~ +> > > means that boxes are the same ==> =~ ANY should be used for IN. +> > +> > That is interesting, to use =~ for ANY. + +If I understand the discussion, I would think is is fine to make an assumption about +which operator is used to implement a subselect expression. If someone remaps an +operator to mean something different, then they will get a different result (or a +nonsensical one) from a subselect. + +I'd be happy to remap existing operators to fit into a convention which would work +with subselects (especially if I got to help choose :). + +> > Subselects are SQL standard, and are never going to be over-ridden by a +> > user. Same with UNION. They want UNION, they get UNION. They want +> > Subselect, we are going to spin through the Query structure and give +> > them what they want. +> +> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), +> derived from the Berkeley Postgres database management system. While +> PostgreSQL retains the powerful object-relational data model, rich data types and +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> easy extensibility of Postgres, it replaces the PostQuel query language with an +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> extended subset of SQL. +> ^^^^^^^^^^^^^^^^^^^^^^ +> +> Should we say users that subselect will work for standard data types only ? +> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? +> Is there difference between handling = ANY and ~ ANY ? I don't see any. +> Currently we can't get IN working properly for boxes (and may be for others too) +> and I don't like to try to resolve these problems now, but hope that someday +> we'll be able to do this. At the moment - just convert IN into = ANY and +> NOT IN into <> ALL in parser. +> +> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but +> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) + +?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted +list? That would give more consistant behavior... + +> > I have been reading some postings, and it seems to me that subselects +> > are the litmus test for many evaluators when deciding if a database +> > engine is full-featured. +> > +> > Sorry to be so straightforward, but I want to keep hashing this around +> > until we get a conclusion, so coding can start. +> > +> > My suggestions have been, I believe, trying to get subselects working +> > with the fullest functionality by adding the least amount of code, and +> > keeping the logic clean. +> > +> > Have you checked out the UNION code? It is very small, but it works. I +> > think it could make a good sample for subselects. +> +> There is big difference between subqueries and queries in UNION - +> there are not dependences between UNION queries. +> +> Ok, opened issues: +> +> 1. Is using upper query' vars in all subquery levels in standard ? + +I'm not certain. Let me know if you do not get an answer from someone else and I will +research it. + +> 2. Is (a, b, c) OP (subselect) in standard ? + +Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where +the parens are allowed to be omitted from a one element list. + +> 3. What types of expressions (Var, Const, ...) are allowed on the left +> side of operator with subquery on the right ? + +I think most expressions are allowed. The "constant OP (subselect)" case you were +asking about is just a simplified case since "(a, b, constant) OP (subselect)" where +a and b are column references should be allowed. Of course, our optimizer could +perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first +example "EXISTS (subselect where x = constant)". + +> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? +> (My vote for all boolean operators). + +Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is +important to get an initial implementation for v6.3 which covers a little, some, or +all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then +we will have the benefit of feedback from others in practical applications which +always uncovers new things to consider. + +> And - did we get consensus on presentation subqueries stuff in Query, +> Expr and Var ? +> I would like to have something done in parser near Jan 17 to get +> subqueries working by Feb 1. I vote for support of all standard +> things (1. - 3.) in parser right now - if there will be no time +> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh, +> sorry, - elog(ERROR)). + +Great. I'd like to help with the remaining parser issues; at the moment "row_expr" +does the right thing with expression comparisions but just parses then ignores +subselect expressions. Let me know what structures you want passed back and I'll put +them in, or if you prefer put in the first one and I'll go through and clean up and +add the rest. + + - Tom + + +From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728 + for ; Sat, 10 Jan 1998 15:00:56 -0500 (EST) +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for ; Sat, 10 Jan 1998 14:35:19 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002; + Sat, 10 Jan 1998 19:31:30 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu> +Date: Sat, 10 Jan 1998 19:31:29 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , hackers@postgresql.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> Are you saying about (a, b, c) or about 'a_constant' ? +> Again, can someone comment on are they in standards or not ? +> Tom ? +> If yes then please add parser' support for them now... + +As I mentioned a few minutes ago in my last message, I parse the row descriptors and +the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently +ignore the result. I didn't want to pass things back as lists until something in the +backend was ready to receive them. + +If it is OK, I'll go ahead and start passing back a list of expressions when a row +descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node +being a list rather than an atomic node. + +Also, I can start passing back the subselect expression as the rexpr; right now the +parser calls elog() and quits. + +btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called +makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. +If lists are handled farther back, this routine should move to there also and the +parser will just pass the lists. Note that some assumptions have to be made about the +meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of +"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK +to disallow those cases or to look for specific appearance of the operator to guess +the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if +it has "<>" or "!" then build as "or"s. + +Let me know what you want... + + - Tom + + +From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998 +Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953 + for ; Sun, 11 Jan 1998 01:01:51 -0500 (EST) +Received: from alumni.caltech.edu (localhost [127.0.0.1]) + by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797; + Sun, 11 Jan 1998 05:58:01 GMT +Sender: tgl@gnet04.jpl.nasa.gov +Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu> +Date: Sun, 11 Jan 1998 05:58:01 +0000 +From: "Thomas G. Lockhart" +Organization: Caltech/JPL +X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) +MIME-Version: 1.0 +To: "Vadim B. Mikheev" +CC: Bruce Momjian , hackers@postgresql.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> +Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702" +Status: OR + +This is a multi-part message in MIME format. +--------------D8B38A0D1F78A10C0023F702 +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit + +Here are context diffs of gram.y and keywords.c; sorry about sending the full files. +These start sending lists of arguments toward the backend from the parser to +implement row descriptors and subselects. + +They should apply OK even over Bruce's recent changes... + + - Tom + +--------------D8B38A0D1F78A10C0023F702 +Content-Type: text/plain; charset=us-ascii; name="gram.y.patch" +Content-Transfer-Encoding: 7bit +Content-Disposition: inline; filename="gram.y.patch" + +*** ../src/backend/parser/gram.y.orig Sat Jan 10 05:44:36 1998 +--- ../src/backend/parser/gram.y Sat Jan 10 19:29:37 1998 +*************** +*** 195,200 **** +--- 195,201 ---- + having_clause + %type row_descriptor, row_list + %type row_expr ++ %type RowOp, row_opt + %type OptCreateAs, CreateAsList + %type CreateAsElement + %type NumConst +*************** +*** 242,248 **** + */ + + /* Keywords (in SQL92 reserved words) */ +! %token ACTION, ADD, ALL, ALTER, AND, AS, ASC, + BEGIN_TRANS, BETWEEN, BOTH, BY, + CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, + CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, +--- 243,249 ---- + */ + + /* Keywords (in SQL92 reserved words) */ +! %token ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC, + BEGIN_TRANS, BETWEEN, BOTH, BY, + CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, + CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, +*************** +*** 258,264 **** + ON, OPTION, OR, ORDER, OUTER_P, + PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, + REFERENCES, REVOKE, RIGHT, ROLLBACK, +! SECOND_P, SELECT, SET, SUBSTRING, + TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, + UNION, UNIQUE, UPDATE, USING, + VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, +--- 259,265 ---- + ON, OPTION, OR, ORDER, OUTER_P, + PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, + REFERENCES, REVOKE, RIGHT, ROLLBACK, +! SECOND_P, SELECT, SET, SOME, SUBSTRING, + TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, + UNION, UNIQUE, UPDATE, USING, + VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, +*************** +*** 2853,2866 **** + /* Expressions using row descriptors + * Define row_descriptor to allow yacc to break the reduce/reduce conflict + * with singleton expressions. + */ + row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' + { +! $$ = NULL; + } + | '(' row_descriptor ')' NOT IN '(' SubSelect ')' + { +! $$ = NULL; + } + | '(' row_descriptor ')' '=' '(' row_descriptor ')' + { +--- 2854,2878 ---- + /* Expressions using row descriptors + * Define row_descriptor to allow yacc to break the reduce/reduce conflict + * with singleton expressions. ++ * ++ * Note that "SOME" is the same as "ANY" in syntax. ++ * - thomas 1998-01-10 + */ + row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' + { +! $$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6); + } + | '(' row_descriptor ')' NOT IN '(' SubSelect ')' + { +! $$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7); +! } +! | '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')' +! { +! char *opr; +! opr = palloc(strlen($4)+strlen($5)+1); +! strcpy(opr, $4); +! strcat(opr, $5); +! $$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7); + } + | '(' row_descriptor ')' '=' '(' row_descriptor ')' + { +*************** +*** 2880,2885 **** +--- 2892,2907 ---- + } + ; + ++ RowOp: '=' { $$ = "="; } ++ | '<' { $$ = "<"; } ++ | '>' { $$ = ">"; } ++ ; ++ ++ row_opt: ALL { $$ = "all"; } ++ | ANY { $$ = "any"; } ++ | SOME { $$ = "any"; } ++ ; ++ + row_descriptor: row_list ',' a_expr + { + $$ = lappend($1, $3); +*************** +*** 3432,3441 **** + ; + + in_expr: SubSelect +! { +! elog(ERROR,"IN (SUBSELECT) not yet implemented"); +! $$ = $1; +! } + | in_expr_nodes + { $$ = $1; } + ; +--- 3454,3460 ---- + ; + + in_expr: SubSelect +! { $$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); } + | in_expr_nodes + { $$ = $1; } + ; +*************** +*** 3449,3458 **** + ; + + not_in_expr: SubSelect +! { +! elog(ERROR,"NOT IN (SUBSELECT) not yet implemented"); +! $$ = $1; +! } + | not_in_expr_nodes + { $$ = $1; } + ; +--- 3468,3474 ---- + ; + + not_in_expr: SubSelect +! { $$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); } + | not_in_expr_nodes + { $$ = $1; } + ; + +--------------D8B38A0D1F78A10C0023F702 +Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch" +Content-Transfer-Encoding: 7bit +Content-Disposition: inline; filename="keywords.c.patch" + +*** ../src/backend/parser/keywords.c.orig Mon Jan 5 07:51:33 1998 +--- ../src/backend/parser/keywords.c Sat Jan 10 19:22:07 1998 +*************** +*** 39,44 **** +--- 39,45 ---- + {"alter", ALTER}, + {"analyze", ANALYZE}, + {"and", AND}, ++ {"any", ANY}, + {"append", APPEND}, + {"archive", ARCHIVE}, + {"as", AS}, +*************** +*** 178,183 **** +--- 179,185 ---- + {"set", SET}, + {"setof", SETOF}, + {"show", SHOW}, ++ {"some", SOME}, + {"stdin", STDIN}, + {"stdout", STDOUT}, + {"substring", SUBSTRING}, + +--------------D8B38A0D1F78A10C0023F702-- + + +From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255 + for ; Sun, 11 Jan 1998 01:31:10 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for ; Sun, 11 Jan 1998 01:10:48 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for ; Sun, 11 Jan 1998 01:01:05 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801; + Sun, 11 Jan 1998 00:59:23 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801110559.AAA11801@candle.pha.pa.us> +Subject: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST) +Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu +In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> I would like to have something done in parser near Jan 17 to get +> subqueries working by Feb 1. I vote for support of all standard +> things (1. - 3.) in parser right now - if there will be no time +> to implement something like (a, b, c) then optimizer will call +> elog(WARN) (oh, sorry, - elog(ERROR)). + +First, let me say I am glad we are still on schedule for Feb 1. I was +panicking because I thought we wouldn't make it in time. + + +> > > (is it allowable by standards ?) - in this case it's better +> > > to don't add tabA to 1st subselect but add tabA to second one +> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - +> > > this gives us 2-tables join in 1st subquery instead of 3-tables join. +> > > (And I'm still not sure that using temp tables is best of what can be +> > > done in all cases...) +> > +> > I don't see any use for temp tables in subselects anymore. After having +> > implemented UNIONS, I now see how much can be done in the upper +> > optimizer. I see you just putting the subquery PLAN into the proper +> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN. +> +> When saying about temp tables, I meant tables created by node Material +> for subquery plan. This is one of two ways - run subquery once for all +> possible upper plan tuples and then just join result table with upper +> query. Another way is re-run subquery for each upper query tuple, +> without temp table but may be with caching results by some ways. +> Actually, there is special case - when subquery can be alternatively +> formulated as joins, - but this is just special case. + +This is interesting. It really only applies for correlated subqueries, +and certainly it may help sometimes to just evaluate the subquery for +valid values that are going to come from the upper query than for all +possible values. Perhaps we can use the 'cost' value of each query to +decide how to handle this. + +> +> > > > In the parent query, to parse the WHERE clause, we create a new operator +> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the +> > > ^^^^^^^^^^^^^^^^^^ +> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and +> > > '_a_constant_' OP (select ...) - I don't know is last in standards, +> > > Sybase has this. +> > +> > I have never seen this in my eight years of SQL. Perhaps we can leave +> > this for later, maybe much later. +> +> Are you saying about (a, b, c) or about 'a_constant' ? +> Again, can someone comment on are they in standards or not ? +> Tom ? +> If yes then please add parser' support for them now... + +OK, Thomas says it is, so we will put in as much code as we can to handle +it. + +> Should we say users that subselect will work for standard data types only ? +> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? +> Is there difference between handling = ANY and ~ ANY ? I don't see any. +> Currently we can't get IN working properly for boxes (and may be for others too) +> and I don't like to try to resolve these problems now, but hope that someday +> we'll be able to do this. At the moment - just convert IN into = ANY and +> NOT IN into <> ALL in parser. + +OK. + +> +> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but +> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) + +I did not know that either. + +> There is big difference between subqueries and queries in UNION - +> there are not dependences between UNION queries. + +Yes, I know UNIONS are trivial compared to subselects. + +> +> Ok, opened issues: +> +> 1. Is using upper query' vars in all subquery levels in standard ? +> 2. Is (a, b, c) OP (subselect) in standard ? +> 3. What types of expressions (Var, Const, ...) are allowed on the left +> side of operator with subquery on the right ? +> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? +> (My vote for all boolean operators). +> +> And - did we get consensus on presentation subqueries stuff in Query, +> Expr and Var ? + +OK, here are my concrete ideas on changes and structures. + +I think we all agreed that Query needs new fields: + + Query *parentQuery; + List *subqueries; + +Maybe query level too, but I don't think so (see later ideas on Var). + +We need a new Node structure, call it Sublink: + + int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...) + Oid operator /* subquery must return single row */ + List *lefthand; /* parent stuff */ + Node *subquery; /* represents nodes from parser */ + Index Subindex; /* filled in to index Query->subqueries */ + +Of course, the names are just suggestions. Every time we run through +the parsenodes of a query to create a Query* structure, when we do the +WHERE clause, if we come upon one of these Sublink nodes (created in the +parser), we move the supplied Query* in Sublink->subquery to a local +List variable, and we set Subquery->subindex to equal the index of the +new query, i.e. is it the first subquery we found, 1, or the second, 2, +etc. + +After we have created the parent Query structure, we run through our +local List variable of subquery parsenodes we created above, and add +Query* entries to Query->subqueries. In each subquery Query*, we set +the parentQuery pointer. + +Also, when parsing the subqueries, we need to keep track of correlated +references. I recommend we add a field to the Var structure: + + Index sublevel; /* range table reference: + = 0 current level of query + < 0 parent above this many levels + > 0 index into subquery list + */ + +This way, a Var node with sublevel 0 is the current level, and is true +in most cases. This helps us not have to change much code. sublevel = +-1 means it references the range table in the parent query. sublevel = +-2 means the parent's parent. sublevel = 2 means it references the range +table of the second entry in Query->subqueries. Varno and varattno are +still meaningful. Of course, we can't reference variables in the +subqueries from the parent in the parser code, but Vadim may want to. + +When doing a Var lookup in the parser, we look in the current level +first, but if not found, if it is a subquery, we can look at the parent +and parent's parent to set the sublevel, varno, and varatno properly. + +We create no phantom range table entries in the subquery, and no phantom +target list entries. We can leave that all for the upper optimizer. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Fri Nov 28 16:34:03 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA17454 + for ; Fri, 28 Nov 1997 16:33:59 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA10553; Fri, 28 Nov 1997 16:20:03 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 28 Nov 1997 16:17:50 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA10116 for pgsql-hackers-outgoing; Fri, 28 Nov 1997 16:17:45 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA09997 for ; Fri, 28 Nov 1997 16:17:26 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id QAA17309 + for hackers@postgreSQL.org; Fri, 28 Nov 1997 16:18:08 -0500 (EST) +From: Bruce Momjian +Message-Id: <199711282118.QAA17309@candle.pha.pa.us> +Subject: [HACKERS] querytrees and multiple statements +To: hackers@postgreSQL.org (PostgreSQL-development) +Date: Fri, 28 Nov 1997 16:18:08 -0500 (EST) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Currently, if a query string arrives that has multiple sql statements in +it, the parser breaks it down into separate queries, analyzes each one, +then executes them in order. (psql automatically breaks things down +into separate queries, do this will not work there.) The problem is +that if the first query creates a table, and the second query goes to +access it, the parser analysis fails because the table is not yet +created. See the attached pginterface source for an example. The real +problem is that all the queries in the string are analyzed first, then +executed, rather than having one analyzed then execute, then the next. + +I am going to have touble with subselects and temp tables. I want to +pull out the subselect, change it into a SELECT ... INTO TEMP, add it to +the QueryTree before the outer select, then the outer select is analyzed +by the parser, the temp table doesn't exist yet, and will cause an +error. + +Currently postgres.c does each step on all queries before moving to the +next step. Does anyone know what the ramifications would be if I +changed this to do to the full set of operations on each statement first +before moving to the next? + +--------------------------------------------------------------------------- + + +/* + * pgnulltest.c + * +*/ + +#include +#include +#include +#include +#include +#include +#include + +int main(int argc, char **argv) +{ + char query[4000]; + int i; + + if (argc != 2) + halt("Usage: %s database\n",argv[0]); + + connectdb(argv[1],NULL,NULL,NULL,NULL); + + sprintf(query,"create table test(x int); select x from test;"); + doquery(query); + + disconnectdb(); + return 0; +} + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Sat Nov 29 05:01:01 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA27942 + for ; Sat, 29 Nov 1997 05:00:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA13666 for ; Sat, 29 Nov 1997 04:35:08 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA17107; Sat, 29 Nov 1997 16:38:58 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <347FE2B1.167EB0E7@sable.krasnoyarsk.su> +Date: Sat, 29 Nov 1997 16:38:57 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: [HACKERS] querytrees and multiple statements +References: <199711282118.QAA17309@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> Currently, if a query string arrives that has multiple sql statements in +> it, the parser breaks it down into separate queries, analyzes each one, +> then executes them in order. (psql automatically breaks things down +> into separate queries, do this will not work there.) The problem is +> that if the first query creates a table, and the second query goes to +> access it, the parser analysis fails because the table is not yet +> created. See the attached pginterface source for an example. The real +> problem is that all the queries in the string are analyzed first, then +> executed, rather than having one analyzed then execute, then the next. +> +> I am going to have touble with subselects and temp tables. I want to +> pull out the subselect, change it into a SELECT ... INTO TEMP, add it to +> the QueryTree before the outer select, then the outer select is analyzed +> by the parser, the temp table doesn't exist yet, and will cause an +> error. +> +> Currently postgres.c does each step on all queries before moving to the +> next step. Does anyone know what the ramifications would be if I +> changed this to do to the full set of operations on each statement first +> before moving to the next? + +This will break ability to prepare plan (parser + optimizer) for latter +execution. This ability is used by RULEs (and so - by VIEWs) and will be +used by PL(s)... + +Please, take a look at nodeMaterial.c: + +/*------------------------------------------------------------------------- + * + * nodeMaterial.c-- + * Routines to handle materialization nodes. +... +/* + * INTERFACE ROUTINES + * ExecMaterial - generate a temporary relation + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +(I'm still very busy. Hope to return soon.) + +Vadim + +From vadim@sable.krasnoyarsk.su Sun Nov 30 02:30:56 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA15439 + for ; Sun, 30 Nov 1997 02:30:55 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id CAA17743 for ; Sun, 30 Nov 1997 02:27:40 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id OAA18937; Sun, 30 Nov 1997 14:32:14 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <3481167E.2781E494@sable.krasnoyarsk.su> +Date: Sun, 30 Nov 1997 14:32:14 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] querytrees and multiple statements +References: <199711291854.NAA05185@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > This will break ability to prepare plan (parser + optimizer) for latter +> > execution. This ability is used by RULEs (and so - by VIEWs) and will be +> > used by PL(s)... +> > +> > Please, take a look at nodeMaterial.c: +> > +> > /*------------------------------------------------------------------------- +> > * +> > * nodeMaterial.c-- +> > * Routines to handle materialization nodes. +> > ... +> > /* +> > * INTERFACE ROUTINES +> > * ExecMaterial - generate a temporary relation +> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> +> I understand what you are saying here. The temp table has transaction +> scope, and breaking each query into multiple commands, each with its own +> transaction scope will cause the temp table to go away. + +No. I just said that there will be no ability to prepare queries with +subselects for latter execution: will be no ability to get execution plan which +could be passed to executor to get results without additional parser/planner +invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() +(==> PLs). RULEs don't use execution plan, but use parsed query tree (stored +in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. + +Ability to have execution plans seems important to me. Other DBMS-es use +this for stored procedures and views. + +Vadim + +From owner-pgsql-hackers@hub.org Mon Dec 1 01:30:57 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA10903 + for ; Mon, 1 Dec 1997 01:30:55 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26262 for ; Mon, 1 Dec 1997 01:21:28 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA05263; Mon, 1 Dec 1997 01:02:12 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:00:12 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA03357 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:00:07 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA03290 for ; Mon, 1 Dec 1997 00:59:45 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA10395; + Mon, 1 Dec 1997 00:57:07 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712010557.AAA10395@candle.pha.pa.us> +Subject: Re: [HACKERS] querytrees and multiple statements +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 1 Dec 1997 00:57:07 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <3481167E.2781E494@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 30, 97 02:32:14 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> No. I just said that there will be no ability to prepare queries with +> subselects for latter execution: will be no ability to get execution plan which +> could be passed to executor to get results without additional parser/planner +> invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() +> (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored +> in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. +> +> Ability to have execution plans seems important to me. Other DBMS-es use +> this for stored procedures and views. +> +> Vadim +> + +I see what you are saying about other people calling pg_plan(). pg_plan +returns the query rewritten, and a plan, and some areas use that. I +will have to make sure I honor that functionality in any changes I make +to it. I will think more about this. I may have to add an 'execute me' +flag to it. However, I am unsure how I am going to generate 'just a +plan or rewritten query structure' without actually running the query +and having the temp table created so the rest can be parsed. + + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Mon Dec 1 02:00:58 1997 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11221 + for ; Mon, 1 Dec 1997 02:00:57 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26994 for ; Mon, 1 Dec 1997 01:55:19 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA23269; Mon, 1 Dec 1997 01:47:13 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:45:31 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA22653 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:45:25 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22590 for ; Mon, 1 Dec 1997 01:45:13 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA21318; Mon, 1 Dec 1997 13:49:58 +0700 (KRS) +Message-ID: <34825E16.446B9B3D@sable.krasnoyarsk.su> +Date: Mon, 01 Dec 1997 13:49:58 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org +Subject: Re: [HACKERS] querytrees and multiple statements +References: <199712010557.AAA10395@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> > +> > No. I just said that there will be no ability to prepare queries with +> > subselects for latter execution: will be no ability to get execution plan which +> > could be passed to executor to get results without additional parser/planner +> > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() +> > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored +> > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. +> > +> > Ability to have execution plans seems important to me. Other DBMS-es use +> > this for stored procedures and views. +> > +> > Vadim +> > +> +> I see what you are saying about other people calling pg_plan(). pg_plan +> returns the query rewritten, and a plan, and some areas use that. I +> will have to make sure I honor that functionality in any changes I make +> to it. I will think more about this. I may have to add an 'execute me' +> flag to it. However, I am unsure how I am going to generate 'just a +> plan or rewritten query structure' without actually running the query +> and having the temp table created so the rest can be parsed. + +That's why I suggest to try with nodeMaterial(): this could allow to handle +subqueries on optimizer level and got single execution plan for +single user query. + +Vadim + + +From owner-pgsql-hackers@hub.org Mon Dec 1 02:46:23 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11762 + for ; Mon, 1 Dec 1997 02:46:21 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA11681; Mon, 1 Dec 1997 02:35:00 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 02:33:17 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA11451 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 02:33:09 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id CAA11110 for ; Mon, 1 Dec 1997 02:32:10 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id CAA11574; + Mon, 1 Dec 1997 02:32:45 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712010732.CAA11574@candle.pha.pa.us> +Subject: Re: [HACKERS] querytrees and multiple statements +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 1 Dec 1997 02:32:45 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34825E16.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 1, 97 01:49:58 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > > +> > > No. I just said that there will be no ability to prepare queries with +> > > subselects for latter execution: will be no ability to get execution plan which +> > > could be passed to executor to get results without additional parser/planner +> > > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() +> > > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored +> > > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. +> > > +> > > Ability to have execution plans seems important to me. Other DBMS-es use +> > > this for stored procedures and views. +> > > +> > > Vadim +> > > +> > +> > I see what you are saying about other people calling pg_plan(). pg_plan +> > returns the query rewritten, and a plan, and some areas use that. I +> > will have to make sure I honor that functionality in any changes I make +> > to it. I will think more about this. I may have to add an 'execute me' +> > flag to it. However, I am unsure how I am going to generate 'just a +> > plan or rewritten query structure' without actually running the query +> > and having the temp table created so the rest can be parsed. +> +> That's why I suggest to try with nodeMaterial(): this could allow to handle +> subqueries on optimizer level and got single execution plan for +> single user query. + +Can you give me more details on this? I realize I can create an empty +tmp table to get through the parser analysis stuff, but how do I do +something in nodeMaterial? + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Tue Dec 2 00:04:05 1997 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA00350 + for ; Tue, 2 Dec 1997 00:03:58 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA22889; Tue, 2 Dec 1997 12:09:57 +0700 (KRS) +Sender: root@www.krasnet.ru +Message-ID: <34839824.3F54BC7E@sable.krasnoyarsk.su> +Date: Tue, 02 Dec 1997 12:09:56 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: "Vadim B. Mikheev" , hackers@postgreSQL.org +Subject: Re: [HACKERS] querytrees and multiple statements +References: <199712010732.CAA11574@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > +> > That's why I suggest to try with nodeMaterial(): this could allow to handle +> > subqueries on optimizer level and got single execution plan for +> > single user query. +> +> Can you give me more details on this? I realize I can create an empty +> tmp table to get through the parser analysis stuff, but how do I do +> something in nodeMaterial? + + * ExecMaterial + * + * The first time this is called, ExecMaterial retrieves tuples + * this node's outer subplan and inserts them into a temporary + ^^^^^^^ + + * relation. After this is done, a flag is set indicating that + * the subplan has been materialized. Once the relation is + * materialized, the first tuple is then returned. Successive + * calls to ExecMaterial return successive tuples from the temp + * relation. + +As you see, this node materializes some plan results into temp relation: +instead of doing SELECT ... INTO temp FROM ... WHERE ... you could +create Material node using plan for 'SELECT ... FROM ... WHERE ...' as +its subplan. SeqScan of this materialized relation can be used in any +join plans just like scan od normal relation, e.g. - NESTLOOP plan: + + NESTLOOP + SeqScan A + SeqScan B + +becomes + + NESTLOOP + SeqScan + Material + ...subplan here... + SeqScan B (or other Material) + +and so on... + +Vadim + +From owner-pgsql-hackers@hub.org Tue Dec 2 01:28:02 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA02313 + for ; Tue, 2 Dec 1997 01:28:00 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA00346; Tue, 2 Dec 1997 01:03:55 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 02 Dec 1997 01:03:04 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28750 for pgsql-hackers-outgoing; Tue, 2 Dec 1997 01:02:57 -0500 (EST) +Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA28254 for ; Tue, 2 Dec 1997 01:02:38 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id BAA01042; + Tue, 2 Dec 1997 01:02:15 -0500 (EST) +From: Bruce Momjian +Message-Id: <199712020602.BAA01042@candle.pha.pa.us> +Subject: Re: [HACKERS] querytrees and multiple statements +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Tue, 2 Dec 1997 01:02:15 -0500 (EST) +Cc: vadim@post.krasnet.ru, hackers@postgreSQL.org +In-Reply-To: <34839824.3F54BC7E@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 2, 97 12:09:56 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > > +> > > That's why I suggest to try with nodeMaterial(): this could allow to handle +> > > subqueries on optimizer level and got single execution plan for +> > > single user query. +> > +> > Can you give me more details on this? I realize I can create an empty +> > tmp table to get through the parser analysis stuff, but how do I do +> > something in nodeMaterial? +> +> * ExecMaterial +> * +> * The first time this is called, ExecMaterial retrieves tuples +> * this node's outer subplan and inserts them into a temporary +> ^^^^^^^ +> +> * relation. After this is done, a flag is set indicating that +> * the subplan has been materialized. Once the relation is +> * materialized, the first tuple is then returned. Successive +> * calls to ExecMaterial return successive tuples from the temp +> * relation. +> +> As you see, this node materializes some plan results into temp relation: +> instead of doing SELECT ... INTO temp FROM ... WHERE ... you could +> create Material node using plan for 'SELECT ... FROM ... WHERE ...' as +> its subplan. SeqScan of this materialized relation can be used in any +> join plans just like scan od normal relation, e.g. - NESTLOOP plan: +> +> NESTLOOP +> SeqScan A +> SeqScan B +> +> becomes +> +> NESTLOOP +> SeqScan +> Material +> ...subplan here... +> SeqScan B (or other Material) +> +> and so on... + +The problem now is that I don't understand much about what happens +inside the optimizer or executor. I am sure you are correct that we can +have the subselect as a subnode, and if you think that is best, then it +is. + +This pretty much stops me in developing subselects. I have the concepts +down of what has to happen, but I can not implement it. It will take me +several months to learn how the optimizer and executor work in enough +detail to implement this. + +I usually alot 2-3 days a month for PostgreSQL development. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Thu Oct 30 01:30:59 1997 +Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA17986 + for ; Thu, 30 Oct 1997 01:30:58 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA27090 for ; Thu, 30 Oct 1997 01:19:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA28901; Thu, 30 Oct 1997 01:16:38 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 01:16:17 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28673 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 01:16:10 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA27557 for ; Thu, 30 Oct 1997 01:15:27 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA20275; Thu, 30 Oct 1997 13:16:10 +0700 (KRS) +Message-ID: <34582629.33590565@sable.krasnoyarsk.su> +Date: Thu, 30 Oct 1997 13:16:09 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: PostgreSQL Developers List +Subject: [HACKERS] Subqueries? +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +Hi! + +Bruce, did you begin with them ? +I agreed that subqueries should be implemented like SQL-funcs, but +I would suggest to don't CREATE FUNCTION - this is quite bad for +performance, but use some new node (VirtualFunc or SubQuery or) and +handle such nodes like sql-funcs are handled in function.c +(but without parser/planner invocation on each call - should be +fixed!). Also, not corelated subqueries returning single result +can't be replaced in parser/planner by constant node: rules (and so - +views), spi and PL use _prepared_ plans... +It seems that this is not hard work... + +Vadim + + +From owner-pgsql-hackers@hub.org Thu Oct 30 16:31:59 1997 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA07360 + for ; Thu, 30 Oct 1997 16:31:49 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA11483; Thu, 30 Oct 1997 16:27:11 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:26:14 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA11163 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:26:07 -0500 (EST) +Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA10874 for ; Thu, 30 Oct 1997 16:25:12 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id QAA06370; + Thu, 30 Oct 1997 16:07:52 -0500 (EST) +From: Bruce Momjian +Message-Id: <199710302107.QAA06370@candle.pha.pa.us> +Subject: Re: [HACKERS] Subqueries? +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Thu, 30 Oct 1997 16:07:51 -0500 (EST) +Cc: hackers@postgreSQL.org +In-Reply-To: <34582629.33590565@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Oct 30, 97 01:16:09 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Hi! +> +> Bruce, did you begin with them ? +> I agreed that subqueries should be implemented like SQL-funcs, but +> I would suggest to don't CREATE FUNCTION - this is quite bad for +> performance, but use some new node (VirtualFunc or SubQuery or) and +> handle such nodes like sql-funcs are handled in function.c +> (but without parser/planner invocation on each call - should be +> fixed!). Also, not corelated subqueries returning single result +> can't be replaced in parser/planner by constant node: rules (and so - +> views), spi and PL use _prepared_ plans... +> It seems that this is not hard work... +> +> Vadim +> +> + +OK, here is what I have collected over the months about subqueries. +The Sybase whitepaper is also attached. + +This should get us thinking about how to implement each subquery type, +what operations need to be performed, and in what order. + +--------------------------------------------------------------------------- + +From: Bruce Momjian +Subject: Re: [PG95-DEV] Need info on other databases. +To: pg95-dev@ki.net +Date: Fri, 22 Nov 1996 12:49:24 -0500 (EST) + +> +> +> What I'm specifically interested in is the SQL-92 spec +> for the ANSI things that postgres95 is missing and the +> syntax/limitations on systems like Informix, Sybase, +> Microsoft, et.al... +> +> Any technical info such as performance hits, disabling +> the use of indices, stuff like that would be _greatly_ +> appreciated. I have a decent understanding of this for +> Oracle, but not for any other systems. I want to get +> an idea of the work load of adding the IN, BETWEEN/AND +> and HAVING clauses. + +I have done some thinking about subselects. There are basically two +issues: + + Does the query return one row or several rows? This can be + determined by seeing if the user uses equals on 'IN' to join the + subquery. + + Is the query correlated, meaning "Does the subquery reference + values from the outer query?" + +(We already have the third type of subquery, the INSERT...SELECT query.) + +So we have these four combinations: + + 1) one row, no correlation + 2) multiple rows, no correlation + 3) one row, correlated + 4) multiple rows, correlated + + +With #1, we can execute the subquery, get the value, replace the +subquery with the constant returned from the subquery, and execute the +outer query. + +With #2, we can execute the subquery and put the result into a temporary +table. We then rewrite the outer query to access the temporary table +and replace the subquery with the column name from the temporary table. +We probabally put an index on the temp. table, which has only one +column, because a subquery can only return one column. We remove the +temp. table after query execution. + +With #3 and #4, we potentially need to execute the subquery for every +row returned by the outer query. Performance would be horrible for +anything but the smallest query. Another way to handle this is to +execute the subquery WITHOUT using any of the outer-query columns to +restrict the WHERE clause, and add those columns used to join the outer +variables into the target list of the subquery. So for query: + + select t1.name + from tab t1 + where t1.age = (select max(t2.age) + from tab2 + where tab2.name = t1.name) + +Execute the subquery and put it in a temporary table: + + select t2.name, max(t2.age) + into table temp999 + from tab2 + where tab2.name = t1.name + + create index i_temp999 on temp999 (name) + +Then re-write the outer query: + + select t1.name + from tab t1, temp999 + where t1.age = temp999.age and + t1.name = temp999.name + +The only problem here is that the subselect is running for all entries +in tab2, even if the outer query is only going to need a few rows. +Determining whether to execute the subquery each time, or create a temp. +table is often difficult to determine. Even some non-correlated +subqueries are better to execute for each row rather the pre-execute the +entire subquery, expecially if the outer query returns few rows. + +One requirement to handle these issues is better column statistics, +which I am working on. + +------------------------------------------------------------------------------ + +Date: Thu, 5 Dec 1996 10:07:56 -0500 +From: aixssd!darrenk@abs.net (Darren King) +To: maillist@candle.pha.pa.us +Subject: Subselect info. + +> Any of them deal with implementing subselects? + +There's a white paper at the www.sybase.com that might +help a little. It's just a copy of a presentation +given by the optimizer guru there. Nothing code-wise, +but he gives a few ways of flattening them with temp +tables, etc... + +Darren + +------------------------------------------------------------------------------ + +Date: Fri, 22 Aug 1997 12:04:31 +0800 +From: "Vadim B. Mikheev" +To: Bruce Momjian +Subject: Re: subselects + +Bruce Momjian wrote: +> +> Considering the complexity of the primary/secondary changes you are +> making, I believe subselects will be easier than that. + +I don't do changes for P/F keys - just thinking... +Yes, I think that impl of referential integrity is +more complex work. + +As for subselects: + +in plannodes.h + +typedef struct Plan { +... + struct Plan *lefttree; + struct Plan *righttree; +} Plan; + +/* ---------------- + * these are are defined to avoid confusion problems with "left" + ^^^^^^^^^^^^^^^^^^ + * and "right" and "inner" and "outer". The convention is that + * the "left" plan is the "outer" plan and the "right" plan is + * the inner plan, but these make the code more readable. + * ---------------- + */ +#define innerPlan(node) (((Plan *)(node))->righttree) +#define outerPlan(node) (((Plan *)(node))->lefttree) + +First thought is avoid any confusions by re-defining + +#define rightPlan(node) (((Plan *)(node))->righttree) +#define leftPlan(node) (((Plan *)(node))->lefttree) + +and change all occurrences of 'outer' & 'inner' in code +to 'left' & 'inner' ones: + +this will allow to use 'outer' & 'inner' things for subselects +latter, without confusion. My hope is that we may change Executor +very easy by adding outer/inner plans/TupleSlots to +EState, CommonState, JoinState, etc and by doing node +processing in right order. + +Subselects are mostly Planner problem. + +Unfortunately, I havn't time at the moment: CHECK/DEFAULT... + +Vadim + +------------------------------------------------------------------------------ + +Date: Fri, 22 Aug 1997 12:22:37 +0800 +From: "Vadim B. Mikheev" +To: Bruce Momjian +Subject: Re: subselects + +Vadim B. Mikheev wrote: +> +> this will allow to use 'outer' & 'inner' things for subselects +> latter, without confusion. My hope is that we may change Executor + +Or may be use 'high' & 'low' for subselecs (to avoid confusion +with outter hoins). + +> very easy by adding outer/inner plans/TupleSlots to +> EState, CommonState, JoinState, etc and by doing node +> processing in right order. + ^^^^^^^^^^^^^^ +Rule is easy: +1. Uncorrelated subselect - do 'low' plan node first +2. Correlated - do left/right first + +- just some flag in structures. + +Vadim + + +--------------------------------------------------------------------------- + +[Image] +Home | Search/Index + +Performance Tips for Transact-SQL + +Slides from a presentation by Jeff Lichtman + +---------------------------------------------------------------------------- + +Table of Contents + +Overview +>versus>= +Exists Versus Not Exists +Exists Versus Not Exists II +Correlated Subqueries with Restrictive Outer Joins +Correlated Subqueries with Restrictive Outer Joins Example +Correlated Subqueries with Restrictive Outer Joins III +Correlated Subqueries with Restrictive Outer Joins IV +Correlated Subqueries with Restrictive Outer Joins V +Correlated Subqueries with Restrictive Outer Joins Example +Creating Tables in Stored Procedures +Creating Tables in Stored Procedures Example +Variables versus Parameters in Where Clause +Variables versus Parameters in Where Clause Example +Count versus Exists +Count versus Exists II +Or versus Union +Or versus Union Example +MAX and MIN Aggregates +MAX and MIN Aggregates II +MAX and MIN Aggregates Example +MAX and MIN Aggregates III +Joins and Datatypes +Joins and Datatypes Example +Joins and Datatypes II +Joins and Datatypes III +Parameters and Datatypes +Parameters and Datatypes Example +Summary +---------------------------------------------------------------------------- + +Overview + + * Goal Is to Learn Some Tips to Help You Improve the Performance of Your + Queries. + * Emphasis Is on Queries, Not on Schema. + * Many Tips Are Not Related to Query Optimizer. + * Tips Are Based on Actual Customer Cases Seen by SQL Server Development + Engineer. + * These Tips Are Intended As Suggestions and Guidelines, Not Absolute + Rules. + * Some of These Tips Could Become Obsolete As Sybase Improves the SQL + Server. + +---------------------------------------------------------------------------- + +> versus >= + +Given the query: + +select * from tab where x > 3 + +with an index on x. This query works by using the index to find the first +value where x = 3, and scanning forward. + +Suppose there are many rows in tab where x = 3. + +In this case, the server has to scan many pages before finding the first row +where x > 3. + +It is more efficient to write the query like this: + +select * from tab where x >= 4 + +---------------------------------------------------------------------------- + +Exists Versus Not Exists + +In subqueries and IF statements, EXISTS and IN are faster than NOT EXISTS +and NOT IN. + +With IF statements, one can easily avoid NOT EXISTS: + +if not exists (select * from ...) +begin /* Statement group 1 */ +... +end else begin /* Statement group 2 */ +... +end + +can be re-written as: + +if exists (select * from ...) +begin /* Statement group 2 */ +... +end else begin /* Statement group 1 */ +... +end + +---------------------------------------------------------------------------- + +Exists versus Not Exists (cont.) + +Even without an ELSE clause, it is possible to avoid + +NOT EXISTS in IF statements : + +if not exists (select * from ...) +begin + /* Statement group */ + ... +end +... + +can be re-written as: + +if exists (select * from ...) +begin + goto exists_label +end +/* Statement group */ +... +exists_label: +... + +---------------------------------------------------------------------------- + +Correlated Subqueries with Restrictive Outer Joins + + * SQL Server Processes Subqueries "Inside-Out" + * For Correlated Subqueries, It Creates a Worktable Containing Subquery + Results + * The Worktable Is Grouped on the Correlation Columns + +---------------------------------------------------------------------------- + +Correlated Subqueries with Restrictive Outer Joins + +For example: + +select w from outer where x = + (select sum(a) from inner + where inner.b = outer.z) + +becomes: + +select outer.z, summ = sum(inner.a) +into #work +from outer, inner +where inner.b = outer.z +group by outer.z +select outer.w +from outer, #work +where outer.z = #work.z +and outer.x = #work.summ + +---------------------------------------------------------------------------- + +Correlated Subqueries with Restrictive Outer Joins (cont.) + +The SQL Server copies search clauses from the outer query to the subquery to +improve performance: + +select w from outer +where y = 1 +and x = (select sum(a) + from inner + where inner.b = outer.z) + +becomes: + +select outer.z, summ = sum(inner.a) +into #work +from outer, inner +where inner.b = outer.z and outer.y = 1 +group by outer .z +select outer.w +from outer, #work +where outer.z = #work.z and outer.y = 1 and outer.x =#work.summ + +---------------------------------------------------------------------------- + +Correlated Subqueries with Restrictive Outer Joins (cont.) + + * The SQL Server Does Not Copy Join Clauses Into Correlated Subqueries As + It Does With Search Clauses. + * Copying Search Clauses Will Always Make the Query Run Faster, but + Copying a Join Clause Might Make It Run Slower. + * Copying the Join Clause Is Beneficial Only If the Join Clause Is Very + Restrictive. + * Only the Query Optimizer Knows Whether a Join Clause Is Restrictive, + but the SQL Server Breaks the Query Into Steps Before Optimization. + * Since You Know Your Data, You Can Copy Join Clauses Into Subqueries + When You Know It Will Help. + +---------------------------------------------------------------------------- + +Correlated Subqueries with Restrictive Outer Joins (cont.) + +An example of when to copy join clause: + +select * +from huge_tab, single_row_tab +where huge_tab.unique_column = single_row_tab.a +and huge_tab.b = (select sum© + from inner + where huge_tab.d = inner.e) + +should be re-written as: + +select * +from huge_tab, single_row_tab +where huge_tab.unique_column = single_row_tab.a +and huge_tab.b = (select sum© + from inner + where huge_tab.d = inner.e + and huge_tab.unique_column = single_row_tab.a) + +---------------------------------------------------------------------------- + +Correlated Subqueries with Restrictive Outer Joins (cont.) + +An example of when not to copy join clause: + +select * +from huge_tab, single_row_tab +where huge_tab.many_duplicates_in_column = single_row_tab.a and +single_row_tab.b = (select sum© + from inner + where single_row_tab.d = inner.e) + +Should not be re-written as: + +select * +from huge_tab, single_row_tab +where huge_tab.many_duplicates_in_column = single_row_tab.a and +single_row_tab.b = (select sum© + from inner + where single_row tab.d = inner .e + and huge_tab.many_duplicates_in_column = single_row_tab.a) + +---------------------------------------------------------------------------- + +Creating Tables in Stored Procedures + + * When You Create a Table in the Same Stored Procedure Where It Is Used, + the Query Optimizer Cannot Know How Big the Table Is. + * The Optimizer Assumes That Any Such Table Has 10 Data Pages and 100 + Rows. + * If the Table Is Really Big, This Assumption Can Lead the Optimizer to + Choose a Sub-Optimal Query Plan. + * In Cases Like This, It Is Better to Create the Table Outside the + Procedure, Which Allows the Optimizer to See How Large the Table Is. + +---------------------------------------------------------------------------- + +Creating Tables in Stored Procedures (cont) + +For example: + +create proc p as + select * into #huge_result from ... + select * from tab, #huge_result where + ... + +can be re-written as: + +create proc p as + select * into #huge_result from ... + exec s +create proc s as + select * from tab, #huge_result where + ... + +---------------------------------------------------------------------------- + +Variables versus Parameters in Where Clause + + * The Query Optimizer Cannot Predict the Value of a Declared Variable. + * The Query Does Know the Value of a Parameter to a Stored Procedure at + Compile Time. + * Knowing the Values in the WHERE Clause of a Query Can Help the + Optimizer Make Better Choices. + * To Avoid Putting Variables Into WHERE Clauses, One Can Split up Stored + Procedures. + +---------------------------------------------------------------------------- + +Variables versus Parameters in Where Clause (cont) + +For example: + +create procedure p as + declare @x int + select @x = col from tab where ... + select * from tab2 where col2 = @x + +can be re-written as: + +create procedure p as + declare @x int + select @x = col from tab where ... + exec s @x +create procedure s @x int as + select * from tab2 where col2 = @x + +---------------------------------------------------------------------------- + +Count versus Exists + +It is possible to use the COUNT aggregate in a subquery to do an existence +check: + +select * from tab where 0 < + (select count(*) from tab2 where ...) + +It is possible to write this same query using EXISTS (or IN): + +select * from tab where exists + (select * from tab2 where ...) + +---------------------------------------------------------------------------- + +Count versus Exists (cont) + + * Using COUNT to Do an Existence Check Is Slower Than Using EXISTS. + * When You Use COUNT, the SQL Server Does Not Know That You Are Doing an + Existence Check. It Counts All of the Matching Values. + * When You Use EXISTS, the SQL Server Knows You Are Doing an Existence + Check, So It Stops Looking When It Finds the First Matching Value. + * The Same Applies to Using COUNT Instead of IN or ANY. + +---------------------------------------------------------------------------- + +Or versus Union + + * The SQL Server Cannot Optimize Join Clauses That Are Linked With OR. + * The SQL Server Can Optimize Selects That Are Linked With UNION. + * The Result of OR Is Somewhat Like the Result of UNION, Except For the + Treatment of Duplicate Rows and Empty Tables. + +---------------------------------------------------------------------------- + +Or versus Union (cont) + +For example: + +select * from tab1, tab2 +where tab1.a = tab2.b +or tab1.x = tab2.y + +can be re-written as: + +select * from tab1, tab2 +where tab1.a = tab2.b +union all +select * from tab1, tab2 +where tab1.x = tab2.y + +You can use UNION instead of UNION ALL if you want to eliminate duplicates, +but this will eliminate all duplicates. It may not be possible to get +exactly the same set of duplicates from the re-written query. +---------------------------------------------------------------------------- + +MAX and MIN Aggregates + + * The SQL Server Uses Special Optimizations for the MAX and MIN + Aggregates When There Is an Index on the Aggregated Column. + * For MIN, It Stops the Scan on the First Qualifying Row. + * For MAX, It Goes Directly to the End of the Index to Find the Last Row. + * The Optimization Is Not Applied If: + o The Expression Inside the MAX or MIN Is Anything but a Column + o The Column Inside the MAX or MIN Is Not the First Column of an + Index + o There Is Another Aggregate in the Query + o There Is a GROUP BY Clause + * In Addition, the MAX Optimization Is Not Applied If There Is a WHERE + Clause. + +---------------------------------------------------------------------------- + +MAX and MIN Aggregates (cont) + +If you have an optimizable MAX or MIN aggregate, it can pay to put it in a +query separate from other aggregates. For example: + +select max(x), min(x) from tab + +will result in a full scan of tab, even if there is an index on x. The query +can be re-written as: + +select max(x) from tab +select min(x) from tab + +This can result in using the index twice, rather than scanning the entire +table once. +---------------------------------------------------------------------------- + +MAX and MIN Aggregates (cont) + +The MIN optimization can backfire if the where clause is highly selective. +For example: + +select min(index_col) +from tab +where + col_in_other_index = "value only at end of first index" + +The MIN optimization will result in a nearly complete scan of the entire +index. + +This is counter-intuitive. The more selective the WHERE clause, the slower +the query. +---------------------------------------------------------------------------- + +MAX and MIN Aggregates (cont) + +In cases like this, it can pay to disable the MIN optimization by combining +it with another aggregate: + +select min(index_col), max(index_col) +from tab +where +col_in_other_index = Òvalue only at end of first indexÓ + +This convinces the optimizer not to use the MIN optimization, so it chooses +the next best plan, which might be the other index. +---------------------------------------------------------------------------- + +Joins and Datatypes + + * When Joining Between Two Columns of the Different Datatypes, One of the + Columns Must Be Converted to the Type of the Other. + * The Commands Reference Manual Shows the Hierarchy of Types. + * The Column Whose Type Is Lower in the Hierarchy Is the One That Is + Converted. + * The Query Optimizer Cannot Choose an Index on the Column That Is + Converted. + +---------------------------------------------------------------------------- + +Joins and Datatypes (cont) + +For example: + +select * +from tab1, tab2 +where tab1.float_column = tab2.int_column + +In this case, no index on tab2.int_column can be used, because int is lower +in the hierarchy than float. + +Note that CHAR NULL is really VARCHAR, and BINARY NULL is really VARBINARY. + +Joining CHAR NOT NULL with CHAR NULL involves a conversion (BINARY too). +---------------------------------------------------------------------------- + +Joins and Datatypes (cont) + +It's best to avoid datatype problems in joins by designing the schema +accordingly. + +If a join between different datatypes is unavoidable, and it hurts +performance, you can force the conversion to be on the other side of the +join. + +For example: + +select * +from tab1, tab2 +where tab1.char_column = convert(char(75),tab2.varchar_column) + +---------------------------------------------------------------------------- + +Joins and Datatypes (cont) + +Be careful! This tactic can change the meaning of the query. + +For example: + +select * +from tab1, tab2 +where tab1.int_column = convert(int, tab2.float_column) + +This will not return the same results as the join without the convert. It +can be salvaged by adding: + +and tab2.float_column = convert(int, tab2.float_column) + +This assumes that all values in tab2.float_column can be converted to int. +---------------------------------------------------------------------------- + +Parameters and Datatypes + + * The Query Optimizer Can Use the Values of Parameters to Stored + Procedures to Help Determine Costs. + * If a Parameter Is Not of the Same Type As the Column in The WHERE + Clause That It Is Being Compared to, the Server Has to Convert the + Parameter. + * The Optimizer Cannot Use the Value of a Converted Parameter. + * It Pays to Make Sure That Parameters Have the Same Type As the Columns + They Are Compared To. + +---------------------------------------------------------------------------- + +Parameters and Datatypes (cont) + +For example: + +create proc p @x varchar(30) as +select * from tab where char_column = @x + +may get a poorer query plan than: + +create proc p @x char(30) as +select * from tab where char_column = @x + +Remember that CHAR NULL is really VARCHAR, and BINARY NULL is really +VARBINARY. +---------------------------------------------------------------------------- + +Summary + + * How you write your queries can make a big difference in performance. + * Two different queries that do the same thing may perform differently. + * There are few absolutes to improving performance, but the tips given + here can help. + * These tips are not all there is to know about performance. + +About the Author + +Jeff Lichtman has worked at Sybase since 1987. In 1994, he was given the new +position of architect of query processing for SQL Server. He is informally +known as Sybase's optimizer guru. + +For more info send email to webmaster@sybase.com + +Copyright 1995 © Sybase, Inc. All Rights Reserved. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Sun Jan 11 23:49:44 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA19252 + for ; Sun, 11 Jan 1998 23:49:02 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA08095; + Mon, 12 Jan 1998 12:09:24 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B9A580.55DD4645@sable.krasnoyarsk.su> +Date: Mon, 12 Jan 1998 12:09:20 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu +Subject: Re: [HACKERS] Re: subselects +References: <199801110559.AAA11801@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> We need a new Node structure, call it Sublink: +> +> int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...) +> Oid operator /* subquery must return single row */ +> List *lefthand; /* parent stuff */ +> Node *subquery; /* represents nodes from parser */ +> Index Subindex; /* filled in to index Query->subqueries */ + +Ok, I agreed that it's better to have new node and don't put subquery stuff +into Expr node. + +int linkType + is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression + subqueries (following Sybase naming) which must return single row - + (a, b, c) = (subquery). + Note again, that there are no linkType for IN and NOTIN here. + User' IN and NOT IN must be converted to = ANY and <> ALL by parser. + +We need not in Oid operator! In all cases we need in + +List *oper + list of Oper nodes for each of a, b, c, ... and operator (=, ...) + corresponding to data type of a, b, c, ... + +List *lefthand + is list of Var/Const nodes - representation of (a, b, c, ...) + +What is Node *subquery ? +In optimizer we need either in Subindex (to get subquery from Query->subqueries +when beeing in Sublink) or in Node *subquery inside Sublink itself. +BTW, after some thought I don't see how Query->subqueries will be usefull. +So, may be just add bool hassubqueries to Query (and Query *parentQuery) +and use Query *subquery in Sublink, but not subindex ? + +> +> Also, when parsing the subqueries, we need to keep track of correlated +> references. I recommend we add a field to the Var structure: +> +> Index sublevel; /* range table reference: +> = 0 current level of query +> < 0 parent above this many levels +> > 0 index into subquery list +> */ +> +> This way, a Var node with sublevel 0 is the current level, and is true +> in most cases. This helps us not have to change much code. sublevel = +> -1 means it references the range table in the parent query. sublevel = +> -2 means the parent's parent. sublevel = 2 means it references the range +> table of the second entry in Query->subqueries. Varno and varattno are +> still meaningful. Of course, we can't reference variables in the +> subqueries from the parent in the parser code, but Vadim may want to. + ^^^^^^^^^^^^^^^^^ +No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ... +sublevel is for optimizer only - executor will not use it. + +> +> When doing a Var lookup in the parser, we look in the current level +> first, but if not found, if it is a subquery, we can look at the parent +> and parent's parent to set the sublevel, varno, and varatno properly. +> +> We create no phantom range table entries in the subquery, and no phantom +> target list entries. We can leave that all for the upper optimizer. + +Ok. + +Vadim + +From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:41 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00786 + for ; Mon, 12 Jan 1998 08:06:39 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12270 for ; Mon, 12 Jan 1998 04:16:10 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460; + Mon, 12 Jan 1998 16:34:54 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> +Date: Mon, 12 Jan 1998 16:34:45 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , hackers@postgreSQL.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Thomas G. Lockhart wrote: +> +> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called +> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. +> If lists are handled farther back, this routine should move to there also and the +> parser will just pass the lists. Note that some assumptions have to be made about the +> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of +> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK +> to disallow those cases or to look for specific appearance of the operator to guess +> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if +> it has "<>" or "!" then build as "or"s. + +Oh, god! I never thought about this! +Ok, I have to agree: + +1. Only <, <=, =, >, >=, <> is allowed with subselects +2. Use OR's for <>, and so - we need in bool useor in SubLink + for <>, <> ANY and <> ALL: + +typedef struct SubLink { + NodeTag type; + int linkType; /* EXISTS, ALL, ANY, EXPR */ + bool useor; /* TRUE for <> */ + List *lefthand; /* List of Var/Const nodes on the left */ + List *oper; /* List of Oper nodes */ + Query *subquery; /* */ +} SubLink; + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 12 08:06:53 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00814 + for ; Mon, 12 Jan 1998 08:06:51 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12449 for ; Mon, 12 Jan 1998 04:26:03 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA01671; Mon, 12 Jan 1998 04:17:59 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 12 Jan 1998 04:17:29 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA01651 for pgsql-hackers-outgoing; Mon, 12 Jan 1998 04:17:23 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA01633 for ; Mon, 12 Jan 1998 04:16:44 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460; + Mon, 12 Jan 1998 16:34:54 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> +Date: Mon, 12 Jan 1998 16:34:45 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , hackers@postgreSQL.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Thomas G. Lockhart wrote: +> +> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called +> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. +> If lists are handled farther back, this routine should move to there also and the +> parser will just pass the lists. Note that some assumptions have to be made about the +> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of +> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK +> to disallow those cases or to look for specific appearance of the operator to guess +> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if +> it has "<>" or "!" then build as "or"s. + +Oh, god! I never thought about this! +Ok, I have to agree: + +1. Only <, <=, =, >, >=, <> is allowed with subselects +2. Use OR's for <>, and so - we need in bool useor in SubLink + for <>, <> ANY and <> ALL: + +typedef struct SubLink { + NodeTag type; + int linkType; /* EXISTS, ALL, ANY, EXPR */ + bool useor; /* TRUE for <> */ + List *lefthand; /* List of Var/Const nodes on the left */ + List *oper; /* List of Oper nodes */ + Query *subquery; /* */ +} SubLink; + +Vadim + + +From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:38 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00783 + for ; Mon, 12 Jan 1998 08:06:36 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12377 for ; Mon, 12 Jan 1998 04:21:55 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08470; + Mon, 12 Jan 1998 16:40:49 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34B9E520.4C0EA6BC@sable.krasnoyarsk.su> +Date: Mon, 12 Jan 1998 16:40:48 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , hackers@postgreSQL.org +Subject: Re: [HACKERS] Re: subselects +References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Thomas G. Lockhart wrote: +> +> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called +> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. +> If lists are handled farther back, this routine should move to there also and the +> parser will just pass the lists. Note that some assumptions have to be made about the +> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of +> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK +> to disallow those cases or to look for specific appearance of the operator to guess +> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if +> it has "<>" or "!" then build as "or"s. + +Sorry, I forgot something: is (a, b) OP (x, y) in standard ? +If not then I suggest to don't implement it at all and allow +(a, b) OP [ANY|ALL] (subselect) only. + +Vadim + +From vadim@sable.krasnoyarsk.su Tue Jan 13 09:30:58 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA28551 + for ; Tue, 13 Jan 1998 09:30:56 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA26483 for ; Tue, 13 Jan 1998 09:21:36 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id VAA04356; + Tue, 13 Jan 1998 21:20:31 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> +Date: Tue, 13 Jan 1998 21:20:25 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu +Subject: Re: [HACKERS] Re: subselects +References: <199801121424.JAA02440@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Ok. I don't see how Query->subqueries could me help, but I foresee +that Query->sublinks can do it. Could you add this ? + +Bruce Momjian wrote: +> +> > +> > What is Node *subquery ? +> > In optimizer we need either in Subindex (to get subquery from Query->subqueries +> > when beeing in Sublink) or in Node *subquery inside Sublink itself. +> > BTW, after some thought I don't see how Query->subqueries will be usefull. +> > So, may be just add bool hassubqueries to Query (and Query *parentQuery) +> > and use Query *subquery in Sublink, but not subindex ? +> +> OK, I originally created it because the parser would have trouble +> filling in a List* field in SelectStmt while it was parsing a WHERE +> clause. I decided to just stick the SelectStmt* into Sublink->subquery. +> +> While we are going through the parse output to fill in the Query*, I +> thought we should move the actual subquery parse output to a separate +> place, and once the Query* was completed, spin through the saved +> subquery parse list and stuff Query->subqueries with a list of Query* +> for the subqueries. I thought this would be easier, because we would +> then have all the subqueries in a nice list that we can manage easier. +> +> In fact, we can fill Query->subqueries with SelectStmt* as we process +> the WHERE clause, then convert them to Query* at the end. +> +> If you would rather keep the subquery Query* entries in the Sublink +> structure, we can do that. The only issue I see is that when you want +> to get to them, you have to wade through the WHERE clause to find them. +> For example, we will have to run the subquery Query* through the rewrite +> system. Right now, for UNION, I have a nice union List* in Query, and I +> just spin through it in postgres.c for each Union query. If we keep the +> subquery Query* inside Sublink, we have to have some logic to go through +> and find them. +> +> If we just have an Index in Sublink to the Query->subqueries, we can use +> the nth() macro to find them quite easily. +> +> But it is up to you. I really don't know how you are going to handle +> things like: +> +> select * +> from taba +> where x = 3 and y = 5 and (z=6 or q in (select g from tabb )) + +No problems. + +> +> My logic was to break the problem down to single queries as much as +> possible, so we would be breaking the problem up into pieces. Whatever +> is easier for you. + +Vadim + +From owner-pgsql-hackers@hub.org Tue Jan 13 10:32:35 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA29523 + for ; Tue, 13 Jan 1998 10:32:33 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA03743; Tue, 13 Jan 1998 10:32:13 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 13 Jan 1998 10:31:57 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA03708 for pgsql-hackers-outgoing; Tue, 13 Jan 1998 10:31:51 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA03628 for ; Tue, 13 Jan 1998 10:31:20 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id JAA28747; + Tue, 13 Jan 1998 09:48:00 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801131448.JAA28747@candle.pha.pa.us> +Subject: Re: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Tue, 13 Jan 1998 09:48:00 -0500 (EST) +Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu +In-Reply-To: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 13, 98 09:20:25 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Ok. I don't see how Query->subqueries could me help, but I foresee +> that Query->sublinks can do it. Could you add this ? + +OK, so instead of moving the query out of the SubLink structure, you +want the Query* in the Sublink structure, and a List* of SubLink +pointers in the query structure? + + Query + { + ... + List *sublink; /* list of pointers to Sublinks + ... + } + +I can do that. Let me know. +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Tue Jan 13 22:23:46 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA08806 + for ; Tue, 13 Jan 1998 22:23:45 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA11486 for ; Tue, 13 Jan 1998 22:09:55 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id KAA05660; + Wed, 14 Jan 1998 10:09:07 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> +Date: Wed, 14 Jan 1998 10:09:02 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu +Subject: Re: [HACKERS] Re: subselects +References: <199801131448.JAA28747@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > +> > Ok. I don't see how Query->subqueries could me help, but I foresee +> > that Query->sublinks can do it. Could you add this ? +> +> OK, so instead of moving the query out of the SubLink structure, you +> want the Query* in the Sublink structure, and a List* of SubLink +> pointers in the query structure? + +Yes. + +> +> Query +> { +> ... +> List *sublink; /* list of pointers to Sublinks +> ... +> } +> +> I can do that. Let me know. + +Thanks! + +Are there any opened issues ? + +Vadim + +From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:40 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21676 + for ; Thu, 15 Jan 1998 19:00:39 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23948 for ; Thu, 15 Jan 1998 18:35:59 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27814; Thu, 15 Jan 1998 18:32:40 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:32:20 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27668 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:32:08 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27425 for ; Thu, 15 Jan 1998 18:31:32 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id SAA12920; + Thu, 15 Jan 1998 18:18:32 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801152318.SAA12920@candle.pha.pa.us> +Subject: Re: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Thu, 15 Jan 1998 18:18:31 -0500 (EST) +Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu +In-Reply-To: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 14, 98 10:09:02 am +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> +> Bruce Momjian wrote: +> > +> > > +> > > Ok. I don't see how Query->subqueries could me help, but I foresee +> > > that Query->sublinks can do it. Could you add this ? +> > +> > OK, so instead of moving the query out of the SubLink structure, you +> > want the Query* in the Sublink structure, and a List* of SubLink +> > pointers in the query structure? +> +> Yes. +> +> > +> > Query +> > { +> > ... +> > List *sublink; /* list of pointers to Sublinks +> > ... +> > } +> > +> > I can do that. Let me know. +> +> Thanks! +> +> Are there any opened issues ? + +OK, what do you need me to do. Do you want me to create the Sublink +support stuff, fill them in in the parser, and pass them through the +rewrite section and into the optimizer. I will prepare a list of +changes. + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:38 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21663 + for ; Thu, 15 Jan 1998 19:00:36 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23925 for ; Thu, 15 Jan 1998 18:35:42 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27796; Thu, 15 Jan 1998 18:32:37 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:31:52 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27463 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:31:37 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27167 for ; Thu, 15 Jan 1998 18:31:06 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id SAA26747; + Thu, 15 Jan 1998 18:26:42 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801152326.SAA26747@candle.pha.pa.us> +Subject: Re: [HACKERS] Re: subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Thu, 15 Jan 1998 18:26:41 -0500 (EST) +Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org +In-Reply-To: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 12, 98 04:34:45 pm +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> typedef struct SubLink { +> NodeTag type; +> int linkType; /* EXISTS, ALL, ANY, EXPR */ +> bool useor; /* TRUE for <> */ +> List *lefthand; /* List of Var/Const nodes on the left */ +> List *oper; /* List of Oper nodes */ +> Query *subquery; /* */ +> } SubLink; + +OK, we add this structure above. During parsing, *subquery actually +will hold Node *parsetree, not Query *. + +And add to Query: + + bool hasSubLinks; + +Also need a function to return a List* of SubLink*. I just did a +similar thing with Aggreg*. And Var gets: + + int uplevels; + +Is that it? + + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From owner-pgsql-hackers@hub.org Fri Jan 16 04:36:05 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09604 + for ; Fri, 16 Jan 1998 04:36:03 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA07040; Fri, 16 Jan 1998 04:35:27 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 16 Jan 1998 04:35:18 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA06936 for pgsql-hackers-outgoing; Fri, 16 Jan 1998 04:35:13 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA06823 for ; Fri, 16 Jan 1998 04:34:22 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10384; + Fri, 16 Jan 1998 16:34:15 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34BF2997.97B40172@sable.krasnoyarsk.su> +Date: Fri, 16 Jan 1998 16:34:15 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org +Subject: Re: [HACKERS] Re: subselects +References: <199801152326.SAA26747@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> > typedef struct SubLink { +> > NodeTag type; +> > int linkType; /* EXISTS, ALL, ANY, EXPR */ +> > bool useor; /* TRUE for <> */ +> > List *lefthand; /* List of Var/Const nodes on the left */ +> > List *oper; /* List of Oper nodes */ +> > Query *subquery; /* */ +> > } SubLink; +> +> OK, we add this structure above. During parsing, *subquery actually +> will hold Node *parsetree, not Query *. + ^^^^^^^^^^^^^^^ +But optimizer will get node Query here, yes ? + +> +> And add to Query: +> +> bool hasSubLinks; +> +> Also need a function to return a List* of SubLink*. I just did a +> similar thing with Aggreg*. And Var gets: +> +> int uplevels; +> +> Is that it? + +Yes. + +Vadim + + +From vadim@sable.krasnoyarsk.su Fri Jan 16 04:36:21 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09607 + for ; Fri, 16 Jan 1998 04:36:06 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10396; + Fri, 16 Jan 1998 16:37:21 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34BF2A50.A357A16D@sable.krasnoyarsk.su> +Date: Fri, 16 Jan 1998 16:37:20 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu +Subject: Re: [HACKERS] Re: subselects +References: <199801152318.SAA12920@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > +> > Are there any opened issues ? +> +> OK, what do you need me to do. Do you want me to create the Sublink +> support stuff, fill them in in the parser, and pass them through the +> rewrite section and into the optimizer. I will prepare a list of +> changes. + +Please do this. I'm ready to start coding of things in optimizer. + +Vadim + +From vadim@sable.krasnoyarsk.su Sun Jan 18 07:32:52 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA14786 + for ; Sun, 18 Jan 1998 07:32:51 -0500 (EST) +Received: from www.krasnet.ru ([193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA29385 for ; Sun, 18 Jan 1998 07:25:55 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780; + Sun, 18 Jan 1998 19:27:14 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su> +Date: Sun, 18 Jan 1998 19:27:09 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] subselects coding started +References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Thomas G. Lockhart wrote: +> +> Bruce Momjian wrote: +> +> > OK, I have created the SubLink structure with supporting routines, and +> > have added code to create the SubLink structures in the parser, and have +> > added Query->hasSubLink. +> > +> > I changed gram.y to support: +> > +> > (x,y,z) OP (subselect) +> > +> > where OP is any operator. Is that right, or are we doing only certain +> > ones, and of so, do we limit it in the parser? +> +> Seems like we would want to pass most operators and expressions through +> gram.y, and then call elog() in either the transformation or in the +> optimizer if it is an operator which can't be supported. + +Not in optimizer, in parser, please. +Remember that for <> SubLink->useor must be TRUE and this is parser work +(optimizer don't know about "=", "<>", etc but only about Oper nodes). + +IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work. + +Vadim + +From owner-pgsql-hackers@hub.org Sun Jan 18 21:08:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA00825 + for ; Sun, 18 Jan 1998 21:08:57 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA25254 for ; Sun, 18 Jan 1998 19:18:24 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA06912; Sun, 18 Jan 1998 19:17:01 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 18 Jan 1998 19:11:05 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA06322 for pgsql-hackers-outgoing; Sun, 18 Jan 1998 19:11:01 -0500 (EST) +Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA06144 for ; Sun, 18 Jan 1998 19:10:31 -0500 (EST) +Received: from www.krasnet.ru ([193.125.44.86]) + by clio.trends.ca (8.8.8/8.8.8) with ESMTP id HAA12383 + for ; Sun, 18 Jan 1998 07:28:38 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780; + Sun, 18 Jan 1998 19:27:14 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su> +Date: Sun, 18 Jan 1998 19:27:09 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: "Thomas G. Lockhart" +CC: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] subselects coding started +References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Thomas G. Lockhart wrote: +> +> Bruce Momjian wrote: +> +> > OK, I have created the SubLink structure with supporting routines, and +> > have added code to create the SubLink structures in the parser, and have +> > added Query->hasSubLink. +> > +> > I changed gram.y to support: +> > +> > (x,y,z) OP (subselect) +> > +> > where OP is any operator. Is that right, or are we doing only certain +> > ones, and of so, do we limit it in the parser? +> +> Seems like we would want to pass most operators and expressions through +> gram.y, and then call elog() in either the transformation or in the +> optimizer if it is an operator which can't be supported. + +Not in optimizer, in parser, please. +Remember that for <> SubLink->useor must be TRUE and this is parser work +(optimizer don't know about "=", "<>", etc but only about Oper nodes). + +IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work. + +Vadim + + +From vadim@sable.krasnoyarsk.su Sun Jan 18 23:59:08 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA10497 + for ; Sun, 18 Jan 1998 23:59:07 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA06941 for ; Sun, 18 Jan 1998 23:44:32 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id LAA16745 + for ; Mon, 19 Jan 1998 11:46:28 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34C2DAA3.78E54042@sable.krasnoyarsk.su> +Date: Mon, 19 Jan 1998 11:46:27 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +Subject: Re: SubLink->oper +References: <199801190419.XAA04367@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> In SubLink->oper, do you want the oid of the pg_operator, or the oid of +> the pg_proc assigned to the operator? +> +> Currently, I am giving you the oid of pg_operator. + +No! I need in Oper nodes here. For "normal" operators parser +returns Expr node with opType = OP_EXPR and corresponding Oper +in Node *oper. Near the same for SubLink: I need in Oper node +for each pair of Var/Const from the left side and target entry from +the subquery. + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 19 01:02:23 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24036 + for ; Mon, 19 Jan 1998 01:02:21 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA13913; Mon, 19 Jan 1998 01:02:16 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:01:41 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA13824 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:01:34 -0500 (EST) +Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA13699 for ; Mon, 19 Jan 1998 01:00:59 -0500 (EST) +Received: (from maillist@localhost) + by candle.pha.pa.us (8.8.5/8.8.5) id AAA23866; + Mon, 19 Jan 1998 00:54:49 -0500 (EST) +From: Bruce Momjian +Message-Id: <199801190554.AAA23866@candle.pha.pa.us> +Subject: [HACKERS] subselects +To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) +Date: Mon, 19 Jan 1998 00:54:49 -0500 (EST) +Cc: hackers@postgreSQL.org (PostgreSQL-development) +X-Mailer: ELM [version 2.4 PL25] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + + +OK, I have added code to allow the SubLinks make it to the optimizer. + +I implemented ParseState->parentParseState, but not parentQuery, because +the parentParseState is much more valuable to me, and Vadim thought it +might be useful, but was not positive. Also, keeping that parentQuery +pointer valid through rewrite may be difficult, so I dropped it. +ParseState is only valid in the parser. + +I have not done: + + correlated subquery column references + added Var->sublevels_up + gotten this to work in the rewrite system + have not added full CopyNode support + +I will address these in the next few days. + +-- +Bruce Momjian +maillist@candle.pha.pa.us + + +From vadim@sable.krasnoyarsk.su Mon Jan 19 01:32:54 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24335 + for ; Mon, 19 Jan 1998 01:32:52 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA10610 for ; Mon, 19 Jan 1998 01:23:02 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16879 + for ; Mon, 19 Jan 1998 13:25:28 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34C2F1D2.9CD191CC@sable.krasnoyarsk.su> +Date: Mon, 19 Jan 1998 13:25:22 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +Subject: Re: SubLink->oper +References: <199801190500.AAA10576@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> > +> > Bruce Momjian wrote: +> > > +> > > In SubLink->oper, do you want the oid of the pg_operator, or the oid of +> > > the pg_proc assigned to the operator? +> > > +> > > Currently, I am giving you the oid of pg_operator. +> > +> > No! I need in Oper nodes here. For "normal" operators parser +> > returns Expr node with opType = OP_EXPR and corresponding Oper +> > in Node *oper. Near the same for SubLink: I need in Oper node +> > for each pair of Var/Const from the left side and target entry from +> > the subquery. +> > +> > Vadim +> > +> +> OK, can I give you an Oper* for each field. + +Nice! But what's this: + +typedef struct SubLink +{ +struct Query; +^^^^^^^^^^^^^ + NodeTag type; + +Vadim + +From vadim@sable.krasnoyarsk.su Mon Jan 19 01:34:39 1998 +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24346 + for ; Mon, 19 Jan 1998 01:34:33 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904; + Mon, 19 Jan 1998 13:37:42 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Sender: root@www.krasnet.ru +Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su> +Date: Mon, 19 Jan 1998 13:37:41 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: Re: subselects +References: <199801190554.AAA23866@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: +> +> OK, I have added code to allow the SubLinks make it to the optimizer. +> +> I implemented ParseState->parentParseState, but not parentQuery, because +> the parentParseState is much more valuable to me, and Vadim thought it +> might be useful, but was not positive. Also, keeping that parentQuery +> pointer valid through rewrite may be difficult, so I dropped it. +> ParseState is only valid in the parser. +> +> I have not done: +> +> correlated subquery column references +> added Var->sublevels_up +> gotten this to work in the rewrite system +> have not added full CopyNode support +> +> I will address these in the next few days. + +Nice! I'm starting with non-correlated subqueries... + +Vadim + +From owner-pgsql-hackers@hub.org Mon Jan 19 01:35:50 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24362 + for ; Mon, 19 Jan 1998 01:35:48 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA17531; Mon, 19 Jan 1998 01:35:39 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:35:33 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA17460 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:35:28 -0500 (EST) +Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA17323 for ; Mon, 19 Jan 1998 01:35:03 -0500 (EST) +Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) + by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904; + Mon, 19 Jan 1998 13:37:42 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su> +Date: Mon, 19 Jan 1998 13:37:41 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: [HACKERS] Re: subselects +References: <199801190554.AAA23866@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> OK, I have added code to allow the SubLinks make it to the optimizer. +> +> I implemented ParseState->parentParseState, but not parentQuery, because +> the parentParseState is much more valuable to me, and Vadim thought it +> might be useful, but was not positive. Also, keeping that parentQuery +> pointer valid through rewrite may be difficult, so I dropped it. +> ParseState is only valid in the parser. +> +> I have not done: +> +> correlated subquery column references +> added Var->sublevels_up +> gotten this to work in the rewrite system +> have not added full CopyNode support +> +> I will address these in the next few days. + +Nice! I'm starting with non-correlated subqueries... + +Vadim + + +From owner-pgsql-hackers@hub.org Wed Jan 21 04:00:59 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA14981 + for ; Wed, 21 Jan 1998 04:00:56 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA02432 for ; Wed, 21 Jan 1998 03:46:22 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id DAA12583; Wed, 21 Jan 1998 03:45:43 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 03:44:07 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id DAA12288 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 03:44:02 -0500 (EST) +Received: from gandalf.sd.spardat.at (gandalf.telecom.at [194.118.26.84]) by hub.org (8.8.8/8.7.5) with ESMTP id DAA12263 for ; Wed, 21 Jan 1998 03:43:18 -0500 (EST) +Received: from sdgtw.sd.spardat.at (sdgtw.sd.spardat.at [172.18.99.31]) + by gandalf.sd.spardat.at (8.8.8/8.8.8) with ESMTP id JAA38408 + for ; Wed, 21 Jan 1998 09:42:55 +0100 +Received: by sdgtw.sd.spardat.at with Internet Mail Service (5.0.1458.49) + id ; Wed, 21 Jan 1998 09:42:55 +0100 +Message-ID: <219F68D65015D011A8E000006F8590C6010A51A2@sdexcsrv1.sd.spardat.at> +From: Zeugswetter Andreas DBT +To: "'pgsql-hackers@hub.org'" +Subject: [HACKERS] Re: subselects +Date: Wed, 21 Jan 1998 09:42:52 +0100 +X-Priority: 3 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.0.1458.49) +Content-Type: text/plain +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce wrote: +> I have completed adding Var.varlevelsup, and have added code to the +> parser to properly set the field. It will allow correlated references +> in the WHERE clause, but not in the target list. + +select i2.ip1, i1.ip4 from nameip i1 where ip1 = (select ip1 from nameip +i2); + 522: Table (i2) not selected in query. +select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2); + 284: A subquery has returned not exactly one row. +select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2 +where name='zeus'); + 2 row(s) retrieved. + +Informix allows correlated references in the target list. It also allows +subselects in the target list as in: +select i1.ip4, (select i1.ip1 from nameip i2) from nameip i1; + 284: A subquery has returned not exactly one row. +select i1.ip4, (select i1.ip1 from nameip i2 where name='zeus') from +nameip i1; + 2 row(s) retrieved. + +Is this what you were looking for ? + +Andreas + + +From owner-pgsql-hackers@hub.org Wed Jan 21 05:31:02 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA15884 + for ; Wed, 21 Jan 1998 05:31:01 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id FAA04709 for ; Wed, 21 Jan 1998 05:16:16 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id FAA05191; Wed, 21 Jan 1998 05:15:42 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 05:14:02 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id FAA04951 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 05:13:57 -0500 (EST) +Received: from dune.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id FAA04610 for ; Wed, 21 Jan 1998 05:12:18 -0500 (EST) +Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86]) + by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id RAA01918; + Wed, 21 Jan 1998 17:10:24 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34C5C98E.3E085F52@sable.krasnoyarsk.su> +Date: Wed, 21 Jan 1998 17:10:22 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: Bruce Momjian +CC: PostgreSQL-development +Subject: [HACKERS] Re: subselects +References: <199801210324.WAA02161@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +Bruce Momjian wrote: +> +> We are only going to have subselects in the WHERE clause, not in the +> target list, right? +> +> The standard says we can have them either place, but I didn't think we +> were implementing the target list subselects. +> +> Is that correct? + +Yes, this is right for 6.3. I hope that we'll support subselects in +target list, FROM, etc in future. + +BTW, I'm going to implement subselect in (let's say) "natural" way - +without substitution of parent query relations into subselect and so on, +but by execution of (correlated) subqueries for each upper query row +(may be with cacheing of results in hash table for better performance). +Sure, this is much more clean way and much more clear how to do this. +This seems like SQL-func way, but funcs start/run/stop Executor each time +when called and this breaks performance. + +Vadim + + +From owner-pgsql-hackers@hub.org Wed Jan 21 10:02:02 1998 +Received: from hub.org (hub.org [209.47.148.200]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA20456 + for ; Wed, 21 Jan 1998 10:02:01 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA06778; Wed, 21 Jan 1998 10:02:13 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 10:00:41 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA06544 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 10:00:37 -0500 (EST) +Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA06326 for ; Wed, 21 Jan 1998 10:00:03 -0500 (EST) +Received: from insightdist.com (nobody@localhost) + by u1.abs.net (8.8.5/8.8.5) with UUCP id JAA08009 + for pgsql-hackers@postgresql.org; Wed, 21 Jan 1998 09:40:29 -0500 (EST) +X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!darrenk using -f +Received: by insightdist.com (AIX 3.2/UCB 5.64/4.03) + id AA33174; Wed, 21 Jan 1998 09:26:09 -0500 +Received: by ceodev (AIX 4.1/UCB 5.64/4.03) + id AA36452; Wed, 21 Jan 1998 09:13:05 -0500 +Date: Wed, 21 Jan 1998 09:13:05 -0500 +From: darrenk@insightdist.com (Darren King) +Message-Id: <9801211413.AA36452@ceodev> +To: pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] subselects +Mime-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +Content-Md5: 4wI6dUsUAXei+yg3JycjGw== +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +> We are only going to have subselects in the WHERE clause, not in the +> target list, right? +> +> The standard says we can have them either place, but I didn't think we +> were implementing the target list subselects. +> +> Is that correct? + +What about the HAVING clause? Currently not in, but someone here wants +to take a stab at it. + +Doesn't seem that tough...loops over the tuples returned from the group +by node and checks the expression such as "x > 5" or "x = (subselect)". + +The cost analysis in the optimizer could be tricky come to think of it. +If a subselect has a HAVING, would have to have a formula to determine +the selectiveness. Hmmm... + +darrenk + +