postgresql/doc/TODO.detail/primary
1999-09-23 15:43:40 +00:00

708 lines
29 KiB
Plaintext

From owner-pgsql-hackers@hub.org Fri Sep 4 00:47:06 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA01047
for <maillist@candle.pha.pa.us>; Fri, 4 Sep 1998 00:47:05 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id XAA02044 for <maillist@candle.pha.pa.us>; Thu, 3 Sep 1998 23:11:07 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA27418; Thu, 3 Sep 1998 23:06:16 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 03 Sep 1998 23:04:11 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA27185 for pgsql-hackers-outgoing; Thu, 3 Sep 1998 23:04:09 -0400 (EDT)
Received: from dune.krs.ru (dune.krs.ru [195.161.16.38]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA27169 for <hackers@postgreSQL.org>; Thu, 3 Sep 1998 23:03:59 -0400 (EDT)
Received: from krs.ru (localhost.krs.ru [127.0.0.1])
by dune.krs.ru (8.8.8/8.8.8) with ESMTP id LAA10059;
Fri, 4 Sep 1998 11:03:00 +0800 (KRSS)
(envelope-from vadim@krs.ru)
Message-ID: <35EF5864.E5142D35@krs.ru>
Date: Fri, 04 Sep 1998 11:03:00 +0800
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.05 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
MIME-Version: 1.0
To: "D'Arcy J.M. Cain" <darcy@druid.net>
CC: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>, hackers@postgreSQL.org
Subject: Re: [HACKERS] Adding PRIMARY KEY info
References: <m0zEaoV-00006JC@druid.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
D'Arcy J.M. Cain wrote:
>
> Thus spake Vadim Mikheev
> > Imho, indices should be used/created for FOREIGN keys and so pg_index
> > is good place for both PRIMARY and FOREIGN keys infos.
>
> Are you sure? I don't know about implementing it but it seems more
> like an attribute thing rather than an index thing. Certainly from a
> database design viewpoint you want to refer to the fields, not the
> index on them. If you put it into the index then you have to do
> an extra join to get the information.
>
> Perhaps you have to do the extra join anyway for other purposes so it
> may not matter. All I want is to be able to be able to extract the
> field that the designer specified as the key. As long as I can design
> a select statement that gives me that I don't much care how it is
> implemented. I'll cache the information anyway so it won't have a
> huge impact on my programs.
First, let me note that you have to add int28 field to pg_class,
not just oid field, to know what attributeS are in primary key
(we support multi-attribute primary keys).
This could be done...
But what about foreign and unique (!) keys ?
There may be _many_ foreign/unique keys defined for one table!
And so foreign/unique keys info have to be stored somewhere else,
not in pg_class.
pg_index is good place for all _3_ key types because of:
1. index should be created for each foreign key -
just for performance.
2. pg_index already has int28 field for key attributes.
3. pg_index already has indisunique (note that foreign keys
may reference unique keys, not just primary ones).
- so we have just add two fields to pg_index:
bool indisprimary;
oid indreferenced;
^^^^^^^^^^^^^^^^^^
this is for foreign keys: oid of referenced relation'
primary/unique key index.
I agreed that indices are just implementation...
If you don't like to store key infos in pg_index then
new pg_key relation have to be added...
Comments ?
Vadim
From owner-pgsql-hackers@hub.org Sat Sep 5 02:01:13 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA14437
for <maillist@candle.pha.pa.us>; Sat, 5 Sep 1998 02:01:11 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id BAA09928 for <maillist@candle.pha.pa.us>; Sat, 5 Sep 1998 01:48:32 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA18282; Sat, 5 Sep 1998 01:43:16 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sat, 05 Sep 1998 01:41:40 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA18241 for pgsql-hackers-outgoing; Sat, 5 Sep 1998 01:41:38 -0400 (EDT)
Received: from dune.krs.ru (dune.krs.ru [195.161.16.38]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA18211; Sat, 5 Sep 1998 01:41:21 -0400 (EDT)
Received: from krs.ru (localhost.krs.ru [127.0.0.1])
by dune.krs.ru (8.8.8/8.8.8) with ESMTP id NAA20555;
Sat, 5 Sep 1998 13:40:44 +0800 (KRSS)
(envelope-from vadim@krs.ru)
Message-ID: <35F0CEDB.AD721090@krs.ru>
Date: Sat, 05 Sep 1998 13:40:43 +0800
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.05 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
MIME-Version: 1.0
To: "D'Arcy J.M. Cain" <darcy@druid.net>
CC: hackers@postgreSQL.org, pgsql-core@postgreSQL.org
Subject: Re: [HACKERS] Adding PRIMARY KEY info
References: <m0zEvLK-00006FC@druid.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: ROr
D'Arcy J.M. Cain wrote:
>
> >
> > pg_index is good place for all _3_ key types because of:
> >
> > 1. index should be created for each foreign key -
> > just for performance.
> > 2. pg_index already has int28 field for key attributes.
> > 3. pg_index already has indisunique (note that foreign keys
> > may reference unique keys, not just primary ones).
> >
> > - so we have just add two fields to pg_index:
> >
> > bool indisprimary;
> > oid indreferenced;
> > ^^^^^^^^^^^^^^^^^^
> > this is for foreign keys: oid of referenced relation'
> > primary/unique key index.
>
> Sounds fine to me. Any chance of seeing this in 6.4?
I could add this (and FOREIGN key implementation) before
11-13 Sep... But not the ALTER TABLE ADD/DROP CONSTRAINT
stuff (ok for Entry SQL).
But we are in beta...
Comments?
> Nope, pg_index is fine by me. Now, once we have this, how do we find
> the index for a particular attribute? I can't seem to figure out the
> relationship between pg_attribute and pg_index. The chart in the docs
> suggests that indkey is the relation but I can't see any useful info
> there for joining the tables.
pg_index:
indrelid - oid of indexed relation
indkey - up to the 8 attnums
pg_attribute:
attrelid - oid of relation
attnum - ...
Without outer join you have to query pg_attribute for each
valid attnum from pg_index->indkey -:(
Vadim
From owner-pgsql-hackers@hub.org Tue Sep 21 05:31:11 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA07543
for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 05:31:09 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id FAA19587 for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 05:12:03 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1])
by hub.org (8.9.3/8.9.3) with ESMTP id EAA55119;
Tue, 21 Sep 1999 04:48:48 -0400 (EDT)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 21 Sep 1999 04:45:33 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.9.3/8.9.3) id EAA54532
for pgsql-hackers-outgoing; Tue, 21 Sep 1999 04:44:35 -0400 (EDT)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.3/8.9.3) with SMTP id EAA54496
for <pgsql-hackers@postgreSQL.org>; Tue, 21 Sep 1999 04:44:13 -0400 (EDT)
(envelope-from wieck@debis.com)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@postgreSQL.org
id m11TLQP-0003kLC; Tue, 21 Sep 99 10:37 MET DST
Message-Id: <m11TLQP-0003kLC@orion.SAPserv.Hamburg.dsh.de>
From: wieck@debis.com (Jan Wieck)
Subject: [HACKERS] Re: Referential Integrity In PostgreSQL
To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS)
Date: Tue, 21 Sep 1999 10:37:21 +0200 (MET DST)
Reply-To: wieck@debis.com (Jan Wieck)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
>
> Hi , Jan
>
> my name is Max .
Hi Max,
>
> I have contributed to SPI interface ,
> that with external Trigger try to make
> a referential integrity.
>
> If I can Help , in something ,
> I'm here .
>
You're welcome.
I've CC'd the hackers list because we might get some ideas
from there too (and to surface once in a while - Bruce
already missed me).
Currently I'm very busy for serious work so I don't find
enough spare time to start on such a big change to
PostgreSQL. But I'd like to give you an overview of what I
have in mind so far so you can decide if you're able to help.
Referential integrity (RI) is based on constraints defined in
the schema of a database. There are some different types of
constraints:
1. Uniqueness constraints.
2. Foreign key constraints that ensure that a key value used
in an attribute exists in another relation. One
constraint must ensure you're unable to INSERT/UPDATE to
a value that doesn't exist, another one must prevent
DELETE on a referenced key item or that it is changed
during UPDATE.
3. Cascading deletes that let rows referring to a key follow
on DELETE silently.
Even if not defined in the standard (AFAIK) there could be
others like letting references automatically follow on UPDATE
to a key value.
All constraints can be enabled and/or default to be deferred.
That means, that the RI checks aren't performed when they are
triggerd. Instead, they're checked at transaction end or if
explicitly invoked by some special statement. This is really
important because someone must be able to setup cyclic RI
checks that could never be satisfied if the checks would be
performed immediately. The major problem on this is the
amount of data affected until the checks must be performed.
The number of statements executed, that trigger such deferred
constraints, shouldn't be limited. And one single
INSERT/UPDATE/DELETE could affect thousands of rows.
Due to these problems I thought, it might not be such a good
idea to remember CTID's or the like to get back OLD/NEW rows
at the time the constraints are checked. Instead I planned to
misuse the rule system for it. Unfortunately, the rule system
has damned tricky problems itself when it comes to having-,
distinct and other clauses and extremely on aggregates and
subselects. These problems would have to get fixed first. So
it's a solution that cannot be implemented right now.
Fallback to CTID remembering though. There are problems too
:-(. Let's enhance the trigger mechanism with a deferred
feature. First this requires two additional bool attributes
in the pg_trigger relation that tell if this trigger is
deferrable and if it is deferred by default. While at it we
should add another bool that tells if the trigger is enabled
(ALTER TRIGGER {ENABLE|DISABLE} trigger).
Second we need an internal list of triggers, that are
currently DEFINED AS DEFERRED. Either because they default to
it, or the user explicitly asked to deferr it.
Third we need an internal list of triggers that must be
invoked later because at the time an event occured where they
should have been triggered, they appeared in the other list
and their execution is delayed until transaction end or
explicit execution. This list must remember the OID of the
trigger to invoke (to identify the procedure and the
arguments), the relation that caused the trigger and the
CTID's of the OLD and NEW row.
That last list could grow extremely! Think of a trigger
that's executing commands over SPI which in turn activate
deferred triggers. Since the order of trigger execution is
very important for RI, I can't see any chance to
simplify/condense this information. Thus it is 16 bytes at
least per deferred trigger call (2 OID's plus 2 CTID's). I
think one or more temp files would fit best for this.
A last tricky point is if one of a bunch of deferred triggers
is explicitly called for execution. At this time, the entries
for it in the temp file(s) must get processed and marked
executed (maybe by overwriting the triggers OID with the
invalid OID) while other trigger events still have to get
recorded.
Needless to say that reading thousands of those entries just
to find a few isn't good on performance. But better have this
special case slow that dealing with hundreds of temp files or
other overhead slowing down the usual case where ALL deferred
triggers get called at transaction end.
Trigger invocation is simple now - fetch the OLD and NEW rows
by CTID and execute the trigger as done by the trigger
manager. Oh - well - vacuum shouldn't touch relations where
deferred triggers are outstanding. Might require some
special lock entry - Vadim?
Did I miss something?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
************
From owner-pgsql-hackers@hub.org Tue Sep 21 08:31:03 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA09071
for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 08:31:02 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id IAA25991 for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 08:04:59 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1])
by hub.org (8.9.3/8.9.3) with ESMTP id HAA82019;
Tue, 21 Sep 1999 07:48:14 -0400 (EDT)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 21 Sep 1999 07:47:30 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.9.3/8.9.3) id HAA81906
for pgsql-hackers-outgoing; Tue, 21 Sep 1999 07:46:38 -0400 (EDT)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.3/8.9.3) with SMTP id HAA81888
for <hackers@postgreSQL.org>; Tue, 21 Sep 1999 07:46:26 -0400 (EDT)
(envelope-from wieck@debis.com)
Received: by orion.SAPserv.Hamburg.dsh.de
for hackers@postgreSQL.org
id m11TOGd-0003kwC; Tue, 21 Sep 99 13:39 MET DST
Message-Id: <m11TOGd-0003kwC@orion.SAPserv.Hamburg.dsh.de>
From: wieck@debis.com (Jan Wieck)
Subject: Re: [HACKERS] Re: Referential Integrity In PostgreSQL
To: andreas.zeugswetter@telecom.at (Andreas Zeugswetter)
Date: Tue, 21 Sep 1999 13:39:27 +0200 (MET DST)
Cc: hackers@postgresql.org
Reply-To: wieck@debis.com (Jan Wieck)
In-Reply-To: <37E74EB9.44F9766E@telecom.at> from "Andreas Zeugswetter" at Sep 21, 99 11:24:09 am
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgresql.org
Precedence: bulk
Status: RO
>
> > Oh - well - vacuum shouldn't touch relations where
> > deferred triggers are outstanding. Might require some
> > special lock entry - Vadim?
>
> All modified data will be in this same still open transaction.
> Therefore no relevant data can be removed by vacuum anyway.
I expect this, but I really need to be sure that not even the
location of the tuple in the heap will change. I need to find
the tuples at the time the deferred triggers must be executed
via heap_fetch() by their CTID!
>
> It is my understanding, that the RI check is performed on the newest
> available (committed) data (+ modified data from my own tx).
> E.g. a primary key that has been removed by another transaction after
> my begin work will lead to an RI violation if referenced as foreign key.
Absolutely right. The function that will fire the deferred
triggers must switch to READ COMMITTED isolevel while doing
so.
What I'm not sure about is which snapshot to use to get the
OLD tuples (outdated in this transaction by a previous
command). Vadim?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
************
From owner-pgsql-hackers@hub.org Tue Sep 21 10:45:40 1999
Received: from hub.org (hub.org [216.126.84.1])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA10993
for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 10:45:39 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1])
by hub.org (8.9.3/8.9.3) with ESMTP id KAA22590;
Tue, 21 Sep 1999 10:36:16 -0400 (EDT)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 21 Sep 1999 10:35:37 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.9.3/8.9.3) id KAA22200
for pgsql-hackers-outgoing; Tue, 21 Sep 1999 10:34:47 -0400 (EDT)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from sunpine.krs.ru (SunPine.krs.ru [195.161.16.37])
by hub.org (8.9.3/8.9.3) with ESMTP id KAA22048
for <hackers@postgreSQL.org>; Tue, 21 Sep 1999 10:33:38 -0400 (EDT)
(envelope-from vadim@krs.ru)
Received: from krs.ru (dune.krs.ru [195.161.16.38])
by sunpine.krs.ru (8.8.8/8.8.8) with ESMTP id WAA27122;
Tue, 21 Sep 1999 22:33:22 +0800 (KRSS)
Message-ID: <37E79730.CC415030@krs.ru>
Date: Tue, 21 Sep 1999 22:33:20 +0800
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.0-RELEASE i386)
X-Accept-Language: ru, en
MIME-Version: 1.0
To: Jan Wieck <wieck@debis.com>
CC: Andreas Zeugswetter <andreas.zeugswetter@telecom.at>,
hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: Referential Integrity In PostgreSQL
References: <m11TOGd-0003kwC@orion.SAPserv.Hamburg.dsh.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Jan Wieck wrote:
>
> > It is my understanding, that the RI check is performed on the newest
> > available (committed) data (+ modified data from my own tx).
> > E.g. a primary key that has been removed by another transaction after
> > my begin work will lead to an RI violation if referenced as foreign key.
>
> Absolutely right. The function that will fire the deferred
> triggers must switch to READ COMMITTED isolevel while doing
^^^^^^^^^^^^^^
> so.
NO!
What if one transaction deleted PK, another one inserted FK
and now both performe RI check? Both transactions _must_
use DIRTY READs to notice that RI violated by another
in-progress transaction and wait for concurrent transaction...
BTW, using triggers to check _each_ modified tuple
(i.e. run Executor for each modified tuple) is bad for
performance. We could implement direct support for
standard RI constraints.
Using rules (statement level triggers) for INSERT...SELECT,
UPDATE and DELETE queries would be nice! Actually, RI constraint
checks need in very simple queries (i.e. without distinct etc)
and the only we would have to do is
> What I'm not sure about is which snapshot to use to get the
> OLD tuples (outdated in this transaction by a previous
> command). Vadim?
1. Add CommandId to Snapshot.
2. Use Snapshot->CommandId instead of global CurrentScanCommandId.
3. Use Snapshots with different CommandId-s to get OLD/NEW
versions.
But I agreed that the size of parsetrees may be big and for
COPY...FROM/INSERTs we should remember IDs of modified
tuples. Well. Please remember that I implement WAL right
now, already have 1000 lines of code and hope to run first
tests after writing additional ~200 lines -:)
We could read modified tuple IDs from WAL...
Vadim
************
From owner-pgsql-hackers@hub.org Tue Sep 21 11:18:19 1999
Received: from hub.org (hub.org [216.126.84.1])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA11537
for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 11:18:18 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1])
by hub.org (8.9.3/8.9.3) with ESMTP id LAA27395;
Tue, 21 Sep 1999 11:04:42 -0400 (EDT)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 21 Sep 1999 11:03:56 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.9.3/8.9.3) id LAA27106
for pgsql-hackers-outgoing; Tue, 21 Sep 1999 11:02:50 -0400 (EDT)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.3/8.9.3) with SMTP id LAA27041
for <hackers@postgreSQL.org>; Tue, 21 Sep 1999 11:02:34 -0400 (EDT)
(envelope-from wieck@debis.com)
Received: by orion.SAPserv.Hamburg.dsh.de
for hackers@postgreSQL.org
id m11TRKP-0003kLC; Tue, 21 Sep 99 16:55 MET DST
Message-Id: <m11TRKP-0003kLC@orion.SAPserv.Hamburg.dsh.de>
From: wieck@debis.com (Jan Wieck)
Subject: Re: [HACKERS] Re: Referential Integrity In PostgreSQL
To: vadim@krs.ru (Vadim Mikheev)
Date: Tue, 21 Sep 1999 16:55:33 +0200 (MET DST)
Cc: wieck@debis.com, andreas.zeugswetter@telecom.at, hackers@postgreSQL.org
Reply-To: wieck@debis.com (Jan Wieck)
In-Reply-To: <37E79730.CC415030@krs.ru> from "Vadim Mikheev" at Sep 21, 99 10:33:20 pm
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
>
> Jan Wieck wrote:
> >
> > > It is my understanding, that the RI check is performed on the newest
> > > available (committed) data (+ modified data from my own tx).
> > > E.g. a primary key that has been removed by another transaction after
> > > my begin work will lead to an RI violation if referenced as foreign key.
> >
> > Absolutely right. The function that will fire the deferred
> > triggers must switch to READ COMMITTED isolevel while doing
> ^^^^^^^^^^^^^^
> > so.
>
> NO!
> What if one transaction deleted PK, another one inserted FK
> and now both performe RI check? Both transactions _must_
> use DIRTY READs to notice that RI violated by another
> in-progress transaction and wait for concurrent transaction...
Oh - I see - yes.
>
> BTW, using triggers to check _each_ modified tuple
> (i.e. run Executor for each modified tuple) is bad for
> performance. We could implement direct support for
> standard RI constraints.
As I want to implement it, there would be not much difference
between a regular trigger invocation and a deferred one. If
that causes a performance problem, I think we should speed up
the trigger call mechanism in general instead of not using
triggers.
>
> Using rules (statement level triggers) for INSERT...SELECT,
> UPDATE and DELETE queries would be nice! Actually, RI constraint
> checks need in very simple queries (i.e. without distinct etc)
> and the only we would have to do is
>
> > What I'm not sure about is which snapshot to use to get the
> > OLD tuples (outdated in this transaction by a previous
> > command). Vadim?
>
> 1. Add CommandId to Snapshot.
> 2. Use Snapshot->CommandId instead of global CurrentScanCommandId.
> 3. Use Snapshots with different CommandId-s to get OLD/NEW
> versions.
>
> But I agreed that the size of parsetrees may be big and for
> COPY...FROM/INSERTs we should remember IDs of modified
> tuples. Well. Please remember that I implement WAL right
> now, already have 1000 lines of code and hope to run first
> tests after writing additional ~200 lines -:)
> We could read modified tuple IDs from WAL...
Not only on COPY. One regular INSERT/UPDATE/DELETE statement
can actually fire thousands of trigger calls right now. These
triggers normally use SPI to execute their own queries. If
such a trigger now uses a query that in turn causes a
deferred constraint, we might have to save thousands of
deferred querytrees - impossible mission.
That's IMHO a clear drawback against using rules for
deferrable RI.
What I'm currently doing is clearly encapsulated in some
functions in commands/trigger.c (except for some additional
attributes in pg_trigger). If it later turns out that we can
combine the information required into WAL, I think we have
time enough to do so and shouldn't really care if v6.6
doesn't have it already combined.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
************
From owner-pgsql-hackers@hub.org Tue Sep 21 15:30:29 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA14590
for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 15:30:28 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id PAA09192 for <maillist@candle.pha.pa.us>; Tue, 21 Sep 1999 15:06:09 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1])
by hub.org (8.9.3/8.9.3) with ESMTP id OAA73126;
Tue, 21 Sep 1999 14:56:15 -0400 (EDT)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 21 Sep 1999 14:54:47 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.9.3/8.9.3) id OAA72607
for pgsql-hackers-outgoing; Tue, 21 Sep 1999 14:53:51 -0400 (EDT)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.3/8.9.3) with SMTP id OAA72516
for <pgsql-hackers@postgreSQL.org>; Tue, 21 Sep 1999 14:52:56 -0400 (EDT)
(envelope-from wieck@debis.com)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@postgreSQL.org
id m11TUvX-0003kLC; Tue, 21 Sep 99 20:46 MET DST
Message-Id: <m11TUvX-0003kLC@orion.SAPserv.Hamburg.dsh.de>
From: wieck@debis.com (Jan Wieck)
Subject: [HACKERS] RI question
To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS)
Date: Tue, 21 Sep 1999 20:46:06 +0200 (MET DST)
Reply-To: wieck@debis.com (Jan Wieck)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Uh oh,
I think deferred RI constraints must only fire the actions
that remain after all commands during the entire transaction
are condensed to the total minimum required to get that
state, because deferred RI must only check what VISIBLY
happened during the transaction.
Thinking on the tuple level, a sequence of
INSERT,UPDATE,UPDATE must fire only one INSERT trigger, but
with the values of the last UPDATE. An UPDATE,DELETE sequence
is in fact a DELETE of the original tuple and an
INSERT,UPDATE,DELETE sequence is nothing.
That means that the recording mechnism of the trigger events
must be very smart on UPDATE and DELETE events, looking at
the x_min of the old tuple if that resulted from the current
transaction. If so, follow the events backward, disable
previous ones and change the new event into what it really
has to be.
But some problems remain unsolvable by this:
- PK has an ON DELETE CASCADE for FK
- BEGIN
- DELETE PK
- INSERT same PK
- COMMIT.
This really shouldn't invoke the cascading delete, because at
COMMIT the PK still is there. Same for a constraint that
forbids deletion of a PK while referenced by FK. Therefore
the deferred event recorder must check on INSERT any previous
DELETES for the same relation if the key does match and drop
both deferred triggers if so. Therefore it needs to know
which attributes build the PK of that relation
(<relname>_pkey guaranteed?).
Well, I think that's finally the death of RI over rules. The
code managing those rules during CREATE/ALTER TABLE would
become totally unmaintainable. And (sorry Vadim) it's the
death of SLT for this too because this event tracking must be
done on the tuple level.
It complicated the trigger approach too, but IMHO not too
bad. Anyway, some co-developer(s) doing the parser- and
utility-statement stuff (SET CONSTRAINTS ... etc.) would be
great.
Volunteers?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
************