From 8cb2c013b6e068697e7543349a74d7864c94ee0b Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Thu, 25 Jan 2001 03:36:34 +0000 Subject: [PATCH] Add. --- doc/TODO.detail/replication | 191 ++++++++++++++++++++++++++++++++++-- 1 file changed, 184 insertions(+), 7 deletions(-) diff --git a/doc/TODO.detail/replication b/doc/TODO.detail/replication index 93a0ea17dd..0c27a4f79d 100644 --- a/doc/TODO.detail/replication +++ b/doc/TODO.detail/replication @@ -43,7 +43,7 @@ From owner-pgsql-hackers@hub.org Fri Dec 24 10:01:18 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA11295 for ; Fri, 24 Dec 1999 11:01:17 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id KAA20310 for ; Fri, 24 Dec 1999 10:39:18 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id KAA20310 for ; Fri, 24 Dec 1999 10:39:18 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id KAA61760; Fri, 24 Dec 1999 10:31:13 -0500 (EST) @@ -129,7 +129,7 @@ From owner-pgsql-hackers@hub.org Fri Dec 24 18:31:03 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA26244 for ; Fri, 24 Dec 1999 19:31:02 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id TAA12730 for ; Fri, 24 Dec 1999 19:30:05 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id TAA12730 for ; Fri, 24 Dec 1999 19:30:05 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id TAA57851; Fri, 24 Dec 1999 19:23:31 -0500 (EST) @@ -212,7 +212,7 @@ From owner-pgsql-hackers@hub.org Fri Dec 24 21:31:10 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA02578 for ; Fri, 24 Dec 1999 22:31:09 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id WAA16641 for ; Fri, 24 Dec 1999 22:18:56 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id WAA16641 for ; Fri, 24 Dec 1999 22:18:56 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id WAA89135; Fri, 24 Dec 1999 22:11:12 -0500 (EST) @@ -486,7 +486,7 @@ From owner-pgsql-hackers@hub.org Sun Dec 26 08:31:09 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA17976 for ; Sun, 26 Dec 1999 09:31:07 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id JAA23337 for ; Sun, 26 Dec 1999 09:28:36 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id JAA23337 for ; Sun, 26 Dec 1999 09:28:36 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id JAA90738; Sun, 26 Dec 1999 09:21:58 -0500 (EST) @@ -909,7 +909,7 @@ From owner-pgsql-hackers@hub.org Thu Dec 30 08:01:09 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA10317 for ; Thu, 30 Dec 1999 09:01:08 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id IAA02365 for ; Thu, 30 Dec 1999 08:37:10 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id IAA02365 for ; Thu, 30 Dec 1999 08:37:10 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id IAA87902; Thu, 30 Dec 1999 08:34:22 -0500 (EST) @@ -1006,7 +1006,7 @@ From owner-pgsql-patches@hub.org Sun Jan 2 23:01:38 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA16274 for ; Mon, 3 Jan 2000 00:01:28 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id XAA02655 for ; Sun, 2 Jan 2000 23:45:55 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id XAA02655 for ; Sun, 2 Jan 2000 23:45:55 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by hub.org (8.9.3/8.9.3) with ESMTP id XAA13828; Sun, 2 Jan 2000 23:40:47 -0500 (EST) @@ -1424,7 +1424,7 @@ From owner-pgsql-hackers@hub.org Tue Jan 4 10:31:01 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA17522 for ; Tue, 4 Jan 2000 11:31:00 -0500 (EST) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.5 $) with ESMTP id LAA01541 for ; Tue, 4 Jan 2000 11:27:30 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.6 $) with ESMTP id LAA01541 for ; Tue, 4 Jan 2000 11:27:30 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id LAA09992; Tue, 4 Jan 2000 11:18:07 -0500 (EST) @@ -1728,3 +1728,180 @@ Betreff: [HACKERS] failing over with postgresql > +From pgsql-hackers-owner+M3662@postgresql.org Tue Jan 23 16:23:34 2001 +Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA04456 + for ; Tue, 23 Jan 2001 16:23:34 -0500 (EST) +Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) + by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLKf004705; + Tue, 23 Jan 2001 16:20:41 -0500 (EST) + (envelope-from pgsql-hackers-owner+M3662@postgresql.org) +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLAe003753 + for ; Tue, 23 Jan 2001 16:10:40 -0500 (EST) + (envelope-from vmikheev@SECTORBASE.COM) +Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) + id ; Tue, 23 Jan 2001 12:49:07 -0800 +Message-ID: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com> +From: "Mikheev, Vadim" +To: "'dom@idealx.com'" , pgsql-hackers@postgresql.org +Subject: RE: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd) +Date: Tue, 23 Jan 2001 13:10:34 -0800 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2653.19) +Content-Type: text/plain; + charset="iso-8859-1" +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: ORr + +> I had thought that the pre-commit information could be stored in an +> auxiliary table by the middleware program ; we would then have +> to re-implement some sort of higher-level WAL (I thought of the list +> of the commands performed in the current transaction, with a sequence +> number for each of them that would guarantee correct ordering between +> concurrent transactions in case of a REDO). But I fear I am missing + +This wouldn't work for READ COMMITTED isolation level. +But why do you want to log commands into WAL where each modification +is already logged in, hm, correct order? +Well, it has sense if you're looking for async replication but +you need not in two-phase commit for this and should aware about +problems with READ COMMITTED isolevel. + +Back to two-phase commit - it's easiest part of work required for +distributed transaction processing. +Currently we place single commit record to log and transaction is +committed when this record (and so all other transaction records) +is on disk. +Two-phase commit: + +1. For 1st phase we'll place into log "prepared-to-commit" record + and this phase will be accomplished after record is flushed on disk. + At this point transaction may be committed at any time because of + all its modifications are logged. But it still may be rolled back + if this phase failed on other sites of distributed system. + +2. When all sites are prepared to commit we'll place "committed" + record into log. No need to flush it because of in the event of + crash for all "prepared" transactions recoverer will have to + communicate other sites to know their statuses anyway. + +That's all! It is really hard to implement distributed lock- and +communication- managers but there is no problem with logging two +records instead of one. Period. + +Vadim + +From pgsql-hackers-owner+M3665@postgresql.org Tue Jan 23 17:05:26 2001 +Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05972 + for ; Tue, 23 Jan 2001 17:05:24 -0500 (EST) +Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) + by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NM31008120; + Tue, 23 Jan 2001 17:03:01 -0500 (EST) + (envelope-from pgsql-hackers-owner+M3665@postgresql.org) +Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46]) + by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f0NLsU007188 + for ; Tue, 23 Jan 2001 16:54:30 -0500 (EST) + (envelope-from pgman@candle.pha.pa.us) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.9.0/8.9.0) id QAA05300; + Tue, 23 Jan 2001 16:53:53 -0500 (EST) +From: Bruce Momjian +Message-Id: <200101232153.QAA05300@candle.pha.pa.us> +Subject: Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd) +In-Reply-To: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com> + "from Mikheev, Vadim at Jan 23, 2001 01:10:34 pm" +To: "Mikheev, Vadim" +Date: Tue, 23 Jan 2001 16:53:53 -0500 (EST) +CC: "'dom@idealx.com'" , pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL77 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +[ Charset ISO-8859-1 unsupported, converting... ] +> > I had thought that the pre-commit information could be stored in an +> > auxiliary table by the middleware program ; we would then have +> > to re-implement some sort of higher-level WAL (I thought of the list +> > of the commands performed in the current transaction, with a sequence +> > number for each of them that would guarantee correct ordering between +> > concurrent transactions in case of a REDO). But I fear I am missing +> +> This wouldn't work for READ COMMITTED isolation level. +> But why do you want to log commands into WAL where each modification +> is already logged in, hm, correct order? +> Well, it has sense if you're looking for async replication but +> you need not in two-phase commit for this and should aware about +> problems with READ COMMITTED isolevel. +> + +I believe the issue here is that while SERIALIZABLE ISOLATION means all +queries can be run serially, our default is READ COMMITTED, meaning that +open transactions see committed transactions, even if the transaction +committed after our transaction started. (FYI, see my chapter on +transactions for help, http://www.postgresql.org/docs/awbook.html.) + +To do higher-level WAL, you would have to record not only the queries, +but the other queries that were committed at the start of each command +in your transaction. + +Ideally, you could number every commit by its XID your log, and then +when processing the query, pass the "committed" transaction ids that +were visible at the time each command began. + +In other words, you can replay the queries in transaction commit order, +except that you have to have some transactions committed at specific +points while other transactions are open, i.e.: + +XID Open XIDS Query +500 UPDATE t SET col = 3; +501 500 BEGIN; +501 500 UPDATE t SET col = 4; +501 UPDATE t SET col = 5; +501 COMMIT; + +This is a silly example, but it shows that 500 must commit after the +first command in transaction 501, but before the second command in the +transaction. This is because UPDATE t SET col = 5 actually sees the +changes made by transaction 500 in READ COMMITTED isolation level. + +I am not advocating this. I think WAL is a better choice. I just +wanted to outline how replaying the queries in commit order is +insufficient. + +> Back to two-phase commit - it's easiest part of work required for +> distributed transaction processing. +> Currently we place single commit record to log and transaction is +> committed when this record (and so all other transaction records) +> is on disk. +> Two-phase commit: +> +> 1. For 1st phase we'll place into log "prepared-to-commit" record +> and this phase will be accomplished after record is flushed on disk. +> At this point transaction may be committed at any time because of +> all its modifications are logged. But it still may be rolled back +> if this phase failed on other sites of distributed system. +> +> 2. When all sites are prepared to commit we'll place "committed" +> record into log. No need to flush it because of in the event of +> crash for all "prepared" transactions recoverer will have to +> communicate other sites to know their statuses anyway. +> +> That's all! It is really hard to implement distributed lock- and +> communication- managers but there is no problem with logging two +> records instead of one. Period. + +Great. + + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 +