mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-01-06 15:24:56 +08:00
742 lines
25 KiB
Plaintext
742 lines
25 KiB
Plaintext
From pgsql-hackers-owner+M174@hub.org Sun Mar 12 22:31:11 2000
|
|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA25886
|
|
for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:31:10 -0500 (EST)
|
|
Received: from news.tht.net (news.hub.org [216.126.91.242]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id XAA04589 for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:19:33 -0500 (EST)
|
|
Received: from hub.org (hub.org [216.126.84.1])
|
|
by news.tht.net (8.9.3/8.9.3) with SMTP id XAA42854;
|
|
Sun, 12 Mar 2000 23:05:05 -0500 (EST)
|
|
(envelope-from pgsql-hackers-owner+M174@hub.org)
|
|
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
|
|
by hub.org (8.9.3/8.9.3) with ESMTP id XAA95917
|
|
for <pgsql-hackers@postgreSQL.org>; Sun, 12 Mar 2000 23:00:56 -0500 (EST)
|
|
(envelope-from pgman@candle.pha.pa.us)
|
|
Received: (from pgman@localhost)
|
|
by candle.pha.pa.us (8.9.0/8.9.0) id WAA25403
|
|
for pgsql-hackers@postgreSQL.org; Sun, 12 Mar 2000 22:59:56 -0500 (EST)
|
|
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
Message-Id: <200003130359.WAA25403@candle.pha.pa.us>
|
|
Subject: [HACKERS] Fix for RENAME
|
|
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
Date: Sun, 12 Mar 2000 22:59:56 -0500 (EST)
|
|
X-Mailer: ELM [version 2.4ME+ PL72 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Content-Transfer-Encoding: 7bit
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@hub.org
|
|
Status: OR
|
|
|
|
I have thought about the issue with ALTER TABLE RENAME and keeping the
|
|
file system in sync with the database.
|
|
|
|
It seems there are three commands that can cause these to get out of
|
|
sync:
|
|
|
|
CREATE TABLE/INDEX
|
|
DROP TABLE/INDEX
|
|
ALTER TABLE RENAME
|
|
|
|
Now, if we had file names based only on the oid, we can eliminate file
|
|
renaming for RENAME, but the others are still a problem.
|
|
|
|
Seems there are three ways to get out of sync:
|
|
|
|
ABORT transaction
|
|
backend crash
|
|
OS crash
|
|
|
|
The last two are the same, except the backend crash restarts the
|
|
postmaster, while the OS crash has the postmaster starting up normally.
|
|
|
|
Here is my idea. Create a C List of file names to unlink on transaction
|
|
commit or abort. For CREATE, unlink created files on transaction ABORT.
|
|
For DROP, unlink dropped files on COMMIT. For RENAME, create a hard
|
|
link for the new table linked to old table, and unlink the old file name
|
|
on COMMIT or the new file on ABORT.
|
|
|
|
That takes care of COMMIT and ABORT. For backend crash or OS crash, add
|
|
a postgres command-line flag for recovery. Have the postmaster on
|
|
startup or shared memory refresh start up a postgres backend on every
|
|
database with the recovery flag set. Have the postgres backend find all
|
|
the oids in the pg_class table, and have it go through every file in the
|
|
database directory and remove all files that don't match the oids/names
|
|
in pg_class. Also, remove all old sort, noname, and temp files at the
|
|
same time. Seems we should be doing this anyway.
|
|
|
|
Care would have to be taken that a corrupted database that caused a
|
|
postgres crash on connection would not get the postmaster startup into
|
|
an infinite loop.
|
|
|
|
Comments?
|
|
|
|
--
|
|
Bruce Momjian | http://www.op.net/~candle
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|
|
From reedstrm@wallace.ece.rice.edu Tue Mar 14 12:33:31 2000
|
|
Received: from wallace.ece.rice.edu (root@wallace.ece.rice.edu [128.42.12.154])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA23826
|
|
for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 13:33:29 -0500 (EST)
|
|
Received: by wallace.ece.rice.edu
|
|
via sendmail from stdin
|
|
id <m12Uw8K-000LELC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
|
|
for pgman@candle.pha.pa.us; Tue, 14 Mar 2000 12:33:32 -0600 (CST)
|
|
Date: Tue, 14 Mar 2000 12:33:32 -0600
|
|
From: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>
|
|
To: Hiroshi Inoue <Inoue@tpf.co.jp>
|
|
Cc: Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
Subject: Re: [HACKERS] Fix for RENAME
|
|
Message-ID: <20000314123331.A6094@rice.edu>
|
|
References: <200003140317.WAA27733@candle.pha.pa.us> <000c01bf8d75$a0016800$2801007e@tpf.co.jp>
|
|
Mime-Version: 1.0
|
|
Content-Type: text/plain; charset=us-ascii
|
|
User-Agent: Mutt/1.0i
|
|
In-Reply-To: <000c01bf8d75$a0016800$2801007e@tpf.co.jp>; from Inoue@tpf.co.jp on Tue, Mar 14, 2000 at 02:24:52PM +0900
|
|
Status: OR
|
|
|
|
Hiroshi -
|
|
I've just about finished working up a patch to store the physical
|
|
file name in the pg_class table. There are only two places that
|
|
require a Rule for generating the filename, and one of them is
|
|
only used for bootstrapping. For the initial cut, I used the rule:
|
|
|
|
The filename consists of the TABLENAME, and underscore, and the OID.
|
|
If this is longer than NAMEDATALEN, shorten the TABLENAME.
|
|
|
|
I implemented this rule by exporting Tom's makeObjectName function
|
|
from analyze.c, which is used to make other system generated names
|
|
that are have a requirement to be human readable. Replacing this
|
|
rule with any other in the future would be straightforward, except
|
|
for bootstrap. There are a number of places in bootstrap that need to
|
|
know the filename. I've factored them out into yet another set of
|
|
#defines (in catname.h) to make that easier.
|
|
|
|
|
|
I'm working through the regression tests right now: this is a relatively
|
|
extensive change, since it modifies the low level access routines, and the
|
|
buffer cache (which I indexed on physical filename, rather than relname,
|
|
as it is now) Hopefully, I caught all the places that assume relname ==
|
|
filename == unique name within a single database (see, I want schemas...)
|
|
|
|
Ross
|
|
--
|
|
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
|
|
NSBRI Research Scientist/Programmer
|
|
Computer and Information Technology Institute
|
|
Rice University, 6100 S. Main St., Houston, TX 77005
|
|
|
|
|
|
|
|
|
|
|
|
On Tue, Mar 14, 2000 at 02:24:52PM +0900, Hiroshi Inoue wrote:
|
|
> > -----Original Message-----
|
|
> > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
|
|
> >
|
|
> > > > They use the existing table file. It is only when
|
|
> > > > adding/removing/renaming file system files that this
|
|
> > out-of-sync problem
|
|
> > > > happens.
|
|
> > > >
|
|
> >
|
|
> > Not sure. I was going to get the CREATE/DROP/RENAME working as it
|
|
> > should then as we add more features, we can implement this solution for
|
|
> > them too.
|
|
> >
|
|
>
|
|
> Hmm,is general solution difficult ?
|
|
> Is more flexible naming rule bad ?
|
|
>
|
|
> This the 3rd or 4th time that I mention the following.
|
|
>
|
|
> PostgreSQL doesn't keep the information in itself where tables are
|
|
> allocated. So we need a naming rule to find where existent tables
|
|
> are allocated. Don't you wonder the spec ?
|
|
>
|
|
> Regards.
|
|
>
|
|
> Hiroshi Inoue
|
|
> Inoue@tpf.co.jp
|
|
>
|
|
>
|
|
|
|
From pgsql-hackers-owner+M74@hub.org Tue Mar 14 18:14:15 2000
|
|
Received: from hub.org (hub.org [216.126.84.1])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA06093
|
|
for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 19:14:13 -0500 (EST)
|
|
Received: from hub.org (hub.org [216.126.84.1])
|
|
by hub.org (8.9.3/8.9.3) with SMTP id SAA95465;
|
|
Tue, 14 Mar 2000 18:45:35 -0500 (EST)
|
|
(envelope-from pgsql-hackers-owner+M74@hub.org)
|
|
Received: from wallace.ece.rice.edu (root@wallace.ece.rice.edu [128.42.12.154])
|
|
by hub.org (8.9.3/8.9.3) with ESMTP id NAA31276
|
|
for <pgsql-hackers@postgresql.org>; Tue, 14 Mar 2000 13:33:52 -0500 (EST)
|
|
(envelope-from reedstrm@wallace.ece.rice.edu)
|
|
Received: by wallace.ece.rice.edu
|
|
via sendmail from stdin
|
|
id <m12Uw8K-000LELC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
|
|
for pgsql-hackers@postgresql.org; Tue, 14 Mar 2000 12:33:32 -0600 (CST)
|
|
Date: Tue, 14 Mar 2000 12:33:32 -0600
|
|
From: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>
|
|
To: Hiroshi Inoue <Inoue@tpf.co.jp>
|
|
Cc: Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
Subject: Re: [HACKERS] Fix for RENAME
|
|
Message-ID: <20000314123331.A6094@rice.edu>
|
|
References: <200003140317.WAA27733@candle.pha.pa.us> <000c01bf8d75$a0016800$2801007e@tpf.co.jp>
|
|
Mime-Version: 1.0
|
|
Content-Type: text/plain; charset=us-ascii
|
|
User-Agent: Mutt/1.0i
|
|
In-Reply-To: <000c01bf8d75$a0016800$2801007e@tpf.co.jp>; from Inoue@tpf.co.jp on Tue, Mar 14, 2000 at 02:24:52PM +0900
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@hub.org
|
|
Status: OR
|
|
|
|
Hiroshi -
|
|
I've just about finished working up a patch to store the physical
|
|
file name in the pg_class table. There are only two places that
|
|
require a Rule for generating the filename, and one of them is
|
|
only used for bootstrapping. For the initial cut, I used the rule:
|
|
|
|
The filename consists of the TABLENAME, and underscore, and the OID.
|
|
If this is longer than NAMEDATALEN, shorten the TABLENAME.
|
|
|
|
I implemented this rule by exporting Tom's makeObjectName function
|
|
from analyze.c, which is used to make other system generated names
|
|
that are have a requirement to be human readable. Replacing this
|
|
rule with any other in the future would be straightforward, except
|
|
for bootstrap. There are a number of places in bootstrap that need to
|
|
know the filename. I've factored them out into yet another set of
|
|
#defines (in catname.h) to make that easier.
|
|
|
|
|
|
I'm working through the regression tests right now: this is a relatively
|
|
extensive change, since it modifies the low level access routines, and the
|
|
buffer cache (which I indexed on physical filename, rather than relname,
|
|
as it is now) Hopefully, I caught all the places that assume relname ==
|
|
filename == unique name within a single database (see, I want schemas...)
|
|
|
|
Ross
|
|
--
|
|
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
|
|
NSBRI Research Scientist/Programmer
|
|
Computer and Information Technology Institute
|
|
Rice University, 6100 S. Main St., Houston, TX 77005
|
|
|
|
|
|
|
|
|
|
|
|
On Tue, Mar 14, 2000 at 02:24:52PM +0900, Hiroshi Inoue wrote:
|
|
> > -----Original Message-----
|
|
> > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
|
|
> >
|
|
> > > > They use the existing table file. It is only when
|
|
> > > > adding/removing/renaming file system files that this
|
|
> > out-of-sync problem
|
|
> > > > happens.
|
|
> > > >
|
|
> >
|
|
> > Not sure. I was going to get the CREATE/DROP/RENAME working as it
|
|
> > should then as we add more features, we can implement this solution for
|
|
> > them too.
|
|
> >
|
|
>
|
|
> Hmm,is general solution difficult ?
|
|
> Is more flexible naming rule bad ?
|
|
>
|
|
> This the 3rd or 4th time that I mention the following.
|
|
>
|
|
> PostgreSQL doesn't keep the information in itself where tables are
|
|
> allocated. So we need a naming rule to find where existent tables
|
|
> are allocated. Don't you wonder the spec ?
|
|
>
|
|
> Regards.
|
|
>
|
|
> Hiroshi Inoue
|
|
> Inoue@tpf.co.jp
|
|
>
|
|
>
|
|
|
|
From mascarm@mascari.com Tue Mar 14 16:34:04 2000
|
|
Received: from corvette.mascari.com (dhcp26136016.columbus.rr.com [24.26.136.16])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04395
|
|
for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 17:32:14 -0500 (EST)
|
|
Received: from mascari.com (ferrari.mascari.com [192.168.2.1])
|
|
by corvette.mascari.com (8.9.3/8.9.3) with ESMTP id RAA09562;
|
|
Tue, 14 Mar 2000 17:27:22 -0500
|
|
Message-ID: <38CEBD0A.52ADB37E@mascari.com>
|
|
Date: Tue, 14 Mar 2000 17:28:26 -0500
|
|
From: Mike Mascari <mascarm@mascari.com>
|
|
X-Mailer: Mozilla 4.7 [en] (Win95; I)
|
|
X-Accept-Language: en
|
|
MIME-Version: 1.0
|
|
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
CC: Hiroshi Inoue <Inoue@tpf.co.jp>,
|
|
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
Subject: Re: [HACKERS] Fix for RENAME
|
|
References: <200003141545.KAA17518@candle.pha.pa.us>
|
|
Content-Type: text/plain; charset=us-ascii
|
|
Content-Transfer-Encoding: 7bit
|
|
Status: OR
|
|
|
|
Bruce Momjian wrote:
|
|
>
|
|
> > Hmm,is general solution difficult ?
|
|
> > Is more flexible naming rule bad ?
|
|
> >
|
|
> > This the 3rd or 4th time that I mention the following.
|
|
>
|
|
> That's because I didn't understand.
|
|
>
|
|
> >
|
|
> > PostgreSQL doesn't keep the information in itself where tables are
|
|
> > allocated. So we need a naming rule to find where existent tables
|
|
> > are allocated. Don't you wonder the spec ?
|
|
>
|
|
> How does naming the files in the database help our DROP/CREATE problem?
|
|
> It would help RENAME a little bit. Not sure about the others because
|
|
> currently they don't have a problem.
|
|
|
|
I've been thinking about this somewhat, and I think the first
|
|
step necessary in correctly supporting ROLLBACK-able DDL
|
|
statements in transactions is the change to <relname>_<oid>.
|
|
Imagine the scenario:
|
|
|
|
CREATE TABLE test (key int4);
|
|
|
|
a) Session #1:
|
|
|
|
BEGIN;
|
|
|
|
b) Session #2:
|
|
|
|
BEGIN;
|
|
DROP TABLE test;
|
|
CREATE TABLE test (value varchar(32));
|
|
|
|
c) Session #1:
|
|
|
|
DROP TABLE test;
|
|
COMMIT;
|
|
|
|
d) Session #2:
|
|
|
|
COMMIT;
|
|
|
|
What's clear to me is that, if DDL statements are to be
|
|
ROLLBACK-able, either (1) an AccessExclusive lock is held on the
|
|
relation until transaction commit (like Phillip Warner stated was
|
|
Dec/Rdb's behavior) or (2) PostgreSQL must be capable of
|
|
supporting "multi-versioned schema" as well as tuples. Before
|
|
step 'c' is executed, both tables must simultaneously exist in
|
|
the database with the same name, which works fine in the cataloge
|
|
thanks to MVCC, but requires that, on disk, there exists:
|
|
|
|
test_01231 - Session #1's table, available for ROLLBACK
|
|
test_13421 - Session #2's table, available for COMMIT
|
|
|
|
Now, I believe it was Andreas who suggested that VACUUM be
|
|
modified to perform cleanup. I agree with this. VACUUM will need
|
|
to check for aborted relation tuples in pg_class and remove the
|
|
associated file from the filesystem in the event, for example,
|
|
that Session #2 aborted -or- Session #1 aborted leaving the
|
|
original pg_class tuple the "active" one and Session #2 attempted
|
|
to COMMIT, which violates the UNIQUE constraint on the relname of
|
|
pg_class. In addition, for "active" relation entries, VACUUM
|
|
should verify the filename is
|
|
<relname>_<oid> for the given oid. If it is not, it should rename
|
|
the filename on the filesystem. Again, this is purely cosmetic
|
|
for administrative purposes only, but would allow
|
|
for lack of atomicity only with respect to the label of the
|
|
relation file, until the next
|
|
VACUUM is run.
|
|
|
|
For the case of ALTER TABLE RENAME, ALTER TABLE DROP COLUMN,
|
|
etc., the same functionality would apply. But, as in previous
|
|
discussions regarding ALTER TABLE DROP COLUMN, PostgreSQL MUST be
|
|
capable of allowing multiple tuples with different attribute
|
|
counts and types within the same relation:
|
|
|
|
CREATE TABLE test (key int4);
|
|
|
|
a) Session #1:
|
|
|
|
BEGIN;
|
|
|
|
b) Session #2:
|
|
|
|
BEGIN;
|
|
ALTER TABLE test ADD COLUMN value int4;
|
|
INSERT INTO test values (1, 1);
|
|
|
|
c) Session #1:
|
|
|
|
INSERT INTO test values (0);
|
|
COMMIT;
|
|
|
|
d) Session #2:
|
|
|
|
COMMIT;
|
|
|
|
This also means that Hiroshi's plan to suppress the visibility of
|
|
attributes for ALTER TABLE DROP COLUMN would be required anyway,
|
|
to allow for "multi-versioning" of attributes within a single
|
|
tuple (i.e., like multi-versioning of tuples within relations),
|
|
an attribute is either visible or not, but the tuple should
|
|
always grow, until, of course, the next VACUUM.
|
|
|
|
So, to support rollback-able DDL statements ("multi-versioning
|
|
schema", if you will), PostgreSQL needs:
|
|
|
|
1) relation names of the form <relname>_<oid>
|
|
2) support "multi-versioning" of attributes within a single tuple
|
|
3) modify VACUUM to:
|
|
|
|
A) Remove filesystem files whose pg_class tuples are no longer
|
|
valid
|
|
B) Rename filesystem files to relname of pg_class when the
|
|
<relname>_<oid> doesn't match
|
|
C) Reconstruct relations after attributes have been
|
|
added/dropped.
|
|
|
|
4) All DDL statements should perform their non-create filesystem
|
|
functions in the now infamous "post-transaction-commit" trigger.
|
|
If the backend should crash between the time the transaction
|
|
committed and the rename() or unlink(), no adverse affects would
|
|
be encountered with the database WRT data, VACUUM would clean up
|
|
the rename() problem, and, worst-case scenario, an old
|
|
<relname>_<oid> file would lie around unused. But at least it
|
|
would no longer prohibit the creation of a table by the same
|
|
name....
|
|
|
|
Just my humble opinion,
|
|
|
|
Mike Mascari
|
|
|
|
From Inoue@tpf.co.jp Tue Mar 14 20:31:35 2000
|
|
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA08792
|
|
for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 21:30:35 -0500 (EST)
|
|
Received: from cadzone ([126.0.1.40] (may be forged))
|
|
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
|
|
id LAA00515; Wed, 15 Mar 2000 11:29:09 +0900
|
|
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
|
|
To: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>,
|
|
"Bruce Momjian" <pgman@candle.pha.pa.us>
|
|
Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
|
|
Subject: RE: [HACKERS] Fix for RENAME
|
|
Date: Wed, 15 Mar 2000 11:35:46 +0900
|
|
Message-ID: <000c01bf8e27$2b3c3ce0$2801007e@tpf.co.jp>
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Content-Transfer-Encoding: 7bit
|
|
X-Priority: 3 (Normal)
|
|
X-MSMail-Priority: Normal
|
|
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
|
|
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
|
|
In-Reply-To: <20000314123331.A6094@rice.edu>
|
|
Importance: Normal
|
|
Status: ORr
|
|
|
|
> -----Original Message-----
|
|
> From: Ross J. Reedstrom [mailto:reedstrm@wallace.ece.rice.edu]
|
|
>
|
|
> Hiroshi -
|
|
> I've just about finished working up a patch to store the physical
|
|
> file name in the pg_class table. There are only two places that
|
|
> require a Rule for generating the filename, and one of them is
|
|
> only used for bootstrapping.
|
|
|
|
Thanks for your trial.
|
|
It's nice that only two places require naming rule.
|
|
|
|
I don't stick to one naming rule.
|
|
The only limitation is the uniqueness and the rule
|
|
could be changed according to situations.
|
|
For example,we could change the naming rule according to
|
|
the kind of relation such as system/user relations.
|
|
|
|
I'm now inclined to introduce a new system relation to store
|
|
the physical path name. It could also have table(data)space
|
|
information in the (near ?) future.
|
|
It seems better to separate it from pg_class because table(data?)
|
|
space may change the concept of table allocation.
|
|
|
|
Comments ?
|
|
|
|
Regards.
|
|
|
|
Hiroshi Inoue
|
|
Inoue@tpf.co.jp
|
|
|
|
|
|
From Inoue@tpf.co.jp Wed Mar 15 02:00:58 2000
|
|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA17887
|
|
for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 03:00:57 -0500 (EST)
|
|
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id CAA02974 for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 02:54:44 -0500 (EST)
|
|
Received: from cadzone ([126.0.1.40] (may be forged))
|
|
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
|
|
id QAA00734; Wed, 15 Mar 2000 16:53:56 +0900
|
|
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
|
|
To: "Bruce Momjian" <pgman@candle.pha.pa.us>
|
|
Cc: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>,
|
|
"PostgreSQL-development" <pgsql-hackers@postgresql.org>
|
|
Subject: RE: [HACKERS] Fix for RENAME
|
|
Date: Wed, 15 Mar 2000 17:00:35 +0900
|
|
Message-ID: <001101bf8e54$8b941cc0$2801007e@tpf.co.jp>
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Content-Transfer-Encoding: 7bit
|
|
X-Priority: 3 (Normal)
|
|
X-MSMail-Priority: Normal
|
|
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
|
|
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
|
|
In-Reply-To: <200003150433.XAA13256@candle.pha.pa.us>
|
|
Importance: Normal
|
|
Status: ORr
|
|
|
|
> -----Original Message-----
|
|
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
|
|
>
|
|
> > I'm now inclined to introduce a new system relation to store
|
|
> > the physical path name. It could also have table(data)space
|
|
> > information in the (near ?) future.
|
|
> > It seems better to separate it from pg_class because table(data?)
|
|
> > space may change the concept of table allocation.
|
|
>
|
|
> Why not just put it in pg_class?
|
|
>
|
|
|
|
Not sure,it's only my feeling.
|
|
Comments please,everyone.
|
|
|
|
We have taken a practical way which doesn't break file per table
|
|
assumption in this thread and it wouldn't so difficult to implement.
|
|
In fact Ross has already tried it.
|
|
|
|
However there was a discussion about data(table)space for
|
|
months ago and currently a new discussion is there.
|
|
Judging from the previous discussion,I can't expect so much
|
|
that it could get a practical consensus(How many opinions there
|
|
were). We can make a practical step toward future by encapsulating
|
|
the information of table allocation. Separating table alloc info from
|
|
pg_class seems one of the way.
|
|
There may be more essential things for encapsulation.
|
|
|
|
Comments ?
|
|
|
|
Regards.
|
|
|
|
Hiroshi Inoue
|
|
Inoue@tpf.co.jp
|
|
|
|
|
|
From pgsql-hackers-owner+M196@hub.org Thu Mar 16 03:02:35 2000
|
|
Received: from hub.org (hub.org [216.126.84.1])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA05789
|
|
for <pgman@candle.pha.pa.us>; Thu, 16 Mar 2000 04:02:29 -0500 (EST)
|
|
Received: from hub.org (hub.org [216.126.84.1])
|
|
by hub.org (8.9.3/8.9.3) with SMTP id CAA27302;
|
|
Thu, 16 Mar 2000 02:58:55 -0500 (EST)
|
|
(envelope-from pgsql-hackers-owner+M196@hub.org)
|
|
Received: from downtown.oche.de (root@downtown.oche.de [194.94.253.3])
|
|
by hub.org (8.9.3/8.9.3) with ESMTP id CAA23907
|
|
for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2000 02:37:54 -0500 (EST)
|
|
(envelope-from mne@darwin.oche.de)
|
|
Received: from darwin.oche.de (uucp@localhost)
|
|
by downtown.oche.de (8.9.3/8.9.3/Debian/GNU) with SMTP id IAA30654
|
|
for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2000 08:40:04 +0100
|
|
Received: from mne by darwin.oche.de with local (Exim 3.12 #1 (Debian))
|
|
id 12VUhX-0003Vz-00
|
|
for <pgsql-hackers@postgreSQL.org>; Thu, 16 Mar 2000 08:28:11 +0100
|
|
Date: Thu, 16 Mar 2000 08:28:11 +0100 (CET)
|
|
From: Martin Neumann <mne@mne.de>
|
|
Subject: [HACKERS] RfD: Design of tablespaces
|
|
To: pgsql-hackers@postgresql.org
|
|
MIME-Version: 1.0
|
|
Content-Type: TEXT/plain; CHARSET=US-ASCII
|
|
Message-Id: <E12VUhX-0003Vz-00@darwin.oche.de>
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@hub.org
|
|
Status: OR
|
|
|
|
|
|
I have written some thoughts on the concept of tablespace
|
|
down. I would be happy to get some comments on it.
|
|
|
|
-----------------------------------------------------------------
|
|
Implementation of tablespaces within PostgreSQL
|
|
- a brainstorming paper designed for general discussion -
|
|
|
|
by Martin Neumann, 2000/3/15
|
|
|
|
|
|
1. What are tablespaces?
|
|
-------------------------
|
|
|
|
Tablespaces make it possible to distribute storage objects
|
|
over multiple points of storage (POS). Therefor one could
|
|
say a tablespace can be a POS.
|
|
|
|
Example:
|
|
|
|
tablespace_a -----> /mnt/raid/arena0/
|
|
tablespace_b -----> /mnt/raid/emc0/
|
|
|
|
Tablespaces can also store their data on other tablespaces:
|
|
|
|
tablespace_c -----> tablespace_b
|
|
|
|
This is quite interessting for administration purposes.
|
|
|
|
|
|
2. What are its advantages?
|
|
----------------------------
|
|
|
|
As you can choose a different tablespace for every storage
|
|
object (table, index etc.) it is easy to improve the following
|
|
aspects of your system:
|
|
|
|
- Reliability
|
|
|
|
You can put storage objects (mostly tables) you strongly depend
|
|
on onto a more reliable tablespace (mirrored RAID or perhaps
|
|
simply a directory which gets backuped more often than others).
|
|
|
|
- Speed
|
|
|
|
You can put storage objects you rarely need onto a rather slow
|
|
tablespace and keep your quick tablespaces clean from this.
|
|
|
|
A fast, but more expensive RAID-Stripeset can be used more
|
|
efficiently as it doesn't get filled with non-performance
|
|
sensitive data.
|
|
|
|
But also distributing storage objects which have equal needs
|
|
in sense of speed onto different tablespaces makes sense as
|
|
you gain more speed by distributing data over more than one
|
|
harddisk spindle.
|
|
|
|
- Manageability
|
|
|
|
You can grant and revoke rights on base of a tablespace.
|
|
|
|
As every storage object belongs to exactly one tablespace,
|
|
you can easily group storage objects using a tablespace.
|
|
|
|
|
|
3. What about disk I/O?
|
|
------------------------
|
|
|
|
Tablespaces tell the storage manager only where to store
|
|
the data, not how. This is the reasonable way.
|
|
|
|
|
|
4. Usage
|
|
---------
|
|
|
|
CREATE TABLESPACE tsname TYPE storage_type storage_options
|
|
|
|
Examples:
|
|
|
|
CREATE TABLESPACE tsemc0
|
|
TYPE classic DIRECTORY /mnt/raid/emc0 NOFSYNC
|
|
|
|
CREATE TABLESPACE tsarena0 TYPE raw DEVICE /dev/araid/0
|
|
MINSIZE 128 MAXSIZE 4096 GROW 4 32 SHRINK 2 32
|
|
BLOCKSIZE 16384
|
|
|
|
CREATE TABLESPACE quick0 TYPE link TABLESPACE tsarena0;
|
|
|
|
--
|
|
|
|
CREATE TABLE tbname ( ... ) TABLESPACE tsname;
|
|
|
|
Examples:
|
|
|
|
CREATE TABLE foo (
|
|
id int4 NOT NULL UNIQUE,
|
|
name text NOT NULL
|
|
) TABLESPACE tsemc0;
|
|
|
|
CREATE TABLE bar (
|
|
id int4 NOT NULL UNIQUE,
|
|
name text NOT NULL
|
|
) TABLESPACE default;
|
|
|
|
If the tablespace isn't given, the storage objects gets created
|
|
in the "default" tablespace.
|
|
|
|
"default" is the PostgreSQL's default tablespace and the only one
|
|
which has to exist on each system.
|
|
|
|
--
|
|
|
|
ALTER TABLESPACE tsname tssettings
|
|
|
|
Examples:
|
|
|
|
ALTER TABLESPACE tsemc0 DIRECTORY /mnt/raid/emc1
|
|
|
|
|
|
NOTE: altering tablespaces without recreating the contained
|
|
storage objects introduces many problems.
|
|
Realisation is difficult and won't be my first goal.
|
|
|
|
--
|
|
|
|
DROP TABLESPACE tsname [FORCE]
|
|
|
|
Examples:
|
|
|
|
DROP TABLESPACE tsarena0
|
|
|
|
This will immediately remove the tablespace tsarena0
|
|
if it contains no storage objects.
|
|
|
|
If it still contains some the tablespace is marked for
|
|
deletion.
|
|
|
|
This means:
|
|
1. you can't create new storage objects in the tablespace
|
|
2. if the last storage object inside gets dropped, the
|
|
tablespace will be removed.
|
|
|
|
|
|
DROP TABLESPACE tsarena0 FORCE
|
|
|
|
This will remove the tablespace including all contained
|
|
storage objects immediately.
|
|
|
|
--
|
|
|
|
VACUUM tsname
|
|
|
|
Example:
|
|
|
|
VACUUM tsemc1
|
|
|
|
This will vacuum a single tablespace with all contained
|
|
storage objects.
|
|
-----------------------------------------------------------------
|
|
|
|
--
|
|
Martin Neumann, Welkenrather Str. 118c, 52074 Aachen, Germany
|
|
mne@mne.de - http://www.mne.de/mne/ - sms@mne.de [eMail2SMS]
|
|
Tel. 0241 / 8876-080 - Mobil: 0173 / 27 69 632
|
|
..------.---------------------------------------------------------
|
|
| at | Inform GmbH - Abteilung Airport Logistics
|
|
| work | Pascalstr. 23 - 52076 Aachen - Tel. 02408 / 9456-0
|
|
|______| martin.neumann@inform-ac.com - http://www.inform-ac.com
|
|
|
|
|
|
|