Add bitmap index mention.

This commit is contained in:
Bruce Momjian 2002-08-13 05:08:35 +00:00
parent b6d8df67d3
commit ccf15026e6

View File

@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.12 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.13 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
Received: from localhost (majordom@localhost)
by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.12 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.13 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
Received: from localhost (majordom@localhost)
by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
@ -1006,7 +1006,7 @@ From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:31:01 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.12 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.13 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477;
Fri, 16 Jun 2000 17:13:36 -0400 (EDT)
@ -1649,3 +1649,593 @@ OEV6eO8MnBSlbJMHiQ08gNE=
--=-Ivchb84S75fOMzJ9DxwK--
From pgsql-hackers-owner+M26157@postgresql.org Tue Aug 6 23:06:34 2002
Date: Wed, 7 Aug 2002 13:07:38 +1000 (EST)
From: Gavin Sherry <swm@linuxworld.com.au>
To: Curt Sampson <cjs@cynic.net>
cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CLUSTER and indisclustered
In-Reply-To: <Pine.NEB.4.44.0208071126590.1214-100000@angelic.cynic.net>
Message-ID: <Pine.LNX.4.21.0208071259210.13438-100000@linuxworld.com.au>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 1357
On Wed, 7 Aug 2002, Curt Sampson wrote:
> But after doing some benchmarking of various sorts of random reads
> and writes, it occurred to me that there might be optimizations
> that could help a lot with this sort of thing. What if, when we've
> got an index block with a bunch of entries, instead of doing the
> reads in the order of the entries, we do them in the order of the
> blocks the entries point to? That would introduce a certain amount
> of "sequentialness" to the reads that the OS is not capable of
> introducing (since it can't reschedule the reads you're doing, the
> way it could reschedule, say, random writes).
This sounds more or less like the method employed by Firebird as described
by Ann Douglas to Tom at OSCON (correct me if I get this wrong).
Basically, firebird populates a bitmap with entries the scan is interested
in. The bitmap is populated in page order so that all entries on the same
heap page can be fetched at once.
This is totally different to the way postgres does things and would
require significant modification to the index access methods.
Gavin
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
From pgsql-hackers-owner+M26162@postgresql.org Wed Aug 7 00:42:35 2002
To: Curt Sampson <cjs@cynic.net>
cc: mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CLUSTER and indisclustered
In-Reply-To: <Pine.NEB.4.44.0208071126590.1214-100000@angelic.cynic.net>
References: <Pine.NEB.4.44.0208071126590.1214-100000@angelic.cynic.net>
Comments: In-reply-to Curt Sampson <cjs@cynic.net>
message dated "Wed, 07 Aug 2002 11:31:32 +0900"
Date: Wed, 07 Aug 2002 00:41:47 -0400
Message-ID: <12593.1028695307@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 3063
Curt Sampson <cjs@cynic.net> writes:
> But after doing some benchmarking of various sorts of random reads
> and writes, it occurred to me that there might be optimizations
> that could help a lot with this sort of thing. What if, when we've
> got an index block with a bunch of entries, instead of doing the
> reads in the order of the entries, we do them in the order of the
> blocks the entries point to?
I thought to myself "didn't I just post something about that?"
and then realized it was on a different mailing list. Here ya go
(and no, this is not the first time around on this list either...)
I am currently thinking that bitmap indexes per se are not all that
interesting. What does interest me is bitmapped index lookup, which
came back into mind after hearing Ann Harrison describe how FireBird/
InterBase does it.
The idea is that you don't scan the index and base table concurrently
as we presently do it. Instead, you scan the index and make a list
of the TIDs of the table tuples you need to visit. This list can
be conveniently represented as a sparse bitmap. After you've finished
looking at the index, you visit all the required table tuples *in
physical order* using the bitmap. This eliminates multiple fetches
of the same heap page, and can possibly let you get some win from
sequential access.
Once you have built this mechanism, you can then move on to using
multiple indexes in interesting ways: you can do several indexscans
in one query and then AND or OR their bitmaps before doing the heap
scan. This would allow, for example, "WHERE a = foo and b = bar"
to be handled by ANDing results from separate indexes on the a and b
columns, rather than having to choose only one index to use as we do
now.
Some thoughts about implementation: FireBird's implementation seems
to depend on an assumption about a fixed number of tuple pointers
per page. We don't have that, but we could probably get away with
just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
Also, the main downside of this approach is that the bitmap could
get large --- but you could have some logic that causes you to fall
back to plain sequential scan if you get too many index hits. (It's
interesting to think of this as lossy compression of the bitmap...
which leads to the idea of only being fuzzy in limited areas of the
bitmap, rather than losing all the information you have.)
A possibly nasty issue is that lazy VACUUM has some assumptions in it
about indexscans holding pins on index pages --- that's what prevents
it from removing heap tuples that a concurrent indexscan is just about
to visit. It might be that there is no problem: even if lazy VACUUM
removes a heap tuple and someone else then installs a new tuple in that
same TID slot, you should be okay because the new tuple is too new to
pass your visibility test. But I'm not convinced this is safe.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
From pgsql-hackers-owner+M26172@postgresql.org Wed Aug 7 02:49:56 2002
X-Authentication-Warning: rh72.home.ee: hannu set sender to hannu@tm.ee using -f
Subject: Re: [HACKERS] CLUSTER and indisclustered
From: Hannu Krosing <hannu@tm.ee>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>,
Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
In-Reply-To: <12776.1028697148@sss.pgh.pa.us>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
<12776.1028697148@sss.pgh.pa.us>
X-Mailer: Ximian Evolution 1.0.7
Date: 07 Aug 2002 09:46:29 +0500
Message-ID: <1028695589.2133.11.camel@rh72.home.ee>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 1064
On Wed, 2002-08-07 at 10:12, Tom Lane wrote:
> Curt Sampson <cjs@cynic.net> writes:
> > On Wed, 7 Aug 2002, Tom Lane wrote:
> >> Also, the main downside of this approach is that the bitmap could
> >> get large --- but you could have some logic that causes you to fall
> >> back to plain sequential scan if you get too many index hits.
>
> > Well, what I was thinking of, should the list of TIDs to fetch get too
> > long, was just to break it down in to chunks.
>
> But then you lose the possibility of combining multiple indexes through
> bitmap AND/OR steps, which seems quite interesting to me. If you've
> visited only a part of each index then you can't apply that concept.
When the tuples are small relative to pagesize, you may get some
"compression" by saving just pages and not the actual tids in the the
bitmap.
-------------
Hannu
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
From pgsql-hackers-owner+M26166@postgresql.org Wed Aug 7 00:55:52 2002
Date: Wed, 7 Aug 2002 13:55:41 +0900 (JST)
From: Curt Sampson <cjs@cynic.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] CLUSTER and indisclustered
In-Reply-To: <12593.1028695307@sss.pgh.pa.us>
Message-ID: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 1840
On Wed, 7 Aug 2002, Tom Lane wrote:
> I thought to myself "didn't I just post something about that?"
> and then realized it was on a different mailing list. Here ya go
> (and no, this is not the first time around on this list either...)
Wow. I'm glad to see you looking at this, because this feature would so
*so* much for the performance of some of my queries, and really, really
impress my "billion-row-database" client.
> The idea is that you don't scan the index and base table concurrently
> as we presently do it. Instead, you scan the index and make a list
> of the TIDs of the table tuples you need to visit.
Right.
> Also, the main downside of this approach is that the bitmap could
> get large --- but you could have some logic that causes you to fall
> back to plain sequential scan if you get too many index hits.
Well, what I was thinking of, should the list of TIDs to fetch get too
long, was just to break it down in to chunks. If you want to limit to,
say, 1000 TIDs, and your index has 3000, just do the first 1000, then
the next 1000, then the last 1000. This would still result in much less
disk head movement and speed the query immensely.
(BTW, I have verified this emperically during testing of random read vs.
random write on a RAID controller. The writes were 5-10 times faster
than the reads because the controller was caching a number of writes and
then doing them in the best possible order, whereas the reads had to be
satisfied in the order they were submitted to the controller.)
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/users-lounge/docs/faq.html
From pgsql-hackers-owner+M26167@postgresql.org Wed Aug 7 01:12:54 2002
To: Curt Sampson <cjs@cynic.net>
cc: mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CLUSTER and indisclustered
In-Reply-To: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
Comments: In-reply-to Curt Sampson <cjs@cynic.net>
message dated "Wed, 07 Aug 2002 13:55:41 +0900"
Date: Wed, 07 Aug 2002 01:12:28 -0400
Message-ID: <12776.1028697148@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 1428
Curt Sampson <cjs@cynic.net> writes:
> On Wed, 7 Aug 2002, Tom Lane wrote:
>> Also, the main downside of this approach is that the bitmap could
>> get large --- but you could have some logic that causes you to fall
>> back to plain sequential scan if you get too many index hits.
> Well, what I was thinking of, should the list of TIDs to fetch get too
> long, was just to break it down in to chunks.
But then you lose the possibility of combining multiple indexes through
bitmap AND/OR steps, which seems quite interesting to me. If you've
visited only a part of each index then you can't apply that concept.
Another point to keep in mind is that the bigger the bitmap gets, the
less useful an indexscan is, by definition --- sooner or later you might
as well fall back to a seqscan. So the idea of lossy compression of a
large bitmap seems really ideal to me. In principle you could seqscan
the parts of the table where matching tuples are thick on the ground,
and indexscan the parts where they ain't. Maybe this seems natural
to me as an old JPEG campaigner, but if you don't see the logic I
recommend thinking about it a little ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
From tgl@sss.pgh.pa.us Wed Aug 7 09:27:05 2002
To: Hannu Krosing <hannu@tm.ee>
cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>,
Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CLUSTER and indisclustered
In-Reply-To: <1028726966.13418.12.camel@taru.tm.ee>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> <12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee> <1028726966.13418.12.camel@taru.tm.ee>
Comments: In-reply-to Hannu Krosing <hannu@tm.ee>
message dated "07 Aug 2002 15:29:26 +0200"
Date: Wed, 07 Aug 2002 09:26:42 -0400
Message-ID: <15010.1028726802@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Content-Length: 1120
Hannu Krosing <hannu@tm.ee> writes:
> Now I remembered my original preference for page bitmaps (vs. tuple
> bitmaps): one can't actually make good use of a bitmap of tuples because
> there is no fixed tuples/page ratio and thus no way to quickly go from
> bit position to actual tuple. You mention the same problem but propose a
> different solution.
> Using page bitmap, we will at least avoid fetching any unneeded pages -
> essentially we will have a sequential scan over possibly interesting
> pages.
Right. One form of the "lossy compression" idea I suggested is to
switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
too large to work with. Again, one could imagine doing that only in
denser areas of the bitmap.
> But I guess that CLUSTER support for INSERT will not be touched for 7.3
> as will real bitmap indexes ;)
All of this is far-future work I think. Adding a new scan type to the
executor would probably be pretty localized, but the ramifications in
the planner could be extensive --- especially if you want to do plans
involving ANDed or ORed bitmaps.
regards, tom lane
From pgsql-hackers-owner+M26178@postgresql.org Wed Aug 7 08:28:14 2002
X-Authentication-Warning: taru.tm.ee: hannu set sender to hannu@tm.ee using -f
Subject: Re: [HACKERS] CLUSTER and indisclustered
From: Hannu Krosing <hannu@tm.ee>
To: Hannu Krosing <hannu@tm.ee>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Curt Sampson <cjs@cynic.net>,
mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
In-Reply-To: <1028695589.2133.11.camel@rh72.home.ee>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
<12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee>
X-Mailer: Ximian Evolution 1.0.3.99
Date: 07 Aug 2002 15:29:26 +0200
Message-ID: <1028726966.13418.12.camel@taru.tm.ee>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 1837
On Wed, 2002-08-07 at 06:46, Hannu Krosing wrote:
> On Wed, 2002-08-07 at 10:12, Tom Lane wrote:
> > Curt Sampson <cjs@cynic.net> writes:
> > > On Wed, 7 Aug 2002, Tom Lane wrote:
> > >> Also, the main downside of this approach is that the bitmap could
> > >> get large --- but you could have some logic that causes you to fall
> > >> back to plain sequential scan if you get too many index hits.
> >
> > > Well, what I was thinking of, should the list of TIDs to fetch get too
> > > long, was just to break it down in to chunks.
> >
> > But then you lose the possibility of combining multiple indexes through
> > bitmap AND/OR steps, which seems quite interesting to me. If you've
> > visited only a part of each index then you can't apply that concept.
>
> When the tuples are small relative to pagesize, you may get some
> "compression" by saving just pages and not the actual tids in the the
> bitmap.
Now I remembered my original preference for page bitmaps (vs. tuple
bitmaps): one can't actually make good use of a bitmap of tuples because
there is no fixed tuples/page ratio and thus no way to quickly go from
bit position to actual tuple. You mention the same problem but propose a
different solution.
Using page bitmap, we will at least avoid fetching any unneeded pages -
essentially we will have a sequential scan over possibly interesting
pages.
If we were to use page-bitmap index for something with only a few values
like booleans, some insert-time local clustering should be useful, so
that TRUEs and FALSEs end up on different pages.
But I guess that CLUSTER support for INSERT will not be touched for 7.3
as will real bitmap indexes ;)
---------------
Hannu
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
From pgsql-hackers-owner+M26192@postgresql.org Wed Aug 7 10:26:30 2002
To: Hannu Krosing <hannu@tm.ee>
cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>,
Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CLUSTER and indisclustered
In-Reply-To: <1028733234.13418.113.camel@taru.tm.ee>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> <12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee> <1028726966.13418.12.camel@taru.tm.ee> <15010.1028726802@sss.pgh.pa.us> <1028733234.13418.113.camel@taru.tm.ee>
Comments: In-reply-to Hannu Krosing <hannu@tm.ee>
message dated "07 Aug 2002 17:13:54 +0200"
Date: Wed, 07 Aug 2002 10:26:13 -0400
Message-ID: <15622.1028730373@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 1224
Hannu Krosing <hannu@tm.ee> writes:
> On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
>> Right. One form of the "lossy compression" idea I suggested is to
>> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
>> too large to work with.
> If it is a real bitmap, should it not be easyeast to allocate at the
> start ?
But it isn't a "real bitmap". That would be a really poor
implementation, both for space and speed --- do you really want to scan
over a couple of megs of zeroes to find the few one-bits you care about,
in the typical case? "Bitmap" is a convenient term because it describes
the abstract behavior we want, but the actual data structure will
probably be nontrivial. If I recall Ann's description correctly,
Firebird's implementation uses run length coding of some kind (anyone
care to dig in their source and get all the details?). If we tried
anything in the way of lossy compression then there'd be even more stuff
lurking under the hood.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
From pgsql-hackers-owner+M26188@postgresql.org Wed Aug 7 10:12:26 2002
X-Authentication-Warning: taru.tm.ee: hannu set sender to hannu@tm.ee using -f
Subject: Re: [HACKERS] CLUSTER and indisclustered
From: Hannu Krosing <hannu@tm.ee>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>,
Gavin Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
In-Reply-To: <15010.1028726802@sss.pgh.pa.us>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
<12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee>
<1028726966.13418.12.camel@taru.tm.ee> <15010.1028726802@sss.pgh.pa.us>
X-Mailer: Ximian Evolution 1.0.3.99
Date: 07 Aug 2002 17:13:54 +0200
Message-ID: <1028733234.13418.113.camel@taru.tm.ee>
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
X-Virus-Scanned: by AMaViS new-20020517
Content-Length: 2812
On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > Now I remembered my original preference for page bitmaps (vs. tuple
> > bitmaps): one can't actually make good use of a bitmap of tuples because
> > there is no fixed tuples/page ratio and thus no way to quickly go from
> > bit position to actual tuple. You mention the same problem but propose a
> > different solution.
>
> > Using page bitmap, we will at least avoid fetching any unneeded pages -
> > essentially we will have a sequential scan over possibly interesting
> > pages.
>
> Right. One form of the "lossy compression" idea I suggested is to
> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
> too large to work with.
If it is a real bitmap, should it not be easyeast to allocate at the
start ?
a page bitmap for a 100 000 000 tuple table with 10 tuples/page will be
sized 10000000/8 = 1.25 MB, which does not look too big for me for that
amount of data (the data table itself would occupy 80 GB).
Even having the bitmap of 16 bits/page (with the bits 0-14 meaning
tuples 0-14 and bit 15 meaning "seq scan the rest of page") would
consume just 20 MB of _local_ memory, and would be quite justifyiable
for a query on a table that large.
For a real bitmap index the tuples-per-page should be a user-supplied
tuning parameter.
> Again, one could imagine doing that only in denser areas of the bitmap.
I would hardly call the resulting structure "a bitmap" ;)
And I'm not sure the overhead for a more complex structure would win us
any additional performance for most cases.
> > But I guess that CLUSTER support for INSERT will not be touched for 7.3
> > as will real bitmap indexes ;)
>
> All of this is far-future work I think.
After we do that we will probably be able claim support for
"datawarehousing" ;)
> Adding a new scan type to the
> executor would probably be pretty localized, but the ramifications in
> the planner could be extensive --- especially if you want to do plans
> involving ANDed or ORed bitmaps.
Also going to "smart inserter" which can do local clustering on sets of
real bitmap indexes for INSERTS (and INSERT side of UPDATE) would
probably be a major change from our current "stupid inserter" ;)
This will not be needed for bitmap resolution higher than 1bit/page but
default local clustering on bitmap indexes will probably buy us some
extra performance. by avoiding data page fetches when such indexes are
used.
AN anyway the support for INSERT being aware of clustering will probably
come up sometime.
------------
Hannu
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
From hannu@tm.ee Wed Aug 7 11:22:53 2002
X-Authentication-Warning: taru.tm.ee: hannu set sender to hannu@tm.ee using -f
Subject: Re: [HACKERS] CLUSTER and indisclustered
From: Hannu Krosing <hannu@tm.ee>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>,
Gavin
Sherry <swm@linuxworld.com.au>,
Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
In-Reply-To: <15622.1028730373@sss.pgh.pa.us>
References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
<12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee>
<1028726966.13418.12.camel@taru.tm.ee> <15010.1028726802@sss.pgh.pa.us>
<1028733234.13418.113.camel@taru.tm.ee> <15622.1028730373@sss.pgh.pa.us>
X-Mailer: Ximian Evolution 1.0.3.99
Date: 07 Aug 2002 18:24:30 +0200
Message-ID: <1028737470.13419.182.camel@taru.tm.ee>
Content-Length: 2382
On Wed, 2002-08-07 at 16:26, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
> >> Right. One form of the "lossy compression" idea I suggested is to
> >> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
> >> too large to work with.
>
> > If it is a real bitmap, should it not be easyeast to allocate at the
> > start ?
>
> But it isn't a "real bitmap". That would be a really poor
> implementation, both for space and speed --- do you really want to scan
> over a couple of megs of zeroes to find the few one-bits you care about,
> in the typical case?
I guess that depends on data. The typical case should be somthing the
stats process will find out so the optimiser can use it
The bitmap must be less than 1/48 (size of TID) full for best
uncompressed "active-tid-list" to be smaller than plain bitmap. If there
were some structure above list then this ratio would be even higher.
I have had good experience using "compressed delta lists", which will
scale well ofer the whole "fullness" spectrum of bitmap, but this is for
storage, not for initial constructing of lists.
> "Bitmap" is a convenient term because it describes
> the abstract behavior we want, but the actual data structure will
> probably be nontrivial. If I recall Ann's description correctly,
> Firebird's implementation uses run length coding of some kind (anyone
> care to dig in their source and get all the details?).
Plain RLL is probably a good way to store it and for merging two or more
bitmaps, but not as good for constructing it bit-by-bit. I guess the
most effective structure for updating is often still a plain bitmap
(maybe not if it is very sparse and all of it does not fit in cache),
followed by some kind of balanced tree (maybe rb-tree).
If the bitmap is relatively full then the plain bitmap is almost always
the most effective to update.
> If we tried anything in the way of lossy compression then there'd
> be even more stuff lurking under the hood.
Having three-valued (0,1,maybe) RLL-encoded "tritmap" would be a good
way to represent lossy compression, and it would also be quite
straightforward to merge two of these using AND or OR. It may even be
possible to easily construct it using a fixed-length b-tree and going
from 1 to "maybe" for nodes that get too dense.
---------------
Hannu