mirror of
synced 2025-03-01 19:45:33 +08:00
2627 lines
114 KiB
2627 lines
114 KiB
From pgsql-performance-owner+M17204@postgresql.org Wed Feb 15 16:28:34 2006
Return-path: <pgsql-performance-owner+M17204@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1FLSV527014
for <pgman@candle.pha.pa.us>; Wed, 15 Feb 2006 16:28:31 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 168C967B584;
Wed, 15 Feb 2006 17:28:29 -0400 (AST)
X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id BB0AB9DCB9E
for <pgsql-performance-postgresql.org@localhost.postgresql.org>; Wed, 15 Feb 2006 17:27:56 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 22055-07
for <pgsql-performance-postgresql.org@localhost.postgresql.org>;
Wed, 15 Feb 2006 17:27:57 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id F385E9DCB98
for <pgsql-performance@postgresql.org>; Wed, 15 Feb 2006 17:27:53 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k1FLRsqd019780;
Wed, 15 Feb 2006 16:27:54 -0500 (EST)
To: Gary Doades <gpd@gpdnet.co.uk>
cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Strange Create Index behaviour
In-Reply-To: <19510.1140036968@sss.pgh.pa.us>
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us>
Comments: In-reply-to Tom Lane <tgl@sss.pgh.pa.us>
message dated "Wed, 15 Feb 2006 15:56:08 -0500"
Date: Wed, 15 Feb 2006 16:27:54 -0500
Message-ID: <19779.1140038874@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.11 required=5 tests=[AWL=0.110]
X-Spam-Score: 0.11
X-Mailing-List: pgsql-performance
List-Archive: <http://archives.postgresql.org/pgsql-performance>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-performance.postgresql.org>
List-Owner: <mailto:pgsql-performance-owner@postgresql.org>
List-Post: <mailto:pgsql-performance@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-performance>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org
Status: ORr
I wrote:
> Interesting. I tried your test script and got fairly close times
> for all the cases on two different machines:
> old HPUX machine: shortest 5800 msec, longest 7960 msec
> new Fedora 4 machine: shortest 461 msec, longest 608 msec
> So what this looks like to me is a corner case that FreeBSD's qsort
> fails to handle well.
I tried forcing PG to use src/port/qsort.c on the Fedora machine,
and lo and behold:
new Fedora 4 machine: shortest 434 msec, longest 8530 msec
So it sure looks like this script does expose a problem on BSD-derived
qsorts. Curiously, the case that's much the worst for me is the third
in the script, while the shortest time is the first case, which was slow
for Gary. So I'd venture that the *BSD code has been tweaked somewhere
along the way, in a manner that moves the problem around without really
fixing it. (Anyone want to compare the actual FreeBSD source to what
we have?)
This is pretty relevant stuff, because there was a thread recently
advocating that we stop using the platform qsort on all platforms:
It's really interesting to see a case where port/qsort is radically
worse than other qsorts ... unless we figure that out and fix it,
I think the idea of using port/qsort everywhere has just taken a
major hit.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
From pgsql-performance-owner+M17212@postgresql.org Wed Feb 15 18:29:07 2006
Return-path: <pgsql-performance-owner+M17212@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1FNT6509074
for <pgman@candle.pha.pa.us>; Wed, 15 Feb 2006 18:29:06 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 2BE6267B58B;
Wed, 15 Feb 2006 19:29:04 -0400 (AST)
X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 7C3D49DC803;
Wed, 15 Feb 2006 19:28:30 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 47149-10; Wed, 15 Feb 2006 19:28:32 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id C56AD9DC843;
Wed, 15 Feb 2006 19:28:27 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k1FNSTkm020782;
Wed, 15 Feb 2006 18:28:29 -0500 (EST)
To: Gary Doades <gpd@gpdnet.co.uk>
cc: pgsql-performance@postgresql.org, pgsql-hackers@postgresql.org
Subject: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
In-Reply-To: <43F39E53.1020009@gpdnet.co.uk>
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us> <19779.1140038874@sss.pgh.pa.us> <43F39E53.1020009@gpdnet.co.uk>
Comments: In-reply-to Gary Doades <gpd@gpdnet.co.uk>
message dated "Wed, 15 Feb 2006 21:34:11 +0000"
Date: Wed, 15 Feb 2006 18:28:29 -0500
Message-ID: <20781.1140046109@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.11 required=5 tests=[AWL=0.110]
X-Spam-Score: 0.11
X-Mailing-List: pgsql-performance
List-Archive: <http://archives.postgresql.org/pgsql-performance>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-performance.postgresql.org>
List-Owner: <mailto:pgsql-performance-owner@postgresql.org>
List-Post: <mailto:pgsql-performance@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-performance>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org
Status: OR
Gary Doades <gpd@gpdnet.co.uk> writes:
> If I run the script again, it is not always the first case that is slow,
> it varies from run to run, which is why I repeated it quite a few times
> for the test.
For some reason I hadn't immediately twigged to the fact that your test
script is just N repetitions of the exact same structure with random data.
So it's not so surprising that you get random variations in behavior
with different test data sets.
I did some experimentation comparing the qsort from Fedora Core 4
(glibc-2.3.5-10.3) with our src/port/qsort.c. For those who weren't
following the pgsql-performance thread, the test case is just this
repeated a lot of times:
create table atest(i int4, r int4);
insert into atest (i,r) select generate_series(1,100000), 0;
insert into atest (i,r) select generate_series(1,100000), random()*100000;
create index idx on atest(r);
drop table atest;
I did this 100 times and sorted the reported runtimes. (Investigation
with trace_sort = on confirms that the runtime is almost entirely spent
in qsort() called from our performsort --- the Postgres overhead is
about 100msec on this machine.) Results are below.
It seems clear that our qsort.c is doing a pretty awful job of picking
qsort pivots, while glibc is mostly managing not to make that mistake.
I haven't looked at the glibc code yet to see what they are doing
I'd say this puts a considerable damper on my enthusiasm for using our
qsort all the time, as was recently debated in this thread:
We need to fix our qsort.c before pushing ahead with that idea.
regards, tom lane
100 runtimes for glibc qsort, sorted ascending:
Time: 459.860 ms
Time: 460.209 ms
Time: 460.704 ms
Time: 461.317 ms
Time: 461.538 ms
Time: 461.652 ms
Time: 461.988 ms
Time: 462.573 ms
Time: 462.638 ms
Time: 462.716 ms
Time: 462.917 ms
Time: 463.219 ms
Time: 463.455 ms
Time: 463.650 ms
Time: 463.723 ms
Time: 463.737 ms
Time: 463.750 ms
Time: 463.852 ms
Time: 463.964 ms
Time: 463.988 ms
Time: 464.003 ms
Time: 464.135 ms
Time: 464.372 ms
Time: 464.458 ms
Time: 464.496 ms
Time: 464.551 ms
Time: 464.599 ms
Time: 464.655 ms
Time: 464.656 ms
Time: 464.722 ms
Time: 464.814 ms
Time: 464.827 ms
Time: 464.878 ms
Time: 464.899 ms
Time: 464.905 ms
Time: 464.987 ms
Time: 465.055 ms
Time: 465.138 ms
Time: 465.159 ms
Time: 465.194 ms
Time: 465.310 ms
Time: 465.316 ms
Time: 465.375 ms
Time: 465.450 ms
Time: 465.535 ms
Time: 465.595 ms
Time: 465.680 ms
Time: 465.769 ms
Time: 465.865 ms
Time: 465.892 ms
Time: 465.903 ms
Time: 466.003 ms
Time: 466.154 ms
Time: 466.164 ms
Time: 466.203 ms
Time: 466.305 ms
Time: 466.344 ms
Time: 466.364 ms
Time: 466.388 ms
Time: 466.502 ms
Time: 466.593 ms
Time: 466.725 ms
Time: 466.794 ms
Time: 466.798 ms
Time: 466.904 ms
Time: 466.971 ms
Time: 466.997 ms
Time: 467.122 ms
Time: 467.146 ms
Time: 467.221 ms
Time: 467.224 ms
Time: 467.244 ms
Time: 467.277 ms
Time: 467.587 ms
Time: 468.142 ms
Time: 468.207 ms
Time: 468.237 ms
Time: 468.471 ms
Time: 468.663 ms
Time: 468.700 ms
Time: 469.235 ms
Time: 469.840 ms
Time: 470.472 ms
Time: 471.140 ms
Time: 472.811 ms
Time: 472.959 ms
Time: 474.858 ms
Time: 477.210 ms
Time: 479.571 ms
Time: 479.671 ms
Time: 482.797 ms
Time: 488.852 ms
Time: 514.639 ms
Time: 529.287 ms
Time: 612.185 ms
Time: 660.748 ms
Time: 742.227 ms
Time: 866.814 ms
Time: 1234.848 ms
Time: 1267.398 ms
100 runtimes for port/qsort.c, sorted ascending:
Time: 418.905 ms
Time: 420.611 ms
Time: 420.764 ms
Time: 420.904 ms
Time: 421.706 ms
Time: 422.466 ms
Time: 422.627 ms
Time: 423.189 ms
Time: 423.302 ms
Time: 425.096 ms
Time: 425.731 ms
Time: 425.851 ms
Time: 427.253 ms
Time: 430.113 ms
Time: 432.756 ms
Time: 432.963 ms
Time: 440.502 ms
Time: 440.640 ms
Time: 450.452 ms
Time: 458.143 ms
Time: 459.212 ms
Time: 467.706 ms
Time: 468.006 ms
Time: 468.574 ms
Time: 470.003 ms
Time: 472.313 ms
Time: 483.622 ms
Time: 492.395 ms
Time: 509.564 ms
Time: 531.037 ms
Time: 533.366 ms
Time: 535.610 ms
Time: 575.523 ms
Time: 582.688 ms
Time: 593.545 ms
Time: 647.364 ms
Time: 660.612 ms
Time: 677.312 ms
Time: 680.288 ms
Time: 697.626 ms
Time: 833.066 ms
Time: 834.511 ms
Time: 851.819 ms
Time: 920.443 ms
Time: 926.731 ms
Time: 954.289 ms
Time: 1045.214 ms
Time: 1059.200 ms
Time: 1062.328 ms
Time: 1136.018 ms
Time: 1260.091 ms
Time: 1276.883 ms
Time: 1319.351 ms
Time: 1438.854 ms
Time: 1475.457 ms
Time: 1538.211 ms
Time: 1549.004 ms
Time: 1744.642 ms
Time: 1771.258 ms
Time: 1959.530 ms
Time: 2300.140 ms
Time: 2589.641 ms
Time: 2612.780 ms
Time: 3100.024 ms
Time: 3284.125 ms
Time: 3379.792 ms
Time: 3750.278 ms
Time: 4302.278 ms
Time: 4780.624 ms
Time: 5000.056 ms
Time: 5092.604 ms
Time: 5168.722 ms
Time: 5292.941 ms
Time: 5895.964 ms
Time: 7003.164 ms
Time: 7099.449 ms
Time: 7115.083 ms
Time: 7384.940 ms
Time: 8214.010 ms
Time: 8700.771 ms
Time: 9331.225 ms
Time: 10503.360 ms
Time: 12496.026 ms
Time: 12982.474 ms
Time: 15192.390 ms
Time: 15392.161 ms
Time: 15958.295 ms
Time: 18375.693 ms
Time: 18617.706 ms
Time: 18927.515 ms
Time: 19898.018 ms
Time: 20865.979 ms
Time: 21000.907 ms
Time: 21297.585 ms
Time: 21714.518 ms
Time: 25423.235 ms
Time: 27543.052 ms
Time: 28314.182 ms
Time: 29400.278 ms
Time: 34142.534 ms
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
From pgsql-hackers-owner+M79733@postgresql.org Wed Feb 15 20:22:07 2006
Return-path: <pgsql-hackers-owner+M79733@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1G1M6529533
for <pgman@candle.pha.pa.us>; Wed, 15 Feb 2006 20:22:06 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id E5C5467B58F;
Wed, 15 Feb 2006 21:22:03 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 3DAA69DCACE;
Wed, 15 Feb 2006 21:21:34 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 76351-01; Wed, 15 Feb 2006 21:21:36 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id 2FBB59DCA3F;
Wed, 15 Feb 2006 21:21:31 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k1G1LXXi021616;
Wed, 15 Feb 2006 20:21:33 -0500 (EST)
To: Ron <rjpeace@earthlink.net>
cc: pgsql-performance@postgresql.org, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
In-Reply-To: <>
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us> <19779.1140038874@sss.pgh.pa.us> <43F39E53.1020009@gpdnet.co.uk> <20781.1140046109@sss.pgh.pa.us> <>
Comments: In-reply-to Ron <rjpeace@earthlink.net>
message dated "Wed, 15 Feb 2006 19:57:51 -0500"
Date: Wed, 15 Feb 2006 20:21:33 -0500
Message-ID: <21615.1140052893@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.11 required=5 tests=[AWL=0.110]
X-Spam-Score: 0.11
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Ron <rjpeace@earthlink.net> writes:
> How are we choosing our pivots?
See qsort.c: it looks like median of nine equally spaced inputs (ie,
the 1/8th points of the initial input array, plus the end points),
implemented as two rounds of median-of-three choices. With half of the
data inputs zero, it's not too improbable for two out of the three
samples to be zeroes in which case I think the med3 result will be zero
--- so choosing a pivot of zero is much more probable than one would
like, and doing so in many levels of recursion causes the problem.
I think. I'm not too sure if the code isn't just being sloppy about the
case where many data values are equal to the pivot --- there's a special
case there to switch to insertion sort, and maybe that's getting invoked
too soon. It'd be useful to get a line-level profile of the behavior of
this code in the slow cases...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
From pgsql-performance-owner+M17282@postgresql.org Fri Feb 17 23:11:11 2006
Return-path: <pgsql-performance-owner+M17282@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1I4BA515503
for <pgman@candle.pha.pa.us>; Fri, 17 Feb 2006 23:11:10 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 2825F67B5F5;
Sat, 18 Feb 2006 00:11:07 -0400 (AST)
X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 7BB8A9DCC4F;
Wed, 15 Feb 2006 21:37:57 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 79365-02; Wed, 15 Feb 2006 21:38:00 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from postal.corporate.connx.com (postal.corporate.connx.com [])
by postgresql.org (Postfix) with ESMTP id 33BEA9DCACE;
Wed, 15 Feb 2006 21:37:54 -0400 (AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Date: Wed, 15 Feb 2006 17:37:58 -0800
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D54C@postal.corporate.connx.com>
Thread-Topic: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Thread-Index: AcYyl2fPgxfNXHIRRyOEN4ZGeHtA3wAAEaNQ
From: "Dann Corbit" <DCorbit@connx.com>
To: "Tom Lane" <tgl@sss.pgh.pa.us>, "Ron" <rjpeace@earthlink.net>
cc: <pgsql-performance@postgresql.org>, <pgsql-hackers@postgresql.org>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.075 required=5 tests=[AWL=0.075]
X-Spam-Score: 0.075
X-Mailing-List: pgsql-performance
List-Archive: <http://archives.postgresql.org/pgsql-performance>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-performance.postgresql.org>
List-Owner: <mailto:pgsql-performance-owner@postgresql.org>
List-Post: <mailto:pgsql-performance@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-performance>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id k1I4BA515503
Status: ORr
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Tom Lane
> Sent: Wednesday, February 15, 2006 5:22 PM
> To: Ron
> Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
> behaviour)
> Ron <rjpeace@earthlink.net> writes:
> > How are we choosing our pivots?
> See qsort.c: it looks like median of nine equally spaced inputs (ie,
> the 1/8th points of the initial input array, plus the end points),
> implemented as two rounds of median-of-three choices. With half of
> data inputs zero, it's not too improbable for two out of the three
> samples to be zeroes in which case I think the med3 result will be
> --- so choosing a pivot of zero is much more probable than one would
> like, and doing so in many levels of recursion causes the problem.
Adding some randomness to the selection of the pivot is a known
technique to fix the oddball partitions problem. However, Bentley and
Sedgewick proved that every quick sort algorithm has some input set that
makes it go quadratic (hence the recent popularity of introspective
sort, which switches to heapsort if quadratic behavior is detected. The
C++ template I submitted was an example of introspective sort, but
PostgreSQL does not use C++ so it was not helpful).
> I think. I'm not too sure if the code isn't just being sloppy about
> case where many data values are equal to the pivot --- there's a
> case there to switch to insertion sort, and maybe that's getting
> too soon.
Here are some cases known to make qsort go quadratic:
1. Data already sorted
2. Data reverse sorted
3. Data organ-pipe sorted or ramp
4. Almost all data of the same value
There are probably other cases. Randomizing the pivot helps some, as
does check for in-order or reverse order partitions.
Imagine if 1/3 of the partitions fall into a category that causes
quadratic behavior (have one of the above formats and have more than
CUTOFF elements in them).
It is doubtful that the switch to insertion sort is causing any sort of
problems. It is only going to be invoked on tiny sets, for which it has
a fixed cost that is probably less that qsort() function calls on sets
of the same size.
>It'd be useful to get a line-level profile of the behavior of
> this code in the slow cases...
I guess that my in-order or presorted tests [which often arise when
there are very few distinct values] may solve the bad partition
problems. Don't forget that the algorithm is called recursively.
> regards, tom lane
> ---------------------------(end of
> TIP 3: Have you checked our extensive FAQ?
> http://www.postgresql.org/docs/faq
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
From kleptog@svana.org Mon Dec 19 06:37:51 2005
Return-path: <kleptog@svana.org>
Received: from svana.org (mail@svana.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBJBboe20936
for <pgman@candle.pha.pa.us>; Mon, 19 Dec 2005 06:37:51 -0500 (EST)
Received: from kleptog by svana.org with local (Exim 3.35 #1 (Debian))
id 1EoJKc-00045V-00; Mon, 19 Dec 2005 22:37:30 +1100
Date: Mon, 19 Dec 2005 12:37:30 +0100
From: Martijn van Oosterhout <kleptog@svana.org>
To: Dann Corbit <DCorbit@connx.com>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Qingqing Zhou <zhouqq@cs.toronto.edu>,
Bruce Momjian <pgman@candle.pha.pa.us>,
Luke Lonergan <llonergan@greenplum.com>, Neil Conway <neilc@samurai.com>,
Subject: Re: [HACKERS] Re: Which qsort is used
Message-ID: <20051219113724.GD12251@svana.org>
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
References: <D425483C2C5C9F49B5B7A41F8944154757D38D@postal.corporate.connx.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="5gxpn/Q6ypwruk0T"
Content-Disposition: inline
In-Reply-To: <D425483C2C5C9F49B5B7A41F8944154757D38D@postal.corporate.connx.com>
User-Agent: Mutt/1.3.28i
X-PGP-Key-ID: Length=1024; ID=0x0DC67BE6
X-PGP-Key-Fingerprint: 295F A899 A81A 156D B522 48A7 6394 F08A 0DC6 7BE6
X-PGP-Key-URL: <http://svana.org/kleptog/0DC67BE6.pgp.asc>
Status: OR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Fri, Dec 16, 2005 at 10:43:58PM -0800, Dann Corbit wrote:
> I am actually quite impressed with the excellence of Bentley's sort out
> of the box. It's definitely the best library implementation of a sort I
> have seen.
I'm not sure whether we have a conclusion here, but I do have one
question: is there a significant difference in the number of times the
comparison routines are called? Comparisons in PostgreSQL are fairly
expensive given the fmgr overhead and when comparing tuples it's even
We don't want to accedently pick a routine that saves data shuffling by
adding extra comparisons. The stats at [1] don't say. They try to
factor in CPU cost but they seem to use unrealistically small values. I
would think a number around 50 (or higher) would be more
[1] http://www.cs.toronto.edu/~zhouqq/postgresql/sort/sort.html
Have a nice day,
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
Content-Type: application/pgp-signature
Content-Disposition: inline
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
From mkoi-pg@aon.at Wed Dec 21 19:44:03 2005
Return-path: <mkoi-pg@aon.at>
Received: from email.aon.at (warsl404pip5.highway.telekom.at [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBM0i2e05649
for <pgman@candle.pha.pa.us>; Wed, 21 Dec 2005 19:44:02 -0500 (EST)
Received: (qmail 12703 invoked from network); 22 Dec 2005 00:43:51 -0000
Received: from m148p015.dipool.highway.telekom.at (HELO Sokrates) ([])
(envelope-sender <mkoi-pg@aon.at>)
by smarthub78.highway.telekom.at (qmail-ldap-1.03) with SMTP
for <tgl@sss.pgh.pa.us>; 22 Dec 2005 00:43:51 -0000
From: Manfred Koizar <mkoi-pg@aon.at>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: "Dann Corbit" <DCorbit@connx.com>, "Qingqing Zhou" <zhouqq@cs.toronto.edu>,
"Bruce Momjian" <pgman@candle.pha.pa.us>,
"Luke Lonergan" <llonergan@greenplum.com>,
"Neil Conway" <neilc@samurai.com>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Re: Which qsort is used
Date: Thu, 22 Dec 2005 01:43:34 +0100
Message-ID: <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D386@postal.corporate.connx.com> <3148.1134795805@sss.pgh.pa.us>
In-Reply-To: <3148.1134795805@sss.pgh.pa.us>
X-Mailer: Forte Agent 3.1/32.783
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR
On Sat, 17 Dec 2005 00:03:25 -0500, Tom Lane <tgl@sss.pgh.pa.us>
>I've still got a problem with these checks; I think they are a net
>waste of cycles on average. [...]
> and when they fail, those cycles are entirely wasted;
>you have not advanced the state of the sort at all.
How can we make the initial check "adavance the state of the sort"?
One answer might be to exclude the sorted sequence at the start of the
array from the qsort, and merge the two sorted lists as the final
stage of the sort.
Qsorting N elements costs O(N*lnN), so excluding H elements from the
sort reduces the cost by at least O(H*lnN). The merge step costs O(N)
plus some (<=50%) more memory, unless someone knows a fast in-place
merge. So depending on the constant factors involved there might be a
usable solution.
I've been playing with some numbers and assuming the constant factors
to be equal for all the O()'s this method starts to pay off at
H for N
20 100
130 1000
8000 100000
From pgsql-hackers-owner+M77795=pgman=candle.pha.pa.us@postgresql.org Thu Dec 22 02:02:28 2005
Return-path: <pgsql-hackers-owner+M77795=pgman=candle.pha.pa.us@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBM72Re16910
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 02:02:28 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id A31E067AAA0
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 03:02:22 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 2C8EC9DCA92
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 22 Dec 2005 03:01:56 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 26033-04
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Thu, 22 Dec 2005 03:01:55 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from svana.org (svana.org [])
by postgresql.org (Postfix) with ESMTP id 800859DC81D
for <pgsql-hackers@postgresql.org>; Thu, 22 Dec 2005 03:01:51 -0400 (AST)
Received: from kleptog by svana.org with local (Exim 3.35 #1 (Debian))
id 1EpKRg-0005ox-00; Thu, 22 Dec 2005 18:01:00 +1100
Date: Thu, 22 Dec 2005 08:01:00 +0100
From: Martijn van Oosterhout <kleptog@svana.org>
To: Manfred Koizar <mkoi-pg@aon.at>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Dann Corbit <DCorbit@connx.com>,
Qingqing Zhou <zhouqq@cs.toronto.edu>,
Bruce Momjian <pgman@candle.pha.pa.us>,
Luke Lonergan <llonergan@greenplum.com>, Neil Conway <neilc@samurai.com>,
Subject: Re: [HACKERS] Re: Which qsort is used
Message-ID: <20051222070057.GA21783@svana.org>
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
References: <D425483C2C5C9F49B5B7A41F8944154757D386@postal.corporate.connx.com> <3148.1134795805@sss.pgh.pa.us> <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="FL5UXtIhxfXey3p5"
Content-Disposition: inline
In-Reply-To: <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com>
User-Agent: Mutt/1.3.28i
X-PGP-Key-ID: Length=1024; ID=0x0DC67BE6
X-PGP-Key-Fingerprint: 295F A899 A81A 156D B522 48A7 6394 F08A 0DC6 7BE6
X-PGP-Key-URL: <http://svana.org/kleptog/0DC67BE6.pgp.asc>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.065 required=5 tests=[AWL=0.065]
X-Spam-Score: 0.065
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, Dec 22, 2005 at 01:43:34AM +0100, Manfred Koizar wrote:
> Qsorting N elements costs O(N*lnN), so excluding H elements from the
> sort reduces the cost by at least O(H*lnN). The merge step costs O(N)
> plus some (<=3D50%) more memory, unless someone knows a fast in-place
> merge. So depending on the constant factors involved there might be a
> usable solution.
But where are you including the cost to check how many cells are
already sorted? That would be O(H), right? This is where we come back
to the issue that comparisons in PostgreSQL are expensive. The cpu_cost
in the tests I saw so far is unrealistically low.
> I've been playing with some numbers and assuming the constant factors
> to be equal for all the O()'s this method starts to pay off at
> H for N
> 20 100 20%
> 130 1000 13%
> 8000 100000 8%
Hmm, what are the chances you have 100000 unordered items to sort and
that the first 8% will already be in order. ISTM that that probability
will be close enough to zero to not matter...
Have a nice day,
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
Content-Type: application/pgp-signature
Content-Disposition: inline
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
From pgsql-hackers-owner+M77831=pgman=candle.pha.pa.us@postgresql.org Thu Dec 22 16:59:19 2005
Return-path: <pgsql-hackers-owner+M77831=pgman=candle.pha.pa.us@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBMLxJe07480
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 16:59:19 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id D1DBE67AC1B
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 17:59:16 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id BE8249DCBEB
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 22 Dec 2005 17:58:53 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 64765-01
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Thu, 22 Dec 2005 17:58:54 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from email.aon.at (warsl404pip7.highway.telekom.at [])
by postgresql.org (Postfix) with ESMTP id 3E08E9DCA5C
for <pgsql-hackers@postgresql.org>; Thu, 22 Dec 2005 17:58:49 -0400 (AST)
Received: (qmail 6986 invoked from network); 22 Dec 2005 21:58:49 -0000
Received: from m150p015.dipool.highway.telekom.at (HELO Sokrates) ([])
(envelope-sender <mkoi-pg@aon.at>)
by smarthub76.highway.telekom.at (qmail-ldap-1.03) with SMTP
for <kleptog@svana.org>; 22 Dec 2005 21:58:49 -0000
From: Manfred Koizar <mkoi-pg@aon.at>
To: Martijn van Oosterhout <kleptog@svana.org>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Dann Corbit <DCorbit@connx.com>,
Qingqing Zhou <zhouqq@cs.toronto.edu>,
Bruce Momjian <pgman@candle.pha.pa.us>,
Luke Lonergan <llonergan@greenplum.com>, Neil Conway <neilc@samurai.com>,
Subject: Re: [HACKERS] Re: Which qsort is used
Date: Thu, 22 Dec 2005 22:58:31 +0100
Message-ID: <4r6mq19fe6937mu9130h45ip3oeg135qo3@4ax.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D386@postal.corporate.connx.com> <3148.1134795805@sss.pgh.pa.us> <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com> <20051222070057.GA21783@svana.org>
In-Reply-To: <20051222070057.GA21783@svana.org>
X-Mailer: Forte Agent 3.1/32.783
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.398 required=5 tests=[AWL=0.398]
X-Spam-Score: 0.398
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
On Thu, 22 Dec 2005 08:01:00 +0100, Martijn van Oosterhout
<kleptog@svana.org> wrote:
>But where are you including the cost to check how many cells are
>already sorted? That would be O(H), right?
Yes. I didn't mention it, because H < N.
> This is where we come back
>to the issue that comparisons in PostgreSQL are expensive.
So we agree that we should try to reduce the number of comparisons.
How many comparisons does it take to sort 100000 items? 1.5 million?
>Hmm, what are the chances you have 100000 unordered items to sort and
>that the first 8% will already be in order. ISTM that that probability
>will be close enough to zero to not matter...
If the items are totally unordered, the check is so cheap you won't
even notice. OTOH in Tom's example ...
|What I think is much more probable in the Postgres environment
|is almost-but-not-quite-ordered inputs --- eg, a table that was
|perfectly ordered by key when filled, but some of the tuples have since
|been moved by UPDATEs.
... I'd not be surprised if H is 90% of N.
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
From DCorbit@connx.com Thu Dec 22 17:22:03 2005
Return-path: <DCorbit@connx.com>
Received: from postal.corporate.connx.com (postal.corporate.connx.com [])
by candle.pha.pa.us (8.11.6/8.11.6) with SMTP id jBMMLve11671
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 17:22:03 -0500 (EST)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
Subject: RE: [HACKERS] Re: Which qsort is used
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Thu, 22 Dec 2005 14:21:49 -0800
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D3AC@postal.corporate.connx.com>
Thread-Topic: [HACKERS] Re: Which qsort is used
Thread-Index: AcYHQuXJdKs8JVgmSKywUqld6KYccQAAfWAA
From: "Dann Corbit" <DCorbit@connx.com>
To: "Manfred Koizar" <mkoi-pg@aon.at>,
"Martijn van Oosterhout" <kleptog@svana.org>
cc: "Tom Lane" <tgl@sss.pgh.pa.us>, "Qingqing Zhou" <zhouqq@cs.toronto.edu>,
"Bruce Momjian" <pgman@candle.pha.pa.us>,
"Luke Lonergan" <llonergan@greenplum.com>,
"Neil Conway" <neilc@samurai.com>, <pgsql-hackers@postgresql.org>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id jBMMLve11671
Status: OR
An interesting article on sorting and comparison count:
Here is the article, the code, and an implementation that I have been
toying with:
Algorithm quickheap is especially interesting because it does not
require much additional space (just an array of integers up to size
log(element_count) and in addition, it has very few data movements.
> -----Original Message-----
> From: Manfred Koizar [mailto:mkoi-pg@aon.at]
> Sent: Thursday, December 22, 2005 1:59 PM
> To: Martijn van Oosterhout
> Cc: Tom Lane; Dann Corbit; Qingqing Zhou; Bruce Momjian; Luke
> Neil Conway; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Re: Which qsort is used
> On Thu, 22 Dec 2005 08:01:00 +0100, Martijn van Oosterhout
> <kleptog@svana.org> wrote:
> >But where are you including the cost to check how many cells are
> >already sorted? That would be O(H), right?
> Yes. I didn't mention it, because H < N.
> > This is where we come back
> >to the issue that comparisons in PostgreSQL are expensive.
> So we agree that we should try to reduce the number of comparisons.
> How many comparisons does it take to sort 100000 items? 1.5 million?
> >Hmm, what are the chances you have 100000 unordered items to sort and
> >that the first 8% will already be in order. ISTM that that
> >will be close enough to zero to not matter...
> If the items are totally unordered, the check is so cheap you won't
> even notice. OTOH in Tom's example ...
> |What I think is much more probable in the Postgres environment
> |is almost-but-not-quite-ordered inputs --- eg, a table that was
> |perfectly ordered by key when filled, but some of the tuples have
> |been moved by UPDATEs.
> ... I'd not be surprised if H is 90% of N.
> Servus
> Manfred
From pgsql-hackers-owner@postgresql.org Mon Dec 19 13:36:58 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 1E0CC9DC810
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Mon, 19 Dec 2005 13:36:58 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 89341-07
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Mon, 19 Dec 2005 13:36:52 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from mail.mi8.com (d01gw02.mi8.com [])
by postgresql.org (Postfix) with ESMTP id 348A69DC9C2
for <pgsql-hackers@postgresql.org>; Mon, 19 Dec 2005 13:36:51 -0400 (AST)
Received: from by mail.mi8.com with ESMTP (- Welcome to Mi8
Corporation www.Mi8.com (D2)); Mon, 19 Dec 2005 12:36:45 -0500
X-Server-Uuid: 7829E76E-BB9E-4995-8473-3C0929DF7DD1
Received: from MI8NYCMAIL06.Mi8.com ([]) by
D01HOST03.Mi8.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 19 Dec
2005 12:36:44 -0500
Received: from ([]) by MI8NYCMAIL06.Mi8.com (
[]) via Exchange Front-End Server mi8owa.mi8.com (
[]) with Microsoft Exchange Server HTTP-DAV ; Mon, 19 Dec
2005 17:36:44 +0000
User-Agent: Microsoft-Entourage/
Date: Mon, 19 Dec 2005 09:36:44 -0800
Subject: Re: Re: Which qsort is used
From: "Luke Lonergan" <llonergan@greenplum.com>
To: "Martijn van Oosterhout" <kleptog@svana.org>,
"Dann Corbit" <DCorbit@connx.com>
cc: "Tom Lane" <tgl@sss.pgh.pa.us>,
"Qingqing Zhou" <zhouqq@cs.toronto.edu>,
"Bruce Momjian" <pgman@candle.pha.pa.us>,
"Neil Conway" <neilc@samurai.com>,
Message-ID: <BFCC2FAC.16CC0%llonergan@greenplum.com>
Thread-Topic: [HACKERS] Re: Which qsort is used
Thread-Index: AcYEkKvEA7duDr/yQneMyWGCfNr3rQAMhuDl
In-Reply-To: <20051219113724.GD12251@svana.org>
MIME-Version: 1.0
X-OriginalArrivalTime: 19 Dec 2005 17:36:44.0849 (UTC)
X-WSS-ID: 6FB830272346940585-01-01
Content-Type: text/plain;
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=1.253 required=5 tests=[AWL=0.000,
X-Spam-Score: 1.253
X-Spam-Level: *
X-Archive-Number: 200512/868
X-Sequence-Number: 77716
Status: OR
On 12/19/05 3:37 AM, "Martijn van Oosterhout" <kleptog@svana.org> wrote:
> I'm not sure whether we have a conclusion here, but I do have one
> question: is there a significant difference in the number of times the
> comparison routines are called? Comparisons in PostgreSQL are fairly
> expensive given the fmgr overhead and when comparing tuples it's even
> worse.
It would be interesting to note the comparison count of the different
Something that really grabbed me about the results though is that the
relative performance of the routines dramatically shifted when the indirect
references in the comparators went in. The first test I did sorted an array
of int4 - these tests that Qingqing did sorted arrays using an indirect
pointer list, at which point the same distributions performed very
I suspect that it is the number of comparisons that caused this, and further
that the indirection has disabled the compiler optimizations for memory
prefetch and other things that it could normally recognize. Given the usage
pattern in Postgres, where sorted things are a mix of strings and intrinsic
types, I'm not sure those optimizations could be done by one routine.
I haven't verified this, but it certainly seems that the NetBSD routine is
the overall winner for the type of use that Postgres has (sorting the using
a pointer list).
- Luke
From pgsql-hackers-owner+M81165@postgresql.org Thu Mar 16 18:37:28 2006
Return-path: <pgsql-hackers-owner+M81165@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2GNbOu11277
for <pgman@candle.pha.pa.us>; Thu, 16 Mar 2006 18:37:25 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id A609567BADC;
Thu, 16 Mar 2006 19:37:21 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 8E8E19DC828
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 16 Mar 2006 19:36:50 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 31174-02
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Thu, 16 Mar 2006 19:36:52 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id 8CA419DC840
for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2006 19:36:46 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2GNagfd023078;
Thu, 16 Mar 2006 18:36:42 -0500 (EST)
To: "Dann Corbit" <DCorbit@connx.com>
cc: "Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
In-Reply-To: <D425483C2C5C9F49B5B7A41F8944154757D67F@postal.corporate.connx.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D67F@postal.corporate.connx.com>
Comments: In-reply-to "Dann Corbit" <DCorbit@connx.com>
message dated "Thu, 16 Mar 2006 13:27:33 -0800"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"
Content-ID: <23060.1142551929.0@sss.pgh.pa.us>
Date: Thu, 16 Mar 2006 18:36:42 -0500
Message-ID: <23077.1142552202@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <23060.1142551929.1@sss.pgh.pa.us>
>> So at least on randomized data, the swap_cnt thing is a serious loser.
>> Need to run some tests on special-case inputs though. Anyone have a
>> test suite they like?
> Here is a distribution maker that will create some torture tests for
> sorting programs.
I fleshed out the sort tester that Bentley & McIlroy give pseudocode for
in their paper (attached below in case anyone wants to hack on it). Not
very surprisingly, it shows the unmodified B&M algorithm as
significantly better than the BSD-lite version:
Our current BSD qsort:
distribution SAWTOOTH: max cratio 12.9259, average 0.870261 over 252 tests
distribution RAND: max cratio 1.07917, average 0.505924 over 252 tests
distribution STAGGER: max cratio 12.9259, average 1.03706 over 252 tests
distribution PLATEAU: max cratio 12.9259, average 0.632514 over 252 tests
distribution SHUFFLE: max cratio 12.9259, average 1.21631 over 252 tests
method COPY: max cratio 3.87533, average 0.666927 over 210 tests
method REVERSE: max cratio 5.6248, average 0.710284 over 210 tests
method FREVERSE: max cratio 12.9259, average 1.58323 over 210 tests
method BREVERSE: max cratio 5.72661, average 1.13674 over 210 tests
method SORT: max cratio 0.758625, average 0.350092 over 210 tests
method DITHER: max cratio 3.13417, average 0.667222 over 210 tests
Overall: average cratio 0.852415 over 1260 tests
without the swap_cnt code:
distribution SAWTOOTH: max cratio 5.6248, average 0.745818 over 252 tests
distribution RAND: max cratio 1.07917, average 0.510097 over 252 tests
distribution STAGGER: max cratio 5.6248, average 1.0494 over 252 tests
distribution PLATEAU: max cratio 3.57655, average 0.411549 over 252 tests
distribution SHUFFLE: max cratio 5.72661, average 1.05988 over 252 tests
method COPY: max cratio 3.87533, average 0.712122 over 210 tests
method REVERSE: max cratio 5.6248, average 0.751011 over 210 tests
method FREVERSE: max cratio 4.80869, average 0.690224 over 210 tests
method BREVERSE: max cratio 5.72661, average 1.13673 over 210 tests
method SORT: max cratio 0.806618, average 0.539829 over 210 tests
method DITHER: max cratio 3.13417, average 0.702174 over 210 tests
Overall: average cratio 0.755348 over 1260 tests
("cratio" is the ratio of the actual number of comparison function calls
to the theoretical expectation of N*lg2(N).) The insertion sort
switchover is a loser for both average and worst-case measurements.
I tried Dann's distributions too, with N = 100000:
Our current BSD qsort:
dist fib: cratio 0.0694229
dist camel: cratio 0.0903228
dist constant: cratio 0.0602126
dist five: cratio 0.132288
dist ramp: cratio 4.29937
dist random: cratio 1.09286
dist reverse: cratio 0.5663
dist sorted: cratio 0.18062
dist ten: cratio 0.174781
dist twenty: cratio 0.238098
dist two: cratio 0.090365
dist perverse: cratio 0.334503
dist trig: cratio 0.679846
Overall: max cratio 4.29937, average cratio 0.616076 over 13 tests
without the swap_cnt code:
dist fib: cratio 0.0694229
dist camel: cratio 0.0903228
dist constant: cratio 0.0602126
dist five: cratio 0.132288
dist ramp: cratio 4.29937
dist random: cratio 1.09286
dist reverse: cratio 0.89184
dist sorted: cratio 0.884907
dist ten: cratio 0.174781
dist twenty: cratio 0.238098
dist two: cratio 0.090365
dist perverse: cratio 0.334503
dist trig: cratio 0.679846
Overall: max cratio 4.29937, average cratio 0.695293 over 13 tests
In this set of tests the behavior is just about identical, except for
the case of already-sorted input, where the BSD coding runs in O(N)
instead of O(N lg2 N) time. So that evidently is why some unknown
person put in the special case.
Some further experimentation destroys my original proposal to limit the
size of subfile we'll use the swap_cnt code for: it turns out that that
eliminates the BSD code's advantage for presorted input (at least for
inputs bigger than the limit) without doing anything much in return.
So my feeling is we should just remove the swap_cnt code and return to
the original B&M algorithm. Being much faster than expected for
presorted input doesn't justify being far slower than expected for
other inputs, IMHO. In the context of Postgres I doubt that perfectly
sorted input shows up very often anyway.
regards, tom lane
------- =_aaaaaaaaaa0
Content-Type: application/octet-stream
Content-ID: <23060.1142551929.2@sss.pgh.pa.us>
Content-Description: sorttester.c
Content-Transfer-Encoding: base64
------- =_aaaaaaaaaa0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
------- =_aaaaaaaaaa0--
From pgsql-hackers-owner+M81167@postgresql.org Thu Mar 16 18:48:37 2006
Return-path: <pgsql-hackers-owner+M81167@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2GNmbu12770
for <pgman@candle.pha.pa.us>; Thu, 16 Mar 2006 18:48:37 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 570D567BADC;
Thu, 16 Mar 2006 19:48:35 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id B49219DCBC2
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 16 Mar 2006 19:48:12 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 28142-10
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Thu, 16 Mar 2006 19:48:15 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id 9E95A9DCBAD
for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2006 19:48:10 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2GNm9Kt023199;
Thu, 16 Mar 2006 18:48:09 -0500 (EST)
To: Darcy Buskermolen <darcy@wavefire.com>
cc: pgsql-hackers@postgresql.org, "Dann Corbit" <DCorbit@connx.com>,
"Jonah H. Harris" <jonah.harris@gmail.com>,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
In-Reply-To: <200603161541.25929.darcy@wavefire.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D67D@postal.corporate.connx.com> <19646.1142539750@sss.pgh.pa.us> <200603161541.25929.darcy@wavefire.com>
Comments: In-reply-to Darcy Buskermolen <darcy@wavefire.com>
message dated "Thu, 16 Mar 2006 15:41:24 -0800"
Date: Thu, 16 Mar 2006 18:48:09 -0500
Message-ID: <23198.1142552889@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Darcy Buskermolen <darcy@wavefire.com> writes:
> On Thursday 16 March 2006 12:09, Tom Lane wrote:
>> So we still have a problem of software archaeology: who added the
>> insertion sort switch to the NetBSD version, and on what grounds?
> This is when that particular code was pushed in, as to why exactly, you'll
> have to ask mycroft.
> http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/stdlib/qsort.c.diff?r1=1.3&r2=1.4&only_with_tag=MAIN
Interesting. It looks to me like he replaced the former
vaguely-Knuth-based coding with B&M's code, but kept the insertion-
sort-after-no-swap special case that was in the previous code. I'll
betcha he didn't test to see whether this was actually such a great
idea ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
From pgsql-hackers-owner+M81168@postgresql.org Thu Mar 16 19:42:51 2006
Return-path: <pgsql-hackers-owner+M81168@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2H0gpu19805
for <pgman@candle.pha.pa.us>; Thu, 16 Mar 2006 19:42:51 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 5F20967BADE;
Thu, 16 Mar 2006 20:42:48 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id AA2239DCBAD
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 16 Mar 2006 20:42:20 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 46728-01
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Thu, 16 Mar 2006 20:42:23 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from postal.corporate.connx.com (postal.corporate.connx.com [])
by postgresql.org (Postfix) with ESMTP id 062EC9DCBA4
for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2006 20:42:17 -0400 (AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
Subject: Re: [HACKERS] qsort, once again
Date: Thu, 16 Mar 2006 16:42:20 -0800
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D688@postal.corporate.connx.com>
Thread-Topic: [HACKERS] qsort, once again
Thread-Index: AcZJUnwKofAzJ+OKTcqF67UTsFWJEQACP7AA
From: "Dann Corbit" <DCorbit@connx.com>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
cc: "Jonah H. Harris" <jonah.harris@gmail.com>, <pgsql-hackers@postgresql.org>,
"Jerry Sievers" <jerry@jerrysievers.com>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.096 required=5 tests=[AWL=0.096]
X-Spam-Score: 0.096
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id k2H0gpu19805
Status: OR
> So my feeling is we should just remove the swap_cnt code and return to
> the original B&M algorithm. Being much faster than expected for
> presorted input doesn't justify being far slower than expected for
> other inputs, IMHO. In the context of Postgres I doubt that perfectly
> sorted input shows up very often anyway.
> Comments?
Checking for presorted input is O(n).
If the input is random, an average of 3 elements will be tested.
So adding an in-order check of the data should not be too expensive.
I would benchmark several approaches and see which one is best when used
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
From pgsql-hackers-owner+M81169@postgresql.org Thu Mar 16 20:13:08 2006
Return-path: <pgsql-hackers-owner+M81169@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2H1D7u25008
for <pgman@candle.pha.pa.us>; Thu, 16 Mar 2006 20:13:07 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 6B48F67BADE;
Thu, 16 Mar 2006 21:13:04 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 19FAD9DCC5C
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 16 Mar 2006 21:12:36 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 53608-01
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Thu, 16 Mar 2006 21:12:37 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from postal.corporate.connx.com (postal.corporate.connx.com [])
by postgresql.org (Postfix) with ESMTP id 839069DCC2D
for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2006 21:12:32 -0400 (AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
Subject: Re: [HACKERS] qsort, once again
Date: Thu, 16 Mar 2006 17:12:35 -0800
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D68B@postal.corporate.connx.com>
Thread-Topic: [HACKERS] qsort, once again
From: "Dann Corbit" <DCorbit@connx.com>
To: "Dann Corbit" <DCorbit@connx.com>, "Tom Lane" <tgl@sss.pgh.pa.us>
cc: "Jonah H. Harris" <jonah.harris@gmail.com>, <pgsql-hackers@postgresql.org>,
"Jerry Sievers" <jerry@jerrysievers.com>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.096 required=5 tests=[AWL=0.096]
X-Spam-Score: 0.096
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id k2H1D7u25008
Status: OR
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Dann Corbit
> Sent: Thursday, March 16, 2006 4:42 PM
> To: Tom Lane
> Cc: Jonah H. Harris; pgsql-hackers@postgresql.org; Jerry Sievers
> Subject: Re: [HACKERS] qsort, once again
> > So my feeling is we should just remove the swap_cnt code and return
> > the original B&M algorithm. Being much faster than expected for
> > presorted input doesn't justify being far slower than expected for
> > other inputs, IMHO. In the context of Postgres I doubt that
> > sorted input shows up very often anyway.
> >
> > Comments?
> Checking for presorted input is O(n).
> If the input is random, an average of 3 elements will be tested.
> So adding an in-order check of the data should not be too expensive.
> I would benchmark several approaches and see which one is best when
> in-place.
Even if "hunks" of the input are sorted, the test is a very good idea.
Recall that we are sorting recursively and so we divide the data into
Consider an example...
Quicksort of a field that contains Sex as 'M' for male, 'F' for female,
or NULL for unknown.
The median selection is going to pick one of 'M', 'F', or NULL.
After pass 1 of qsort we will have two partitions. One partition will
have all of one type and the other partition will have the other two
An in-order check will tell us that the monotone partition is sorted and
we are done with it.
Imagine also a table that was clustered but for which we have not
updated statistics. Perhaps it is 98% sorted. Checking for order in
our partitions is probably a good idea.
I think you could also get a good optimization if you are checking for
partitions and find a big section of the partition is not ordered (even
though the whole thing is not). If you could perk the ordered size up
the tree, you could just add another partition to the merge list and
sort the unordered part.
In "C Unleashed" I call this idea partition discovery mergesort.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
From pgsql-hackers-owner+M81172@postgresql.org Fri Mar 17 00:27:41 2006
Return-path: <pgsql-hackers-owner+M81172@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2H5Reu14258
for <pgman@candle.pha.pa.us>; Fri, 17 Mar 2006 00:27:40 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id CE86767BAEA;
Fri, 17 Mar 2006 01:27:36 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 465549DC874
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Fri, 17 Mar 2006 01:27:11 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 76897-01
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Fri, 17 Mar 2006 01:27:10 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id 2456F9DC871
for <pgsql-hackers@postgresql.org>; Fri, 17 Mar 2006 01:27:08 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2H5R6PF025295;
Fri, 17 Mar 2006 00:27:06 -0500 (EST)
To: "Dann Corbit" <DCorbit@connx.com>
cc: "Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
In-Reply-To: <D425483C2C5C9F49B5B7A41F8944154757D68B@postal.corporate.connx.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D68B@postal.corporate.connx.com>
Comments: In-reply-to "Dann Corbit" <DCorbit@connx.com>
message dated "Thu, 16 Mar 2006 17:12:35 -0800"
Date: Fri, 17 Mar 2006 00:27:05 -0500
Message-ID: <25294.1142573225@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
"Dann Corbit" <DCorbit@connx.com> writes:
>> So my feeling is we should just remove the swap_cnt code and return
>> to the original B&M algorithm.
> Even if "hunks" of the input are sorted, the test is a very good idea.
Yah know, guys, Bentley and McIlroy are each smarter than any five of
us, and I'm quite certain it occurred to them to try prechecking for
sorted input. If that case is not in their code then it's probably
because it's a net loss. Unless you have reason to think that sorted
input is *more* common than other cases for the Postgres environment,
which is certainly a fact not in evidence.
(Bentley was my thesis adviser for awhile before he went to Bell Labs,
so my respect for him is based on direct personal experience. McIlroy
I only know by reputation, but he's sure got a ton of that.)
> Imagine also a table that was clustered but for which we have not
> updated statistics. Perhaps it is 98% sorted. Checking for order in
> our partitions is probably a good idea.
If we are using the sort code rather than the recently-clustered index
for such a case, then we have problems elsewhere. This scenario is not
a good argument that the sort code needs to be specialized to handle
this case at the expense of other cases; the place to be fixing it is
the planner or the statistics-management code.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
From pgsql-hackers-owner+M81173@postgresql.org Fri Mar 17 00:29:24 2006
Return-path: <pgsql-hackers-owner+M81173@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2H5TNu14505
for <pgman@candle.pha.pa.us>; Fri, 17 Mar 2006 00:29:23 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id A271D67BAEA;
Fri, 17 Mar 2006 01:29:19 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id C96D79DCA40
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Fri, 17 Mar 2006 01:28:55 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 44062-07
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Fri, 17 Mar 2006 01:28:54 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from postal.corporate.connx.com (postal.corporate.connx.com [])
by postgresql.org (Postfix) with ESMTP id CDE6C9DCA4B
for <pgsql-hackers@postgresql.org>; Fri, 17 Mar 2006 01:28:53 -0400 (AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
Subject: Re: [HACKERS] qsort, once again
Date: Thu, 16 Mar 2006 21:28:52 -0800
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com>
Thread-Topic: [HACKERS] qsort, once again
Thread-Index: AcZJg2/Nvc2IdeFUT0id8WtNczZvGQAACfYw
From: "Dann Corbit" <DCorbit@connx.com>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
cc: "Jonah H. Harris" <jonah.harris@gmail.com>, <pgsql-hackers@postgresql.org>,
"Jerry Sievers" <jerry@jerrysievers.com>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.097 required=5 tests=[AWL=0.097]
X-Spam-Score: 0.097
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id k2H5TNu14505
Status: OR
Well, my point was that it is a snap to implement and test.
It will be better, worse, or the same.
I agree that Bentley is a bloody genius.
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, March 16, 2006 9:27 PM
> To: Dann Corbit
> Cc: Jonah H. Harris; pgsql-hackers@postgresql.org; Jerry Sievers
> Subject: Re: [HACKERS] qsort, once again
> "Dann Corbit" <DCorbit@connx.com> writes:
> >> So my feeling is we should just remove the swap_cnt code and return
> >> to the original B&M algorithm.
> > Even if "hunks" of the input are sorted, the test is a very good
> Yah know, guys, Bentley and McIlroy are each smarter than any five of
> us, and I'm quite certain it occurred to them to try prechecking for
> sorted input. If that case is not in their code then it's probably
> because it's a net loss. Unless you have reason to think that sorted
> input is *more* common than other cases for the Postgres environment,
> which is certainly a fact not in evidence.
> (Bentley was my thesis adviser for awhile before he went to Bell Labs,
> so my respect for him is based on direct personal experience. McIlroy
> I only know by reputation, but he's sure got a ton of that.)
> > Imagine also a table that was clustered but for which we have not
> > updated statistics. Perhaps it is 98% sorted. Checking for order
> > our partitions is probably a good idea.
> If we are using the sort code rather than the recently-clustered index
> for such a case, then we have problems elsewhere. This scenario is
> a good argument that the sort code needs to be specialized to handle
> this case at the expense of other cases; the place to be fixing it is
> the planner or the statistics-management code.
> regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
From pgsql-hackers-owner+M81277@postgresql.org Tue Mar 21 13:53:08 2006
Return-path: <pgsql-hackers-owner+M81277@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2LIr6M03797
for <pgman@candle.pha.pa.us>; Tue, 21 Mar 2006 13:53:06 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 01A8467BBF2;
Tue, 21 Mar 2006 14:53:00 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 0ED1D9DCCD2
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Tue, 21 Mar 2006 14:52:29 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 38232-04-3
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Tue, 21 Mar 2006 14:52:26 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id 1A0EE9DCC2C
for <pgsql-hackers@postgresql.org>; Tue, 21 Mar 2006 14:52:22 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2LIqHPc018733;
Tue, 21 Mar 2006 13:52:17 -0500 (EST)
To: "Dann Corbit" <DCorbit@connx.com>
cc: "Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
In-Reply-To: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com>
Comments: In-reply-to "Dann Corbit" <DCorbit@connx.com>
message dated "Thu, 16 Mar 2006 21:28:52 -0800"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"
Content-ID: <18685.1142966822.0@sss.pgh.pa.us>
Date: Tue, 21 Mar 2006 13:52:17 -0500
Message-ID: <18732.1142967137@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <18685.1142966822.1@sss.pgh.pa.us>
"Dann Corbit" <DCorbit@connx.com> writes:
> Well, my point was that it is a snap to implement and test.
Well, having done this, I have to eat my words: it does seem to be a
pretty good idea.
The following test numbers are using Bentley & McIlroy's test framework,
but modified to test only the case N=10000 rather than the four smaller
N values they originally used. I did that because it exposes quadratic
behavior more obviously, and the variance in N made it harder to compare
comparison ratios for different cases. I also added a "NEARSORT" test
method, which sorts the input distribution and then exchanges two
elements chosen at random. I did that because I was concerned that
nearly sorted input would be the worst case for the presorted-input
check, as it would waste the most cycles before failing on such input.
With our existing qsort code, the results look like
distribution SAWTOOTH: max cratio 94.17, min 0.08, average 1.56 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.51 over 105 tests
distribution STAGGER: max cratio 6.08, min 0.23, average 1.01 over 105 tests
distribution PLATEAU: max cratio 94.17, min 0.08, average 2.12 over 105 tests
distribution SHUFFLE: max cratio 94.17, min 0.23, average 1.92 over 105 tests
method COPY: max cratio 6.08, min 0.08, average 0.72 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.69 over 75 tests
method FREVERSE: max cratio 94.17, min 0.08, average 5.71 over 75 tests
method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests
method SORT: max cratio 0.82, min 0.08, average 0.31 over 75 tests
method NEARSORT: max cratio 0.82, min 0.08, average 0.36 over 75 tests
method DITHER: max cratio 5.52, min 0.18, average 0.77 over 75 tests
Overall: average cratio 1.42 over 525 tests
("cratio" is the ratio of the actual number of comparison function calls
to the theoretical expectation, N log2(N))
That's pretty awful: there are several test cases that make it use
nearly 100 times the expected number of comparisons.
Removing the swap_cnt test to bring it close to B&M's original
recommendations, we get
distribution SAWTOOTH: max cratio 3.85, min 0.08, average 0.70 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.52 over 105 tests
distribution STAGGER: max cratio 6.08, min 0.58, average 1.12 over 105 tests
distribution PLATEAU: max cratio 3.70, min 0.08, average 0.34 over 105 tests
distribution SHUFFLE: max cratio 3.86, min 0.86, average 1.24 over 105 tests
method COPY: max cratio 6.08, min 0.08, average 0.76 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.75 over 75 tests
method FREVERSE: max cratio 4.56, min 0.08, average 0.73 over 75 tests
method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests
method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests
Overall: average cratio 0.78 over 525 tests
which is a whole lot better as to both average and worst cases.
I then added some code to check for presorted input (just after the
n<7 insertion sort code):
presorted = 1;
for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es)
if (cmp(pm - es, pm) > 0)
presorted = 0;
if (presorted)
This gives
distribution SAWTOOTH: max cratio 3.88, min 0.08, average 0.62 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.46 over 105 tests
distribution STAGGER: max cratio 6.15, min 0.08, average 0.98 over 105 tests
distribution PLATEAU: max cratio 3.79, min 0.08, average 0.31 over 105 tests
distribution SHUFFLE: max cratio 3.91, min 0.08, average 1.09 over 105 tests
method COPY: max cratio 6.15, min 0.08, average 0.72 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.76 over 75 tests
method FREVERSE: max cratio 4.58, min 0.08, average 0.73 over 75 tests
method BREVERSE: max cratio 3.91, min 0.08, average 1.44 over 75 tests
method SORT: max cratio 0.08, min 0.08, average 0.08 over 75 tests
method NEARSORT: max cratio 0.89, min 0.08, average 0.39 over 75 tests
method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests
Overall: average cratio 0.69 over 525 tests
So the worst case seems only very marginally worse, and there is a
definite improvement in the average case, even for inputs that aren't
entirely sorted. Importantly, the "near sorted" case that I thought
might send it into quadratic behavior doesn't seem to do that.
So, unless anyone wants to do further testing, I'll go ahead and commit
these changes.
regards, tom lane
PS: Just as a comparison point, here are the results when testing HPUX's
library qsort:
distribution SAWTOOTH: max cratio 7.00, min 0.08, average 0.76 over 105 tests
distribution RAND: max cratio 1.11, min 0.08, average 0.53 over 105 tests
distribution STAGGER: max cratio 7.05, min 0.58, average 1.24 over 105 tests
distribution PLATEAU: max cratio 7.00, min 0.08, average 0.43 over 105 tests
distribution SHUFFLE: max cratio 7.00, min 0.86, average 1.54 over 105 tests
method COPY: max cratio 6.70, min 0.08, average 0.79 over 75 tests
method REVERSE: max cratio 7.05, min 0.08, average 0.78 over 75 tests
method FREVERSE: max cratio 7.00, min 0.08, average 0.77 over 75 tests
method BREVERSE: max cratio 7.00, min 0.08, average 2.11 over 75 tests
method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method DITHER: max cratio 4.06, min 0.16, average 0.74 over 75 tests
Overall: average cratio 0.90 over 525 tests
and here are the results using glibc's qsort, which of course isn't
quicksort at all but some kind of merge sort:
distribution SAWTOOTH: max cratio 0.90, min 0.49, average 0.65 over 105 tests
distribution RAND: max cratio 0.91, min 0.49, average 0.76 over 105 tests
distribution STAGGER: max cratio 0.92, min 0.49, average 0.70 over 105 tests
distribution PLATEAU: max cratio 0.84, min 0.49, average 0.54 over 105 tests
distribution SHUFFLE: max cratio 0.64, min 0.49, average 0.52 over 105 tests
method COPY: max cratio 0.92, min 0.49, average 0.66 over 75 tests
method REVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests
method FREVERSE: max cratio 0.92, min 0.49, average 0.67 over 75 tests
method BREVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests
method SORT: max cratio 0.49, min 0.49, average 0.49 over 75 tests
method NEARSORT: max cratio 0.55, min 0.49, average 0.51 over 75 tests
method DITHER: max cratio 0.92, min 0.50, average 0.74 over 75 tests
Overall: average cratio 0.63 over 525 tests
PPS: final version of test framework attached for the archives.
------- =_aaaaaaaaaa0
Content-Type: application/octet-stream
Content-ID: <18685.1142966822.2@sss.pgh.pa.us>
Content-Description: sorttester.c
Content-Transfer-Encoding: base64
------- =_aaaaaaaaaa0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
------- =_aaaaaaaaaa0--
From pgsql-hackers-owner+M81283@postgresql.org Tue Mar 21 15:18:07 2006
Return-path: <pgsql-hackers-owner+M81283@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2LKI7M12970
for <pgman@candle.pha.pa.us>; Tue, 21 Mar 2006 15:18:07 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id AABA167BBF8;
Tue, 21 Mar 2006 16:18:04 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 3E6009DC827
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Tue, 21 Mar 2006 16:17:38 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 56406-07
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Tue, 21 Mar 2006 16:17:38 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from stark.xeocode.com (stark.xeocode.com [])
by postgresql.org (Postfix) with ESMTP id 21DCF9DC809
for <pgsql-hackers@postgresql.org>; Tue, 21 Mar 2006 16:17:35 -0400 (AST)
Received: from localhost ([] helo=stark.xeocode.com)
by stark.xeocode.com with smtp (Exim 3.36 #1 (Debian))
id 1FLnI7-0004xB-00; Tue, 21 Mar 2006 15:17:19 -0500
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: "Dann Corbit" <DCorbit@connx.com>,
"Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
References: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com>
In-Reply-To: <18732.1142967137@sss.pgh.pa.us>
From: Greg Stark <gsstark@mit.edu>
Organization: The Emacs Conspiracy; member since 1992
Date: 21 Mar 2006 15:17:19 -0500
Message-ID: <87lkv3mu28.fsf@stark.xeocode.com>
Lines: 13
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.128 required=5 tests=[AWL=0.128]
X-Spam-Score: 0.128
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Tom Lane <tgl@sss.pgh.pa.us> writes:
> and here are the results using glibc's qsort, which of course isn't
> quicksort at all but some kind of merge sort:
> ...
> Overall: average cratio 0.63 over 525 tests
That looks better both on average and in the worst case. Are the time
constants that much worse that the merge sort still takes longer?
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
From pgsql-hackers-owner+M81285@postgresql.org Tue Mar 21 15:38:06 2006
Return-path: <pgsql-hackers-owner+M81285@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2LKc6M14799
for <pgman@candle.pha.pa.us>; Tue, 21 Mar 2006 15:38:06 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 0917767BBF8;
Tue, 21 Mar 2006 16:38:03 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 069389DC843
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Tue, 21 Mar 2006 16:37:39 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 60037-07
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Tue, 21 Mar 2006 16:37:39 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id BDC039DC827
for <pgsql-hackers@postgresql.org>; Tue, 21 Mar 2006 16:37:36 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2LKbZid019858;
Tue, 21 Mar 2006 15:37:35 -0500 (EST)
To: Greg Stark <gsstark@mit.edu>
cc: "Dann Corbit" <DCorbit@connx.com>,
"Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
In-Reply-To: <87lkv3mu28.fsf@stark.xeocode.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com> <18732.1142967137@sss.pgh.pa.us> <87lkv3mu28.fsf@stark.xeocode.com>
Comments: In-reply-to Greg Stark <gsstark@mit.edu>
message dated "21 Mar 2006 15:17:19 -0500"
Date: Tue, 21 Mar 2006 15:37:35 -0500
Message-ID: <19857.1142973455@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Greg Stark <gsstark@mit.edu> writes:
> That looks better both on average and in the worst case. Are the time
> constants that much worse that the merge sort still takes longer?
Keep in mind that this is only counting the number of
comparison-function calls; it's not accounting for any other effects.
In particular, for a large sort operation quicksort might win because of
its more cache-friendly memory access patterns.
The whole question of our qsort vs the system library's qsort probably
needs to be revisited, however, now that we've identified and fixed this
particular performance issue.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
From pgsql-hackers-owner+M81289@postgresql.org Tue Mar 21 16:27:30 2006
Return-path: <pgsql-hackers-owner+M81289@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2LLRTM20101
for <pgman@candle.pha.pa.us>; Tue, 21 Mar 2006 16:27:30 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id ACB4F67BBFD;
Tue, 21 Mar 2006 17:27:27 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 16E0E9DCA0F
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Tue, 21 Mar 2006 17:27:01 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 69903-02
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Tue, 21 Mar 2006 17:27:02 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from stark.xeocode.com (stark.xeocode.com [])
by postgresql.org (Postfix) with ESMTP id 107429DC867
for <pgsql-hackers@postgresql.org>; Tue, 21 Mar 2006 17:26:58 -0400 (AST)
Received: from localhost ([] helo=stark.xeocode.com)
by stark.xeocode.com with smtp (Exim 3.36 #1 (Debian))
id 1FLoNU-0006M4-00; Tue, 21 Mar 2006 16:26:56 -0500
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Greg Stark <gsstark@mit.edu>, "Dann Corbit" <DCorbit@connx.com>,
"Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
References: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com>
<18732.1142967137@sss.pgh.pa.us> <87lkv3mu28.fsf@stark.xeocode.com>
In-Reply-To: <19857.1142973455@sss.pgh.pa.us>
From: Greg Stark <gsstark@mit.edu>
Organization: The Emacs Conspiracy; member since 1992
Date: 21 Mar 2006 16:26:55 -0500
Message-ID: <87acbjmqu8.fsf@stark.xeocode.com>
Lines: 22
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.128 required=5 tests=[AWL=0.128]
X-Spam-Score: 0.128
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Greg Stark <gsstark@mit.edu> writes:
> > That looks better both on average and in the worst case. Are the time
> > constants that much worse that the merge sort still takes longer?
> Keep in mind that this is only counting the number of
> comparison-function calls; it's not accounting for any other effects.
> In particular, for a large sort operation quicksort might win because of
> its more cache-friendly memory access patterns.
My question explicitly recognized that possibility. I'm just a little
skeptical since the comparison function in Postgres is often not some simple
bit of tightly optimized C code, but rather a complex locale sensitive
comparison function or even a bit of SQL expression to evaluate.
Cache effectiveness is may be a minimal factor anyways when the comparison is
executing more than a minimal amount of code. And one extra comparison is
going to cost a lot more too.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
From pgsql-hackers-owner+M81290@postgresql.org Tue Mar 21 16:48:00 2006
Return-path: <pgsql-hackers-owner+M81290@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2LLlxM22215
for <pgman@candle.pha.pa.us>; Tue, 21 Mar 2006 16:47:59 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 28A9867BBFD;
Tue, 21 Mar 2006 17:47:57 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 1B4849DCC25
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Tue, 21 Mar 2006 17:47:27 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 72535-05
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Tue, 21 Mar 2006 17:47:28 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id D27239DCC21
for <pgsql-hackers@postgresql.org>; Tue, 21 Mar 2006 17:47:24 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2LLlNpJ002194;
Tue, 21 Mar 2006 16:47:23 -0500 (EST)
To: Greg Stark <gsstark@mit.edu>
cc: "Dann Corbit" <DCorbit@connx.com>,
"Jonah H. Harris" <jonah.harris@gmail.com>, pgsql-hackers@postgresql.org,
"Jerry Sievers" <jerry@jerrysievers.com>
Subject: Re: [HACKERS] qsort, once again
In-Reply-To: <87acbjmqu8.fsf@stark.xeocode.com>
References: <D425483C2C5C9F49B5B7A41F8944154757D6A1@postal.corporate.connx.com> <18732.1142967137@sss.pgh.pa.us> <87lkv3mu28.fsf@stark.xeocode.com> <19857.1142973455@sss.pgh.pa.us> <87acbjmqu8.fsf@stark.xeocode.com>
Comments: In-reply-to Greg Stark <gsstark@mit.edu>
message dated "21 Mar 2006 16:26:55 -0500"
Date: Tue, 21 Mar 2006 16:47:23 -0500
Message-ID: <2193.1142977643@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Greg Stark <gsstark@mit.edu> writes:
> My question explicitly recognized that possibility. I'm just a little
> skeptical since the comparison function in Postgres is often not some simple
> bit of tightly optimized C code, but rather a complex locale sensitive
> comparison function or even a bit of SQL expression to evaluate.
Yeah, I'd guess the same way, but OTOH at least a few people have
reported that our qsort code is consistently faster than glibc's (and
that was before this fix). See this thread:
Currently I believe that we only use our qsort on Solaris, not any other
platform, so if you think that glibc's qsort is better then you've
already got your wish. It seems to need more investigation though.
In particular, I'm thinking that the various adjustments we've made
to the sort support code over the past month probably invalidate any
previous testing of the point, and that we ought to go back and redo
those comparisons.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
From pgsql-hackers-owner+M81282@postgresql.org Tue Mar 21 14:09:22 2006
Return-path: <pgsql-hackers-owner+M81282@postgresql.org>
Received: from ams.hub.org (ams.hub.org [])
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k2LK9KM11902
for <pgman@candle.pha.pa.us>; Tue, 21 Mar 2006 15:09:21 -0500 (EST)
Received: from postgresql.org (postgresql.org [])
by ams.hub.org (Postfix) with ESMTP id 6B1CF67BBF6;
Tue, 21 Mar 2006 16:09:18 -0400 (AST)
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [])
by postgresql.org (Postfix) with ESMTP id 0B2E19DCA0F;
Tue, 21 Mar 2006 16:08:50 -0400 (AST)
Received: from postgresql.org ([])
by localhost (av.hub.org []) (amavisd-new, port 10024)
with ESMTP id 54998-02; Tue, 21 Mar 2006 16:08:50 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-
X-Greylist: from auto-whitelisted by SQLgrey-
Received: from sss.pgh.pa.us (sss.pgh.pa.us [])
by postgresql.org (Postfix) with ESMTP id C39619DC9E6;
Tue, 21 Mar 2006 16:08:45 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [])
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k2LK8flq019571;
Tue, 21 Mar 2006 15:08:41 -0500 (EST)
To: Gary Doades <gpd@gpdnet.co.uk>
cc: pgsql-performance@postgresql.org, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
In-Reply-To: <20781.1140046109@sss.pgh.pa.us>
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us> <19779.1140038874@sss.pgh.pa.us> <43F39E53.1020009@gpdnet.co.uk> <20781.1140046109@sss.pgh.pa.us>
Comments: In-reply-to Tom Lane <tgl@sss.pgh.pa.us>
message dated "Wed, 15 Feb 2006 18:28:29 -0500"
Date: Tue, 21 Mar 2006 15:08:40 -0500
Message-ID: <19570.1142971720@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR
Last month I wrote:
> It seems clear that our qsort.c is doing a pretty awful job of picking
> qsort pivots, while glibc is mostly managing not to make that mistake.
I re-ran Gary's test script using the just-committed improvements to
qsort.c, and got pretty nice numbers (attached --- compare to
So it was wrong to blame his problems on the pivot selection --- the
culprit was that ill-considered switch to insertion sort.
regards, tom lane
100 runtimes for latest port/qsort.c, sorted ascending:
Time: 335.481 ms
Time: 335.606 ms
Time: 335.932 ms
Time: 336.039 ms
Time: 336.182 ms
Time: 336.231 ms
Time: 336.711 ms
Time: 336.721 ms
Time: 336.971 ms
Time: 336.982 ms
Time: 337.036 ms
Time: 337.190 ms
Time: 337.223 ms
Time: 337.312 ms
Time: 337.350 ms
Time: 337.423 ms
Time: 337.523 ms
Time: 337.528 ms
Time: 337.565 ms
Time: 337.566 ms
Time: 337.732 ms
Time: 337.741 ms
Time: 337.744 ms
Time: 337.786 ms
Time: 337.790 ms
Time: 337.898 ms
Time: 337.905 ms
Time: 337.952 ms
Time: 337.976 ms
Time: 338.017 ms
Time: 338.123 ms
Time: 338.206 ms
Time: 338.306 ms
Time: 338.514 ms
Time: 338.594 ms
Time: 338.597 ms
Time: 338.683 ms
Time: 338.705 ms
Time: 338.729 ms
Time: 338.748 ms
Time: 338.816 ms
Time: 338.958 ms
Time: 338.963 ms
Time: 338.997 ms
Time: 339.074 ms
Time: 339.106 ms
Time: 339.134 ms
Time: 339.159 ms
Time: 339.226 ms
Time: 339.260 ms
Time: 339.289 ms
Time: 339.341 ms
Time: 339.500 ms
Time: 339.585 ms
Time: 339.595 ms
Time: 339.774 ms
Time: 339.897 ms
Time: 339.927 ms
Time: 340.064 ms
Time: 340.133 ms
Time: 340.172 ms
Time: 340.219 ms
Time: 340.261 ms
Time: 340.323 ms
Time: 340.708 ms
Time: 340.761 ms
Time: 340.785 ms
Time: 340.900 ms
Time: 340.986 ms
Time: 341.339 ms
Time: 341.564 ms
Time: 341.707 ms
Time: 342.155 ms
Time: 342.213 ms
Time: 342.452 ms
Time: 342.515 ms
Time: 342.540 ms
Time: 342.928 ms
Time: 343.548 ms
Time: 343.663 ms
Time: 344.192 ms
Time: 344.952 ms
Time: 345.152 ms
Time: 345.174 ms
Time: 345.444 ms
Time: 346.848 ms
Time: 348.144 ms
Time: 348.842 ms
Time: 354.550 ms
Time: 356.877 ms
Time: 357.475 ms
Time: 358.487 ms
Time: 364.178 ms
Time: 370.730 ms
Time: 493.098 ms
Time: 648.009 ms
Time: 849.345 ms
Time: 860.616 ms
Time: 936.800 ms
Time: 1727.085 ms
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?