[PATCH] Incremental sort (was: PoC: Partial sort)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH] Incremental sort (was: PoC: Partial sort)

Alexander Korotkov-3
Hi all!

I decided to start new thread for this patch for following two reasons.
 * It's renamed from "Partial sort" to "Incremental sort" per suggestion by Robert Haas [1].  New name much better characterizes the essence of algorithm.
 * I think it's not PoC anymore.  Patch received several rounds of review and now it's in the pretty good shape.

Attached revision of patch has following changes.
 * According to review [1], two new path and plan nodes are responsible for incremental sort: IncSortPath and IncSort which are inherited from SortPath and Sort correspondingly.  That allowed to get rid of set of hacks with minimal code changes.
 * According to review [1] and comment [2], previous tuple is stored in standalone tuple slot of SortState rather than just HeapTuple.
 * New GUC parameter enable_incsort is introduced to control planner ability to choose incremental sort.
 * Test of postgres_fdw with not pushed down cross join is corrected.  It appeared that with incremental sort such query is profitable to push down.  I changed ORDER BY columns so that index couldn't be used.  I think this solution is more elegant than setting enable_incsort = off.

Also patch has set of assorted code and comments improvements.

Links

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incremental-sort-1.patch (135K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Robert Haas
On Sat, Feb 18, 2017 at 4:01 PM, Alexander Korotkov
<[hidden email]> wrote:

> I decided to start new thread for this patch for following two reasons.
>  * It's renamed from "Partial sort" to "Incremental sort" per suggestion by
> Robert Haas [1].  New name much better characterizes the essence of
> algorithm.
>  * I think it's not PoC anymore.  Patch received several rounds of review
> and now it's in the pretty good shape.
>
> Attached revision of patch has following changes.
>  * According to review [1], two new path and plan nodes are responsible for
> incremental sort: IncSortPath and IncSort which are inherited from SortPath
> and Sort correspondingly.  That allowed to get rid of set of hacks with
> minimal code changes.
>  * According to review [1] and comment [2], previous tuple is stored in
> standalone tuple slot of SortState rather than just HeapTuple.
>  * New GUC parameter enable_incsort is introduced to control planner ability
> to choose incremental sort.
>  * Test of postgres_fdw with not pushed down cross join is corrected.  It
> appeared that with incremental sort such query is profitable to push down.
> I changed ORDER BY columns so that index couldn't be used.  I think this
> solution is more elegant than setting enable_incsort = off.

I usually advocate for spelling things out instead of abbreviating, so
I guess I'll stay true to form here and suggest that abbreviating
incremental to inc doesn't seem like a great idea.  Is that sort
incrementing, incremental, incredible, incautious, or incorporated?

The first hunk in the patch, a change in the postgres_fdw regression
test output, looks an awful lot like a bug: now the query that
formerly returned various different numbers is returning all zeroes.
It might not actually be a bug, because you've also changed the test
query (not sure why), but anyway the new regression test output that
is all zeroes seems less useful for catching bugs in, say, the
ordering of the results than the old output where the different rows
were different.

I don't know of any existing cases where the same executor file is
responsible for executing more than 1 different type of executor node.
I was imagining a more-complete separation of the new executor node.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Alexander Korotkov-3
On Sun, Feb 19, 2017 at 2:18 PM, Robert Haas <[hidden email]> wrote:
On Sat, Feb 18, 2017 at 4:01 PM, Alexander Korotkov
<[hidden email]> wrote:
> I decided to start new thread for this patch for following two reasons.
>  * It's renamed from "Partial sort" to "Incremental sort" per suggestion by
> Robert Haas [1].  New name much better characterizes the essence of
> algorithm.
>  * I think it's not PoC anymore.  Patch received several rounds of review
> and now it's in the pretty good shape.
>
> Attached revision of patch has following changes.
>  * According to review [1], two new path and plan nodes are responsible for
> incremental sort: IncSortPath and IncSort which are inherited from SortPath
> and Sort correspondingly.  That allowed to get rid of set of hacks with
> minimal code changes.
>  * According to review [1] and comment [2], previous tuple is stored in
> standalone tuple slot of SortState rather than just HeapTuple.
>  * New GUC parameter enable_incsort is introduced to control planner ability
> to choose incremental sort.
>  * Test of postgres_fdw with not pushed down cross join is corrected.  It
> appeared that with incremental sort such query is profitable to push down.
> I changed ORDER BY columns so that index couldn't be used.  I think this
> solution is more elegant than setting enable_incsort = off.

I usually advocate for spelling things out instead of abbreviating, so
I guess I'll stay true to form here and suggest that abbreviating
incremental to inc doesn't seem like a great idea.  Is that sort
incrementing, incremental, incredible, incautious, or incorporated?

I'm not that sure about naming of GUCs, because we already have enable_hashagg instead of enable_hashaggregate, enable_material instead of enable_materialize, enable_nestloop instead of enable_nestedloop.  But anyway I renamed "inc" to "Incremental" everywhere in the code.  I renamed enable_incsort GUC into enable_incrementalsort as well, because I don't have strong opinion here.

The first hunk in the patch, a change in the postgres_fdw regression
test output, looks an awful lot like a bug: now the query that
formerly returned various different numbers is returning all zeroes.
It might not actually be a bug, because you've also changed the test
query (not sure why), but anyway the new regression test output that
is all zeroes seems less useful for catching bugs in, say, the
ordering of the results than the old output where the different rows
were different.

Yes, I've changed regression test query as I mentioned in the previous message.  With incremental sort feature original query can't serve anymore as an example of non-pushdown join.  However, you're right that query which returns all zeroes doesn't look good there either.  So, I changed that query to ordering by column "c3" which is actually non-indexed textual representation of "c1".
 
I don't know of any existing cases where the same executor file is
responsible for executing more than 1 different type of executor node.
I was imagining a more-complete separation of the new executor node.

Ok, I put incremental sort into separate executor node.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incremental-sort-2.patch (150K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Mithun Cy
On Mon, Feb 27, 2017 at 8:29 PM, Alexander Korotkov
<[hidden email]> wrote:

This patch needs to be rebased.

1. It fails while applying as below

patching file src/test/regress/expected/sysviews.out
Hunk #1 FAILED at 70.
1 out of 1 hunk FAILED -- saving rejects to file
src/test/regress/expected/sysviews.out.rej
patching file src/test/regress/sql/inherit.sql

2. Also, there are compilation errors due to new commits.

-fwrapv -fexcess-precision=standard -O2 -I../../../../src/include
-D_GNU_SOURCE   -c -o createplan.o createplan.c
createplan.c: In function ‘create_gather_merge_plan’:
createplan.c:1510:11: warning: passing argument 3 of ‘make_sort’ makes
integer from pointer without a cast [enabled by default]
           gm_plan->nullsFirst);
           ^
createplan.c:235:14: note: expected ‘int’ but argument is of type ‘AttrNumber *’
 static Sort *make_sort(Plan *lefttree, int numCols, int skipCols,
              ^
createplan.c:1510:11: warning: passing argument 4 of ‘make_sort’ from
incompatible pointer type [enabled by default]
           gm_plan->nullsFirst);

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Alexander Korotkov-3
Dear Mithun,

On Mon, Mar 20, 2017 at 10:01 AM, Mithun Cy <[hidden email]> wrote:
On Mon, Feb 27, 2017 at 8:29 PM, Alexander Korotkov
<[hidden email]> wrote:

This patch needs to be rebased.

1. It fails while applying as below

patching file src/test/regress/expected/sysviews.out
Hunk #1 FAILED at 70.
1 out of 1 hunk FAILED -- saving rejects to file
src/test/regress/expected/sysviews.out.rej
patching file src/test/regress/sql/inherit.sql

2. Also, there are compilation errors due to new commits.

-fwrapv -fexcess-precision=standard -O2 -I../../../../src/include
-D_GNU_SOURCE   -c -o createplan.o createplan.c
createplan.c: In function ‘create_gather_merge_plan’:
createplan.c:1510:11: warning: passing argument 3 of ‘make_sort’ makes
integer from pointer without a cast [enabled by default]
           gm_plan->nullsFirst);
           ^
createplan.c:235:14: note: expected ‘int’ but argument is of type ‘AttrNumber *’
 static Sort *make_sort(Plan *lefttree, int numCols, int skipCols,
              ^
createplan.c:1510:11: warning: passing argument 4 of ‘make_sort’ from
incompatible pointer type [enabled by default]
           gm_plan->nullsFirst);

Thank you for the report.
Please, find rebased patch in the attachment.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incremental-sort-3.patch (154K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Heikki Linnakangas
On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
> Please, find rebased patch in the attachment.

I had a quick look at this.

* I'd love to have an explanation of what an Incremental Sort is, in the
file header comment for nodeIncrementalSort.c.

* I didn't understand the maxMem stuff in tuplesort.c. The comments
there use the phrase "on-disk memory", which seems like an oxymoron.
Also, "maximum status" seems weird, as it assumes that there's a natural
order to the states.

* In the below example, the incremental sort is significantly slower
than the Seq Scan + Sort you get otherwise:

create table foo (a int4, b int4, c int4);
insert into sorttest select g, g, g from generate_series(1, 1000000) g;
vacuum foo;
create index i_sorttest on sorttest (a, b, c);
set work_mem='100MB';


postgres=# explain select count(*) from (select * from sorttest order by
a, c) as t;
                                               QUERY PLAN

-------------------------------------------------------------------------------------------------------
  Aggregate  (cost=138655.68..138655.69 rows=1 width=8)
    ->  Incremental Sort  (cost=610.99..124870.38 rows=1102824 width=12)
          Sort Key: sorttest.a, sorttest.c
          Presorted Key: sorttest.a
          ->  Index Only Scan using i_sorttest on sorttest
(cost=0.43..53578.79 rows=1102824 width=12)
(5 rows)

Time: 0.409 ms
postgres=# select count(*) from (select * from sorttest order by a, c) as t;
   count
---------
  1000000
(1 row)

Time: 387.091 ms


postgres=# explain select count(*) from (select * from sorttest order by
a, c) as t;
                                   QUERY PLAN

-------------------------------------------------------------------------------
  Aggregate  (cost=130063.84..130063.85 rows=1 width=8)
    ->  Sort  (cost=115063.84..117563.84 rows=1000000 width=12)
          Sort Key: sorttest.a, sorttest.c
          ->  Seq Scan on sorttest  (cost=0.00..15406.00 rows=1000000
width=12)
(4 rows)

Time: 0.345 ms
postgres=# select count(*) from (select * from sorttest order by a, c) as t;
   count
---------
  1000000
(1 row)

Time: 231.668 ms

According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
alleviate that, it might be worthwhile to add a special case for when
the group contains exactly one group, and not put the tuple to the
tuplesort in that case. Or if we cannot ensure that the Incremental Sort
is actually faster, the cost model should probably be smarter, to avoid
picking an incremental sort when it's not a win.

- Heikki



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

David Steele
Hi Alexander,

On 3/20/17 10:19 AM, Heikki Linnakangas wrote:
> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
>> Please, find rebased patch in the attachment.
>
> I had a quick look at this.

<...>

> According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
> alleviate that, it might be worthwhile to add a special case for when
> the group contains exactly one group, and not put the tuple to the
> tuplesort in that case. Or if we cannot ensure that the Incremental Sort
> is actually faster, the cost model should probably be smarter, to avoid
> picking an incremental sort when it's not a win.

This thread has been idle for over a week.  Please respond with a new
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".

--
-David
[hidden email]


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
On Tue, Mar 28, 2017 at 5:27 PM, David Steele <[hidden email]> wrote:
Hi Alexander,

On 3/20/17 10:19 AM, Heikki Linnakangas wrote:
On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
Please, find rebased patch in the attachment.

I had a quick look at this.

<...>

According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
alleviate that, it might be worthwhile to add a special case for when
the group contains exactly one group, and not put the tuple to the
tuplesort in that case. Or if we cannot ensure that the Incremental Sort
is actually faster, the cost model should probably be smarter, to avoid
picking an incremental sort when it's not a win.

This thread has been idle for over a week.  Please respond with a new patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback".

Thank you for reminder!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
In reply to this post by Heikki Linnakangas
On Mon, Mar 20, 2017 at 5:19 PM, Heikki Linnakangas <[hidden email]> wrote:
On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
Please, find rebased patch in the attachment.

I had a quick look at this.

* I'd love to have an explanation of what an Incremental Sort is, in the file header comment for nodeIncrementalSort.c.
 
Done.

* I didn't understand the maxMem stuff in tuplesort.c. The comments there use the phrase "on-disk memory", which seems like an oxymoron. Also, "maximum status" seems weird, as it assumes that there's a natural order to the states.
 
Variables were renamed.

* In the below example, the incremental sort is significantly slower than the Seq Scan + Sort you get otherwise:

create table foo (a int4, b int4, c int4);
insert into sorttest select g, g, g from generate_series(1, 1000000) g;
vacuum foo;
create index i_sorttest on sorttest (a, b, c);
set work_mem='100MB';


postgres=# explain select count(*) from (select * from sorttest order by a, c) as t;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=138655.68..138655.69 rows=1 width=8)
   ->  Incremental Sort  (cost=610.99..124870.38 rows=1102824 width=12)
         Sort Key: sorttest.a, sorttest.c
         Presorted Key: sorttest.a
         ->  Index Only Scan using i_sorttest on sorttest (cost=0.43..53578.79 rows=1102824 width=12)
(5 rows)

Time: 0.409 ms
postgres=# select count(*) from (select * from sorttest order by a, c) as t;
  count
---------
 1000000
(1 row)

Time: 387.091 ms


postgres=# explain select count(*) from (select * from sorttest order by a, c) as t;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=130063.84..130063.85 rows=1 width=8)
   ->  Sort  (cost=115063.84..117563.84 rows=1000000 width=12)
         Sort Key: sorttest.a, sorttest.c
         ->  Seq Scan on sorttest  (cost=0.00..15406.00 rows=1000000 width=12)
(4 rows)

Time: 0.345 ms
postgres=# select count(*) from (select * from sorttest order by a, c) as t;
  count
---------
 1000000
(1 row)

Time: 231.668 ms

According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To alleviate that, it might be worthwhile to add a special case for when the group contains exactly one group, and not put the tuple to the tuplesort in that case.

I'm not sure we should do such optimization for one tuple per group, since it's similar situation with 2 or 3 tuples per group.
 
Or if we cannot ensure that the Incremental Sort is actually faster, the cost model should probably be smarter, to avoid picking an incremental sort when it's not a win.

I added to cost_sort() extra costing for incremental sort: cost of extra tuple copying and comparing as well as cost of tuplesort reset.
The only problem is that I made following estimate for tuplesort reset:

run_cost += 10.0 * cpu_tuple_cost * num_groups;

It makes ordinal sort to be selected in your example, but it contains constant 10 which is quite arbitrary.  It would be nice to evade such hard coded constants, but I don't know how could we calculate such cost realistically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incremental-sort-4.patch (154K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Andres Freund
In reply to this post by Alexander Korotkov-3
On 2017-03-29 00:17:02 +0300, Alexander Korotkov wrote:

> On Tue, Mar 28, 2017 at 5:27 PM, David Steele <[hidden email]> wrote:
>
> > Hi Alexander,
> >
> > On 3/20/17 10:19 AM, Heikki Linnakangas wrote:
> >
> >> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
> >>
> >>> Please, find rebased patch in the attachment.
> >>>
> >>
> >> I had a quick look at this.
> >>
> >
> > <...>
> >
> > According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
> >> alleviate that, it might be worthwhile to add a special case for when
> >> the group contains exactly one group, and not put the tuple to the
> >> tuplesort in that case. Or if we cannot ensure that the Incremental Sort
> >> is actually faster, the cost model should probably be smarter, to avoid
> >> picking an incremental sort when it's not a win.
> >>
> >
> > This thread has been idle for over a week.  Please respond with a new
> > patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked
> > "Returned with Feedback".

> Thank you for reminder!

I've just done so.  Please resubmit once updated, it's a cool feature.

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
On Mon, Apr 3, 2017 at 9:34 PM, Andres Freund <[hidden email]> wrote:
On 2017-03-29 00:17:02 +0300, Alexander Korotkov wrote:
> On Tue, Mar 28, 2017 at 5:27 PM, David Steele <[hidden email]> wrote:
> > On 3/20/17 10:19 AM, Heikki Linnakangas wrote:
> >
> >> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
> >>
> >>> Please, find rebased patch in the attachment.
> >>>
> >>
> >> I had a quick look at this.
> >>
> >
> > <...>
> >
> > According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
> >> alleviate that, it might be worthwhile to add a special case for when
> >> the group contains exactly one group, and not put the tuple to the
> >> tuplesort in that case. Or if we cannot ensure that the Incremental Sort
> >> is actually faster, the cost model should probably be smarter, to avoid
> >> picking an incremental sort when it's not a win.
> >>
> >
> > This thread has been idle for over a week.  Please respond with a new
> > patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked
> > "Returned with Feedback".

> Thank you for reminder!

I've just done so.  Please resubmit once updated, it's a cool feature.

Thank you!
I already sent version of patch after David's reminder.
Please find rebased patch in the attachment.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incremental-sort-5.patch (155K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Andres Freund


On April 3, 2017 12:03:56 PM PDT, Alexander Korotkov <[hidden email]> wrote:

>On Mon, Apr 3, 2017 at 9:34 PM, Andres Freund <[hidden email]>
>wrote:
>
>> On 2017-03-29 00:17:02 +0300, Alexander Korotkov wrote:
>> > On Tue, Mar 28, 2017 at 5:27 PM, David Steele <[hidden email]>
>> wrote:
>> > > On 3/20/17 10:19 AM, Heikki Linnakangas wrote:
>> > >
>> > >> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
>> > >>
>> > >>> Please, find rebased patch in the attachment.
>> > >>>
>> > >>
>> > >> I had a quick look at this.
>> > >>
>> > >
>> > > <...>
>> > >
>> > > According to 'perf', 85% of the CPU time is spent in
>ExecCopySlot(). To
>> > >> alleviate that, it might be worthwhile to add a special case for
>when
>> > >> the group contains exactly one group, and not put the tuple to
>the
>> > >> tuplesort in that case. Or if we cannot ensure that the
>Incremental
>> Sort
>> > >> is actually faster, the cost model should probably be smarter,
>to
>> avoid
>> > >> picking an incremental sort when it's not a win.
>> > >>
>> > >
>> > > This thread has been idle for over a week.  Please respond with a
>new
>> > > patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be
>> marked
>> > > "Returned with Feedback".
>>
>> > Thank you for reminder!
>>
>> I've just done so.  Please resubmit once updated, it's a cool
>feature.
>>
>
>Thank you!
>I already sent version of patch after David's reminder.
>Please find rebased patch in the attachment.

Cool. I think that's still a bit late for v10?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
On Mon, Apr 3, 2017 at 10:05 PM, Andres Freund <[hidden email]> wrote:
On April 3, 2017 12:03:56 PM PDT, Alexander Korotkov <[hidden email]> wrote:
>On Mon, Apr 3, 2017 at 9:34 PM, Andres Freund <[hidden email]>
>wrote:
>
>> On 2017-03-29 00:17:02 +0300, Alexander Korotkov wrote:
>> > On Tue, Mar 28, 2017 at 5:27 PM, David Steele <[hidden email]>
>> wrote:
>> > > On 3/20/17 10:19 AM, Heikki Linnakangas wrote:
>> > >
>> > >> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
>> > >>
>> > >>> Please, find rebased patch in the attachment.
>> > >>>
>> > >>
>> > >> I had a quick look at this.
>> > >>
>> > >
>> > > <...>
>> > >
>> > > According to 'perf', 85% of the CPU time is spent in
>ExecCopySlot(). To
>> > >> alleviate that, it might be worthwhile to add a special case for
>when
>> > >> the group contains exactly one group, and not put the tuple to
>the
>> > >> tuplesort in that case. Or if we cannot ensure that the
>Incremental
>> Sort
>> > >> is actually faster, the cost model should probably be smarter,
>to
>> avoid
>> > >> picking an incremental sort when it's not a win.
>> > >>
>> > >
>> > > This thread has been idle for over a week.  Please respond with a
>new
>> > > patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be
>> marked
>> > > "Returned with Feedback".
>>
>> > Thank you for reminder!
>>
>> I've just done so.  Please resubmit once updated, it's a cool
>feature.
>>
>
>Thank you!
>I already sent version of patch after David's reminder.
>Please find rebased patch in the attachment.

Cool. I think that's still a bit late for v10?

I don't know.  ISTM, that I addressed all the issues raised by reviewers.
Also, this patch is pending since late 2013.  It would be very nice to finally get it in...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Andres Freund
Hi,

On 2017-04-04 00:04:09 +0300, Alexander Korotkov wrote:

> > >Thank you!
> > >I already sent version of patch after David's reminder.
> > >Please find rebased patch in the attachment.
> >
> > Cool. I think that's still a bit late for v10?
> >
>
> I don't know.  ISTM, that I addressed all the issues raised by reviewers.
> Also, this patch is pending since late 2013.  It would be very nice to
> finally get it in...

To me this hasn't gotten even remotely enough performance evaluation.
And I don't think it's fair to characterize it as pending since 2013,
given it was essentially "waiting on author" for most of that.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
On Tue, Apr 4, 2017 at 12:09 AM, Andres Freund <[hidden email]> wrote:
On 2017-04-04 00:04:09 +0300, Alexander Korotkov wrote:
> > >Thank you!
> > >I already sent version of patch after David's reminder.
> > >Please find rebased patch in the attachment.
> >
> > Cool. I think that's still a bit late for v10?
> >
>
> I don't know.  ISTM, that I addressed all the issues raised by reviewers.
> Also, this patch is pending since late 2013.  It would be very nice to
> finally get it in...

To me this hasn't gotten even remotely enough performance evaluation.

I'm ready to put my efforts on that.
 
And I don't think it's fair to characterize it as pending since 2013,

Probably, this duration isn't good characteristic at all.
 
given it was essentially "waiting on author" for most of that.

What makes you think so?  Do you have some statistics?  Or is it just random assumption?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Robert Haas
In reply to this post by Andres Freund
On Mon, Apr 3, 2017 at 5:09 PM, Andres Freund <[hidden email]> wrote:
> To me this hasn't gotten even remotely enough performance evaluation.
> And I don't think it's fair to characterize it as pending since 2013,
> given it was essentially "waiting on author" for most of that.

This is undeniably a patch which has been kicking around for a lot of
time without getting a lot of attention, and if it just keeps getting
punted down the road, it's never going to become committable.
Alexander's questions upthread about what decisions the committer who
took an interest (Heikki) would prefer never really got an answer, for
example.  I don't deny that there may be some work left to do here,
but I think blaming the author for a week's delay when this has been
ignored so often for so long is unfair.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Andres Freund
On 2017-04-03 22:18:21 -0400, Robert Haas wrote:
> On Mon, Apr 3, 2017 at 5:09 PM, Andres Freund <[hidden email]> wrote:
> > To me this hasn't gotten even remotely enough performance evaluation.
> > And I don't think it's fair to characterize it as pending since 2013,
> > given it was essentially "waiting on author" for most of that.
>
> This is undeniably a patch which has been kicking around for a lot of
> time without getting a lot of attention, and if it just keeps getting
> punted down the road, it's never going to become committable.

Indeed, it's old.  And it hasn't gotten enough timely feedback.

But I don't think the wait time can meaningfully be measured by
subtracting two dates:
The first version of the patch, as a PoC, has been posted 2013-12-14,
which then got a good amount of feedback & revisions, and then stalled
till 2014-07-12.  There a few back-and forths yielded a new version.
From 2014-09-15 till 2015-10-16 the patch stalled, waiting on its
author.  That version had open todos ([1]), as had the version from
2016-03-13 [2], which weren't addressed 2016-03-30 - unfortunately that
was pretty much when the tree was frozen.  2016-09-13 a rebased patch
was sent, some minor points were raised 2016-10-02 (unaddressed), a
larger review was done 2016-12-01 ([5]), unaddressed till 2017-02-18.
At that point we're in this thread.

There's obviously some long waiting-on-author periods in there.  And
some long needs-review periods.


> Alexander's questions upthread about what decisions the committer who
> took an interest (Heikki) would prefer never really got an answer, for
> example.  I don't deny that there may be some work left to do here,
> but I think blaming the author for a week's delay when this has been
> ignored so often for so long is unfair.

I'm not trying to blame Alexander for a week's worth of delay, at all.
It's just that, well, we're past the original code-freeze date, three
days before the "final" code freeze. I don't think fairness is something
we can achieve at this point :(.  Given the risk of regressions -
demonstrated in this thread although partially adressed - and the very
limited amount of benchmarking done, it seems unlikely that this is
going to be merged.

Regards,

Andres


[1] http://archives.postgresql.org/message-id/CAPpHfdvhwMsG69exCRUGK3ms-ng0PSPcucH5FU6tAaM-qL-1%2Bw%40mail.gmail.com
[2] http://archives.postgresql.org/message-id/CAPpHfdvzjYGLTyA-8ib8UYnKLPrewd9Z%3DT4YJNCRWiHWHHweWw%40mail.gmail.com
[3] http://archives.postgresql.org/message-id/CAPpHfdtCcHZ-mLWzsFrRCvHpV1LPSaOGooMZ3sa40AkwR=7ouQ@...
[4] http://archives.postgresql.org/message-id/CAPpHfdvj1Tdi2WA64ZbBp5-yG-uzaRXzk3K7J7zt-cRX6YSd0A@...
[5] http://archives.postgresql.org/message-id/CA+TgmoZapyHRm7NVyuyZ+yAV=U1a070BOgRe7PkgyrAegR4JDA@...
[6] http://archives.postgresql.org/message-id/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@...


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
In reply to this post by Alexander Korotkov-3
On Wed, Mar 29, 2017 at 5:14 PM, Alexander Korotkov <[hidden email]> wrote:
I added to cost_sort() extra costing for incremental sort: cost of extra tuple copying and comparing as well as cost of tuplesort reset.
The only problem is that I made following estimate for tuplesort reset:

run_cost += 10.0 * cpu_tuple_cost * num_groups;

It makes ordinal sort to be selected in your example, but it contains constant 10 which is quite arbitrary.  It would be nice to evade such hard coded constants, but I don't know how could we calculate such cost realistically.

That appears to be wrong.  I intended to make cost_sort prefer plain sort over incremental sort for this dataset size.  But, that appears to be not always right solution.  Quick sort is so fast only on presorted data.
On my laptop I have following numbers for test case provided by Heikki.

Presorted data – very fast.

# explain select count(*) from (select * from sorttest order by a, c) as t;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=147154.34..147154.35 rows=1 width=8)
   ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
         Sort Key: sorttest.a, sorttest.c
         ->  Seq Scan on sorttest  (cost=0.00..15406.00 rows=1000000 width=12)
(4 rows)

# select count(*) from (select * from sorttest order by a, c) as t;
  count
---------
 1000000
(1 row)

Time: 260,752 ms

Not presorted data – not so fast.  It's actually slower than incremental sort was.

# explain select count(*) from (select * from sorttest order by a desc, c desc) as t;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=130063.84..130063.85 rows=1 width=8)
   ->  Sort  (cost=115063.84..117563.84 rows=1000000 width=12)
         Sort Key: sorttest.a DESC, sorttest.c DESC
         ->  Seq Scan on sorttest  (cost=0.00..15406.00 rows=1000000 width=12)
(4 rows)

# select count(*) from (select * from sorttest order by a desc, c desc) as t;
  count
---------
 1000000
(1 row)

Time: 416,207 ms

Thus, it would be nice to reflect the fact that our quicksort implementation is very fast on presorted data.  Fortunately, we have corresponding statistics: STATISTIC_KIND_CORRELATION.  However, it probably should be a subject of a separate patch.

But I'd like to make incremental sort not slower than quicksort in case of presorted data.  New idea about it comes to my mind.  Since cause of incremental sort slowness in this case is too frequent reset of tuplesort, then what if we would artificially put data in larger groups.  Attached revision of patch implements this: it doesn't stop to accumulate tuples to tuplesort until we have MIN_GROUP_SIZE tuples.

# explain select count(*) from (select * from sorttest order by a, c) as t;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=85412.43..85412.43 rows=1 width=8)
   ->  Incremental Sort  (cost=0.46..72912.43 rows=1000000 width=12)
         Sort Key: sorttest.a, sorttest.c
         Presorted Key: sorttest.a
         ->  Index Only Scan using i_sorttest on sorttest  (cost=0.42..30412.42 rows=1000000 width=12)
(5 rows)

# select count(*) from (select * from sorttest order by a, c) as t;
  count
---------
 1000000
(1 row)

Time: 251,227 ms

# explain select count(*) from (select * from sorttest order by a desc, c desc) as t;
                                                   QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=85412.43..85412.43 rows=1 width=8)
   ->  Incremental Sort  (cost=0.46..72912.43 rows=1000000 width=12)
         Sort Key: sorttest.a DESC, sorttest.c DESC
         Presorted Key: sorttest.a
         ->  Index Only Scan Backward using i_sorttest on sorttest  (cost=0.42..30412.42 rows=1000000 width=12)
(5 rows)

# select count(*) from (select * from sorttest order by a desc, c desc) as t;
  count
---------
 1000000
(1 row)

Time: 253,270 ms

Now, incremental sort is not slower than quicksort.  And this seems to be cool.
However, in the LIMIT case we will pay the price of fetching some extra tuples from outer node.  But, that doesn't seem to hurt us too much.

# explain select * from sorttest order by a, c limit 10;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Limit  (cost=0.46..0.84 rows=10 width=12)
   ->  Incremental Sort  (cost=0.46..37500.78 rows=1000000 width=12)
         Sort Key: a, c
         Presorted Key: a
         ->  Index Only Scan using i_sorttest on sorttest  (cost=0.42..30412.42 rows=1000000 width=12)
(5 rows)

# select * from sorttest order by a, c limit 10;
 a  | b  | c
----+----+----
  1 |  1 |  1
  2 |  2 |  2
  3 |  3 |  3
  4 |  4 |  4
  5 |  5 |  5
  6 |  6 |  6
  7 |  7 |  7
  8 |  8 |  8
  9 |  9 |  9
 10 | 10 | 10
(10 rows)

Time: 0,903 ms

Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incremental-sort-6.patch (155K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Peter Geoghegan-4
On Wed, Apr 26, 2017 at 8:39 AM, Alexander Korotkov
<[hidden email]> wrote:
> That appears to be wrong.  I intended to make cost_sort prefer plain sort
> over incremental sort for this dataset size.  But, that appears to be not
> always right solution.  Quick sort is so fast only on presorted data.

As you may know, I've often said that the precheck for sorted input
added to our quicksort implementation by a3f0b3d is misguided. It
sometimes throws away a ton of work if the presorted input isn't
*perfectly* presorted. This happens when the first out of order tuple
is towards the end of the presorted input.

I think that it isn't fair to credit our qsort with doing so well on a
100% presorted case, because it doesn't do the necessary bookkeeping
to not throw that work away completely in certain important cases.

--
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Incremental sort

Alexander Korotkov-3
On Wed, Apr 26, 2017 at 7:56 PM, Peter Geoghegan <[hidden email]> wrote:
On Wed, Apr 26, 2017 at 8:39 AM, Alexander Korotkov
<[hidden email]> wrote:
> That appears to be wrong.  I intended to make cost_sort prefer plain sort
> over incremental sort for this dataset size.  But, that appears to be not
> always right solution.  Quick sort is so fast only on presorted data.

As you may know, I've often said that the precheck for sorted input
added to our quicksort implementation by a3f0b3d is misguided. It
sometimes throws away a ton of work if the presorted input isn't
*perfectly* presorted. This happens when the first out of order tuple
is towards the end of the presorted input.

I think that it isn't fair to credit our qsort with doing so well on a
100% presorted case, because it doesn't do the necessary bookkeeping
to not throw that work away completely in certain important cases.

OK, I get it.  Our qsort is so fast not only on 100% presorted case.
However, that doesn't change many things in context of incremental sort.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
12
Previous Thread Next Thread