PoC: Partial sort

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
91 messages Options
12345
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

PoC: Partial sort

Alexander Korotkov
Hackers!

Currently when we need to get ordered result from table we have to choose one of two approaches: get results from index in exact order we need or do sort of tuples. However, it could be useful to mix both methods: get results from index in order which partially meets our requirements and do rest of work from heap.

Two attached patches are proof of concept for this approach.

partial-sort-1.patch

This patch allows to use index for order-by if order-by clause and index has non-empty common prefix. So, index gives right ordering for first n order-by columns. In order to provide right order for rest m columns, sort node is inserted. This sort node sorts groups of tuples where values of first n order-by columns are equal.

See an example.

create table test as (select id, (random()*10000)::int as v1, random() as v2 from generate_series(1,1000000) id);
create index test_v1_idx on test (v1);

We've index by v1 column, but we can get results ordered by v1, v2.

postgres=# select * from test order by v1, v2 limit 10;
   id   | v1 |         v2         
--------+----+--------------------
 390371 |  0 | 0.0284479795955122
 674617 |  0 | 0.0322008323855698
 881905 |  0 |  0.042586590629071
 972877 |  0 | 0.0531588457524776
 364903 |  0 | 0.0594307743012905
  82333 |  0 | 0.0666455538012087
 266488 |  0 |  0.072808934841305
 892215 |  0 | 0.0744258034974337
  13805 |  0 | 0.0794667331501842
 338435 |  0 |  0.171817752998322
(10 rows)

And it's fast using following plan.

                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual time=0.097..0.099 rows=10 loops=1)
   ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual time=0.096..0.097 rows=10 loops=1)
         Sort Key: v1, v2
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42 rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
 Total runtime: 0.125 ms
(6 rows)

For sure, this approach is effective only when first n order-by columns we selected provides enough count of unique values (so, sorted groups are small). Patch is only PoC because it doesn't contains any try to estimate right cost of using partial sort. 

partial-knn-1.patch

KNN-GiST provides ability to get ordered results from index, but this order is based only on index information. For instance, GiST index contains bounding rectangles for polygons, and we can't get exact distance to polygon from index (similar situation is in PostGIS). In attached patch, GiST distance method can set recheck flag (similar to consistent method). This flag means that distance method returned lower bound of distance and we should recheck it from heap.

See an example.

create table test as (select id, polygon(3+(random()*10)::int, circle(point(random(), random()), 0.0003 + random()*0.001)) as p from generate_series(1,1000000) id);
create index test_idx on test using gist (p);

We can get results ordered by distance from polygon to point.

postgres=# select id, p <-> point(0.5,0.5) from test order by p <-> point(0.5,0.5) limit 10;
   id   |       ?column?       
--------+----------------------
 755611 | 0.000405855808916853
 807562 | 0.000464123777564343
 437778 | 0.000738524708741959
 947860 |  0.00076250998760724
 389843 | 0.000886362723569568
  17586 | 0.000981960100555216
 411329 |  0.00145338112316853
 894191 |  0.00149399559703506
 391907 |   0.0016647896049741
 235381 |  0.00167554614889509
(10 rows)

It's fast using just index scan.

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230 rows=10 loops=1)
   ->  Index Scan using test_idx on test  (cost=0.29..157672.29 rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
         Order By: (p <-> '(0.5,0.5)'::point)
 Total runtime: 0.305 ms
(4 rows)

This patch is also only PoC because of following:
1) It's probably wrong at all to get heap tuple from index scan node. This work should be done from another node.
2) Assumption that order-by operator returns float8 comparable with GiST distance method result in general case is wrong.

------
With best regards,
Alexander Korotkov.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

partial-sort-1.patch (29K) Download Attachment
partial-knn-1.patch (35K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Andres Freund-3
Hi,

Cool stuff.

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
> Currently when we need to get ordered result from table we have to choose
> one of two approaches: get results from index in exact order we need or do
> sort of tuples. However, it could be useful to mix both methods: get
> results from index in order which partially meets our requirements and do
> rest of work from heap.

> ------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> time=0.097..0.099 rows=10 loops=1)
>    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> time=0.096..0.097 rows=10 loops=1)
>          Sort Key: v1, v2
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
>  Total runtime: 0.125 ms
> (6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

> *partial-knn-1.patch*
>
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

> See an example.
>
> create table test as (select id, polygon(3+(random()*10)::int,
> circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
> generate_series(1,1000000) id);
> create index test_idx on test using gist (p);
>
> We can get results ordered by distance from polygon to point.
>
> postgres=# select id, p <-> point(0.5,0.5) from test order by p <->
> point(0.5,0.5) limit 10;

> ----------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
> rows=10 loops=1)
>    ->  Index Scan using test_idx on test  (cost=0.29..157672.29
> rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
>          Order By: (p <-> '(0.5,0.5)'::point)
>  Total runtime: 0.305 ms
> (4 rows)

Rechecking from the heap means adding a sort node though, which I don't
see here? Or am I misunderstanding something?
Greetings,

Andres Freund

--
 Andres Freund                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
Hi!

Thanks for feedback!

On Sat, Dec 14, 2013 at 4:54 PM, Andres Freund <[hidden email]> wrote:
Hi,

Cool stuff.

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
> Currently when we need to get ordered result from table we have to choose
> one of two approaches: get results from index in exact order we need or do
> sort of tuples. However, it could be useful to mix both methods: get
> results from index in order which partially meets our requirements and do
> rest of work from heap.

> ------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> time=0.097..0.099 rows=10 loops=1)
>    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> time=0.096..0.097 rows=10 loops=1)
>          Sort Key: v1, v2
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
>  Total runtime: 0.125 ms
> (6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

In this patch I don't do full sort of dataset. For instance, index returns data ordered by first column and we need to order them also by second column. Then this node sorts groups (assumed to be small) where values of the first column are same by value of second column. And with limit clause only required number of such groups will be processed. But, I don't think we should expect pre-sorted values of second column inside a group.
 
> *partial-knn-1.patch*
>
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

> See an example.
>
> create table test as (select id, polygon(3+(random()*10)::int,
> circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
> generate_series(1,1000000) id);
> create index test_idx on test using gist (p);
>
> We can get results ordered by distance from polygon to point.
>
> postgres=# select id, p <-> point(0.5,0.5) from test order by p <->
> point(0.5,0.5) limit 10;

> ----------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
> rows=10 loops=1)
>    ->  Index Scan using test_idx on test  (cost=0.29..157672.29
> rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
>          Order By: (p <-> '(0.5,0.5)'::point)
>  Total runtime: 0.305 ms
> (4 rows)

Rechecking from the heap means adding a sort node though, which I don't
see here? Or am I misunderstanding something?

KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked inside GiST and reinserted into same RB-tree. It appears to be much easier implementation for PoC and also looks very efficient. I'm not sure what is actually right design for it. This is what I like to discuss.

------
With best regards,
Alexander Korotkov.
 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Jeremy Harris
In reply to this post by Andres Freund-3
On 14/12/13 12:54, Andres Freund wrote:

> On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
>> Currently when we need to get ordered result from table we have to choose
>> one of two approaches: get results from index in exact order we need or do
>> sort of tuples. However, it could be useful to mix both methods: get
>> results from index in order which partially meets our requirements and do
>> rest of work from heap.
>
>> ------------------------------------------------------------------------------------------------------------------------------------------
>>   Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
>> time=0.097..0.099 rows=10 loops=1)
>>     ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
>> time=0.096..0.097 rows=10 loops=1)
>>           Sort Key: v1, v2
>>           Sort Method: top-N heapsort  Memory: 25kB
>>           ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
>> rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
>>   Total runtime: 0.125 ms
>> (6 rows)
>
> Is that actually all that beneficial when sorting with a bog standard
> qsort() since that doesn't generally benefit from data being pre-sorted?
> I think we might need to switch to a different algorithm to really
> benefit from mostly pre-sorted input.

Eg:  http://www.postgresql.org/message-id/5291467E.6070807@...

Maybe Alexander and I should bash our heads together.

--
Cheers,
    Jeremy


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Martijn van Oosterhout
In reply to this post by Alexander Korotkov
On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote:

> > Is that actually all that beneficial when sorting with a bog standard
> > qsort() since that doesn't generally benefit from data being pre-sorted?
> > I think we might need to switch to a different algorithm to really
> > benefit from mostly pre-sorted input.
> >
>
> In this patch I don't do full sort of dataset. For instance, index returns
> data ordered by first column and we need to order them also by second
> column. Then this node sorts groups (assumed to be small) where values of
> the first column are same by value of second column. And with limit clause
> only required number of such groups will be processed. But, I don't think
> we should expect pre-sorted values of second column inside a group.
Nice. I imagine this would be mostly beneficial for fast-start plans,
since you no longer need to sort the whole table prior to returning the
first tuple.

Reduced memory usage might be a factor, especially for large sorts
where you otherwise might need to spool to disk.

You can now use an index on (a) to improve sorting for (a,b).

Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl
log l), useful for large n.

Minor comments:

I find cmpTuple a bad name. That's what it's doing but perhaps
cmpSkipColumns would be clearer.

I think it's worthwhile adding a seperate path for the skipCols = 0
case, to avoid extra copies.

Have a nice day,
--
Martijn van Oosterhout   <[hidden email]>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc (845 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Andres Freund-3
In reply to this post by Alexander Korotkov
Hi,

> > >  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > time=0.097..0.099 rows=10 loops=1)
> > >    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > time=0.096..0.097 rows=10 loops=1)
> > >          Sort Key: v1, v2
> > >          Sort Method: top-N heapsort  Memory: 25kB
> > >          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > >  Total runtime: 0.125 ms
> > > (6 rows)
> >
> > Is that actually all that beneficial when sorting with a bog standard
> > qsort() since that doesn't generally benefit from data being pre-sorted?
> > I think we might need to switch to a different algorithm to really
> > benefit from mostly pre-sorted input.
> >
>
> In this patch I don't do full sort of dataset. For instance, index returns
> data ordered by first column and we need to order them also by second
> column.

Ah, that makes sense.

> But, I don't think we should expect pre-sorted values of second column
> inside a group.

Yes, if you do it that way, there doesn't seem to any need to assume
that any more than we usually do.

I think you should make the explain output reflect the fact that we're
assuming v1 is presorted and just sorting v2. I'd be happy enough with:
Sort Key: v1, v2
Partial Sort: v2
or even just
"Partial Sort Key: [v1,] v2"
but I am sure others disagree.

> > > *partial-knn-1.patch*

> > Rechecking from the heap means adding a sort node though, which I don't
> > see here? Or am I misunderstanding something?

> KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
> inside GiST and reinserted into same RB-tree. It appears to be much easier
> implementation for PoC and also looks very efficient. I'm not sure what is
> actually right design for it. This is what I like to discuss.

I don't have enough clue about gist to say wether it's the right design,
but it doesn't look wrong to my eyes. It'd probably be useful to export
the knowledge that we are rechecking and how often that happens to the
outside.
While I didn't really look into the patch, I noticed in passing that you
pass a all_dead variable to heap_hot_search_buffer without using the
result - just pass NULL instead, that performs a bit less work.

Greetings,

Andres Freund

--
 Andres Freund                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Andreas Karlsson
In reply to this post by Alexander Korotkov
On 12/14/2013 10:59 AM, Alexander Korotkov wrote:
> This patch allows to use index for order-by if order-by clause and index
> has non-empty common prefix. So, index gives right ordering for first n
> order-by columns. In order to provide right order for rest m columns,
> sort node is inserted. This sort node sorts groups of tuples where
> values of first n order-by columns are equal.

I recently looked at the same problem. I see that you solved the
rescanning problem by simply forcing the sort to be redone on
ExecReScanSort if you have done a partial sort.

My idea for a solution was to modify tuplesort to allow storing the
already sorted keys in either memtuples or the sort result file, but
setting a field so it does not sort thee already sorted tuples again.
This would allow the rescan to work as it used to, but I am unsure how
clean or ugly this code would be. Was this something you considered?

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
In reply to this post by Martijn van Oosterhout
On Sat, Dec 14, 2013 at 6:39 PM, Martijn van Oosterhout <[hidden email]> wrote:
On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote:
> > Is that actually all that beneficial when sorting with a bog standard
> > qsort() since that doesn't generally benefit from data being pre-sorted?
> > I think we might need to switch to a different algorithm to really
> > benefit from mostly pre-sorted input.
> >
>
> In this patch I don't do full sort of dataset. For instance, index returns
> data ordered by first column and we need to order them also by second
> column. Then this node sorts groups (assumed to be small) where values of
> the first column are same by value of second column. And with limit clause
> only required number of such groups will be processed. But, I don't think
> we should expect pre-sorted values of second column inside a group.

Nice. I imagine this would be mostly beneficial for fast-start plans,
since you no longer need to sort the whole table prior to returning the
first tuple.

Reduced memory usage might be a factor, especially for large sorts
where you otherwise might need to spool to disk.

You can now use an index on (a) to improve sorting for (a,b).

Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl
log l), useful for large n.

Agree. Your reasoning looks correct.
 
Minor comments:

I find cmpTuple a bad name. That's what it's doing but perhaps
cmpSkipColumns would be clearer.

I think it's worthwhile adding a seperate path for the skipCols = 0
case, to avoid extra copies.

Thanks. I'll take care about.

------
With best regards,
Alexander Korotkov.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
In reply to this post by Andres Freund-3
On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <[hidden email]> wrote:
Hi,

> > >  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > time=0.097..0.099 rows=10 loops=1)
> > >    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > time=0.096..0.097 rows=10 loops=1)
> > >          Sort Key: v1, v2
> > >          Sort Method: top-N heapsort  Memory: 25kB
> > >          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > >  Total runtime: 0.125 ms
> > > (6 rows)
> >
> > Is that actually all that beneficial when sorting with a bog standard
> > qsort() since that doesn't generally benefit from data being pre-sorted?
> > I think we might need to switch to a different algorithm to really
> > benefit from mostly pre-sorted input.
> >
>
> In this patch I don't do full sort of dataset. For instance, index returns
> data ordered by first column and we need to order them also by second
> column.

Ah, that makes sense.

> But, I don't think we should expect pre-sorted values of second column
> inside a group.

Yes, if you do it that way, there doesn't seem to any need to assume
that any more than we usually do.

I think you should make the explain output reflect the fact that we're
assuming v1 is presorted and just sorting v2. I'd be happy enough with:
Sort Key: v1, v2
Partial Sort: v2
or even just
"Partial Sort Key: [v1,] v2"
but I am sure others disagree.

Sure, I just didn't change explain output yet. It should look like what you propose. 

> > > *partial-knn-1.patch*

> > Rechecking from the heap means adding a sort node though, which I don't
> > see here? Or am I misunderstanding something?

> KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
> inside GiST and reinserted into same RB-tree. It appears to be much easier
> implementation for PoC and also looks very efficient. I'm not sure what is
> actually right design for it. This is what I like to discuss.

I don't have enough clue about gist to say wether it's the right design,
but it doesn't look wrong to my eyes. It'd probably be useful to export
the knowledge that we are rechecking and how often that happens to the
outside.
While I didn't really look into the patch, I noticed in passing that you
pass a all_dead variable to heap_hot_search_buffer without using the
result - just pass NULL instead, that performs a bit less work.

Useful notice, thanks.

------
With best regards,
Alexander Korotkov.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
In reply to this post by Andreas Karlsson
On Sat, Dec 14, 2013 at 11:47 PM, Andreas Karlsson <[hidden email]> wrote:
On 12/14/2013 10:59 AM, Alexander Korotkov wrote:
This patch allows to use index for order-by if order-by clause and index
has non-empty common prefix. So, index gives right ordering for first n
order-by columns. In order to provide right order for rest m columns,
sort node is inserted. This sort node sorts groups of tuples where
values of first n order-by columns are equal.

I recently looked at the same problem. I see that you solved the rescanning problem by simply forcing the sort to be redone on ExecReScanSort if you have done a partial sort.

Naturally, I'm sure I solved it at all :) I just get version of patch working for very limited use-cases.
 
My idea for a solution was to modify tuplesort to allow storing the already sorted keys in either memtuples or the sort result file, but setting a field so it does not sort thee already sorted tuples again. This would allow the rescan to work as it used to, but I am unsure how clean or ugly this code would be. Was this something you considered?

I'm not sure. I believe that best answer depends on particular parameter: how much memory we've for sort, how expensive is underlying node and how it performs rescan, how big are groups in partial sort.

------
With best regards,
Alexander Korotkov.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Andreas Karlsson
On 12/18/2013 01:02 PM, Alexander Korotkov wrote:

>     My idea for a solution was to modify tuplesort to allow storing the
>     already sorted keys in either memtuples or the sort result file, but
>     setting a field so it does not sort thee already sorted tuples
>     again. This would allow the rescan to work as it used to, but I am
>     unsure how clean or ugly this code would be. Was this something you
>     considered?
>
>
> I'm not sure. I believe that best answer depends on particular
> parameter: how much memory we've for sort, how expensive is underlying
> node and how it performs rescan, how big are groups in partial sort.

Yes, if one does not need a rescan your solution will use less memory
and about the same amount of CPU (if the tuplesort does not spill to
disk). While if we keep all the already sorted tuples in the tuplesort
rescans will be cheap but more memory will be used with an increased
chance of spilling to disk.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
In reply to this post by Alexander Korotkov
Hi!

Next revision. It expected to do better work with optimizer. It introduces presorted_keys argument of cost_sort function which represent number of keys already sorted in Path. Then this function uses estimate_num_groups to estimate number of groups with different values of presorted keys and assumes that dataset is uniformly divided by groups. get_cheapest_fractional_path_for_pathkeys tries to select the path matching most part of path keys.
You can see it's working pretty good on single table queries.

create table test as (select id, (random()*5)::int as v1, (random()*1000)::int as v2 from generate_series(1,1000000) id);
create index test_v1_idx on test (v1);
create index test_v1_v2_idx on test (v1, v2);
create index test_v2_idx on test (v2);
vacuum analyze;

postgres=# explain analyze select * from test order by v1, id;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=149244.84..151744.84 rows=1000000 width=12) (actual time=2111.476..2586.493 rows=1000000 loops=1)
   Sort Key: v1, id
   Sort Method: external merge  Disk: 21512kB
   ->  Seq Scan on test  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.012..113.815 rows=1000000 loops=1)
 Total runtime: 2683.011 ms
(5 rows)

postgres=# explain analyze select * from test order by v1, id limit 10;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11441.77..11442.18 rows=10 width=12) (actual time=79.980..79.982 rows=10 loops=1)
   ->  Partial sort  (cost=11441.77..53140.44 rows=1000000 width=12) (actual time=79.978..79.978 rows=10 loops=1)
         Sort Key: v1, id
         Presorted Key: v1
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Scan using test_v1_idx on test  (cost=0.42..47038.83 rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
 Total runtime: 81.786 ms
(7 rows)

postgres=# explain analyze select * from test order by v1, v2 limit 10;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.90 rows=10 width=12) (actual time=0.031..0.047 rows=10 loops=1)
   ->  Index Scan using test_v1_v2_idx on test  (cost=0.42..47286.28 rows=1000000 width=12) (actual time=0.029..0.043 rows=10 loops=1)
 Total runtime: 0.083 ms
(3 rows)

postgres=# explain analyze select * from test order by v2, id;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Partial sort  (cost=97.75..99925.50 rows=1000000 width=12) (actual time=1.069..1299.481 rows=1000000 loops=1)
   Sort Key: v2, id
   Presorted Key: v2
   Sort Method: quicksort  Memory: 52kB
   ->  Index Scan using test_v2_idx on test  (cost=0.42..47603.79 rows=1000000 width=12) (actual time=0.030..812.083 rows=1000000 loops=1)
 Total runtime: 1393.850 ms
(6 rows)

However, work with joins needs more improvements.

------
With best regards,
Alexander Korotkov.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

partial-sort-2.patch (69K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Martijn van Oosterhout
On Sun, Dec 22, 2013 at 07:38:05PM +0400, Alexander Korotkov wrote:

> Hi!
>
> Next revision. It expected to do better work with optimizer. It introduces
> presorted_keys argument of cost_sort function which represent number of
> keys already sorted in Path. Then this function uses estimate_num_groups to
> estimate number of groups with different values of presorted keys and
> assumes that dataset is uniformly divided by
> groups. get_cheapest_fractional_path_for_pathkeys tries to select the path
> matching most part of path keys.
> You can see it's working pretty good on single table queries.
Nice work! The plans look good and the calculated costs seem sane also.

I suppose the problem with the joins is generating the pathkeys?

Have a nice day,
--
Martijn van Oosterhout   <[hidden email]>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc (845 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
On Sun, Dec 22, 2013 at 8:12 PM, Martijn van Oosterhout <[hidden email]> wrote:
On Sun, Dec 22, 2013 at 07:38:05PM +0400, Alexander Korotkov wrote:
> Hi!
>
> Next revision. It expected to do better work with optimizer. It introduces
> presorted_keys argument of cost_sort function which represent number of
> keys already sorted in Path. Then this function uses estimate_num_groups to
> estimate number of groups with different values of presorted keys and
> assumes that dataset is uniformly divided by
> groups. get_cheapest_fractional_path_for_pathkeys tries to select the path
> matching most part of path keys.
> You can see it's working pretty good on single table queries.

Nice work! The plans look good and the calculated costs seem sane also.

I suppose the problem with the joins is generating the pathkeys?

In general, problem is that partial sort is alternative to do less restrictive merge join and filter it's results. As far as I can see, taking care about it require some rework of merge optimization. For now, I didn't get what it's going to look like. I'll try to dig more into details.

------
With best regards,
Alexander Korotkov.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
In reply to this post by Jeremy Harris
On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris <[hidden email]> wrote:
On 14/12/13 12:54, Andres Freund wrote:
On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
Currently when we need to get ordered result from table we have to choose
one of two approaches: get results from index in exact order we need or do
sort of tuples. However, it could be useful to mix both methods: get
results from index in order which partially meets our requirements and do
rest of work from heap.

------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
time=0.097..0.099 rows=10 loops=1)
    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
time=0.096..0.097 rows=10 loops=1)
          Sort Key: v1, v2
          Sort Method: top-N heapsort  Memory: 25kB
          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
  Total runtime: 0.125 ms
(6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

Eg:  http://www.postgresql.org/message-id/5291467E.6070807@wizmail.org

Maybe Alexander and I should bash our heads together.

Partial sort patch is mostly optimizer/executor improvement rather than improvement of sort algorithm itself. But I would appreciate using enchantments of sorting algorithms in my work.

------
With best regards,
Alexander Korotkov. 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Andreas Karlsson
In reply to this post by Alexander Korotkov
On 12/22/2013 04:38 PM, Alexander Korotkov wrote:

> postgres=# explain analyze select * from test order by v1, id limit 10;
>                                                                    QUERY
> PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=11441.77..11442.18 rows=10 width=12) (actual
> time=79.980..79.982 rows=10 loops=1)
>     ->  Partial sort  (cost=11441.77..53140.44 rows=1000000 width=12)
> (actual time=79.978..79.978 rows=10 loops=1)
>           Sort Key: v1, id
>           Presorted Key: v1
>           Sort Method: top-N heapsort  Memory: 25kB
>           ->  Index Scan using test_v1_idx on test  (cost=0.42..47038.83
> rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
>   Total runtime: 81.786 ms
> (7 rows)

Have you thought about how do you plan to print which sort method and
how much memory was used? Several different sort methods may have been
use in the query. Should the largest amount of memory/disk be printed?

> However, work with joins needs more improvements.

That would be really nice to have, but the patch seems useful even
without the improvements to joins.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson <[hidden email]> wrote:
On 12/22/2013 04:38 PM, Alexander Korotkov wrote:
postgres=# explain analyze select * from test order by v1, id limit 10;
                                                                   QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=11441.77..11442.18 rows=10 width=12) (actual
time=79.980..79.982 rows=10 loops=1)
    ->  Partial sort  (cost=11441.77..53140.44 rows=1000000 width=12)
(actual time=79.978..79.978 rows=10 loops=1)
          Sort Key: v1, id
          Presorted Key: v1
          Sort Method: top-N heapsort  Memory: 25kB
          ->  Index Scan using test_v1_idx on test  (cost=0.42..47038.83
rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
  Total runtime: 81.786 ms
(7 rows)

Have you thought about how do you plan to print which sort method and how much memory was used? Several different sort methods may have been use in the query. Should the largest amount of memory/disk be printed?

Apparently, now amount of memory for sorted last group is printed. Your proposal makes sense: largest amount of memory/disk should be printed.
 
However, work with joins needs more improvements.

That would be really nice to have, but the patch seems useful even without the improvements to joins.

Attached revision of patch implements partial sort usage in merge joins.

create table test1 as (
select id, 
(random()*100)::int as v1, 
(random()*10000)::int as v2
from generate_series(1,1000000) id);

create table test2 as (
select id, 
(random()*100)::int as v1, 
(random()*10000)::int as v2
from generate_series(1,1000000) id);
create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);

create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);

# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1 and t1.v2 = t2.v2;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=2257.67..255273.39 rows=983360 width=24)
   Merge Cond: ((t1.v1 = t2.v1) AND (t1.v2 = t2.v2))
   ->  Partial sort  (cost=1128.84..116470.79 rows=1000000 width=12)
         Sort Key: t1.v1, t1.v2
         Presorted Key: t1.v1
         ->  Index Scan using test1_v1_idx on test1 t1  (cost=0.42..47604.01 rows=1000000 width=12)
   ->  Materialize  (cost=1128.83..118969.00 rows=1000000 width=12)
         ->  Partial sort  (cost=1128.83..116469.00 rows=1000000 width=12)
               Sort Key: t2.v1, t2.v2
               Presorted Key: t2.v1
               ->  Index Scan using test2_v1_idx on test2 t2  (cost=0.42..47602.22 rows=1000000 width=12)

I believe now patch covers desired functionality. I'm going to focus on nailing down details, refactoring and documenting.

------
With best regards,
Alexander Korotkov.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

partial-sort-3.patch (90K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

David Rowley
On Sat, Dec 28, 2013 at 9:28 PM, Alexander Korotkov <[hidden email]> wrote:
On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson <[hidden email]> wrote:
Attached revision of patch implements partial sort usage in merge joins.



I'm looking forward to doing a bit of testing on this patch. I think it is a really useful feature to get a bit more out of existing indexes.

I was about to test it tonight, but I'm having trouble getting the patch to compile... I'm really wondering which compiler you are using as it seems you're declaring your variables in some strange places.. See nodeSort.c line 101. These variables are declared after there has been an if statement in the same scope. That's not valid in C. (The patch did however apply without any complaints).

Here's a list of the errors I get when compiling with visual studios on windows.

"D:\Postgres\c\pgsql.sln" (default target) (1) ->
"D:\Postgres\c\postgres.vcxproj" (default target) (2) ->
(ClCompile target) ->
  src\backend\executor\nodeSort.c(101): error C2275: 'Sort' : illegal use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(101): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(102): error C2275: 'PlanState' : illegal use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(102): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2275: 'TupleDesc' : illegal use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2146: syntax error : missing ';' before identifier 'tupDesc' [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(120): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(121): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(121): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(125): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(126): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(126): error C2223: left of '->numCols' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(127): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(127): error C2223: left of '->sortColIdx' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(128): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(128): error C2223: left of '->sortOperators' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(129): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(129): error C2223: left of '->collations' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(130): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(130): error C2223: left of '->nullsFirst' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(132): error C2198: 'tuplesort_begin_heap' : too few arguments for call [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(143): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(167): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]

    13 Warning(s)
    24 Error(s)
 

Regards

David Rowley
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

Alexander Korotkov
On Sat, Dec 28, 2013 at 1:04 PM, David Rowley <[hidden email]> wrote:
On Sat, Dec 28, 2013 at 9:28 PM, Alexander Korotkov <[hidden email]> wrote:
On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson <[hidden email]> wrote:
Attached revision of patch implements partial sort usage in merge joins.



I'm looking forward to doing a bit of testing on this patch. I think it is a really useful feature to get a bit more out of existing indexes.

I was about to test it tonight, but I'm having trouble getting the patch to compile... I'm really wondering which compiler you are using as it seems you're declaring your variables in some strange places.. See nodeSort.c line 101. These variables are declared after there has been an if statement in the same scope. That's not valid in C. (The patch did however apply without any complaints).

Here's a list of the errors I get when compiling with visual studios on windows.

"D:\Postgres\c\pgsql.sln" (default target) (1) ->
"D:\Postgres\c\postgres.vcxproj" (default target) (2) ->
(ClCompile target) ->
  src\backend\executor\nodeSort.c(101): error C2275: 'Sort' : illegal use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(101): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(102): error C2275: 'PlanState' : illegal use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(102): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2275: 'TupleDesc' : illegal use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2146: syntax error : missing ';' before identifier 'tupDesc' [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(120): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(121): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(121): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(125): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(126): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(126): error C2223: left of '->numCols' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(127): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(127): error C2223: left of '->sortColIdx' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(128): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(128): error C2223: left of '->sortOperators' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(129): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(129): error C2223: left of '->collations' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(130): error C2065: 'plannode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(130): error C2223: left of '->nullsFirst' must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(132): error C2198: 'tuplesort_begin_heap' : too few arguments for call [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(143): error C2065: 'outerNode' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(167): error C2065: 'tupDesc' : undeclared identifier [D:\Postgres\c\postgres.vcxproj]

    13 Warning(s)
    24 Error(s)

I've compiled it with clang. Yeah, there was mixed declarations. I've rechecked it with gcc, now it gives no warnings. I didn't try it with visual studio, but I hope it will be OK.

------
With best regards,
Alexander Korotkov.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

partial-sort-4.patch (90K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PoC: Partial sort

David Rowley
On Sun, Dec 29, 2013 at 4:51 AM, Alexander Korotkov <[hidden email]> wrote:
I've compiled it with clang. Yeah, there was mixed declarations. I've rechecked it with gcc, now it gives no warnings. I didn't try it with visual studio, but I hope it will be OK.


Thanks for the patch. It now compiles without any problems.
I've been doing a bit of testing with the patch testing a few different workloads. One thing that I've found is that in my test case when the table only contains 1 tuple for any given presort columns that the query is actually slower than when there are say 100 tuples to sort for any given presort group.

Here is my test case:

DROP TABLE IF EXISTS temperature_readings;

CREATE TABLE temperature_readings (
  readingid SERIAL NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  locationid INT NOT NULL,
  temperature INT NOT NULL,
  PRIMARY KEY (readingid)
);

INSERT INTO temperature_readings (timestamp,locationid,temperature)
SELECT ts.timestamp, loc.locationid, -10 + random() * 40
FROM generate_series('1900-04-01','2000-04-01','1 day'::interval) ts(timestamp)
CROSS JOIN generate_series(1,1) loc(locationid);

VACUUM ANALYZE temperature_readings;

-- Warm buffers
SELECT AVG(temperature) FROM temperature_readings;

explain (buffers, analyze) select * from temperature_readings order by timestamp,locationid; -- (seqscan -> sort) 70.805ms

-- create an index to allow presorting on timestamp.
CREATE INDEX temperature_readings_timestamp_idx ON temperature_readings (timestamp);

-- warm index buffers
SELECT COUNT(*) FROM (SELECT timestamp FROM temperature_readings ORDER BY timestamp) c;

explain (buffers, analyze) select * from temperature_readings order by timestamp,locationid; -- index scan -> partial sort 253.032ms

The first query without the index to presort on runs in 70.805 ms, the 2nd query uses the index to presort and runs in 253.032 ms.

I ran the code through a performance profiler and found that about 80% of the time is spent in tuplesort_end and tuplesort_begin_heap. 

If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan -> sort. The code the way it is seems to lookup the sort functions from the syscache for each group then allocate some sort space, so quite a bit of time is also spent in palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number of sort groups would be better so that the optimization is skipped if there are too many sort groups.

Regards

David Rowley




 
------
With best regards,
Alexander Korotkov.

12345
Previous Thread Next Thread
Loading...