POC: GROUP BY optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

POC: GROUP BY optimization

Teodor Sigaev
Hi!

As far as I known, columns in GROUP BY could be reordered without loss of
correctness. But reorder could give a benefits in performance. Suggested patch
implements that in several ways:

1) Reorder GROUP BY columns by number of unique values in descent order. Simple
example shows a performance difference by two times (see
opt_group_by_demostrator.sql, on my notebook first two queries demonstrate
that). The idea is: if first column is unique then sort comparator will never
compute difference of following columns.

2) Reorder GROUP BY columns to match existing pathkeys which are result of index
scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win.

3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER
BY or merge join case it doesn't pay attention to actual order of columns
because of 2)

Patch doesn't change any cost estimation computation, although 1) could take an
advantage of it.

Some issues/problems/notices:

1) I didn't play around GROUPING SETS at all. As I can see, current coding
prevents any optimization around it and, suppose, it should be addressed  to
another patch

2) I'm not completely satisfied with counting of number of groups per column, it
looks ugly without refactoring estimate_num_groups(). At least, now it could be
called multiple times for each column. May be, this number should be added to
PathKey structure?

3) EquivalenceClass->ec_sortref. If planner faced with column first time in
WHERE clause it doesn't fill target reference field because it is unknown yet.
Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause
(groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates
ec_sortref field if it's not initialized yet.

4) Algorithms to reorder columns is proportional to N^2 where N is number of
columns, but I hope it isn't a problem because number of GROUP BY columns isn't
very big usually.




--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

opt_group_by-v5.patch (25K) Download Attachment
opt_group_by_demostrator.sql (920 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
Cosmetics change: remove find_sort_group_clause_by_sortref() function added in
v5 patch version because it duplicates existsing get_sortgroupref_clause().

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

opt_group_by-v6.patch (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4
Hi Teodor,

Thanks for the patch. This (missing) optimization popped-up repeatedly
recently, and I was planning to look into it for PG12. So now I don't
have to, because you've done all the hard work ;-)

I've have done only very basic review so far, but ISTM the general
approach of considering additional indexes in create_index_paths is
correct. That being said, I do have a couple of comments:


1) add_path() ensures that we only keep the one cheapest path sorted
path for each pathkeys. This patch somewhat defeats that because it
considers additional pathkeys (essentially any permutation of group
keys) as interesting. So we may end up with more paths in the list.

I wonder if we should make the add_path() logic smarter to recognize
when two paths have different pathkeys but were generated to match the
same grouping, to reduce the number of paths added by this optimization.
Currently we do that for each pathkeys list independently, but we're
considering many more pathkeys orderings ...

That being said, maybe this concern is moot - to generate large number
of new paths, there would have to be a lot of indexes matching the group
by clause (as a prefix), and the group by clause would need to be fairly
large (to have many permutations). That seems like a fairly rare case. I
do know a couple of people who do create large numbers of indexes on
tables, but those are usually single-column indexes.


2) sort reordering based on ndistinct estimates

This part of the patch (reordering the columns in an explicit sort node)
seems to be largely independent of the "consider other indexes" part. So
I do suggest separating those two, and discussing them independently.

But thinking about this optimization, I'm worried it relies on a couple
of important assumptions. For now those decisions could have be made by
the person writing the SQL query, but this optimization makes that
impossible. So we really need to get this right.

For example, it seems to disregard that different data types have
different comparison costs. For example comparing bytea will be far more
expensive compared to int4, so it may be much more efficient to compare
int4 columns first, even if there are far fewer distinct values in them.

This costing is probably related to abbreviated keys too, which is a
huge improvement for text and other varlena-based types.

Also, simply sorting the columns by their ndistinct estimate is somewhat
naive, because it assumes the columns are independent. Imagine for
example a table with three columns:

    CREATE TABLE t (a INT, b INT, c INT);

    INSERT INTO t SELECT mod(i,100), mod(i,100)/2, 49*random()
      FROM generate_series(1,1000000) s(i);

Clearly, "a" has the most distinct values (100), while both "b" and "c"
have 50 distinct values. But "b" does not actually split any of the "a"
groups, while "c" does:

     select count(*) from (select 1 from t group by a,b) foo;
      count
     -------
        100
     (1 row)

     select count(*) from (select 1 from t group by a,c) foo;
      count
     -------
       5000
     (1 row)

So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use
"(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using
per-column ndistinct values. Luckily, we now have ndistinct multi-column
coefficients which could be used to decide this I believe (but I haven't
tried).

The real issue however is that this decision can't be made entirely
locally. Consider for example this:

     explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
                                       QUERY PLAN

 
------------------------------------------------------------------------------
      Sort  (cost=156010.16..156260.16 rows=100000 width=20)
        Sort Key: c, b, a
        ->  GroupAggregate  (cost=132154.34..145654.34 rows=100000 width=20)
              Group Key: a, c, b
              ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
                    Sort Key: a, c, b
                    ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000
width=12)
     (7 rows)

while without the patch, this would happen:

     explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
                                    QUERY PLAN

 
------------------------------------------------------------------------
      GroupAggregate  (cost=132154.34..145654.34 rows=100000 width=20)
        Group Key: c, b, a
        ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
              Sort Key: c, b, a
              ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=12)
     (5 rows)

Which is clearly cheaper (at least according to the optimizer) than
doing two separate sorts. So the optimization actually does the wrong
thing here, and it needs to somehow consider the other ordering
requirements (in this case ORDER BY) - either by generating multiple
paths with different orderings or some heuristics.

I'm also wondering how freely we can change the group by result
ordering. Assume you disable hash aggregate and parallel query -
currently we are guaranteed to use group aggregate that produces exactly
the ordering as specified in GROUP BY, but this patch removes that
"guarantee" and we can produce arbitrary permutation of the ordering.
But as I argued in other threads, such implicit guarantees are really
fragile, can silently break for arbitrary reasons (say, parallel query
will do just that) and can be easily fixed by adding a proper ORDER BY.
So I don't think this is an issue.

The other random thought is how will/could this interact with the
incremental sorts patch. I know Alexander wanted to significantly limit
where we actually consider the incremental sort, but I'm not sure if
this was one of those places or not (is sure looks like a place where we
would greatly benefit from it).

So to summarize this initial review - I do suggest splitting the patch
into two parts. One that does the index magic, and one for this
reordering optimization. The first part (additional indexes) seems quite
fairly safe, likely to get committable soon. The other part (ndistinct
reordering) IMHO requires more thought regarding costing and interaction
with other query parts.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
> Thanks for the patch. This (missing) optimization popped-up repeatedly recently,
> and I was planning to look into it for PG12. So now I don't have to, because
> you've done all the hard work ;-)
You are welcome. Actually one of out customers faced the problem with  GROUP BY
column order and exactly with reordering without any indexes, you mean it as
problem 2). Index optimization was noticed by me later. But based on your
suggested patch's order I split the patch to index  and non-index part and
second part depends of first one. They touch the same part of code and they
could not be independent

> 1) add_path() ensures that we only keep the one cheapest path sorted path for
> each pathkeys. This patch somewhat defeats that because it considers additional
> pathkeys (essentially any permutation of group keys) as interesting. So we may
> end up with more paths in the list.
Seems, index scan here could give benefits here only if:
   1) it's a index only scan
   2) it's a index full (opposite to only) scan but physical order of heap is
      close to logical index order (ie table is clustered)

In other cases costs of disk seeking will be very high. But on this stage of
planing we don't know that facts yet. So we couldn't make a good decision here
and should believe in add_path() logic.

 > I wonder if we should make the add_path() logic smarter to recognize when two
 > paths have different pathkeys but were generated to match the same grouping,
 > to reduce the number of paths added by this optimization. Currently we do
that > for each pathkeys list independently, but we're considering many more
pathkeys > orderings ...

Hm. I tend to say no.
select .. from t1 group by a, b
union
select .. from t2 group by a, b

t1 and t2 could have different set of indexes and different distribution, so
locally it could be cheaper to use one index (for example, one defined as (b, a)
and second as (a,b,c,d) - second is larger) but totally - another (example:
second table doesn't have (b,a) index)

> 2) sort reordering based on ndistinct estimates

> But thinking about this optimization, I'm worried it relies on a couple of
> important assumptions. For now those decisions could have be made by the person
> writing the SQL query, but this optimization makes that impossible. So we really
> need to get this right.
Hm, sql by design should not be used that way, but, of course, it's used :(

> For example, it seems to disregard that different data types have different
> comparison costs. For example comparing bytea will be far more expensive
> compared to int4, so it may be much more efficient to compare int4 columns
> first, even if there are far fewer distinct values in them.
as I can see cost_sort() doesn't pay attention to this details. And it should be
a separated patch to improve that.

> Also, simply sorting the columns by their ndistinct estimate is somewhat naive,
> because it assumes the columns are independent. Imagine for example a table with
> three columns:
> So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use
> "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using
> per-column ndistinct values. Luckily, we now have ndistinct multi-column
> coefficients which could be used to decide this I believe (but I haven't tried).
Agree, but I don't know how to use it here. Except, may be:
1) first column - the column with bigger estimated number of groups
2) second column - the pair of (first, second) with bigger estimated number of
groups
3) third column - the triple of (first, second, third) with bigger ...

But seems even with that algorithm, ISTM, it could be implemented in cheaper manner.


> The real issue however is that this decision can't be made entirely locally.
> Consider for example this:
>
>      explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
>
> Which is clearly cheaper (at least according to the optimizer) than doing two
> separate sorts. So the optimization actually does the wrong thing here, and it
> needs to somehow consider the other ordering requirements (in this case ORDER
> BY) - either by generating multiple paths with different orderings or some
> heuristics.
Hm, thank you. I consider it is a bug of my implementation - basic idea was that
we try to match already existing or needed order and only if we fail or have
unmatched tail of pathkey list than we will try to find cheapest column order.
Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by naive
way: if we don't have a path pathkey first try to reorder columns accordingly to
order by clause. Test for your is also added.


> I'm also wondering how freely we can change the group by result ordering. Assume
> you disable hash aggregate and parallel query - currently we are guaranteed to
> use group aggregate that produces exactly the ordering as specified in GROUP BY,
> but this patch removes that "guarantee" and we can produce arbitrary permutation
> of the ordering. But as I argued in other threads, such implicit guarantees are
> really fragile, can silently break for arbitrary reasons (say, parallel query
> will do just that) and can be easily fixed by adding a proper ORDER BY. So I
> don't think this is an issue.
Agree. SQL by design doesn't give a warranty of particular order without
explicit ORDER BY clause.

> The other random thought is how will/could this interact with the incremental
> sorts patch. I know Alexander wanted to significantly limit where we actually
> consider the incremental sort, but I'm not sure if this was one of those places
> or not (is sure looks like a place where we would greatly benefit from it).
Seems, they should not directly interact. Patch tries to find cheapest column
order, Alexander's patch tries to make sort cheaper - they are a different
tasks. But I will try.

> So to summarize this initial review - I do suggest splitting the patch into two
> parts. One that does the index magic, and one for this reordering optimization.
> The first part (additional indexes) seems quite fairly safe, likely to get
> committable soon. The other part (ndistinct reordering) IMHO requires more
> thought regarding costing and interaction with other query parts.
Thank you for review!

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

0002-opt_group_by_index_and_order-v7.patch (13K) Download Attachment
0001-opt_group_by_index-v7.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4

On 06/05/2018 07:56 PM, Teodor Sigaev wrote:

>> Thanks for the patch. This (missing) optimization popped-up repeatedly
>> recently, and I was planning to look into it for PG12. So now I don't
>> have to, because you've done all the hard work ;-)
> You are welcome. Actually one of out customers faced the problem with  
> GROUP BY column order and exactly with reordering without any indexes,
> you mean it as problem 2). Index optimization was noticed by me later.
> But based on your suggested patch's order I split the patch to index  
> and non-index part and second part depends of first one. They touch the
> same part of code and they could not be independent
>

The way I see it the patch does two different things:

a) consider additional indexes by relaxing the pathkeys check

b) if there are no indexes, i.e. when an explicit Sort is needed,
consider reordering the columns by ndistinct

Not sure why those two parts couldn't be separated. I haven't tried
splitting the patch, of course, so I may be missing something.

In the worst case, one part will depend on the other, which is OK. It
still allows us to commit the first part and continue working on the
other one, for example.

>> 1) add_path() ensures that we only keep the one cheapest path sorted
>> path for each pathkeys. This patch somewhat defeats that because it
>> considers additional pathkeys (essentially any permutation of group
>> keys) as interesting. So we may end up with more paths in the list.
> Seems, index scan here could give benefits here only if:
>    1) it's a index only scan
>    2) it's a index full (opposite to only) scan but physical order of
> heap is
>       close to logical index order (ie table is clustered)
>
> In other cases costs of disk seeking will be very high. But on this
> stage of planing we don't know that facts yet. So we couldn't make a
> good decision here and should believe in add_path() logic.
>
Not sure what you mean? Surely we do costing of the paths at this stage,
so we can decide which one is cheaper etc. The decision which paths to
keep is done by add_path(), and it should stay like this, of course. I
wasn't suggesting to move the logic elsewhere.

>  > I wonder if we should make the add_path() logic smarter to recognize
> when two
>  > paths have different pathkeys but were generated to match the same
> grouping,
>  > to reduce the number of paths added by this optimization. Currently
> we do that > for each pathkeys list independently, but we're considering
> many more pathkeys > orderings ...
>
> Hm. I tend to say no.
> select .. from t1 group by a, b
> union
> select .. from t2 group by a, b
>
> t1 and t2 could have different set of indexes and different
> distribution, so locally it could be cheaper to use one index (for
> example, one defined as (b, a) and second as (a,b,c,d) - second is
> larger) but totally - another (example: second table doesn't have (b,a)
> index)
>

But add_path() treats each of the relations independently, why couldn't
we pick a different index for each of the two relations?

>> 2) sort reordering based on ndistinct estimates
>
>> But thinking about this optimization, I'm worried it relies on a
>> couple of important assumptions. For now those decisions could have be
>> made by the person writing the SQL query, but this optimization makes
>> that impossible. So we really need to get this right.
> Hm, sql by design should not be used that way, but, of course, it's used :(
>

Well, yes and no. I'm not worried about people relying on us to give
them some ordering - they can (and should) add an ORDER BY clause to fix
that. I'm more worried about the other stuff.

>> For example, it seems to disregard that different data types have
>> different comparison costs. For example comparing bytea will be far
>> more expensive compared to int4, so it may be much more efficient to
>> compare int4 columns first, even if there are far fewer distinct
>> values in them.
> as I can see cost_sort() doesn't pay attention to this details. And it
> should be a separated patch to improve that.
>

But sort also does not reorder columns.

Imagine you have a custom data type that is expensive for comparisons.
You know that, so you place it at the end of GROUP BY clauses, to reduce
the number of comparisons on that field. And then we come along and just
reorder the columns, placing it first, because it happens to have a high
ndistinct statistic. And there's no way to get the original behavior :-(

>> Also, simply sorting the columns by their ndistinct estimate is
>> somewhat naive, because it assumes the columns are independent.
>> Imagine for example a table with three columns:
>> So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to
>> use "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely
>> using per-column ndistinct values. Luckily, we now have ndistinct
>> multi-column coefficients which could be used to decide this I believe
>> (but I haven't tried).
> Agree, but I don't know how to use it here. Except, may be:
> 1) first column - the column with bigger estimated number of groups
> 2) second column - the pair of (first, second) with bigger estimated
> number of groups
> 3) third column - the triple of (first, second, third) with bigger ...
>
> But seems even with that algorithm, ISTM, it could be implemented in
> cheaper manner.
>

Maybe. I do have some ideas, although I haven't tried implementing it.

If there's no extended statistic on the columns, you can do the current
thing (assuming independence etc.). There's not much we can do here.

If there's an extended statistic, you can do either a greedy search (get
the next column with the highest ndistinct coefficient) or exhaustive
search (computing the estimated number of comparisons).

Another challenge is that using only the ndistinct coefficient assumes
uniform distribution of the values. But you can have a column with 1M
distinct values, where a single value represents 99% of the rows. And
another column with 100k distinct values, with actual uniform
distribution. I'm pretty sure it'd be more efficient to place the 100k
column first.

>
>> The real issue however is that this decision can't be made entirely
>> locally. Consider for example this:
>>
>>      explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
>>
>> Which is clearly cheaper (at least according to the optimizer) than
>> doing two separate sorts. So the optimization actually does the wrong
>> thing here, and it needs to somehow consider the other ordering
>> requirements (in this case ORDER BY) - either by generating multiple
>> paths with different orderings or some heuristics.
> Hm, thank you. I consider it is a bug of my implementation - basic idea
> was that we try to match already existing or needed order and only if we
> fail or have unmatched tail of pathkey list than we will try to find
> cheapest column order.
> Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by
> naive way: if we don't have a path pathkey first try to reorder columns
> accordingly to order by clause. Test for your is also added.
>

OK. I'll give it a try.

>
>> I'm also wondering how freely we can change the group by result
>> ordering. Assume you disable hash aggregate and parallel query -
>> currently we are guaranteed to use group aggregate that produces
>> exactly the ordering as specified in GROUP BY, but this patch removes
>> that "guarantee" and we can produce arbitrary permutation of the
>> ordering. But as I argued in other threads, such implicit guarantees
>> are really fragile, can silently break for arbitrary reasons (say,
>> parallel query will do just that) and can be easily fixed by adding a
>> proper ORDER BY. So I don't think this is an issue.
> Agree. SQL by design doesn't give a warranty of particular order without
> explicit ORDER BY clause.
>
>> The other random thought is how will/could this interact with the
>> incremental sorts patch. I know Alexander wanted to significantly
>> limit where we actually consider the incremental sort, but I'm not
>> sure if this was one of those places or not (is sure looks like a
>> place where we would greatly benefit from it).
> Seems, they should not directly interact. Patch tries to find cheapest
> column order, Alexander's patch tries to make sort cheaper - they are a
> different tasks. But I will try.
>

Thanks.

>> So to summarize this initial review - I do suggest splitting the patch
>> into two parts. One that does the index magic, and one for this
>> reordering optimization. The first part (additional indexes) seems
>> quite fairly safe, likely to get committable soon. The other part
>> (ndistinct reordering) IMHO requires more thought regarding costing
>> and interaction with other query parts.
> Thank you for review!
>

;-)


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev

>> problem 2). Index optimization was noticed by me later. But based on your
>> suggested patch's order I split the patch to index and non-index part and
>> second part depends of first one. They touch the same part of code and they
>> could not be independent
> The way I see it the patch does two different things:
Yes

> a) consider additional indexes by relaxing the pathkeys check
Yes, but not only. Patch could reorder GROUP BY clause to match existing
pathkeys which could come from index scan (suggested by patch or not) or some
another way to get ordered output.

> b) if there are no indexes, i.e. when an explicit Sort is needed, consider
> reordering the columns by ndistinct
Yes. But again, this description is a bit short. First it works after first
patch and might get some preordered leading pathkeys. Second, it tries to match
ORDER BY clause order if there is no preordered leading pathkeys from first
patch (it was introduced in v7). And third, if there is a tail of unmatched
pathkeys on previous stages then it will reorder that tail.

> In the worst case, one part will depend on the other, which is OK. It still
> allows us to commit the first part and continue working on the other one, for
> example.
Exactly it's our case. Of course, it's possible to split first patch for two
again: just suggestion of index (1.1) and reordering by existing pathkeys (1.2).
Then 1.1 will be independent but 2 still should be applied after 1.2. But patch
1.1 is rather useless.

> can decide which one is cheaper etc. The decision which paths to keep is done by
> add_path(), and it should stay like this, of course. I wasn't suggesting to move
> the logic elsewhere.
Cool, I haven't intention to modify it too.

>
>>  > I wonder if we should make the add_path() logic smarter to recognize when two
>>  > paths have different pathkeys but were generated to match the same grouping,
>>  > to reduce the number of paths added by this optimization. Currently we do
>> that > for each pathkeys list independently, but we're considering many more
>> pathkeys > orderings ...
>>
>> Hm. I tend to say no.
>> select .. from t1 group by a, b
>> union
>> select .. from t2 group by a, b
>>
>> t1 and t2 could have different set of indexes and different distribution, so
>> locally it could be cheaper to use one index (for example, one defined as (b,
>> a) and second as (a,b,c,d) - second is larger) but totally - another (example:
>> second table doesn't have (b,a) index)
>>
>
> But add_path() treats each of the relations independently, why couldn't we pick
> a different index for each of the two relations?
Having of sorted output of both subselect could be cheaper that sorting one of
them even if index scan was cheaper. But it seems to me that is not deal of
suggested here optimization.


>>> For example, it seems to disregard that different data types have different
>>> comparison costs. For example comparing bytea will be far more expensive
>>> compared to int4, so it may be much more efficient to compare int4 columns
>>> first, even if there are far fewer distinct values in them.
>> as I can see cost_sort() doesn't pay attention to this details. And it should
>> be a separated patch to improve that.
>>
>
> But sort also does not reorder columns.
Yes. But estimation of cost of comparing function/number of unique values in
column could be not very accurate and so planner could make a wrong choice. I
saw 2 times difference in real-world application. Again, improving sort cost
estimation is a separate task.

>
> Imagine you have a custom data type that is expensive for comparisons. You know
> that, so you place it at the end of GROUP BY clauses, to reduce the number of
> comparisons on that field. And then we come along and just reorder the columns,
> placing it first, because it happens to have a high ndistinct statistic. And
> there's no way to get the original behavior :-(
Hm. that it could be, but I don't know how to improve here.  Current cost_sort()
will return the same cost for any columns order.

Okay, here we know an estimation of nrow, we could somehow find a comparing
function oid and find a its procost field. And then? we also need to know width
of tuple here and mostly repeat the cost_sort.

Another option is an introducing enable_groupby_reorder GUC variable although
personally I don't like an idea to add new GUC variable.

>> Agree, but I don't know how to use it here. Except, may be:
>> 1) first column - the column with bigger estimated number of groups
>> 2) second column - the pair of (first, second) with bigger estimated number of
>> groups
>> 3) third column - the triple of (first, second, third) with bigger ...
>>
>> But seems even with that algorithm, ISTM, it could be implemented in cheaper
>> manner.
>>
>
> Maybe. I do have some ideas, although I haven't tried implementing it.
Implemented, pls, have a look.

>
> If there's no extended statistic on the columns, you can do the current thing
> (assuming independence etc.). There's not much we can do here.
Agree

>
> If there's an extended statistic, you can do either a greedy search (get the
> next column with the highest ndistinct coefficient) or exhaustive search
> (computing the estimated number of comparisons).
>
> Another challenge is that using only the ndistinct coefficient assumes uniform
> distribution of the values. But you can have a column with 1M distinct values,
> where a single value represents 99% of the rows. And another column with 100k
> distinct values, with actual uniform distribution. I'm pretty sure it'd be more
> efficient to place the 100k column first.
Interesting.  Will think, thank you


--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

0001-opt_group_by_index-v8.patch (17K) Download Attachment
0002-opt_group_by_index_and_order-v8.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4
On 06/06/2018 08:04 PM, Teodor Sigaev wrote:

>
>>> problem 2). Index optimization was noticed by me later. But based on
>>> your suggested patch's order I split the patch to index and non-index
>>> part and second part depends of first one. They touch the same part
>>> of code and they could not be independent
>> The way I see it the patch does two different things:
> Yes
>
>> a) consider additional indexes by relaxing the pathkeys check
> Yes, but not only. Patch could reorder GROUP BY clause to match existing
> pathkeys which could come from index scan (suggested by patch or not) or
> some another way to get ordered output.
>
>> b) if there are no indexes, i.e. when an explicit Sort is needed,
>> consider reordering the columns by ndistinct
> Yes. But again, this description is a bit short. First it works after
> first patch and might get some preordered leading pathkeys. Second, it
> tries to match ORDER BY clause order if there is no preordered leading
> pathkeys from first patch (it was introduced in v7). And third, if there
> is a tail of unmatched pathkeys on previous stages then it will reorder
> that tail.
>

OK, I haven't looked at v7 yet, but if I understand correctly it tries
to maintain the ordering as much as possible? Does that actually help? I
mean, the incremental sort patch allows the sorting to happen by pieces,
but here we still need to sort all the data, right?

Can you give an example demonstrating the benefit?

>> In the worst case, one part will depend on the other, which is OK. It
>> still allows us to commit the first part and continue working on the
>> other one, for example.
> Exactly it's our case. Of course, it's possible to split first patch for
> two again: just suggestion of index (1.1) and reordering by existing
> pathkeys (1.2). Then 1.1 will be independent but 2 still should be
> applied after 1.2. But patch 1.1 is rather useless.
>
>> can decide which one is cheaper etc. The decision which paths to keep
>> is done by add_path(), and it should stay like this, of course. I
>> wasn't suggesting to move the logic elsewhere.
> Cool, I haven't intention to modify it too.
>
>>
>>>  > I wonder if we should make the add_path() logic smarter to
>>> recognize when two
>>>  > paths have different pathkeys but were generated to match the same
>>> grouping,
>>>  > to reduce the number of paths added by this optimization.
>>> Currently we do that > for each pathkeys list independently, but
>>> we're considering many more pathkeys > orderings ...
>>>
>>> Hm. I tend to say no.
>>> select .. from t1 group by a, b
>>> union
>>> select .. from t2 group by a, b
>>>
>>> t1 and t2 could have different set of indexes and different
>>> distribution, so locally it could be cheaper to use one index (for
>>> example, one defined as (b, a) and second as (a,b,c,d) - second is
>>> larger) but totally - another (example: second table doesn't have
>>> (b,a) index)
>>>
>>
>> But add_path() treats each of the relations independently, why
>> couldn't we pick a different index for each of the two relations?
>
> Having of sorted output of both subselect could be cheaper that sorting
> one of them even if index scan was cheaper. But it seems to me that is
> not deal of suggested here optimization.
>

OK, I think I understand what you're trying to say. We may have a query
like this:

SELECT a, SUM(x) FROM
(
    SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b
    UNION ALL
    SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b
) foo GROUP BY a;

and indexes on (a,b) and (b,a) for both relations. The "deduplication"
by pathkeys I suggested would mean we might keep only index (a,b) on t1
and (b,a) on t2, which means the grouping by "a" can't leverage index
scans on both relations. But if we keep paths for both indexes on each
relation, we can.

That's a good point, yes.

>
>>>> For example, it seems to disregard that different data types have
>>>> different comparison costs. For example comparing bytea will be far
>>>> more expensive compared to int4, so it may be much more efficient to
>>>> compare int4 columns first, even if there are far fewer distinct
>>>> values in them.
>>> as I can see cost_sort() doesn't pay attention to this details. And
>>> it should be a separated patch to improve that.
>>>
>>
>> But sort also does not reorder columns.
> Yes. But estimation of cost of comparing function/number of unique
> values in column could be not very accurate and so planner could make
> a wrong choice.

Isn't "estimation of cost of comparing function/number of unique values
in column could be not very accurate and so planner could make a wrong
choice" is more an argument against relying on it when doing these
optimizations?

FWIW it's one of the arguments Tom made in the incremental sort patch,
which relies on it too when computing cost of the incremental sort. I'm
sure it's going to be an obstacle there too.

> I saw 2 times difference in real-world application. Again, improving
> sort cost estimation is a separate task.
Sure. But we also need to ask the other question, i.e. how many people
would be negatively affected by the optimization. And I admit I don't
know the answer to that, the next example is entirely made up.

>>
>> Imagine you have a custom data type that is expensive for comparisons.
>> You know that, so you place it at the end of GROUP BY clauses, to
>> reduce the number of comparisons on that field. And then we come along
>> and just reorder the columns, placing it first, because it happens to
>> have a high ndistinct statistic. And there's no way to get the
>> original behavior :-(
> Hm. that it could be, but I don't know how to improve here.  Current
> cost_sort() will return the same cost for any columns order.
>
> Okay, here we know an estimation of nrow, we could somehow find a
> comparing function oid and find a its procost field. And then? we also
> need to know width of tuple here and mostly repeat the cost_sort.
>
> Another option is an introducing enable_groupby_reorder GUC variable
> although personally I don't like an idea to add new GUC variable.
>

I don't like the idea of yet another GUC either, as it's rather blunt
instrument. Which is why I'm suggesting to split the patch into two
parts. Then we can apply the index stuff (which seems relatively
straightforward) and keep working on this second part.

FWIW I think it would be useful to have "development GUC" that would
allow us to enable/disable these options during development, because
that makes experiments much easier. But then remove them before commit.

>>> Agree, but I don't know how to use it here. Except, may be:
>>> 1) first column - the column with bigger estimated number of groups
>>> 2) second column - the pair of (first, second) with bigger estimated
>>> number of groups
>>> 3) third column - the triple of (first, second, third) with bigger ...
>>>
>>> But seems even with that algorithm, ISTM, it could be implemented in
>>> cheaper manner.
>>>
>>
>> Maybe. I do have some ideas, although I haven't tried implementing it.
> Implemented, pls, have a look.
>
>>
>> If there's no extended statistic on the columns, you can do the
>> current thing (assuming independence etc.). There's not much we can do
>> here.
> Agree
>
>>
>> If there's an extended statistic, you can do either a greedy search
>> (get the next column with the highest ndistinct coefficient) or
>> exhaustive search (computing the estimated number of comparisons).
>>
>> Another challenge is that using only the ndistinct coefficient assumes
>> uniform distribution of the values. But you can have a column with 1M
>> distinct values, where a single value represents 99% of the rows. And
>> another column with 100k distinct values, with actual uniform
>> distribution. I'm pretty sure it'd be more efficient to place the 100k
>> column first.
>
> Interesting.  Will think, thank you
>

You're welcome.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Claudio Freire
On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
<[hidden email]> wrote:

>
> >>>> For example, it seems to disregard that different data types have
> >>>> different comparison costs. For example comparing bytea will be far
> >>>> more expensive compared to int4, so it may be much more efficient to
> >>>> compare int4 columns first, even if there are far fewer distinct
> >>>> values in them.
> >>> as I can see cost_sort() doesn't pay attention to this details. And
> >>> it should be a separated patch to improve that.
> >>>
> >>
> >> But sort also does not reorder columns.
> > Yes. But estimation of cost of comparing function/number of unique
> > values in column could be not very accurate and so planner could make
> > a wrong choice.

...

>
> >> Imagine you have a custom data type that is expensive for comparisons.
> >> You know that, so you place it at the end of GROUP BY clauses, to
> >> reduce the number of comparisons on that field. And then we come along
> >> and just reorder the columns, placing it first, because it happens to
> >> have a high ndistinct statistic. And there's no way to get the
> >> original behavior :-(
> > Hm. that it could be, but I don't know how to improve here.  Current
> > cost_sort() will return the same cost for any columns order.
> >
> > Okay, here we know an estimation of nrow, we could somehow find a
> > comparing function oid and find a its procost field. And then? we also
> > need to know width of tuple here and mostly repeat the cost_sort.
> >
> > Another option is an introducing enable_groupby_reorder GUC variable
> > although personally I don't like an idea to add new GUC variable.
> >
>
> I don't like the idea of yet another GUC either, as it's rather blunt
> instrument. Which is why I'm suggesting to split the patch into two
> parts. Then we can apply the index stuff (which seems relatively
> straightforward) and keep working on this second part.
>
> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.

Correct me if I'm wrong, but doesn't this patch concern itself with
precisely sort performance?

As such, estimating sort performance correctly in the various plan
variants being considered seems to be a very central aspect of it.

This means it's pretty much time to consider the effect of operator
cost *and* distinct values in the cost computation.

Assuming cost_sort does its thing, it's just a matter of building the
desired variants and picking the one with the smallest cost_sort. One
can use heuristics to build a subset of all possible column orderings
with a guiding principle that tries to maximize the likelihook of
finding a better order than the one in the query (like sorting by
ndistinct), but I'd suggest:

- Always considering the sort order verbatim as given in the SQL (ie:
what the user requests)
- Relying on cost_sort to distinguish among them, and pick a winner, and
- When in doubt (if cost_sort doesn't produce a clear winner), use the
order given by the user

Comparison cost can be approximated probabilistically as:

cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))

Where cost_op_n is the cost of the comparison function for column N,
and ndistinct(col_1_to_n) is an estimation of the number of distinct
values for columns prior to N in the sort order.

You can approximate ndistinct as the product of ndistinct of previous
columns, or use extended statistics when available.

I think that should give a good enough approximation of
ndistinct-enriched sort costs that considers operator cost
appropriately. That operator cost is basically an average and can be
used as a constant, so it's just a matter of passing the right
comparison_cost to cost_sort.

Even if ndistinct estimates are off, the fact that operator cost is
involved should be a good enough tool for the user should the planner
pick the wrong path - it's only a matter of boosting operator cost to
let the planner know that sorting that way is expensive.

Priorization of the user-provided order can be as simple as giving
that comparison_cost a small handicap.

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4
On 06/06/2018 11:22 PM, Claudio Freire wrote:

> On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
> <[hidden email]> wrote:
>>
>>>>>> For example, it seems to disregard that different data types have
>>>>>> different comparison costs. For example comparing bytea will be far
>>>>>> more expensive compared to int4, so it may be much more efficient to
>>>>>> compare int4 columns first, even if there are far fewer distinct
>>>>>> values in them.
>>>>> as I can see cost_sort() doesn't pay attention to this details. And
>>>>> it should be a separated patch to improve that.
>>>>>
>>>>
>>>> But sort also does not reorder columns.
>>> Yes. But estimation of cost of comparing function/number of unique
>>> values in column could be not very accurate and so planner could make
>>> a wrong choice.
>
> ...
>>
>>>> Imagine you have a custom data type that is expensive for comparisons.
>>>> You know that, so you place it at the end of GROUP BY clauses, to
>>>> reduce the number of comparisons on that field. And then we come along
>>>> and just reorder the columns, placing it first, because it happens to
>>>> have a high ndistinct statistic. And there's no way to get the
>>>> original behavior :-(
>>> Hm. that it could be, but I don't know how to improve here.  Current
>>> cost_sort() will return the same cost for any columns order.
>>>
>>> Okay, here we know an estimation of nrow, we could somehow find a
>>> comparing function oid and find a its procost field. And then? we also
>>> need to know width of tuple here and mostly repeat the cost_sort.
>>>
>>> Another option is an introducing enable_groupby_reorder GUC variable
>>> although personally I don't like an idea to add new GUC variable.
>>>
>>
>> I don't like the idea of yet another GUC either, as it's rather blunt
>> instrument. Which is why I'm suggesting to split the patch into two
>> parts. Then we can apply the index stuff (which seems relatively
>> straightforward) and keep working on this second part.
>>
>> FWIW I think it would be useful to have "development GUC" that would
>> allow us to enable/disable these options during development, because
>> that makes experiments much easier. But then remove them before commit.
>
> Correct me if I'm wrong, but doesn't this patch concern itself with
> precisely sort performance?
>

Yes, the second part (reordering pathkeys by ndistinct) does.

> As such, estimating sort performance correctly in the various plan
> variants being considered seems to be a very central aspect of it.
>
> This means it's pretty much time to consider the effect of operator
> cost *and* distinct values in the cost computation.
>

Yes, until now that was not really needed because the optimizer does not
consider reordering of the pathkeys - it simply grabs either GROUP BY or
ORDER BY pathkeys and runs with that.

So the costing was fairly trivial, we simply do something like

    comparison_cost = 2.0 * cpu_operator_cost;

    sort_cost = comparison_cost * tuples * LOG2(tuples);

which essentially ignores that there might be multiple columns, or that
the columns may have sort operator with different costs.

The question is how reliable the heuristics can be. The current patch
uses just plain ndistinct, but that seems rather unreliable but I don't
have a clear idea how to improve that - we may have MCV for the columns
and perhaps some extended statistics, but I'm not sure how far we can
run with that.

Essentially what we need to estimate the number of comparisons for each
column, to compute better comparison_cost.

>
> Assuming cost_sort does its thing, it's just a matter of building the
> desired variants and picking the one with the smallest cost_sort. One
> can use heuristics to build a subset of all possible column orderings
> with a guiding principle that tries to maximize the likelihook of
> finding a better order than the one in the query (like sorting by
> ndistinct), but I'd suggest:
>
> - Always considering the sort order verbatim as given in the SQL (ie:
> what the user requests)
> - Relying on cost_sort to distinguish among them, and pick a winner, and
> - When in doubt (if cost_sort doesn't produce a clear winner), use the
> order given by the user
>

I don't see why not to generate all possible orderings (keeping only the
cheapest path for each pathkey variant) and let the optimizer to sort it
out. If the user specified an explicit ORDER BY, we'll slap an extra
Sort by at the end, increasing the cost.

> Comparison cost can be approximated probabilistically as:
>
> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
>
> Where cost_op_n is the cost of the comparison function for column N,
> and ndistinct(col_1_to_n) is an estimation of the number of distinct
> values for columns prior to N in the sort order.
>
> You can approximate ndistinct as the product of ndistinct of previous
> columns, or use extended statistics when available.
>

Sure. The trouble is this also assumes uniform distribution, which is
not always the case.

> I think that should give a good enough approximation of
> ndistinct-enriched sort costs that considers operator cost
> appropriately. That operator cost is basically an average and can be
> used as a constant, so it's just a matter of passing the right
> comparison_cost to cost_sort.
>
> Even if ndistinct estimates are off, the fact that operator cost is
> involved should be a good enough tool for the user should the
> planner pick the wrong path - it's only a matter of boosting operator
> cost to let the planner know that sorting that way is expensive.
>

There are far to many "should" in these two paragraphs.

> Priorization of the user-provided order can be as simple as giving
> that comparison_cost a small handicap.

I see no point in doing that, and I don't recall a single place in the
planner where we do that. If the user specified ORDER BY, we'll slap an
explicit Sort on top when needed (which acts as the handicap, but in a
clear way). Otherwise we don't do such things - it'd be just plain
confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
types, ndistinct etc. but unexpectedly different costs). Also, what
would be a good value for the handicap?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Claudio Freire
On Wed, Jun 6, 2018 at 6:57 PM Tomas Vondra
<[hidden email]> wrote:

>
> On 06/06/2018 11:22 PM, Claudio Freire wrote:
> > On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
> > As such, estimating sort performance correctly in the various plan
> > variants being considered seems to be a very central aspect of it.
> >
> > This means it's pretty much time to consider the effect of operator
> > cost *and* distinct values in the cost computation.
> >
>
> Yes, until now that was not really needed because the optimizer does not
> consider reordering of the pathkeys - it simply grabs either GROUP BY or
> ORDER BY pathkeys and runs with that.
>
> So the costing was fairly trivial, we simply do something like
>
>     comparison_cost = 2.0 * cpu_operator_cost;
>
>     sort_cost = comparison_cost * tuples * LOG2(tuples);
>
> which essentially ignores that there might be multiple columns, or that
> the columns may have sort operator with different costs.
>
> The question is how reliable the heuristics can be. The current patch
> uses just plain ndistinct, but that seems rather unreliable but I don't
> have a clear idea how to improve that - we may have MCV for the columns
> and perhaps some extended statistics, but I'm not sure how far we can
> run with that.
>
> Essentially what we need to estimate the number of comparisons for each
> column, to compute better comparison_cost.

This could be approached by adjusting statistics by any relevant
restrictions applicable to the columns being grouped, as is done for
selectivity estimations.

I'm not sure how far that would get us, though. It would be rather
easy to lose sight of those restrictions if there are complex
operations involved.

> > Assuming cost_sort does its thing, it's just a matter of building the
> > desired variants and picking the one with the smallest cost_sort. One
> > can use heuristics to build a subset of all possible column orderings
> > with a guiding principle that tries to maximize the likelihook of
> > finding a better order than the one in the query (like sorting by
> > ndistinct), but I'd suggest:
> >
> > - Always considering the sort order verbatim as given in the SQL (ie:
> > what the user requests)
> > - Relying on cost_sort to distinguish among them, and pick a winner, and
> > - When in doubt (if cost_sort doesn't produce a clear winner), use the
> > order given by the user
> >
>
> I don't see why not to generate all possible orderings (keeping only the
> cheapest path for each pathkey variant) and let the optimizer to sort it
> out.

I'm assuming costing the full N! possible orderings would be
prohibitively expensive.

> If the user specified an explicit ORDER BY, we'll slap an extra
> Sort by at the end, increasing the cost.

I think you misunderstood me. I'm saying the exact opposite.

When I mention handicap, I mean to give the explicitly requested group
by order a *reduced* cost, to give it an advantage over the
heuristics.

This is to try to attack the problem you mentioned where users already
accounting for operator costs in their queries would get overridden by
the planner, perhaps in detriment of overall execution time.

In essence:

- If the user requested that order, we assume it "somewhat
subjectively better" (by giving it a slightly reduced cost).
- If there is an explicit order by clause that differs from a
considered path, the required sort node will already penalize it
appropriately, nothing needs to be done in relation to sort costs.
- All other things being equal, cost_sort will make the decision. If a
plan beats the user-provided order in spite of the handicap, it will
be used. So it's still optimizing clear cases.

>
> > Comparison cost can be approximated probabilistically as:
> >
> > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> >
> > Where cost_op_n is the cost of the comparison function for column N,
> > and ndistinct(col_1_to_n) is an estimation of the number of distinct
> > values for columns prior to N in the sort order.
> >
> > You can approximate ndistinct as the product of ndistinct of previous
> > columns, or use extended statistics when available.
> >
>
> Sure. The trouble is this also assumes uniform distribution, which is
> not always the case.

Well, (1.0 / ndistinct) = p(left == right).

If you can compute a better p(left == right) with an MCV, you can get
a better estimate. If you have an MCV. But that'd be a bit expensive,
and I'm not sure it's all that relevant.

To what degree does improving this produce better plans? As long as
average expected cost is relatively well estimated (as in, one sort
order relative to another sort order), it won't affect the decision.

But if needed, the MCV can be used for this.

So, in essence, you need to account for:

- restrictions on that column that constrain the domain
- distribution skew (MCV, nulls)
- ndistinct

To compute p(left == right).

Maybe something as simple as the following?

p_special = sum(mcv_i * mcv_i) + null_frac * null_frac
p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr)

> > I think that should give a good enough approximation of
> > ndistinct-enriched sort costs that considers operator cost
> > appropriately. That operator cost is basically an average and can be
> > used as a constant, so it's just a matter of passing the right
> > comparison_cost to cost_sort.
> >
> > Even if ndistinct estimates are off, the fact that operator cost is
> > involved should be a good enough tool for the user should the
> > planner pick the wrong path - it's only a matter of boosting operator
> > cost to let the planner know that sorting that way is expensive.
> >
>
> There are far to many "should" in these two paragraphs.

Aren't all planner decisions riddled with "should"s?

Anyway, my point is that, since the user can control operator cost,
and operator cost is involved in the computation, and the ordering
given by the user is explicitly tested for and given a slight
advantage, it should (will?) be easy for the user to overcome any poor
planner decisions by giving the planner the relevant information.

> > Priorization of the user-provided order can be as simple as giving
> > that comparison_cost a small handicap.
>
> I see no point in doing that, and I don't recall a single place in the
> planner where we do that. If the user specified ORDER BY, we'll slap an
> explicit Sort on top when needed (which acts as the handicap, but in a
> clear way). Otherwise we don't do such things - it'd be just plain
> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
> types, ndistinct etc. but unexpectedly different costs). Also, what
> would be a good value for the handicap?

As I said, I'm not talking about explicit order by, but about the case
where the user knows which group by order is the most efficient
somehow.

I'm proposing the planner shouldn't override the user without clear
evidence that the other plan is actually better, above the noise
better. That's what I mean by handicap.

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Claudio Freire
On Wed, Jun 6, 2018 at 7:18 PM Claudio Freire <[hidden email]> wrote:

> > > Comparison cost can be approximated probabilistically as:
> > >
> > > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> > >
> > > Where cost_op_n is the cost of the comparison function for column N,
> > > and ndistinct(col_1_to_n) is an estimation of the number of distinct
> > > values for columns prior to N in the sort order.
> > >
> > > You can approximate ndistinct as the product of ndistinct of previous
> > > columns, or use extended statistics when available.
> > >
> >
> > Sure. The trouble is this also assumes uniform distribution, which is
> > not always the case.
>
> Well, (1.0 / ndistinct) = p(left == right).
>
> If you can compute a better p(left == right) with an MCV, you can get
> a better estimate. If you have an MCV. But that'd be a bit expensive,
> and I'm not sure it's all that relevant.
>
> To what degree does improving this produce better plans? As long as
> average expected cost is relatively well estimated (as in, one sort
> order relative to another sort order), it won't affect the decision.
>
> But if needed, the MCV can be used for this.
>
> So, in essence, you need to account for:
>
> - restrictions on that column that constrain the domain
> - distribution skew (MCV, nulls)
> - ndistinct
>
> To compute p(left == right).
>
> Maybe something as simple as the following?
>
> p_special = sum(mcv_i * mcv_i) + null_frac * null_frac
> p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr)

Well, that came out with a slew of math errors, but the idea is clear:
compute p(left == right) given the available statistics and
constrained by any applicable restrictions.

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4
In reply to this post by Claudio Freire

On 06/07/2018 12:18 AM, Claudio Freire wrote:

> On Wed, Jun 6, 2018 at 6:57 PM Tomas Vondra
> <[hidden email]> wrote:
>>
>> On 06/06/2018 11:22 PM, Claudio Freire wrote:
>>> On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
>>> As such, estimating sort performance correctly in the various plan
>>> variants being considered seems to be a very central aspect of it.
>>>
>>> This means it's pretty much time to consider the effect of operator
>>> cost *and* distinct values in the cost computation.
>>>
>>
>> Yes, until now that was not really needed because the optimizer does not
>> consider reordering of the pathkeys - it simply grabs either GROUP BY or
>> ORDER BY pathkeys and runs with that.
>>
>> So the costing was fairly trivial, we simply do something like
>>
>>     comparison_cost = 2.0 * cpu_operator_cost;
>>
>>     sort_cost = comparison_cost * tuples * LOG2(tuples);
>>
>> which essentially ignores that there might be multiple columns, or that
>> the columns may have sort operator with different costs.
>>
>> The question is how reliable the heuristics can be. The current patch
>> uses just plain ndistinct, but that seems rather unreliable but I don't
>> have a clear idea how to improve that - we may have MCV for the columns
>> and perhaps some extended statistics, but I'm not sure how far we can
>> run with that.
>>
>> Essentially what we need to estimate the number of comparisons for
>> each column, to compute better comparison_cost.
>
> This could be approached by adjusting statistics by any relevant
> restrictions applicable to the columns being grouped, as is done for
> selectivity estimations.
>

I don't follow how is this related to restrictions? Can you elaborate?

> I'm not sure how far that would get us, though. It would be rather
> easy to lose sight of those restrictions if there are complex
> operations involved.
>
>>> Assuming cost_sort does its thing, it's just a matter of building the
>>> desired variants and picking the one with the smallest cost_sort. One
>>> can use heuristics to build a subset of all possible column orderings
>>> with a guiding principle that tries to maximize the likelihook of
>>> finding a better order than the one in the query (like sorting by
>>> ndistinct), but I'd suggest:
>>>
>>> - Always considering the sort order verbatim as given in the SQL (ie:
>>> what the user requests)
>>> - Relying on cost_sort to distinguish among them, and pick a winner, and
>>> - When in doubt (if cost_sort doesn't produce a clear winner), use the
>>> order given by the user
>>>
>>
>> I don't see why not to generate all possible orderings (keeping only the
>> cheapest path for each pathkey variant) and let the optimizer to sort it
>> out.
>
> I'm assuming costing the full N! possible orderings would be
> prohibitively expensive.
>

That's possible, yes - particularly for large N values. Perhaps there's
a simpler algorithm to compute a sufficient approximation. In a greedy
way, starting from some likely-good combination of columns, or so. I
haven't thought about this part very much ...

>> If the user specified an explicit ORDER BY, we'll slap an extra
>> Sort by at the end, increasing the cost.
>
> I think you misunderstood me. I'm saying the exact opposite.
>
> When I mention handicap, I mean to give the explicitly requested group
> by order a *reduced* cost, to give it an advantage over the
> heuristics.
>

I did understand you. I'm saying that if the ordering does not match the
one requested by the user (in ORDER BY), we end up adding an explicit
Sort node on top of it, which increases the cost of all the other paths,
acting as a disadvantage.

But now I realize this only works for when there is an ORDER BY clause,
not for GROUP BY (in which case we don't add any Sort node).

> This is to try to attack the problem you mentioned where users already
> accounting for operator costs in their queries would get overridden by
> the planner, perhaps in detriment of overall execution time.
>
> In essence:
>
> - If the user requested that order, we assume it "somewhat
> subjectively better" (by giving it a slightly reduced cost).
> - If there is an explicit order by clause that differs from a
> considered path, the required sort node will already penalize it
> appropriately, nothing needs to be done in relation to sort costs.
> - All other things being equal, cost_sort will make the decision. If a
> plan beats the user-provided order in spite of the handicap, it will
> be used. So it's still optimizing clear cases.
>

OK. I still don't believe this is a good approach, because we don't know
how good our estimate of the costs is, so I have no idea how good large
the handicap needs to be.

>>
>>> Comparison cost can be approximated probabilistically as:
>>>
>>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
>>>
>>> Where cost_op_n is the cost of the comparison function for column N,
>>> and ndistinct(col_1_to_n) is an estimation of the number of distinct
>>> values for columns prior to N in the sort order.
>>>
>>> You can approximate ndistinct as the product of ndistinct of previous
>>> columns, or use extended statistics when available.
>>>
>>
>> Sure. The trouble is this also assumes uniform distribution, which is
>> not always the case.
>
> Well, (1.0 / ndistinct) = p(left == right).
>
> If you can compute a better p(left == right) with an MCV, you can get
> a better estimate. If you have an MCV. But that'd be a bit expensive,
> and I'm not sure it's all that relevant.
>
> To what degree does improving this produce better plans? As long as
> average expected cost is relatively well estimated (as in, one sort
> order relative to another sort order), it won't affect the decision.
>

I'd bet I can construct corner-case examples with vastly different
behavior depending on column data distributions, so it's not entirely
insignificant. How important is it in practice I don't know.

> But if needed, the MCV can be used for this.
>
> So, in essence, you need to account for:
>
> - restrictions on that column that constrain the domain
> - distribution skew (MCV, nulls)
> - ndistinct
>
> To compute p(left == right).
>
> Maybe something as simple as the following?
>
> p_special = sum(mcv_i * mcv_i) + null_frac * null_frac
> p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr)
>
>>> I think that should give a good enough approximation of
>>> ndistinct-enriched sort costs that considers operator cost
>>> appropriately. That operator cost is basically an average and can be
>>> used as a constant, so it's just a matter of passing the right
>>> comparison_cost to cost_sort.
>>>
>>> Even if ndistinct estimates are off, the fact that operator cost is
>>> involved should be a good enough tool for the user should the
>>> planner pick the wrong path - it's only a matter of boosting operator
>>> cost to let the planner know that sorting that way is expensive.
>>>
>>
>> There are far to many "should" in these two paragraphs.
>
> Aren't all planner decisions riddled with "should"s?
>
> Anyway, my point is that, since the user can control operator cost,
> and operator cost is involved in the computation, and the ordering
> given by the user is explicitly tested for and given a slight
> advantage, it should (will?) be easy for the user to overcome any poor
> planner decisions by giving the planner the relevant information.
>

I really don't want to force users to tweak operator cost in order to
tune queries. It's about an order of magnitude less convenient than a
GUC, for various reasons.

>>> Priorization of the user-provided order can be as simple as giving
>>> that comparison_cost a small handicap.
>>
>> I see no point in doing that, and I don't recall a single place in the
>> planner where we do that. If the user specified ORDER BY, we'll slap an
>> explicit Sort on top when needed (which acts as the handicap, but in a
>> clear way). Otherwise we don't do such things - it'd be just plain
>> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
>> types, ndistinct etc. but unexpectedly different costs). Also, what
>> would be a good value for the handicap?
>
> As I said, I'm not talking about explicit order by, but about the case
> where the user knows which group by order is the most efficient
> somehow.
>
> I'm proposing the planner shouldn't override the user without clear
> evidence that the other plan is actually better, above the noise
> better. That's what I mean by handicap.
>

OK, gotcha. But I'm really unsure how large the handicap should be.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Claudio Freire
On Wed, Jun 6, 2018 at 8:06 PM Tomas Vondra
<[hidden email]> wrote:

> >>> Comparison cost can be approximated probabilistically as:
> >>>
> >>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> >>>
> >>> Where cost_op_n is the cost of the comparison function for column N,
> >>> and ndistinct(col_1_to_n) is an estimation of the number of distinct
> >>> values for columns prior to N in the sort order.
> >>>
> >>> You can approximate ndistinct as the product of ndistinct of previous
> >>> columns, or use extended statistics when available.
> >>>
> >>
> >> Sure. The trouble is this also assumes uniform distribution, which is
> >> not always the case.
> >
> > Well, (1.0 / ndistinct) = p(left == right).
> >
> > If you can compute a better p(left == right) with an MCV, you can get
> > a better estimate. If you have an MCV. But that'd be a bit expensive,
> > and I'm not sure it's all that relevant.
> >
> > To what degree does improving this produce better plans? As long as
> > average expected cost is relatively well estimated (as in, one sort
> > order relative to another sort order), it won't affect the decision.
> >
>
> I'd bet I can construct corner-case examples with vastly different
> behavior depending on column data distributions, so it's not entirely
> insignificant. How important is it in practice I don't know.

I guess that can only be answered by building that solution and
testing it against the corner cases.

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
In reply to this post by Tomas Vondra-4


> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
>
> Can you give an example demonstrating the benefit?


>
> SELECT a, SUM(x) FROM
> (
>      SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b
>      UNION ALL
>      SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b
> ) foo GROUP BY a;
>
> and indexes on (a,b) and (b,a) for both relations. The "deduplication"
> by pathkeys I suggested would mean we might keep only index (a,b) on t1
> and (b,a) on t2, which means the grouping by "a" can't leverage index
> scans on both relations. But if we keep paths for both indexes on each
> relation, we can.
yes, one of option

> Isn't "estimation of cost of comparing function/number of unique values
> in column could be not very accurate and so planner could make a wrong
> choice" is more an argument against relying on it when doing these
> optimizations?
>
> FWIW it's one of the arguments Tom made in the incremental sort patch,
> which relies on it too when computing cost of the incremental sort. I'm
> sure it's going to be an obstacle there too. >
>> I saw 2 times difference in real-world application. Again, improving
>> sort cost estimation is a separate task.
> Sure. But we also need to ask the other question, i.e. how many people
> would be negatively affected by the optimization. And I admit I don't
> know the answer to that, the next example is entirely made up.
Hm, seems, the best way here is a improving cost_sort estimation. Will try, but
I think that is separated patch


> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.

Will do

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
In reply to this post by Tomas Vondra-4
> So the costing was fairly trivial, we simply do something like
>
>      comparison_cost = 2.0 * cpu_operator_cost;
>
>      sort_cost = comparison_cost * tuples * LOG2(tuples);
>
> which essentially ignores that there might be multiple columns, or that
> the columns may have sort operator with different costs.
Agree. And distribution of keys.
>
> The question is how reliable the heuristics can be. The current patch
> uses just plain ndistinct, but that seems rather unreliable but I don't
> have a clear idea how to improve that - we may have MCV for the columns
> and perhaps some extended statistics, but I'm not sure how far we can
> run with that.
v8 already uses another algorithm.

>
> Essentially what we need to estimate the number of comparisons for each
> column, to compute better comparison_cost.
Exactly

>> Priorization of the user-provided order can be as simple as giving
>> that comparison_cost a small handicap.
>
> I see no point in doing that, and I don't recall a single place in the
> planner where we do that. If the user specified ORDER BY, we'll slap an
> explicit Sort on top when needed (which acts as the handicap, but in a
> clear way). Otherwise we don't do such things - it'd be just plain
> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
> types, ndistinct etc. but unexpectedly different costs). Also, what
> would be a good value for the handicap?

Again agree. If we have fixed order of columns (ORDER BY) then we should not try
to reorder it. Current patch follows that if I didn't a mistake.

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
In reply to this post by Claudio Freire
>> I don't see why not to generate all possible orderings (keeping only the
>> cheapest path for each pathkey variant) and let the optimizer to sort it
>> out.
>
> I'm assuming costing the full N! possible orderings would be
> prohibitively expensive.

That's true, but for the first step we need to improve cost_sort and only then
try to find the best pathkeys order by optimal way.

> - If the user requested that order, we assume it "somewhat
> subjectively better" (by giving it a slightly reduced cost).
I don't think so. It's not a SQL way - DB should define the optimal way to
execute query.


--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4
In reply to this post by Teodor Sigaev
On 06/07/2018 03:41 PM, Teodor Sigaev wrote:
 >>
>> ... snip ...
 >>

>>> Priorization of the user-provided order can be as simple as giving
>>> that comparison_cost a small handicap.
>>
>> I see no point in doing that, and I don't recall a single place in the
>> planner where we do that. If the user specified ORDER BY, we'll slap an
>> explicit Sort on top when needed (which acts as the handicap, but in a
>> clear way). Otherwise we don't do such things - it'd be just plain
>> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
>> types, ndistinct etc. but unexpectedly different costs). Also, what
>> would be a good value for the handicap?
>
> Again agree. If we have fixed order of columns (ORDER BY) then we should
> not try to reorder it. Current patch follows that if I didn't a mistake.
>

This part seems to be more a misunderstanding between me and Claudio. I
believe Claudio was referring to the column order in a GROUP BY, not
ORDER BY. In which case we don't add any Sort, of course.

I'm still opposed to adding arbitrary handicap to prioritize the order
specified by user, for the reasons I explained before. We should aim to
make the heuristics/costing reliable enough to make this unnecessary.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
>> Again agree. If we have fixed order of columns (ORDER BY) then we should not
>> try to reorder it. Current patch follows that if I didn't a mistake.
>>
>
> This part seems to be more a misunderstanding between me and Claudio. I believe
> Claudio was referring to the column order in a GROUP BY, not ORDER BY. In which
> case we don't add any Sort, of course.
I hope so

>
> I'm still opposed to adding arbitrary handicap to prioritize the order specified
> by user, for the reasons I explained before. We should aim to make the
> heuristics/costing reliable enough to make this unnecessary.
+1

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Teodor Sigaev
In reply to this post by Tomas Vondra-4
>> Yes. But again, this description is a bit short. First it works after

>> first patch and might get some preordered leading pathkeys. Second, it
>> tries to match ORDER BY clause order if there is no preordered leading
>> pathkeys from first patch (it was introduced in v7). And third, if there
>> is a tail of unmatched pathkeys on previous stages then it will reorder
>> that tail.
>>
>
> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
>
> Can you give an example demonstrating the benefit?
See tst.sql. queries are marked with opt (optimization is on) and noopt.

Query 1: select count(*) from btg group by v, r;
Query 2: select count(*) from btg group by n, v, r order by n;

For both queries it's possible to reorder v and r column, n column has the
single distinct value.

On my laptop:
Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
       2opt vs 2noopt: 5800.307 ms vs 7486.967 ms

So, what we see:
1) for query 1 optimization gives 2 times better performance, for query 2 only
30%. if column 'n' will be unique then time for query 1 and 2 should be the
same. We could add check for preordered pathkeys in
get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys)
is close to 1 then we could do not reordering of tail of pathkeys.

2) Planing cost is the same for all queries. So, cost_sort() doesn't take into
account even number of columns.

> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.
Added, v9, debug_enable_group_by_match_order_by and
debug_enable_cheapest_group_by. I also checked compatibility with incremental
sort patch, and all works except small merge conflict which could be resolved
right before committing.

Next, I had a look on cost_incremental_sort() provided by incremental sort patch
and, it's a pity, it doesn't solve our problem with the impact of the cost of
per-column comparison function and number of its calls.


--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

0001-opt_group_by_index-v9.patch (17K) Download Attachment
0002-opt_group_by_index_and_order-v9.patch (17K) Download Attachment
tst.sql (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: GROUP BY optimization

Tomas Vondra-4


On 06/07/2018 06:22 PM, Teodor Sigaev wrote:

>>> Yes. But again, this description is a bit short. First it works after
>>> first patch and might get some preordered leading pathkeys. Second, it
>>> tries to match ORDER BY clause order if there is no preordered leading
>>> pathkeys from first patch (it was introduced in v7). And third, if there
>>> is a tail of unmatched pathkeys on previous stages then it will reorder
>>> that tail.
>>>
>>
>> OK, I haven't looked at v7 yet, but if I understand correctly it tries
>> to maintain the ordering as much as possible? Does that actually help? I
>> mean, the incremental sort patch allows the sorting to happen by pieces,
>> but here we still need to sort all the data, right?
>>
>> Can you give an example demonstrating the benefit?
>
> See tst.sql. queries are marked with opt (optimization is on) and noopt.
>
> Query 1: select count(*) from btg group by v, r;
> Query 2: select count(*) from btg group by n, v, r order by n;
>
> For both queries it's possible to reorder v and r column, n column has
> the single distinct value.
>
> On my laptop:
> Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
>       2opt vs 2noopt: 5800.307 ms vs 7486.967 ms
>
> So, what we see:
> 1) for query 1 optimization gives 2 times better performance, for query
> 2 only 30%. if column 'n' will be unique then time for query 1 and 2
> should be the same. We could add check for preordered pathkeys in
> get_cheapest_group_keys_order() and if estimate_num_groups(reordered
> pathkeys) is close to 1 then we could do not reordering of tail of
> pathkeys.
>

OK, those are nice improvements, although the examples are somewhat
extreme (only 1 or 2 distinct values). So yes, in some cases this
reordering clearly makes a big difference, but I still think we need to
improve the heuristics to minimize regressions.

I see v9 is now calling estimate_num_groups(), so it already benefits
from extended statistics. Nice! I think the algorithm is more or less
the greedy option I proposed in one of the earlier messages - I don't
know if it's sufficient or not, but the only other idea I have is
essentially an exhaustive search through all possible permutations.

So that's a nice improvement, although I think we should also consider
non-uniform distributions (using the per-column MCV lists).

> 2) Planing cost is the same for all queries. So, cost_sort() doesn't
> take into account even number of columns.
>

Yeah. As I wrote before, I think we'll need to come up with a model for
comparison_cost depending on the column order, which should make costs
for those paths different.

>> FWIW I think it would be useful to have "development GUC" that would
>> allow us to enable/disable these options during development, because
>> that makes experiments much easier. But then remove them before commit.
> Added, v9, debug_enable_group_by_match_order_by and
> debug_enable_cheapest_group_by. I also checked compatibility with
> incremental sort patch, and all works except small merge conflict which
> could be resolved right before committing.
>

OK. I think we should also have a debug GUC for the first patch (i.e.
one that would allow reordering group_pathkeys to match the input path).

Regarding the two GUCs introduced in v9, I'm not sure I 100% understand
what they are doing. Let me try explaining what I think they do:

1) debug_cheapest_group_by - Reorder GROUP BY clauses to using ndistinct
stats for columns, placing columns with more distinct values first (so
that we don't need to compare the following columns as often).

2) debug_group_by_match_order_by - Try to reorder the GROUP BY clauses
to match the ORDER BY, so that the group aggregate produces results in
the desired output (and we don't need an explicit Sort).

FWIW I doubt about the debug_group_by_match_order_by optimization. The
trouble is that it's only based on comparing the pathkeys lists, and
entirely ignores that the two result sets may operate on very different
numbers of rows.

Consider for example "SELECT * FROM t GROUP BY a,b,c,d ORDER BY c,d"
where table "t" has 1B rows, but there are only ~10k rows in the result
(which is fairly common in fact tables in star-schema DBs). IIUC the
optimization will ensure "c,d" is placed first in the GROUP BY, and then
we reorder "a,b" using ndistinct. But this may be much less efficient
than simply reordering (a,b,c,d) per ndistinct and then adding explicit
Sort on top of that (because on 10k rows that's going to be cheap).

So I think this somewhat contradicts the order-by-ndistinct optimization
and may easily undo it's benefits.

The other "issue" I have with get_cheapest_group_keys_order() is how it
interacts with group_keys_reorder_by_pathkeys(). I mean, we first call

  n_preordered = group_keys_reorder_by_pathkeys(path->pathkeys, ...);

so the reordering tries to maintain ordering of the input path. Then if
(n_preordered == 0) we do this:

  group_keys_reorder_by_pathkeys(root->sort_pathkeys, ...);

Doesn't that mean that if we happen to have input path with partially
matching ordering (so still requiring explicit Sort before grouping) we
may end up ignoring the ORDER BY entirely? Instead of using Sort that
would match the ORDER BY? I might have misunderstood this, though.

I'm not sure why the first group_keys_reorder_by_pathkeys() call does
not simply return 0 when the input path ordering is not sufficient for
the grouping? Is there some reasoning behind that (e.g. that sorting
partially sorted is faster)?

Overall I think this patch introduces four different optimizations that
are indeed very interesting:

1) consider additional sorted paths for grouping

2) reorder GROUP BY clauses per ndistinct to minimize comparisons

3) when adding Sort for grouping, maintain partial input ordering

4) when adding Sort for grouping, try producing the right output order
   (if the ORDER BY was specified)

But at this point it's kinda mashed together in ad-hoc manner, using
very simple heuristics / assumptions. We'll need to figure out how to
combine those optimizations.

BTW I get compiler warnings that n_preordered_groups may be used
uninitialized in add_paths_to_grouping_rel, because it's only set in an
if/else branch, and then used further down.

> Next, I had a look on cost_incremental_sort() provided by
> incremental sort patch and, it's a pity, it doesn't solve our problem
> with the impact of the cost of per-column comparison function and
> number of its calls.

I currently doesn't, but that might be more an issue in the incremental
sort patch and we may need to fix that. Clearly the costing in general
in that patch needs more work, and I do recall Tom pointing out that the
current heuristics (estimating number of sort groups using ndistincts)
seems rather weak.

It may not be exactly the same problem, but it seems kinda similar to
what this patch does. I have a hunch that those patches will end up
inventing something fairly similar.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

12