Additional improvements to extended statistics

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Additional improvements to extended statistics

Tomas Vondra-4
Hi,

Now that I've committed [1] which allows us to use multiple extended
statistics per table, I'd like to start a thread discussing a couple of
additional improvements for extended statistics. I've considered
starting a separate patch for each, but that would be messy as those
changes will touch roughly the same places. So I've organized it into a
single patch series, with the simpler parts at the beginning.

There are three main improvements:

1) improve estimates of OR clauses

Until now, OR clauses pretty much ignored extended statistics, based on
the experience that they're less vulnerable to misestimates. But it's a
bit weird that AND clauses are handled while OR clauses are not, so this
extends the logic to OR clauses.

Status: I think this is fairly OK.


2) support estimating clauses (Var op Var)

Currently, we only support clauses with a single Var, i.e. clauses like

   - Var op Const
   - Var IS [NOT] NULL
   - [NOT] Var
   - ...

and AND/OR clauses built from those simple ones. This patch adds support
for clauses of the form (Var op Var), of course assuming both Vars come
from the same relation.

Status: This works, but it feels a bit hackish. Needs more work.


3) support extended statistics on expressions

Currently we only allow simple references to columns in extended stats,
so we can do

    CREATE STATISTICS s ON a, b, c FROM t;

but not

    CREATE STATISTICS s ON (a+b), (c + 1) FROM t;

This patch aims to allow this. At the moment it's a WIP - it does most
of the catalog changes and stats building, but with some hacks/bugs. And
it does not even try to use those statistics during estimation.

The first question is how to extend the current pg_statistic_ext catalog
to support expressions. I've been planning to do it the way we support
expressions for indexes, i.e. have two catalog fields - one for keys,
one for expressions.

One difference is that for statistics we don't care about order of the
keys, so that we don't need to bother with storing 0 keys in place for
expressions - we can simply assume keys are first, then expressions.

And this is what the patch does now.

I'm however wondering whether to keep this split - why not to just treat
everything as expressions, and be done with it? A key just represents a
Var expression, after all. And it would massively simplify a lot of code
that now has to care about both keys and expressions.

Of course, expressions are a bit more expensive, but I wonder how
noticeable that would be.

Opinions?


ragards

[1] https://commitfest.postgresql.org/26/2320/

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-Support-using-extended-stats-for-parts-of-O-20200113.patch (10K) Download Attachment
0002-Support-clauses-of-the-form-Var-op-Var-20200113.patch (6K) Download Attachment
0003-Support-for-extended-statistics-on-expressi-20200113.patch (70K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional improvements to extended statistics

Pavel Stehule


út 14. 1. 2020 v 0:00 odesílatel Tomas Vondra <[hidden email]> napsal:
Hi,

Now that I've committed [1] which allows us to use multiple extended
statistics per table, I'd like to start a thread discussing a couple of
additional improvements for extended statistics. I've considered
starting a separate patch for each, but that would be messy as those
changes will touch roughly the same places. So I've organized it into a
single patch series, with the simpler parts at the beginning.

There are three main improvements:

1) improve estimates of OR clauses

Until now, OR clauses pretty much ignored extended statistics, based on
the experience that they're less vulnerable to misestimates. But it's a
bit weird that AND clauses are handled while OR clauses are not, so this
extends the logic to OR clauses.

Status: I think this is fairly OK.


2) support estimating clauses (Var op Var)

Currently, we only support clauses with a single Var, i.e. clauses like

   - Var op Const
   - Var IS [NOT] NULL
   - [NOT] Var
   - ...

and AND/OR clauses built from those simple ones. This patch adds support
for clauses of the form (Var op Var), of course assuming both Vars come
from the same relation.

Status: This works, but it feels a bit hackish. Needs more work.


3) support extended statistics on expressions

Currently we only allow simple references to columns in extended stats,
so we can do

    CREATE STATISTICS s ON a, b, c FROM t;

but not

    CREATE STATISTICS s ON (a+b), (c + 1) FROM t;

+1 for expression's statisctics - it can be great feature.

Pavel


This patch aims to allow this. At the moment it's a WIP - it does most
of the catalog changes and stats building, but with some hacks/bugs. And
it does not even try to use those statistics during estimation.

The first question is how to extend the current pg_statistic_ext catalog
to support expressions. I've been planning to do it the way we support
expressions for indexes, i.e. have two catalog fields - one for keys,
one for expressions.

One difference is that for statistics we don't care about order of the
keys, so that we don't need to bother with storing 0 keys in place for
expressions - we can simply assume keys are first, then expressions.

And this is what the patch does now.

I'm however wondering whether to keep this split - why not to just treat
everything as expressions, and be done with it? A key just represents a
Var expression, after all. And it would massively simplify a lot of code
that now has to care about both keys and expressions.

Of course, expressions are a bit more expensive, but I wonder how
noticeable that would be.

Opinions?


ragards

[1] https://commitfest.postgresql.org/26/2320/

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services