Fix to not check included columns in ANALYZE on indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Fix to not check included columns in ANALYZE on indexes

Yugo Nagata
Hi,

I found that both key columns and included columns are checked when analyze
is run on indexes. This is almost harmless because non-expression columns
are not processed. However, this check is obviously unnecessary and we
can fix this to not check included columns. If we decide to support expressions
in included columns in future, this must be fixed eventually.

Attached is a patch to fix this.

Regards,

--
Yugo Nagata <[hidden email]>

analyze_only_index_key_attrs.patch (590 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Tom Lane-2
Yugo Nagata <[hidden email]> writes:
> I found that both key columns and included columns are checked when analyze
> is run on indexes. This is almost harmless because non-expression columns
> are not processed. However, this check is obviously unnecessary and we
> can fix this to not check included columns. If we decide to support expressions
> in included columns in future, this must be fixed eventually.

AFAICS, we'd just have to revert this patch later, so I don't see
much value in it.

Also, is it really true that we don't support included expression
columns now?  In what way would that not be a bug?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Andres Freund


On June 28, 2018 4:18:36 PM PDT, Tom Lane <[hidden email]> wrote:

>Yugo Nagata <[hidden email]> writes:
>> I found that both key columns and included columns are checked when
>analyze
>> is run on indexes. This is almost harmless because non-expression
>columns
>> are not processed. However, this check is obviously unnecessary and
>we
>> can fix this to not check included columns. If we decide to support
>expressions
>> in included columns in future, this must be fixed eventually.
>
>AFAICS, we'd just have to revert this patch later, so I don't see
>much value in it.
>
>Also, is it really true that we don't support included expression
>columns now?  In what way would that not be a bug?

I don't think IOS supports expression columns, right?  Away from code for a bit, so can't check.  If indeed true, there'd be little point in allowing it, right?

Andres

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On June 28, 2018 4:18:36 PM PDT, Tom Lane <[hidden email]> wrote:
>> Also, is it really true that we don't support included expression
>> columns now?  In what way would that not be a bug?

> I don't think IOS supports expression columns, right?  Away from code for a bit, so can't check.  If indeed true, there'd be little point in allowing it, right?

The point of ANALYZE on an expression column is that you can direct
ANALYZE to collect stats on that expression.  This is potentially valuable
for rowcount estimation whether or not the planner notices that it can
fetch the expression value from the index, or chooses to do so even if it
did notice.

(In principle, CREATE STATISTICS might someday obsolete this use-case
for expression indexes, but it hasn't done so yet AFAIK.)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Andres Freund


On June 28, 2018 4:28:39 PM PDT, Tom Lane <[hidden email]> wrote:

>Andres Freund <[hidden email]> writes:
>> On June 28, 2018 4:18:36 PM PDT, Tom Lane <[hidden email]> wrote:
>>> Also, is it really true that we don't support included expression
>>> columns now?  In what way would that not be a bug?
>
>> I don't think IOS supports expression columns, right?  Away from code
>for a bit, so can't check.  If indeed true, there'd be little point in
>allowing it, right?
>
>The point of ANALYZE on an expression column is that you can direct
>ANALYZE to collect stats on that expression.  This is potentially
>valuable
>for rowcount estimation whether or not the planner notices that it can
>fetch the expression value from the index, or chooses to do so even if
>it
>did notice.

The whole point of including additional columns in the index is that they allow IOSs. It seems more likely that people will expect that included expressions actually are usable, than enlarging the index just to get a bit better stats.


>(In principle, CREATE STATISTICS might someday obsolete this use-case
>for expression indexes, but it hasn't done so yet AFAIK.)

You mean stats on them, or the feature entirely? If the latter, how?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On June 28, 2018 4:28:39 PM PDT, Tom Lane <[hidden email]> wrote:
>> (In principle, CREATE STATISTICS might someday obsolete this use-case
>> for expression indexes, but it hasn't done so yet AFAIK.)

> You mean stats on them, or the feature entirely? If the latter, how?

No, just the collection-of-stats-on-an-expression part.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Yugo Nagata
In reply to this post by Tom Lane-2
On Thu, 28 Jun 2018 19:18:36 -0400
Tom Lane <[hidden email]> wrote:

> Yugo Nagata <[hidden email]> writes:
> > I found that both key columns and included columns are checked when analyze
> > is run on indexes. This is almost harmless because non-expression columns
> > are not processed. However, this check is obviously unnecessary and we
> > can fix this to not check included columns. If we decide to support expressions
> > in included columns in future, this must be fixed eventually.
>
> AFAICS, we'd just have to revert this patch later, so I don't see
> much value in it.

I'm sorry but I don't understand why we'd just have to revert this patch later.

Do you mean that if we decide to support expressions in included columns in future,
this patch would be reverted? This is wrong. To my understanding, statistics on
included  (= non-key) columns in index is never used by the planner whether this
is expression or not. So, we don't have to examin these columns in ANALYZE.

> Also, is it really true that we don't support included expression
> columns now?  In what way would that not be a bug?

Currently, included expression columns are not supported.

 postgres=# create index on test(i) include ((d+1));
 ERROR:  expressions are not supported in included columns

>
> regards, tom lane
>


--
Yugo Nagata <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Teodor Sigaev
In reply to this post by Tom Lane-2
> AFAICS, we'd just have to revert this patch later, so I don't see
> much value in it.
True, I suppose we should apply this patch just for consistency, because we
don't allow expression in included columns.

>
> Also, is it really true that we don't support included expression
> columns now?  In what way would that not be a bug?
Because index only scan doesn't support expression index and, hence, expressions
in included columns are not useful. It could be changed in future but, suppose,
not in v11.

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Peter Geoghegan-4
In reply to this post by Yugo Nagata
On Fri, Jun 29, 2018 at 1:31 AM, Yugo Nagata <[hidden email]> wrote:
> I'm sorry but I don't understand why we'd just have to revert this patch later.
>
> Do you mean that if we decide to support expressions in included columns in future,
> this patch would be reverted? This is wrong. To my understanding, statistics on
> included  (= non-key) columns in index is never used by the planner whether this
> is expression or not. So, we don't have to examin these columns in ANALYZE.

I think that the argument Tom is making is that it might be useful to
have statistics on the expression regardless of this -- the expression
may be interesting in some general sense. For example, one can imagine
the planner creating a plan with a hash aggregate rather than a group
aggregate, but only when statistics on an expression are available,
somehow. Those statistics may only be available because of the
appearance of the expression as a non-key attribute (in a future world
where expressions as non-key attributes are accepted, and index-only
scans work with expressions).

In the past, we talked about adding a feature that made this happen
for expressions without requiring any index at all, which Tom referred
to in passing. That's why I understand his remarks in this way.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Tom Lane-2
Peter Geoghegan <[hidden email]> writes:
> I think that the argument Tom is making is that it might be useful to
> have statistics on the expression regardless of this -- the expression
> may be interesting in some general sense. For example, one can imagine
> the planner creating a plan with a hash aggregate rather than a group
> aggregate, but only when statistics on an expression are available,
> somehow.

Right.  For instance, "select sum(x) from ... group by y+z" is only
suitable for hash aggregation if we can predict that there's a fairly
small number of distinct values of y+z.  This makes it useful to have
stats on the expression y+z, independently of whether any related index
actually gets used in the plan.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Yugo Nagata
On Sat, 30 Jun 2018 14:13:49 -0400
Tom Lane <[hidden email]> wrote:

> Peter Geoghegan <[hidden email]> writes:
> > I think that the argument Tom is making is that it might be useful to
> > have statistics on the expression regardless of this -- the expression
> > may be interesting in some general sense. For example, one can imagine
> > the planner creating a plan with a hash aggregate rather than a group
> > aggregate, but only when statistics on an expression are available,
> > somehow.
>
> Right.  For instance, "select sum(x) from ... group by y+z" is only
> suitable for hash aggregation if we can predict that there's a fairly
> small number of distinct values of y+z.  This makes it useful to have
> stats on the expression y+z, independently of whether any related index
> actually gets used in the plan.

Thank you for your explanation. I understand the usefulness of the statistics
on non-key expression attributions and that "CREATE INDEX ... INCLUDE" migth be
a means to collect the statistics on "non-key" expressions in future.

>
> regards, tom lane
>


--
Yugo Nagata <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Fix to not check included columns in ANALYZE on indexes

Yugo Nagata
In reply to this post by Teodor Sigaev
On Fri, 29 Jun 2018 17:31:51 +0300
Teodor Sigaev <[hidden email]> wrote:

> > AFAICS, we'd just have to revert this patch later, so I don't see
> > much value in it.
> True, I suppose we should apply this patch just for consistency, because we
> don't allow expression in included columns.

Yes, this is what I intend in my patch, but I don't persist in this if there
is a reason to leave the code as it is, since the current code is alomot harmless.

Thanks,


--
Yugo Nagata <[hidden email]>