Surjective functional indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
99 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Álvaro Herrera
Konstantin Knizhnik wrote:

> If you still thing that additional 16 bytes per relation in statistic is too
> high overhead, then I will also remove autotune.

I think it's pretty clear that these additional bytes are excessive.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Michael Paquier
On Fri, Dec 15, 2017 at 6:15 AM, Alvaro Herrera <[hidden email]> wrote:
> Konstantin Knizhnik wrote:
>> If you still thing that additional 16 bytes per relation in statistic is too
>> high overhead, then I will also remove autotune.
>
> I think it's pretty clear that these additional bytes are excessive.

The bar to add new fields in PgStat_TableCounts in very high, and one
attempt to tackle its scaling problems with many relations is here by
Horiguchi-san:
https://www.postgresql.org/message-id/20171211.201523.24172046.horiguchi.kyotaro@...
His patch may be worth a look if you need more fields for your
feature. So it seems to me that the patch as currently presented has
close to zero chance to be committed as long as you keep your changes
to pgstat.c.
--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 15.12.2017 01:21, Michael Paquier wrote:

> On Fri, Dec 15, 2017 at 6:15 AM, Alvaro Herrera <[hidden email]> wrote:
>> Konstantin Knizhnik wrote:
>>> If you still thing that additional 16 bytes per relation in statistic is too
>>> high overhead, then I will also remove autotune.
>> I think it's pretty clear that these additional bytes are excessive.
> The bar to add new fields in PgStat_TableCounts in very high, and one
> attempt to tackle its scaling problems with many relations is here by
> Horiguchi-san:
> https://www.postgresql.org/message-id/20171211.201523.24172046.horiguchi.kyotaro@...
> His patch may be worth a look if you need more fields for your
> feature. So it seems to me that the patch as currently presented has
> close to zero chance to be committed as long as you keep your changes
> to pgstat.c.

Ok, looks like everybody think that autotune based on statistic is bad idea.
Attached please find patch without autotune.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


projection-6.patch (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Stephen Frost
Greetings,

* Konstantin Knizhnik ([hidden email]) wrote:

> On 15.12.2017 01:21, Michael Paquier wrote:
> >On Fri, Dec 15, 2017 at 6:15 AM, Alvaro Herrera <[hidden email]> wrote:
> >>Konstantin Knizhnik wrote:
> >>>If you still thing that additional 16 bytes per relation in statistic is too
> >>>high overhead, then I will also remove autotune.
> >>I think it's pretty clear that these additional bytes are excessive.
> >The bar to add new fields in PgStat_TableCounts in very high, and one
> >attempt to tackle its scaling problems with many relations is here by
> >Horiguchi-san:
> >https://www.postgresql.org/message-id/20171211.201523.24172046.horiguchi.kyotaro@...
> >His patch may be worth a look if you need more fields for your
> >feature. So it seems to me that the patch as currently presented has
> >close to zero chance to be committed as long as you keep your changes
> >to pgstat.c.
>
> Ok, looks like everybody think that autotune based on statistic is bad idea.
> Attached please find patch without autotune.
This patch appears to apply with a reasonable amount of fuzz, builds,
and passes 'make check', at least, therefore I'm going to mark it
'Needs Review'.

I will note that the documentation doesn't currently build due to this:

/home/sfrost/git/pg/dev/cleanup/doc/src/sgml/ref/create_index.sgml:302: parser error : Opening and ending tag mismatch: literal line 302 and unparseable
    <term><literal>recheck_on_update</></term>

but I don't think it makes sense for that to stand in the way of someone
doing a review of the base patch.  Still, please do fix the
documentation build when you get a chance.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 07.01.2018 01:59, Stephen Frost wrote:

> Greetings,
>
> * Konstantin Knizhnik ([hidden email]) wrote:
>> On 15.12.2017 01:21, Michael Paquier wrote:
>>> On Fri, Dec 15, 2017 at 6:15 AM, Alvaro Herrera <[hidden email]> wrote:
>>>> Konstantin Knizhnik wrote:
>>>>> If you still thing that additional 16 bytes per relation in statistic is too
>>>>> high overhead, then I will also remove autotune.
>>>> I think it's pretty clear that these additional bytes are excessive.
>>> The bar to add new fields in PgStat_TableCounts in very high, and one
>>> attempt to tackle its scaling problems with many relations is here by
>>> Horiguchi-san:
>>> https://www.postgresql.org/message-id/20171211.201523.24172046.horiguchi.kyotaro@...
>>> His patch may be worth a look if you need more fields for your
>>> feature. So it seems to me that the patch as currently presented has
>>> close to zero chance to be committed as long as you keep your changes
>>> to pgstat.c.
>> Ok, looks like everybody think that autotune based on statistic is bad idea.
>> Attached please find patch without autotune.
> This patch appears to apply with a reasonable amount of fuzz, builds,
> and passes 'make check', at least, therefore I'm going to mark it
> 'Needs Review'.
>
> I will note that the documentation doesn't currently build due to this:
>
> /home/sfrost/git/pg/dev/cleanup/doc/src/sgml/ref/create_index.sgml:302: parser error : Opening and ending tag mismatch: literal line 302 and unparseable
>      <term><literal>recheck_on_update</></term>
>
> but I don't think it makes sense for that to stand in the way of someone
> doing a review of the base patch.  Still, please do fix the
> documentation build when you get a chance.
>
> Thanks!
>
> Stephen
Sorry, issue with documentation is fixed.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


projection-7.patch (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
On 10 January 2018 at 09:54, Konstantin Knizhnik
<[hidden email]> wrote:

> Sorry, issue with documentation is fixed.

OK, thanks.

Patch appears to work cleanly now.

I'm wondering now about automatically inferring "recheck_on_update =
true" for certain common datatype/operators. It doesn't need to be an
exhaustive list, but it would be useful if we detected the main use
case of

(JSONB datatype column)->>CONSTANT

Seems like we could do a test to see if the index function is
FUNCTION(COLUMNNAME, CONSTANTs...)
{JSONB, ->>} or
{jsonb_object_field_text(Columnname, Constant)}
{substring(Columname, Constants...)}

It would be a shame if people had to remember to use this for the
common and obvious cases.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 18.01.2018 11:38, Simon Riggs wrote:

> On 10 January 2018 at 09:54, Konstantin Knizhnik
> <[hidden email]> wrote:
>
>> Sorry, issue with documentation is fixed.
> OK, thanks.
>
> Patch appears to work cleanly now.
>
> I'm wondering now about automatically inferring "recheck_on_update =
> true" for certain common datatype/operators. It doesn't need to be an
> exhaustive list, but it would be useful if we detected the main use
> case of
>
> (JSONB datatype column)->>CONSTANT
>
> Seems like we could do a test to see if the index function is
> FUNCTION(COLUMNNAME, CONSTANTs...)
> {JSONB, ->>} or
> {jsonb_object_field_text(Columnname, Constant)}
> {substring(Columname, Constants...)}
>
> It would be a shame if people had to remember to use this for the
> common and obvious cases.
>
Right now by default index is considered as projective. So even if you
do not specify "recheck_on_update" option, then recheck will be done.
This decision is based on the assumption that most of functional indexes
are actually projective and looks likes (JSONB datatype column)->>CONSTANT.
So do you think that this assumption is not correct and we should switch
disable recheck_on_update by default?
If not, then there is an opposite challenge: find out class of functions
which definitely are not projective and recheck on them will have no sense.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
On 18 January 2018 at 08:59, Konstantin Knizhnik
<[hidden email]> wrote:

>
>
> On 18.01.2018 11:38, Simon Riggs wrote:
>>
>> On 10 January 2018 at 09:54, Konstantin Knizhnik
>> <[hidden email]> wrote:
>>
>>> Sorry, issue with documentation is fixed.
>>
>> OK, thanks.
>>
>> Patch appears to work cleanly now.
>>
>> I'm wondering now about automatically inferring "recheck_on_update =
>> true" for certain common datatype/operators. It doesn't need to be an
>> exhaustive list, but it would be useful if we detected the main use
>> case of
>>
>> (JSONB datatype column)->>CONSTANT
>>
>> Seems like we could do a test to see if the index function is
>> FUNCTION(COLUMNNAME, CONSTANTs...)
>> {JSONB, ->>} or
>> {jsonb_object_field_text(Columnname, Constant)}
>> {substring(Columname, Constants...)}
>>
>> It would be a shame if people had to remember to use this for the
>> common and obvious cases.
>>
> Right now by default index is considered as projective. So even if you do
> not specify "recheck_on_update" option, then recheck will be done.

That's good

> This decision is based on the assumption that most of functional indexes are
> actually projective and looks likes (JSONB datatype column)->>CONSTANT.
> So do you think that this assumption is not correct and we should switch
> disable recheck_on_update by default?

No thanks

> If not, then there is an opposite challenge: find out class of functions
> which definitely are not projective and recheck on them will have no sense.

If there are some.

Projective is not quite correct, since sin((col->>'angle')::numeric))
could stay same but the result is not a subset of the input.

I think it would be better to avoid the use of mathematical terms and
keep the description simple

"If the indexed value depends upon only a subset of the data, it is
possible that the function value will remain constant after an UPDATE
that changes the non-indexed data.
e.g.

If a column is updated from '/some/url/before' to '/some/url/after'
then the value of substing(col, 1, 5) will not change when updated

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
In reply to this post by konstantin knizhnik
On 10 January 2018 at 09:54, Konstantin Knizhnik
<[hidden email]> wrote:

> (new version attached)

Why this comment?

Current implementation of projection optimization has to calculate
index expression twice
in case of hit (value of index expression is not changed) and three
times if values are different.

Old + New for check = 2
plus calculate again in index = 3
?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 01.02.2018 03:10, Simon Riggs wrote:

> On 10 January 2018 at 09:54, Konstantin Knizhnik
> <[hidden email]> wrote:
>
>> (new version attached)
> Why this comment?
>
> Current implementation of projection optimization has to calculate
> index expression twice
> in case of hit (value of index expression is not changed) and three
> times if values are different.
>
> Old + New for check = 2
> plus calculate again in index = 3
> ?
>
Sorry, I do not completely understand your question.
You do not agree with this statement or you think that this comment is
irrelevant in this place?
Or you just want to understand why index expression is calculated 2/3
times? If so, then you have answered this question yourself:
> Old + New for check = 2
> plus calculate again in index = 3

Yes, we have to calculate the value of index expression for original and
updated version of the record. If them are equal, then it is all we have
to do with this index: hot update is applicable.
In this case we calculate index expression twice.
But if them are not equal, then hot update is not applicable and we have
to update index. In this case expression will be calculated one more
time. So totally three times.
This is why, if calculation of index expression is very expensive, then
effect of this optimization may be negative even if value of index
expression is not changed.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
On 1 February 2018 at 08:49, Konstantin Knizhnik
<[hidden email]> wrote:

>
>
> On 01.02.2018 03:10, Simon Riggs wrote:
>>
>> On 10 January 2018 at 09:54, Konstantin Knizhnik
>> <[hidden email]> wrote:
>>
>>> (new version attached)
>>
>> Why this comment?
>>
>> Current implementation of projection optimization has to calculate
>> index expression twice
>> in case of hit (value of index expression is not changed) and three
>> times if values are different.
>>
>> Old + New for check = 2
>> plus calculate again in index = 3
>> ?
>>
> Sorry, I do not completely understand your question.
> You do not agree with this statement or you think that this comment is
> irrelevant in this place?
> Or you just want to understand why index expression is calculated 2/3 times?
> If so, then you have answered this question yourself:
>>
>> Old + New for check = 2
>> plus calculate again in index = 3
>
>
> Yes, we have to calculate the value of index expression for original and
> updated version of the record. If them are equal, then it is all we have to
> do with this index: hot update is applicable.
> In this case we calculate index expression twice.
> But if them are not equal, then hot update is not applicable and we have to
> update index. In this case expression will be calculated one more time. So
> totally three times.
> This is why, if calculation of index expression is very expensive, then
> effect of this optimization may be negative even if value of index
> expression is not changed.

OK, thanks. Just checking my understanding and will add to the
comments in the patch.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
On 1 February 2018 at 09:32, Simon Riggs <[hidden email]> wrote:

> OK, thanks. Just checking my understanding and will add to the
> comments in the patch.

I'm feeling good about ths now, but for personal reasons won't be
committing this until late Feb/early March.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Andres Freund
In reply to this post by konstantin knizhnik
Hi,

I still don't think, as commented upon by Tom and me upthread, that we
want this feature in the current form.

Your arguments about layering violations seem to be counteracted by the
fact that ProjectionIsNotChanged() basically appears to do purely
executor work inside inside heapam.c.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 01.03.2018 22:48, Andres Freund wrote:
> Hi,
>
> I still don't think, as commented upon by Tom and me upthread, that we
> want this feature in the current form.
>
> Your arguments about layering violations seem to be counteracted by the
> fact that ProjectionIsNotChanged() basically appears to do purely
> executor work inside inside heapam.c.

ProjectionIsNotChanged doesn't perform evaluation itslef, is calls FormIndexDatum.
FormIndexDatum is also called in 13 other places in Postgres:
analyze.c, constraint.c, index.c, tuplesort, execIndexing.c

I still think that trying to store somewhere result odf index expression evaluation to be able to reuse in index update is not so good idea:
it complicates code and add extra dependencies between different modules.
And benefits of such optimization are not obvious: is index expression evaluation is so expensive, then recheck_on_update should be prohibited for such index at all.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
Please excuse my absence from this thread.

On 2 March 2018 at 12:34, Konstantin Knizhnik <[hidden email]> wrote:
>
>
> On 01.03.2018 22:48, Andres Freund wrote:
>>
>> Hi,
>>
>> I still don't think, as commented upon by Tom and me upthread, that we
>> want this feature in the current form.

The performance benefit of this patch is compelling. I don't see a
comment from Tom along those lines, just a queston as to whether the
code is in the right place.

>> Your arguments about layering violations seem to be counteracted by the
>> fact that ProjectionIsNotChanged() basically appears to do purely
>> executor work inside inside heapam.c.
>
>
> ProjectionIsNotChanged doesn't perform evaluation itslef, is calls
> FormIndexDatum.
> FormIndexDatum is also called in 13 other places in Postgres:
> analyze.c, constraint.c, index.c, tuplesort, execIndexing.c
>
> I still think that trying to store somewhere result odf index expression
> evaluation to be able to reuse in index update is not so good idea:
> it complicates code and add extra dependencies between different modules.
> And benefits of such optimization are not obvious: is index expression
> evaluation is so expensive, then recheck_on_update should be prohibited for
> such index at all.

Agreed. What we are doing is trying to avoid HOT updates.

We make the check for HOT update dynamically when it could be made
statically in some cases. Simplicity says just test it dynamically.

We could also do work to check for HOT update for functional indexes
earlier but for the same reason, I don't think its worth adding.

The patch itself is about as simple as it can be, so the code is in
the right place. My problems with it have been around the actual user
interface to request it.

Index option handling has changed (and this needs rebase!), but other
than that I think we want this and am planning to commit something
early next week.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Álvaro Herrera
In reply to this post by konstantin knizhnik
The rd_projidx (list of each nth element in the index list that is a
projection index) thing looks odd.  Wouldn't it make more sense to have
a list of index OIDs that are projection indexes?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Álvaro Herrera
In reply to this post by konstantin knizhnik
The whole IsProjectionFunctionalIndex looks kinda bogus/ugly to me.  Set
the boolean to false, but keep evaluating anyway?  But then, I thought
the idea was to do this based on the reloption, not by comparing the
expression cost to a magical (unmodifiable) value?

In RelationGetIndexAttrBitmap(), indexattrs no longer gets the columns
corresponding to projection indexes.  Isn't that weird/error
prone/confusing?  I think it'd be saner to add these bits to both
bitmaps.

Please update the comments ending in heapam.c:4188, and generally all
comments that you should update.

Please keep serial_schedule in sync with parallel_schedule.

Also, pgindent.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik
In reply to this post by Álvaro Herrera


On 22.03.2018 23:37, Alvaro Herrera wrote:
> The rd_projidx (list of each nth element in the index list that is a
> projection index) thing looks odd.  Wouldn't it make more sense to have
> a list of index OIDs that are projection indexes?
>

rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
sets in RelationData:

     Bitmapset  *rd_indexattr;    /* identifies columns used in indexes */
     Bitmapset  *rd_keyattr;        /* cols that can be ref'd by foreign
keys */
     Bitmapset  *rd_pkattr;        /* cols included in primary key */
     Bitmapset  *rd_idattr;        /* included in replica identity index */

Yes, it is possible to construct Oid list of projectional indexes
instead of having bitmap which marks some indexes in projectional, but I
am not sure that
it will be more efficient both from CPU and memory footprint point of
view (in most indexes bitmap will consists of just one integer). In any
case, I do not think that it can have some measurable impact on
performance or memory size:
number of indexes and especially projectional indexes will very rarely
exceed 10.


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Álvaro Herrera
Konstantin Knizhnik wrote:

> rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
> sets in RelationData:

Yes, but the other bitmapsets are of AttrNumber of the involved column.
They new one is of list_nth() counters for items in the index list.
That seems weird and it scares me -- do we use that coding pattern
elsewhere?  Maybe use an array of OIDs instead?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
In reply to this post by konstantin knizhnik
On 23 March 2018 at 15:39, Konstantin Knizhnik
<[hidden email]> wrote:

>
>
> On 22.03.2018 23:37, Alvaro Herrera wrote:
>>
>> The rd_projidx (list of each nth element in the index list that is a
>> projection index) thing looks odd.  Wouldn't it make more sense to have
>> a list of index OIDs that are projection indexes?
>>
>
> rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
> sets in RelationData:
>
>     Bitmapset  *rd_indexattr;    /* identifies columns used in indexes */
>     Bitmapset  *rd_keyattr;        /* cols that can be ref'd by foreign keys
> */
>     Bitmapset  *rd_pkattr;        /* cols included in primary key */
>     Bitmapset  *rd_idattr;        /* included in replica identity index */
>
> Yes, it is possible to construct Oid list of projectional indexes instead of
> having bitmap which marks some indexes in projectional, but I am not sure
> that
> it will be more efficient both from CPU and memory footprint point of view
> (in most indexes bitmap will consists of just one integer). In any case, I
> do not think that it can have some measurable impact on performance or
> memory size:
> number of indexes and especially projectional indexes will very rarely
> exceed 10.

Alvaro's comment seems reasonable to me.

The patch adds both rd_projindexattr and rd_projidx
rd_projindexattr should be a Bitmapset, but rd_projidx looks strange
now he mentions it.

Above that in RelationData we have other structures that are List of
OIDs, so Alvaro's proposal make sense.

That would simplify the code in ProjectionIsNotChanged() by just
looping over the list of projection indexes rather than the list of
indexes

So please could you make the change?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

12345