Surjective functional indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
99 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 22.03.2018 23:53, Alvaro Herrera wrote:
> The whole IsProjectionFunctionalIndex looks kinda bogus/ugly to me.  Set
> the boolean to false, but keep evaluating anyway?  But then, I thought
> the idea was to do this based on the reloption, not by comparing the
> expression cost to a magical (unmodifiable) value?
The idea is very simple, I tried to explain it in the comments. Sorry if
my explanation was not so clear.
By default we assume all functional indexes as projectional. This is
done by first assignment is_projection = true.
Then we check cost of index function.
I agree that "magical (unmodifiable) value" used as cost threshold is
not a good idea. But I tried to avoid as much as possible adding yet
another configuration parameter.
I do no think that flexibility here is so important. I believe that in
99% cases functional indexes are projectional,
and in all other cases DBA cna explicitly specify it using
recheck_on_update index option.
Which should override cost threshold: if DBA thinks that recheck is
useful for this index, then it should be used despite to previsions
guess based on index expression cost.
This is why after setting "is_projection" to false because of too large
cost of index expression, we  continue work and check index options for
explicitly specified value.

>
> In RelationGetIndexAttrBitmap(), indexattrs no longer gets the columns
> corresponding to projection indexes.  Isn't that weird/error
> prone/confusing?  I think it'd be saner to add these bits to both
> bitmaps.
This bitmap is needed only to mark attributes, which modification
prevents hot update.
Should I rename rd_indexattr to  rd_hotindexattr or whatever else to
avoid confusion?
Please notice, that rd_indexattr is used only inside
RelationGetIndexAttrBitmap and is not visible from outside.
It is returned for INDEX_ATTR_BITMAP_HOT IndexAttrBitmapKind.

> Please update the comments ending in heapam.c:4188, and generally all
> comments that you should update.
Sorry, did you apply projection-7.patch?
I have checked all trailing spaces and other indentation issues.
At least there is no trailing space at heapam.c:4188...

>
> Please keep serial_schedule in sync with parallel_schedule.

Sorry, once again do not understand your complaint: I have added
func_index both to serial and parallel schedule.

> Also, pgindent.
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik
In reply to this post by Álvaro Herrera


On 23.03.2018 18:45, Alvaro Herrera wrote:
> Konstantin Knizhnik wrote:
>
>> rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
>> sets in RelationData:
> Yes, but the other bitmapsets are of AttrNumber of the involved column.
> They new one is of list_nth() counters for items in the index list.
> That seems weird and it scares me -- do we use that coding pattern
> elsewhere?  Maybe use an array of OIDs instead?
>
Using list or array instead of bitmap requires much more space...
And bitmaps are much more efficient for many operations: check for
element presence, combine, intersect,...
Check in bitmap has O(0) complexity so iterating through list of all
indexes or attributes with bitmap check doesn't add any essential overhead.

Sorry, I do not understand this: "They new one is of list_nth() counters
for items in the index list"

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
In reply to this post by Simon Riggs
On 23 March 2018 at 15:54, Simon Riggs <[hidden email]> wrote:

> So please could you make the change?

Committed, but I still think that change would be good.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 27.03.2018 22:08, Simon Riggs wrote:
> On 23 March 2018 at 15:54, Simon Riggs <[hidden email]> wrote:
>
>> So please could you make the change?
> Committed, but I still think that change would be good.
>
Thank you.
But I still not sure whether replacement of bitmap with List or array of
Oids is really good idea.

There are two aspects: size and speed. It seems to me that memory
overhead is most important.
At least you rejected your own idea about using autotune for this
parameters because of presence of extra 16 bytes in statistic.
But RelationData structure is even more space critical: its size is
multiplied by number of relations and backends.
Bitmap seems to provide the most compact representation of the
projective index list.

Concerning speed aspect, certainly iteration through the list of all
indexes with checking presence of index in the bitmap is more expensive
than just direct iteration through Oid
list or array. But since check of bitmap can be done in constant time,
both approaches have the same complexity. Also for typical case (< 5
normal indexes for a table and 1-2 functional indexes)
difference in performance can not be measured.

Also bitmap provides convenient interface for adding new members. To
construct array of Oids I need first to determine number of indexes, so
I have to perform two loops.

So if you and Alvaro insist on this change, then I will definitely do
it. But from my point of view, using bitmap here is acceptable solution.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Robert Haas
In reply to this post by Andres Freund
On Thu, Mar 1, 2018 at 2:48 PM, Andres Freund <[hidden email]> wrote:
> I still don't think, as commented upon by Tom and me upthread, that we
> want this feature in the current form.

Was this concern ever addressed, or did the patch just get committed anyway?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Andres Freund
On 2018-05-10 23:25:58 -0400, Robert Haas wrote:
> On Thu, Mar 1, 2018 at 2:48 PM, Andres Freund <[hidden email]> wrote:
> > I still don't think, as commented upon by Tom and me upthread, that we
> > want this feature in the current form.
>
> Was this concern ever addressed, or did the patch just get committed anyway?

No. Simon just claimed it's not actually a concern:
https://www.postgresql.org/message-id/CANP8+j+vtskPhEp_GmqmEqdWaKSt2KbOtee0yz-my+Agh0aRPw@...

And yes, it got committed without doing squat to address the
architectural concerns.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

David G Johnston
In reply to this post by konstantin knizhnik
On Thursday, February 1, 2018, Konstantin Knizhnik <[hidden email]> wrote:

Old + New for check = 2
plus calculate again in index = 3

Yes, we have to calculate the value of index expression for original and updated version of the record. If them are equal, then it is all we have to do with this index: hot update is applicable.
In this case we calculate index expression twice.
But if them are not equal, then hot update is not applicable and we have to update index. In this case expression will be calculated one more time. So totally three times.
This is why, if calculation of index expression is very expensive, then effect of this optimization may be negative even if value of index expression is not changed.

For the old/new comparison and the re-calculate if changed dynamics - is this a side effect of separation of concerns only or is there some other reason the old computed value already stored in the index isn't compared to the one and only function result of the new tuple which, if found to be different, is then stored.  One function invocation, which has to happen anyway, and one extra equality check.  Seems like this could be used for non-functional indexes too, so that mere presence in the update listing doesn't negate HOT if the column didn't actually change (if I'm not mis-remembering something here anyway...)

Also, create index page doc typo from site:  "with an total" s/b "with a total" (expression cost less than 1000) - maybe add a comma for 1,000

David J.


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 11.05.2018 07:48, David G. Johnston wrote:
On Thursday, February 1, 2018, Konstantin Knizhnik <[hidden email]> wrote:

Old + New for check = 2
plus calculate again in index = 3

Yes, we have to calculate the value of index expression for original and updated version of the record. If them are equal, then it is all we have to do with this index: hot update is applicable.
In this case we calculate index expression twice.
But if them are not equal, then hot update is not applicable and we have to update index. In this case expression will be calculated one more time. So totally three times.
This is why, if calculation of index expression is very expensive, then effect of this optimization may be negative even if value of index expression is not changed.

For the old/new comparison and the re-calculate if changed dynamics - is this a side effect of separation of concerns only or is there some other reason the old computed value already stored in the index isn't compared to the one and only function result of the new tuple which, if found to be different, is then stored.  One function invocation, which has to happen anyway, and one extra equality check.  Seems like this could be used for non-functional indexes too, so that mere presence in the update listing doesn't negate HOT if the column didn't actually change (if I'm not mis-remembering something here anyway...)

Also, create index page doc typo from site:  "with an total" s/b "with a total" (expression cost less than 1000) - maybe add a comma for 1,000

David J.



Sorry, may be I do not completely understand you.
So whats happed before this patch:

- On update postgres compares old and new values of all changed attributes to determine whether them are actually changed.
- If value of some indexed attribute is changed,  then hot update is not applicable and we have to rebuild indexed.
- Otherwise hot update is used and indexes should not be updated.

What is changed:
  
-  When some of attributes, on which functional index depends, is changed, then we calculate value of index expression. It is done using existed FormIndexDatum function which stores calculated expression value in the provided slot. This evaluation of index expressions and their comparison is done in access/heap/heapam.c file.
- Only if old and new values of index expression are different, then hot update is really not applicable.
- In this case we have to rebuild indexes. It is done by ExecInsertIndexTuples in executor/execIndexing.c which calls FormIndexDatum once again to calculate index expression.

So in principle, it is certainly possible to store value of index expression calculated in ProjIndexIsUnchanged and reuse it ExecInsertIndexTuples.
But I do not know right place (context) where this value can be stored.
And also it will add some dependency between  heapam and execIndexing modules. Also it is necessary to take in account that ProjIndexIsUnchanged is not always called. So index expression value may be not present.

Finally, most of practically used index expressions are not expensive. It is usually something like extraction of JSON component. Cost of execution of this function is much smaller than cost of extracting and unpacking toasted JSON value. So I do not think that such optimization will be really useful, while it is obvious that it significantly complicate code.
Also please notice that FormIndexDatum is used in some other places:

  utils/adt/selfuncs.c
  utils/sort/tuplesort.c
  mmands/constraint.c

So this patch just adds two more calls of this function.
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
In reply to this post by Andres Freund
On 11 May 2018 at 05:32, Andres Freund <[hidden email]> wrote:

> On 2018-05-10 23:25:58 -0400, Robert Haas wrote:
>> On Thu, Mar 1, 2018 at 2:48 PM, Andres Freund <[hidden email]> wrote:
>> > I still don't think, as commented upon by Tom and me upthread, that we
>> > want this feature in the current form.
>>
>> Was this concern ever addressed, or did the patch just get committed anyway?
>
> No. Simon just claimed it's not actually a concern:
> https://www.postgresql.org/message-id/CANP8+j+vtskPhEp_GmqmEqdWaKSt2KbOtee0yz-my+Agh0aRPw@...
>
> And yes, it got committed without doing squat to address the
> architectural concerns.

"Squat" means "zero, nothing" to me. So that comment would be inaccurate.

I've spent a fair amount of time reviewing and grooming the patch, and
I am happy that Konstantin has changed his patch significantly in
response to those reviews, as well as explaining why the patch is fine
as it is. It's a useful patch that PostgreSQL needs, so all good.

I have no problem if you want to replace this with an even better
design in a later release.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

David G Johnston
In reply to this post by konstantin knizhnik
On Fri, May 11, 2018 at 4:58 AM, Konstantin Knizhnik <[hidden email]> wrote:

Sorry, may be I do not completely understand you.
So whats happed before this patch:

- On update postgres compares old and new values of all changed attributes to determine whether them are actually changed.
- If value of some indexed attribute is changed,  then hot update is not applicable and we have to rebuild indexed.
- Otherwise hot update is used and indexes should not be updated.

What is changed:
  
-  When some of attributes, on which functional index depends, is changed, then we calculate value of index expression. It is done using existed FormIndexDatum function which stores calculated expression value in the provided slot. This evaluation of index expressions and their comparison is done in access/heap/heapam.c file.
- Only if old and new values of index expression are different, then hot update is really not applicable.


​Thanks.

So, in my version of layman's terms, before this patch we treated simple column referenced indexes and expression indexes differently because in a functional index we didn't actually compare the expression results, only the original values of the depended-on columns.  With this patch both are treated the same - and in a manner that I would think would be normal/expected.  Treating expression indexes this way, however, is potentially more costly than before and thus we want to provide the user a way to say "hey, PG, don't bother with the extra effort of comparing the entire expression, it ain't gonna matter since if I change the depended-on values odds are the expression result will change as well."  Changing an option named "recheck_on_update" doesn't really communicate that to me (I dislike the word recheck, too implementation specific).  Something like "only_compare_dependencies_on_update (default false)" would do a better job at that - tell me what the abnormal behavior is, not the normal, and lets me enable it instead of disabling the default behavior and not know what it is falling back to.  The description for the parameter does a good job of describing this - just suggesting that the name match how the user might see things.

On the whole the change makes sense - and the general assumptions around runtime cost seem sound and support the trade-off simply re-computing the expression on the old tuple versus engineering an inexpensive way to retrieve the existing value from the index.

I don't have a problem with "total expression cost" being used as a heuristic - though maybe that would mean we should make this an enum with (off, on, never/always) with the third option overriding all heuristics.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Andres Freund
In reply to this post by Simon Riggs
On 2018-05-11 14:56:12 +0200, Simon Riggs wrote:
> On 11 May 2018 at 05:32, Andres Freund <[hidden email]> wrote:
> > No. Simon just claimed it's not actually a concern:
> > https://www.postgresql.org/message-id/CANP8+j+vtskPhEp_GmqmEqdWaKSt2KbOtee0yz-my+Agh0aRPw@...
> >
> > And yes, it got committed without doing squat to address the
> > architectural concerns.
>
> "Squat" means "zero, nothing" to me. So that comment would be inaccurate.

Yes, I know it means that. And I don't see how it is inaccurate.


> I have no problem if you want to replace this with an even better
> design in a later release.

Meh. The author / committer should get a patch into the right shape, not
other people that are concerned with the consequences.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Simon Riggs
On 11 May 2018 at 16:37, Andres Freund <[hidden email]> wrote:

> On 2018-05-11 14:56:12 +0200, Simon Riggs wrote:
>> On 11 May 2018 at 05:32, Andres Freund <[hidden email]> wrote:
>> > No. Simon just claimed it's not actually a concern:
>> > https://www.postgresql.org/message-id/CANP8+j+vtskPhEp_GmqmEqdWaKSt2KbOtee0yz-my+Agh0aRPw@...
>> >
>> > And yes, it got committed without doing squat to address the
>> > architectural concerns.
>>
>> "Squat" means "zero, nothing" to me. So that comment would be inaccurate.
>
> Yes, I know it means that. And I don't see how it is inaccurate.
>
>
>> I have no problem if you want to replace this with an even better
>> design in a later release.
>
> Meh. The author / committer should get a patch into the right shape

They have done, at length. Claiming otherwise is just trash talk.

As you pointed out, the design of the patch avoids layering violations
that could have led to architectural objections. Are you saying I
should have ignored your words and rewritten the patch to introduce a
layering violation? What other objection do you think has not been
addressed?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Robert Haas
On Fri, May 11, 2018 at 11:01 AM, Simon Riggs <[hidden email]> wrote:
>>> I have no problem if you want to replace this with an even better
>>> design in a later release.
>>
>> Meh. The author / committer should get a patch into the right shape
>
> They have done, at length. Claiming otherwise is just trash talk.

Some people might call it "honest disagreement".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Fri, May 11, 2018 at 11:01 AM, Simon Riggs <[hidden email]> wrote:
>>>> I have no problem if you want to replace this with an even better
>>>> design in a later release.

>>> Meh. The author / committer should get a patch into the right shape

>> They have done, at length. Claiming otherwise is just trash talk.

> Some people might call it "honest disagreement".

So where we are today is that this patch was lobotomized by commits
77366d90f/d06fe6ce2 as a result of this bug report:

https://postgr.es/m/20181106185255.776mstcyehnc63ty@...

We need to move forward, either by undertaking a more extensive
clean-out, or by finding a path to a version of the code that is
satisfactory.  I wanted to enumerate my concerns while yesterday's
events are still fresh in mind.  (Andres or Robert might have more.)

* I do not understand why this feature is on-by-default in the first
place.  It can only be a win for expression indexes that are many-to-one
mappings; for indexes that are one-to-one or few-to-one, it's a pretty
big loss.  I see no reason to assume that most expression indexes fall
into the first category.  I suggest that the design ought to be to use
this optimization only for indexes for which the user has explicitly
enabled recheck_on_update.  That would allow getting rid of the cost check
in IsProjectionFunctionalIndex, about which we'd otherwise have to have
an additional fight: I do not like its ad-hoc-ness, nor the modularity
violation (and potential circularity) involved in having the relcache call
cost_qual_eval.

* Having heapam.c calling the executor also seems like a
modularity/layering violation that will bite us in the future.

* The implementation seems incredibly inefficient.  ProjIndexIsUnchanged
is doing creation/destruction of an EState, index_open/index_close
(including acquisition of a lock we should already have), BuildIndexInfo,
and expression compilation, all of which are completely redundant with
work done elsewhere in the executor.  And it's doing that *over again for
every tuple*, which totally aside from wasted cycles probably means there
are large query-lifespan memory leaks in an UPDATE affecting many rows.
I think a minimum expectation should be that one-time work is done only
one time; but ideally none of those things would happen at all because
we could share work with the regular code path.  Perhaps it's too much
to hope that we could also avoid duplicate computation of the new index
expression values, but as long as I'm listing complaints ...

* As noted in the bug thread, the implementation of the new reloption
is broken because (a) it fails to work for some built-in index AMs
and (b) by design, it can't work for add-on AMs.  I agree with Andres
that that's not acceptable.

* This seems like bad data structure design:

+   Bitmapset  *rd_projidx;     /* Oids of projection indexes */

That comment is a lie, although IMO it'd be better if it were true;
a list of target index OIDs would be a far better design here.  The use
of rd_projidx as a set of indexes into the relation's indexlist is
inefficient and overcomplicated, plus it creates an unnecessary and not
very safe (even if it were documented) coupling between rd_indexlist and
rd_projidx.

* Having ProjIndexIsUnchanged examine relation->rd_projidx directly is
broken by design anyway, both from a modularity standpoint and because
its inner loop involves catalog accesses that could result in relcache
flushes.  I'm somewhat amazed that the regression tests passed on
CLOBBER_CACHE_ALWAYS buildfarm animals, although possibly this is
explained by the fact that they're only testing cases with a single
expression index, so that the bitmap isn't checked again after the cache
flush happens.  Again, this could be fixed if what was returned was just
a list of relevant index OIDs.

* This bit of coding is unsafe, for the reason explained in the existing
comment:

     /*
      * Now save copies of the bitmaps in the relcache entry.  We intentionally
      * set rd_indexattr last, because that's the one that signals validity of
      * the values; if we run out of memory before making that copy, we won't
      * leave the relcache entry looking like the other ones are valid but
      * empty.
      */
     oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
     relation->rd_keyattr = bms_copy(uindexattrs);
     relation->rd_pkattr = bms_copy(pkindexattrs);
     relation->rd_idattr = bms_copy(idindexattrs);
     relation->rd_indexattr = bms_copy(indexattrs);
+    relation->rd_projindexattr = bms_copy(projindexattrs);
+    relation->rd_projidx = bms_copy(projindexes);
     MemoryContextSwitchTo(oldcxt);

* There's a lot of other inattention to comments.  For example, I noticed
that this patch added new responsibilities to RelationGetIndexList without
updating its header comment to mention them.

* There's a lack of attention to terminology, too.  I do not think that
"projection index" is an appropriate term at all, nor do I think that
"recheck_on_update" is a particularly helpful option name, though we
may be stuck with the latter at this point.

* I also got annoyed by minor sloppiness like not adding the new
regression test in the same place in serial_schedule and
parallel_schedule.  The whole thing needed more careful review
than it got.

In short, it seems likely to me that large parts of this patch need to
be pulled out, rewritten, and then put back in different places than
they are today.  I'm not sure if a complete revert is the best next
step, or if we can make progress without that.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 07.11.2018 22:25, Tom Lane wrote:
Robert Haas [hidden email] writes:
On Fri, May 11, 2018 at 11:01 AM, Simon Riggs [hidden email] wrote:
I have no problem if you want to replace this with an even better
design in a later release.

      
Meh. The author / committer should get a patch into the right shape

      
They have done, at length. Claiming otherwise is just trash talk.

      
Some people might call it "honest disagreement".
So where we are today is that this patch was lobotomized by commits
77366d90f/d06fe6ce2 as a result of this bug report:

https://postgr.es/m/20181106185255.776mstcyehnc63ty@...

We need to move forward, either by undertaking a more extensive
clean-out, or by finding a path to a version of the code that is
satisfactory.  I wanted to enumerate my concerns while yesterday's
events are still fresh in mind.  (Andres or Robert might have more.)

* I do not understand why this feature is on-by-default in the first
place.  It can only be a win for expression indexes that are many-to-one
mappings; for indexes that are one-to-one or few-to-one, it's a pretty
big loss.  I see no reason to assume that most expression indexes fall
into the first category.  I suggest that the design ought to be to use
this optimization only for indexes for which the user has explicitly
enabled recheck_on_update.  That would allow getting rid of the cost check
in IsProjectionFunctionalIndex, about which we'd otherwise have to have
an additional fight: I do not like its ad-hoc-ness, nor the modularity
violation (and potential circularity) involved in having the relcache call
cost_qual_eval.

* Having heapam.c calling the executor also seems like a
modularity/layering violation that will bite us in the future.

* The implementation seems incredibly inefficient.  ProjIndexIsUnchanged
is doing creation/destruction of an EState, index_open/index_close
(including acquisition of a lock we should already have), BuildIndexInfo,
and expression compilation, all of which are completely redundant with
work done elsewhere in the executor.  And it's doing that *over again for
every tuple*, which totally aside from wasted cycles probably means there
are large query-lifespan memory leaks in an UPDATE affecting many rows.
I think a minimum expectation should be that one-time work is done only
one time; but ideally none of those things would happen at all because
we could share work with the regular code path.  Perhaps it's too much
to hope that we could also avoid duplicate computation of the new index
expression values, but as long as I'm listing complaints ...

* As noted in the bug thread, the implementation of the new reloption
is broken because (a) it fails to work for some built-in index AMs
and (b) by design, it can't work for add-on AMs.  I agree with Andres
that that's not acceptable.

* This seems like bad data structure design:

+   Bitmapset  *rd_projidx;     /* Oids of projection indexes */

That comment is a lie, although IMO it'd be better if it were true;
a list of target index OIDs would be a far better design here.  The use
of rd_projidx as a set of indexes into the relation's indexlist is
inefficient and overcomplicated, plus it creates an unnecessary and not
very safe (even if it were documented) coupling between rd_indexlist and
rd_projidx.

* Having ProjIndexIsUnchanged examine relation->rd_projidx directly is
broken by design anyway, both from a modularity standpoint and because
its inner loop involves catalog accesses that could result in relcache
flushes.  I'm somewhat amazed that the regression tests passed on
CLOBBER_CACHE_ALWAYS buildfarm animals, although possibly this is
explained by the fact that they're only testing cases with a single
expression index, so that the bitmap isn't checked again after the cache
flush happens.  Again, this could be fixed if what was returned was just
a list of relevant index OIDs.

* This bit of coding is unsafe, for the reason explained in the existing
comment:

     /*
      * Now save copies of the bitmaps in the relcache entry.  We intentionally
      * set rd_indexattr last, because that's the one that signals validity of
      * the values; if we run out of memory before making that copy, we won't
      * leave the relcache entry looking like the other ones are valid but
      * empty.
      */
     oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
     relation->rd_keyattr = bms_copy(uindexattrs);
     relation->rd_pkattr = bms_copy(pkindexattrs);
     relation->rd_idattr = bms_copy(idindexattrs);
     relation->rd_indexattr = bms_copy(indexattrs);
+    relation->rd_projindexattr = bms_copy(projindexattrs);
+    relation->rd_projidx = bms_copy(projindexes);
     MemoryContextSwitchTo(oldcxt);

* There's a lot of other inattention to comments.  For example, I noticed
that this patch added new responsibilities to RelationGetIndexList without
updating its header comment to mention them.

* There's a lack of attention to terminology, too.  I do not think that
"projection index" is an appropriate term at all, nor do I think that
"recheck_on_update" is a particularly helpful option name, though we
may be stuck with the latter at this point.

* I also got annoyed by minor sloppiness like not adding the new
regression test in the same place in serial_schedule and
parallel_schedule.  The whole thing needed more careful review
than it got.

In short, it seems likely to me that large parts of this patch need to
be pulled out, rewritten, and then put back in different places than
they are today.  I'm not sure if a complete revert is the best next
step, or if we can make progress without that.

			regards, tom lane
First of all I am sorry for this bug. It's a pity that I was not involved in this investigation: I am not subscribed on "postgresql-general" list.
I have found the same bug in yet another my code few month ago. Shame on me, but I  didn't realized that values returned by FormIndexDatum
data doesn't match with descriptor of index relation, because AM may convert them to own types. By the way, I will be please if somebody can suggest the best
way of building proper tuple descriptor.

If the idea of "projection" index optimization is still considered as valuable by community, I can continue improvement of this patch.
Tom, thank you for the critics. But if it's not too brazen of me, I'd like to receive more constructive critics:

>
Having heapam.c calling the executor also seems like a modularity/layering violation that will bite us in the future.
Function ProjIndexIsUnchanged is implemented and used in heapam.c because this is the place where decision weather to perform or no perform hot update is made.
Insertion of index is done later. So the only possibility is to somehow cache results of this operation to be able to reuse in future.

> The implementation seems incredibly inefficient. ProjIndexIsUnchanged is doing creation/destruction of an EState, index_open/index_close...

Well, please look for example at get_actual_variable_range function. It is doing almost the same. Is it also extremely inefficient?
Almost all functions dealing with indexes and calling FormIndexDatum are performing the same actions. But the main differences, is that most of them (like IndexCheckExclusion) initialize EState and create slot only once and then iterate through all tuples matching search condition. And ProjIndexIsUnchanged is doing it for each tuple. Once again, I will be pleased to receive from you (or somebody else) advice how to address this problem. I have no idea how it it possible to avoid to perform this check for each affected tuple. So the only solution I can imagine is to somehow cache and reused created executor state.

> Having ProjIndexIsUnchanged examine relation->rd_projidx directly is broken by design anyway, both from a modularity standpoint and because its inner loop involves catalog accesses that could result in relcache flushes.

Sorry, can you clarify it? As far as there are shared lock for correspondent relation and index, nobody else can alter this table or index. And there is rd_refcnt which should prevent destruction of Relation even if relcache is invalidated. There are a lot of other places where components of relation are accessed directly (most often relation->rd_rel stuff). I do not understand why replacing rd_projidx with list of Oid will solve the problem.

> There's a lack of attention to terminology, too. I do not think that "projection index" is an appropriate term at all, nor do I think that "recheck_on_update" is a particularly helpful option name, though we may be stuck with the latter at this point.

As I am not native english speaking, I will agree with any proposed terminology.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Laurenz Albe
In reply to this post by Tom Lane-2
Tom Lane wrote:

> I wanted to enumerate my concerns while yesterday's
> events are still fresh in mind.  (Andres or Robert might have more.)
>
> * I do not understand why this feature is on-by-default in the first
> place.  It can only be a win for expression indexes that are many-to-one
> mappings; for indexes that are one-to-one or few-to-one, it's a pretty
> big loss.  I see no reason to assume that most expression indexes fall
> into the first category.  I suggest that the design ought to be to use
> this optimization only for indexes for which the user has explicitly
> enabled recheck_on_update.  That would allow getting rid of the cost check
> in IsProjectionFunctionalIndex, about which we'd otherwise have to have
> an additional fight: I do not like its ad-hoc-ness, nor the modularity
> violation (and potential circularity) involved in having the relcache call
> cost_qual_eval.

That was my impression too when I had a closer look at this feature.

What about an option "hot_update_check" with values "off" (default),
"on" and "always"?

Yours,
Laurenz Albe


Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

konstantin knizhnik


On 08.11.2018 15:23, Laurenz Albe wrote:

> Tom Lane wrote:
>> I wanted to enumerate my concerns while yesterday's
>> events are still fresh in mind.  (Andres or Robert might have more.)
>>
>> * I do not understand why this feature is on-by-default in the first
>> place.  It can only be a win for expression indexes that are many-to-one
>> mappings; for indexes that are one-to-one or few-to-one, it's a pretty
>> big loss.  I see no reason to assume that most expression indexes fall
>> into the first category.  I suggest that the design ought to be to use
>> this optimization only for indexes for which the user has explicitly
>> enabled recheck_on_update.  That would allow getting rid of the cost check
>> in IsProjectionFunctionalIndex, about which we'd otherwise have to have
>> an additional fight: I do not like its ad-hoc-ness, nor the modularity
>> violation (and potential circularity) involved in having the relcache call
>> cost_qual_eval.
> That was my impression too when I had a closer look at this feature.
>
> What about an option "hot_update_check" with values "off" (default),
> "on" and "always"?
>
> Yours,
> Laurenz Albe
>
Before doing any other refactoring of projection indexes I want to
attach small bug fix patch which
fixes the original problem (SIGSEGV) and also disables recheck_on_update
by default.
As Laurenz has suggested, I replaced boolean recheck_on_update option
with "on","auto,"off" (default).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


projection-10.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Alvaro Herrera-9
On 2018-Nov-08, Konstantin Knizhnik wrote:

> Before doing any other refactoring of projection indexes I want to attach
> small bug fix patch which
> fixes the original problem (SIGSEGV) and also disables recheck_on_update by
> default.
> As Laurenz has suggested, I replaced boolean recheck_on_update option with
> "on","auto,"off" (default).

I think this causes an ABI break for GenericIndexOpts.  Not sure to what
extent that is an actual problem (i.e. how many modules were compiled
with 11.0 that are gonna be reading that struct with later Pg), but I
think it should be avoided anyhow.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2018-11-07 14:25:54 -0500, Tom Lane wrote:
> We need to move forward, either by undertaking a more extensive
> clean-out, or by finding a path to a version of the code that is
> satisfactory.
> [...]
> In short, it seems likely to me that large parts of this patch need to
> be pulled out, rewritten, and then put back in different places than
> they are today.  I'm not sure if a complete revert is the best next
> step, or if we can make progress without that.

I think the feature has merit, but I don't think it makes that much
sense to start with the current in-tree version. There's just too many
architectural issues.  So I think we should clean it out as much as we
can, and then have the feature re-submitted with proper review etc.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Surjective functional indexes

Tom Lane-2
In reply to this post by Alvaro Herrera-9
Alvaro Herrera <[hidden email]> writes:
> On 2018-Nov-08, Konstantin Knizhnik wrote:
>> Before doing any other refactoring of projection indexes I want to attach
>> small bug fix patch which
>> fixes the original problem (SIGSEGV) and also disables recheck_on_update by
>> default.
>> As Laurenz has suggested, I replaced boolean recheck_on_update option with
>> "on","auto,"off" (default).

> I think this causes an ABI break for GenericIndexOpts.  Not sure to what
> extent that is an actual problem (i.e. how many modules were compiled
> with 11.0 that are gonna be reading that struct with later Pg), but I
> think it should be avoided anyhow.

I do not see the point of having more than a boolean option anyway,
if the default is going to be "off".  If the user is going to the
trouble of marking a specific index for this feature, what else is
she likely to want other than having it "on"?

The bigger picture here, and the reason for my skepticism about having
any intelligence in the enabling logic, is that there is no scenario
in which this code can be smarter than the user about what to do.
We have no insight today, and are unlikely to have any in future, about
whether a specific index expression is many-to-one or not.  I have
exactly no faith in cost-estimate-based logic either, because of the
extremely poor quality of our function cost estimates --- very little
effort has been put into assigning trustworthy procost numbers to the
built-in functions, and as for user-defined ones, it's probably much
worse.  So that's why I'm on the warpath against the cost-based logic
that's there now; I think it's just garbage-in-garbage-out.

                        regards, tom lane

12345