Inadequate executor locking of indexes

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Inadequate executor locking of indexes

Tom Lane-2
I discovered that it's possible to trigger relation_open's new assertion
about having a lock on the relation by the simple expedient of running
the core regression tests with plan_cache_mode = force_generic_plan.
(This doesn't give me a terribly warm feeling about how thoroughly we've
exercised the relevant code, but I don't know what to do about that.)

The reason for the crash is that nodeIndexscan.c and friends all open
their index-to-scan with code like

    /*
     * Open the index relation.
     *
     * If the parent table is one of the target relations of the query, then
     * InitPlan already opened and write-locked the index, so we can avoid
     * taking another lock here.  Otherwise we need a normal reader's lock.
     */
    relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
    indexstate->iss_RelationDesc = index_open(node->indexid,
                                              relistarget ? NoLock : AccessShareLock);

Now, at the time this code was written, InitPlan did indeed ensure that
indexes of a plan's result relation were already opened and locked,
because it calls InitResultRelInfo which used to call ExecOpenIndices.
Nowadays, the ExecOpenIndices part is postponed to ExecInitModifyTable,
which figures it can optimize matters:

        /*
         * If there are indices on the result relation, open them and save
         * descriptors in the result relation info, so that we can add new
         * index entries for the tuples we add/update.  We need not do this
         * for a DELETE, however, since deletion doesn't affect indexes. Also,
         * inside an EvalPlanQual operation, the indexes might be open
         * already, since we share the resultrel state with the original
         * query.
         */
        if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
            operation != CMD_DELETE &&
            resultRelInfo->ri_IndexRelationDescs == NULL)
            ExecOpenIndices(resultRelInfo,
                            node->onConflictAction != ONCONFLICT_NONE);

Therefore, in a plan consisting of a DELETE ModifyTable atop an indexscan
of the target table, we are opening the index without the executor
acquiring any lock on the index.  The only thing that saves us from
triggering that Assert is that in most cases the planner has already
taken a lock on any index it considered (and not released that lock).
But using a prepared plan breaks that.

This problem is ancient; it's unrelated to the recent changes to reduce
executor locking, because that only concerned table locks not index locks.
I'm not certain how much real-world impact it has, since holding a lock
on the index's parent table is probably enough to prevent dire things
from happening in most cases.  Still, it seems like a bug.

There are several things we might do to fix this:

1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
in ExecInitModifyTable.  We might be forced into that someday anyway if
we want to support non-heap-style tables, since most other peoples'
versions of indexes do want to know about deletions.

2. Drop the target-table optimization from nodeIndexscan.c and friends,
and just always open the scan index with AccessShareLock.  (BTW, it's
a bit inconsistent that these nodes don't release their index locks
at the end; ExecCloseIndices does.)

3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
exception, so that they get the lock for themselves in that case.  This
seems pretty ugly/fragile, but it's about the only option that won't end
in adding index lock-acquisition overhead in some cases.  (But given the
planner's behavior, it's not clear to me how often that really matters.)

BTW, another thing that is bothering me about this is that, both with
the current code and options #1 and #3, there's an assumption that any
indexscan on a query's target table must be underneath the relevant
ModifyTable plan node.  Given some other plan tree structure it might
be possible to initialize the indexscan node before the ModifyTable,
breaking the assumption that the latter already got index locks.  I'm
not sure if that's actually possible at present, but it doesn't seem
like I'd want to bet on it always being so in the future.  This might
be an argument for going with option #2, which eliminates all such
assumptions.

Another idea is to make things work more like we just made them work for
tables, which is to ensure that all index locks are taken before reaching
the executor, or at least as part of some other processing than the plan
tree walk.  That would be a good deal more work than any of the options
listed above, though, and it would definitely not be back-patchable if we
decide we need a back-patched fix.

Thoughts?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Thu, 8 Nov 2018 at 13:14, Tom Lane <[hidden email]> wrote:
>
> I discovered that it's possible to trigger relation_open's new assertion
> about having a lock on the relation by the simple expedient of running
> the core regression tests with plan_cache_mode = force_generic_plan.


> There are several things we might do to fix this:
>
> 1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
> in ExecInitModifyTable.  We might be forced into that someday anyway if
> we want to support non-heap-style tables, since most other peoples'
> versions of indexes do want to know about deletions.
>
> 2. Drop the target-table optimization from nodeIndexscan.c and friends,
> and just always open the scan index with AccessShareLock.  (BTW, it's
> a bit inconsistent that these nodes don't release their index locks
> at the end; ExecCloseIndices does.)
>
> 3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
> exception, so that they get the lock for themselves in that case.  This
> seems pretty ugly/fragile, but it's about the only option that won't end
> in adding index lock-acquisition overhead in some cases.  (But given the
> planner's behavior, it's not clear to me how often that really matters.)

Since I missed this and only discovered this was a problem when
someone else reported it today, and since I already did my own
analysis separately in [1], then my vote is for #2.  For partitioned
table updates, there may be many result relations which can cause
ExecRelationIsTargetRelation() to become very slow, in such a case any
additional redundant lock would be cheap by comparison.

Ideally, the locking code would realise we already hold a stronger
lock and skip the lock, but I don't see how that's realistically
possible without probing the hash table for all stronger lock types
first, which would likely damage the performance of locking.

[1] https://www.postgresql.org/message-id/CAKJS1f-DyKTYyMf9oxn1PQ%3DWyEOOjfVcV-dCc6eB9eat_MYPeA%40mail.gmail.com

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Tom Lane-2
David Rowley <[hidden email]> writes:

> On Thu, 8 Nov 2018 at 13:14, Tom Lane <[hidden email]> wrote:
>> There are several things we might do to fix this:
>>
>> 1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
>> in ExecInitModifyTable.  We might be forced into that someday anyway if
>> we want to support non-heap-style tables, since most other peoples'
>> versions of indexes do want to know about deletions.
>>
>> 2. Drop the target-table optimization from nodeIndexscan.c and friends,
>> and just always open the scan index with AccessShareLock.  (BTW, it's
>> a bit inconsistent that these nodes don't release their index locks
>> at the end; ExecCloseIndices does.)
>>
>> 3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
>> exception, so that they get the lock for themselves in that case.  This
>> seems pretty ugly/fragile, but it's about the only option that won't end
>> in adding index lock-acquisition overhead in some cases.  (But given the
>> planner's behavior, it's not clear to me how often that really matters.)

> Since I missed this and only discovered this was a problem when
> someone else reported it today, and since I already did my own
> analysis separately in [1], then my vote is for #2.

Thinking more about this, the problem I noted previously about two of
these solutions not working if the index scan node is not physically
underneath the ModifyTable node actually applies to all three :-(.
It's a slightly different issue for #2, namely that what we risk is
first taking AccessShareLock and then upgrading to RowExclusiveLock.
Since there are places (not many) that take ShareLock on indexes,
this would pose a deadlock risk.

Now, after more thought, I believe that it's probably impossible
to have a plan tree in which ExecRelationIsTargetRelation would
return true at an indexscan node that's not underneath the ModifyTable
node.  What *is* possible is that we have such an indexscan on a
different RTE for the same table.  I constructed this admittedly
artificial example in the regression database:

# explain with x1 as (select * from tenk1 t1 where unique1 = 42),
x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
select * from x1,x2 where x1.ten = x2.ten;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Nested Loop  (cost=16.61..16.66 rows=1 width=488)
   Join Filter: (x1.ten = x2.ten)
   CTE x1
     ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.29..8.30 rows=1 width=244)
           Index Cond: (unique1 = 42)
   CTE x2
     ->  Update on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
           ->  Index Scan using tenk1_unique2 on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
                 Index Cond: (unique2 = 11)
   ->  CTE Scan on x1  (cost=0.00..0.02 rows=1 width=244)
   ->  CTE Scan on x2  (cost=0.00..0.02 rows=1 width=244)
(11 rows)

Because the CTEs will be initialized in order, this presents a case
where the lock-upgrade hazard exists today: the x1 indexscan will
open tenk1_unique1 with AccessShareLock and then x2's ModifyTable
node will upgrade that to RowExclusiveLock.  None of the proposed
fixes improve this.

I'm beginning to think that postponing target-index locking to
ExecInitModifyTable was a damfool idea and we should undo it.

> For partitioned
> table updates, there may be many result relations which can cause
> ExecRelationIsTargetRelation() to become very slow, in such a case any
> additional redundant lock would be cheap by comparison.

Yeah, it'd be nice to get rid of the need for that.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Sat, 24 Nov 2018 at 05:25, Tom Lane <[hidden email]> wrote:

> Now, after more thought, I believe that it's probably impossible
> to have a plan tree in which ExecRelationIsTargetRelation would
> return true at an indexscan node that's not underneath the ModifyTable
> node.  What *is* possible is that we have such an indexscan on a
> different RTE for the same table.  I constructed this admittedly
> artificial example in the regression database:
>
> # explain with x1 as (select * from tenk1 t1 where unique1 = 42),
> x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
> select * from x1,x2 where x1.ten = x2.ten;

Well, that problem exists with more than just indexes. Relations could
be problematic too. An even more simple artificial example would be:

select * from t1 inner join t1 t2 on t1.a=t2.a for update of t2;

We could fix that in the executor by processing the rangetable in the
planner, first throwing the whole thing into a hash table by Oid and
finding the strongest lock level and applying that level to each
(non-dummy) matching RangeTblEntry's rellockmode.  While we're there
we could add a new field for indexlockmode and do the same process on
that.   However... there might not be much point as this does nothing
for the same problem that exists in parse analyze.  That may be much
harder or even impossible to fix.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Amit Langote-2
In reply to this post by Tom Lane-2
Sorry for jumping in late on this.

On 2018/11/24 1:25, Tom Lane wrote:

> David Rowley <[hidden email]> writes:
> Thinking more about this, the problem I noted previously about two of
> these solutions not working if the index scan node is not physically
> underneath the ModifyTable node actually applies to all three :-(.
> It's a slightly different issue for #2, namely that what we risk is
> first taking AccessShareLock and then upgrading to RowExclusiveLock.
> Since there are places (not many) that take ShareLock on indexes,
> this would pose a deadlock risk.
>
> Now, after more thought, I believe that it's probably impossible
> to have a plan tree in which ExecRelationIsTargetRelation would
> return true at an indexscan node that's not underneath the ModifyTable
> node.  What *is* possible is that we have such an indexscan on a
> different RTE for the same table.  I constructed this admittedly
> artificial example in the regression database:
>
> # explain with x1 as (select * from tenk1 t1 where unique1 = 42),
> x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
> select * from x1,x2 where x1.ten = x2.ten;
>                                           QUERY PLAN                                          
> ----------------------------------------------------------------------------------------------
>  Nested Loop  (cost=16.61..16.66 rows=1 width=488)
>    Join Filter: (x1.ten = x2.ten)
>    CTE x1
>      ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.29..8.30 rows=1 width=244)
>            Index Cond: (unique1 = 42)
>    CTE x2
>      ->  Update on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
>            ->  Index Scan using tenk1_unique2 on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
>                  Index Cond: (unique2 = 11)
>    ->  CTE Scan on x1  (cost=0.00..0.02 rows=1 width=244)
>    ->  CTE Scan on x2  (cost=0.00..0.02 rows=1 width=244)
> (11 rows)
>
> Because the CTEs will be initialized in order, this presents a case
> where the lock-upgrade hazard exists today: the x1 indexscan will
> open tenk1_unique1 with AccessShareLock and then x2's ModifyTable
> node will upgrade that to RowExclusiveLock.  None of the proposed
> fixes improve this.
Provided we want to keep ExecRelationIsTargetRelation going forward, how
about modifying it such that we compare the scan relation's OID with that
of the result relations, not the RT index?  Like in the attached.

> I'm beginning to think that postponing target-index locking to
> ExecInitModifyTable was a damfool idea and we should undo it.

+1

Also as already proposed, InitPlan should lock result relation indexes
even for DELETEs.

>> For partitioned
>> table updates, there may be many result relations which can cause
>> ExecRelationIsTargetRelation() to become very slow, in such a case any
>> additional redundant lock would be cheap by comparison.
>
> Yeah, it'd be nice to get rid of the need for that.

As David mentioned elsewhere, can we add the ResultRelInfos to a hash
table if there are at least a certain number of result relations?
3f2393edefa did that for UPDATE tuple routing efficiency.

Thanks,
Amit

ExecRelationIsTargetRelation-compare-OID.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Mon, 26 Nov 2018 at 17:37, Amit Langote
<[hidden email]> wrote:
> On 2018/11/24 1:25, Tom Lane wrote:
> > I'm beginning to think that postponing target-index locking to
> > ExecInitModifyTable was a damfool idea and we should undo it.
>
> +1
>
> Also as already proposed, InitPlan should lock result relation indexes
> even for DELETEs.

I'd rather see it fixed another way.  The reason being, if we get [1],
then that opens the door to run-time partition pruning for
UPDATE/DELETE, which is currently blocked due to lack of knowledge
about baserestrictinfos for the base partitioned relation because
inheritance_planner() does not plan for the partitioned table, only
its partitions.  Doing the index opening work during InitPlan() means
we do that for all partitions, even the ones that will later be
run-time pruned. If we can doing it during init of a ModifyTable node,
then we can likely do it after the run-time pruning takes place.
Since Amit and I are both working towards making partitioning faster,
I imagined he would also not want to do something that could slow it
down significantly, if there was some alternative way to fix it, at
least.

I'm making efforts to delay per-partition work even further in the
executor, for example locking of the per-partition result relations
until after run-time pruning would be a significant win for
partitioned tables with many partitions when generic plans are in use.
Moving things back to InitPlan() flies in the face of that work.

[1] https://commitfest.postgresql.org/20/1778/

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Amit Langote-2
On 2018/11/26 14:25, David Rowley wrote:

> On Mon, 26 Nov 2018 at 17:37, Amit Langote
> <[hidden email]> wrote:
>> On 2018/11/24 1:25, Tom Lane wrote:
>>> I'm beginning to think that postponing target-index locking to
>>> ExecInitModifyTable was a damfool idea and we should undo it.
>>
>> +1
>>
>> Also as already proposed, InitPlan should lock result relation indexes
>> even for DELETEs.
>
> I'd rather see it fixed another way.  The reason being, if we get [1],
> then that opens the door to run-time partition pruning for
> UPDATE/DELETE, which is currently blocked due to lack of knowledge
> about baserestrictinfos for the base partitioned relation because
> inheritance_planner() does not plan for the partitioned table, only
> its partitions.  Doing the index opening work during InitPlan() means
> we do that for all partitions, even the ones that will later be
> run-time pruned. If we can doing it during init of a ModifyTable node,
> then we can likely do it after the run-time pruning takes place.
> Since Amit and I are both working towards making partitioning faster,
> I imagined he would also not want to do something that could slow it
> down significantly, if there was some alternative way to fix it, at
> least.
>
> I'm making efforts to delay per-partition work even further in the
> executor, for example locking of the per-partition result relations
> until after run-time pruning would be a significant win for
> partitioned tables with many partitions when generic plans are in use.
> Moving things back to InitPlan() flies in the face of that work.
>
> [1] https://commitfest.postgresql.org/20/1778/

That's an interesting point.  Although, considering the concerns that Tom
raised about the same index relation being locked such that lock-strength
upgrade occurs (his example containing two CTEs), we'll have to find a way
to do the ModifyTable run-time pruning such that result relations and
their indexes (possibly under multiple ModifyTable nodes) are locked with
RowExclusiveLock before they're locked with AccessShareLock lock as scan
relations.  For example, we might be able to find a way to do the
ModifyTable run-time pruning in InitPlan before initializing plan trees.

That said, I don't quite understand how the lock-strength upgrade
occurring in the way being discussed here (AccessShareLock for scanning to
RowExclusiveLock for modifying) leads to deadlock hazard?

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Mon, 26 Nov 2018 at 18:57, Amit Langote
<[hidden email]> wrote:

> On 2018/11/26 14:25, David Rowley wrote:
> > I'm making efforts to delay per-partition work even further in the
> > executor, for example locking of the per-partition result relations
> > until after run-time pruning would be a significant win for
> > partitioned tables with many partitions when generic plans are in use.
> > Moving things back to InitPlan() flies in the face of that work.
> >
> > [1] https://commitfest.postgresql.org/20/1778/
>
> That's an interesting point.  Although, considering the concerns that Tom
> raised about the same index relation being locked such that lock-strength
> upgrade occurs (his example containing two CTEs), we'll have to find a way
> to do the ModifyTable run-time pruning such that result relations and
> their indexes (possibly under multiple ModifyTable nodes) are locked with
> RowExclusiveLock before they're locked with AccessShareLock lock as scan
> relations.  For example, we might be able to find a way to do the
> ModifyTable run-time pruning in InitPlan before initializing plan trees.

I thought my idea of the processing the rangetable at the end of
planning to determine the strongest lock per relation would have
solved that.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Amit Langote-2
On 2018/11/27 6:19, David Rowley wrote:

> On Mon, 26 Nov 2018 at 18:57, Amit Langote
> <[hidden email]> wrote:
>> On 2018/11/26 14:25, David Rowley wrote:
>>> I'm making efforts to delay per-partition work even further in the
>>> executor, for example locking of the per-partition result relations
>>> until after run-time pruning would be a significant win for
>>> partitioned tables with many partitions when generic plans are in use.
>>> Moving things back to InitPlan() flies in the face of that work.
>>>
>>> [1] https://commitfest.postgresql.org/20/1778/
>>
>> That's an interesting point.  Although, considering the concerns that Tom
>> raised about the same index relation being locked such that lock-strength
>> upgrade occurs (his example containing two CTEs), we'll have to find a way
>> to do the ModifyTable run-time pruning such that result relations and
>> their indexes (possibly under multiple ModifyTable nodes) are locked with
>> RowExclusiveLock before they're locked with AccessShareLock lock as scan
>> relations.  For example, we might be able to find a way to do the
>> ModifyTable run-time pruning in InitPlan before initializing plan trees.
>
> I thought my idea of the processing the rangetable at the end of
> planning to determine the strongest lock per relation would have
> solved that.

Yeah, would be nice to make that work.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Tue, 27 Nov 2018 at 19:00, Amit Langote
<[hidden email]> wrote:

> On 2018/11/27 6:19, David Rowley wrote:
> > On Mon, 26 Nov 2018 at 18:57, Amit Langote
> >> That's an interesting point.  Although, considering the concerns that Tom
> >> raised about the same index relation being locked such that lock-strength
> >> upgrade occurs (his example containing two CTEs), we'll have to find a way
> >> to do the ModifyTable run-time pruning such that result relations and
> >> their indexes (possibly under multiple ModifyTable nodes) are locked with
> >> RowExclusiveLock before they're locked with AccessShareLock lock as scan
> >> relations.  For example, we might be able to find a way to do the
> >> ModifyTable run-time pruning in InitPlan before initializing plan trees.
> >
> > I thought my idea of the processing the rangetable at the end of
> > planning to determine the strongest lock per relation would have
> > solved that.
>
> Yeah, would be nice to make that work.
Here's a very rough and incomplete patch just to demo what I had in
mind. The finalize_lockmodes() is likely more or less complete, just
minus me testing it works.  What's mostly missing is changing all the
places that grab index locks to use the new idxlockmode field. I've
really just changed index scan and index only scan and just stared a
bit at nodeModifyTable.c wondering what I should do with that
operation != CMD_DELETE test before the ExecOpenIndices() call.

The patch also includes code to determine the strongest rellockmode
per relation.  Perhaps it's not really worth doing that since parse
analyze could still cause some lock upgrade hazards. The code that's
there just fixes things for the executor, so really would only have an
effect when executing cached plans.

If this looks like a good path to go in, then I can produce something
a bit more finished. I'm just a bit unsure when exactly I can do that
as I'm on leave and have other commitments to take care of.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

wip_idxlockmode_and_lock_upgrade_fix.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

akapila
In reply to this post by Tom Lane-2
On Fri, Nov 23, 2018 at 9:55 PM Tom Lane <[hidden email]> wrote:
>
> Thinking more about this, the problem I noted previously about two of
> these solutions not working if the index scan node is not physically
> underneath the ModifyTable node actually applies to all three :-(.
> It's a slightly different issue for #2, namely that what we risk is
> first taking AccessShareLock and then upgrading to RowExclusiveLock.
> Since there are places (not many) that take ShareLock on indexes,
> this would pose a deadlock risk.
>

Can you be a bit more specific on what exact deadlock risk you are
seeing here as Amit L asked about it and I am also curious to know?
One way I could see is:

Session-1
begin;
Lock table foo in Access Share Mode;

Session-2
begin;
Lock table foo in Share Mode;

Session-1
Lock table foo in Row Exclusive Mode;  --here it will wait for session-2

Session-2
Lock table foo in Access Exclusive Mode;  --here it will lead to deadlock


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
In reply to this post by David Rowley-3
On Wed, 28 Nov 2018 at 01:55, David Rowley <[hidden email]> wrote:
> If this looks like a good path to go in, then I can produce something
> a bit more finished. I'm just a bit unsure when exactly I can do that
> as I'm on leave and have other commitments to take care of.

This patch is still on my list, so I had another look at what I did
back in November...

I've changed a couple of things:

1. Changed nodeBitmapIndexscan.c now properly uses the RangeTblEntry's
idxlockmode field.
2. Renamed a few variables in finalize_lockmodes().

I'm keen to get some feedback if we should go about fixing things this
way.  One thing that's still on my mind is that the parser is still at
risk of lock upgrade hazards. This patch only fixes the executor.  I
don't quite see how it would be possible to fix the same in the
parser.

I was also looking at each call site that calls ExecOpenIndices(). I
don't think it's great that ExecInitModifyTable() has its own logic to
skip calling that function for DELETE.  I wondered if it shouldn't
somehow depend on what the idxlockmode is set to.  I also saw that
apply_handle_delete() makes a call to ExecOpenIndices(). I don't think
that one is needed, but I didn't test anything to make sure. Maybe
that's for another thread anyway.

Updated patch is attached.

Adding to the March commitfest as a bug fix.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

idxlockmode_and_lock_upgrade_fix_v2.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Julien Rouhaud
Hi,

On Tue, Feb 5, 2019 at 5:16 AM David Rowley
<[hidden email]> wrote:

>
> I've changed a couple of things:
>
> 1. Changed nodeBitmapIndexscan.c now properly uses the RangeTblEntry's
> idxlockmode field.
> 2. Renamed a few variables in finalize_lockmodes().
>
> I'm keen to get some feedback if we should go about fixing things this
> way.  One thing that's still on my mind is that the parser is still at
> risk of lock upgrade hazards. This patch only fixes the executor.  I
> don't quite see how it would be possible to fix the same in the
> parser.

+        /*
+         * If there are multiple instances of the same rel with varying lock
+         * strengths then set the strongest lock level to each instance of
+         * that relation.
+         */
+        if (applystrongest)
[...]

The patch is quite straightforward, so I don't have general comments
on it.  However, I think that the idxlockmode initialization is
problematic: you're using the statement's commandType so this doesn't
work with CTE.  For instance, with this artificial query

WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;

will take an AccessShareLock on test's index while it should have an
RowExclusiveLock.  I guess that you have to code similar lock upgrade
logic for the indexes, inspecting the planTree and subplans to find
the correct command type.

>
> I was also looking at each call site that calls ExecOpenIndices(). I
> don't think it's great that ExecInitModifyTable() has its own logic to
> skip calling that function for DELETE.  I wondered if it shouldn't
> somehow depend on what the idxlockmode is set to.

I don't think that it's great either.  However for DELETE we shouldn't
simply call ExecOpenIndices(), but open only the used indexes right?

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Mon, 11 Feb 2019 at 01:22, Julien Rouhaud <[hidden email]> wrote:

> The patch is quite straightforward, so I don't have general comments
> on it.  However, I think that the idxlockmode initialization is
> problematic: you're using the statement's commandType so this doesn't
> work with CTE.  For instance, with this artificial query
>
> WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;
>
> will take an AccessShareLock on test's index while it should have an
> RowExclusiveLock.  I guess that you have to code similar lock upgrade
> logic for the indexes, inspecting the planTree and subplans to find
> the correct command type.

Good catch. I'm a bit stuck on the best way to fix this.  So far I can
only think of, either;

1. Adding a new field to RangeTblEntry to indicate the operation type
that's being performed on the relation; or
2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
indexes of RangeTblEntry items that belong to DELETEs and ignore these
when setting resultRelids in finalize_lockmodes().

For #2, the only place I can see to do this is
add_rtes_to_flat_rtable(), which would require either passing the
PlannerInfo into the function, or at least its parse's commandType.

I don't really like either, but don't have any other ideas at the moment.

> > I was also looking at each call site that calls ExecOpenIndices(). I
> > don't think it's great that ExecInitModifyTable() has its own logic to
> > skip calling that function for DELETE.  I wondered if it shouldn't
> > somehow depend on what the idxlockmode is set to.
>
> I don't think that it's great either.  However for DELETE we shouldn't
> simply call ExecOpenIndices(), but open only the used indexes right?

No, I don't think so. The "used" index(es) will be opened in the scan
node(s).   The reason I didn't like it much is that I wanted to keep
the logic for deciding what lock level to use in the planner.  The
executor seems to have more knowledge than I think maybe it should.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Julien Rouhaud
On Mon, Feb 11, 2019 at 5:32 AM David Rowley
<[hidden email]> wrote:

>
> On Mon, 11 Feb 2019 at 01:22, Julien Rouhaud <[hidden email]> wrote:
> > The patch is quite straightforward, so I don't have general comments
> > on it.  However, I think that the idxlockmode initialization is
> > problematic: you're using the statement's commandType so this doesn't
> > work with CTE.  For instance, with this artificial query
> >
> > WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;
> >
> > will take an AccessShareLock on test's index while it should have an
> > RowExclusiveLock.  I guess that you have to code similar lock upgrade
> > logic for the indexes, inspecting the planTree and subplans to find
> > the correct command type.
>
> Good catch. I'm a bit stuck on the best way to fix this.  So far I can
> only think of, either;
>
> 1. Adding a new field to RangeTblEntry to indicate the operation type
> that's being performed on the relation; or
> 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> indexes of RangeTblEntry items that belong to DELETEs and ignore these
> when setting resultRelids in finalize_lockmodes().
>
> For #2, the only place I can see to do this is
> add_rtes_to_flat_rtable(), which would require either passing the
> PlannerInfo into the function, or at least its parse's commandType.
>
> I don't really like either, but don't have any other ideas at the moment.

But we would still need the same lock level upgrade logic on indexes
for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
relation I think.  #1 seems like a better solution to me.

> > > I was also looking at each call site that calls ExecOpenIndices(). I
> > > don't think it's great that ExecInitModifyTable() has its own logic to
> > > skip calling that function for DELETE.  I wondered if it shouldn't
> > > somehow depend on what the idxlockmode is set to.
> >
> > I don't think that it's great either.  However for DELETE we shouldn't
> > simply call ExecOpenIndices(), but open only the used indexes right?
>
> No, I don't think so. The "used" index(es) will be opened in the scan
> node(s).   The reason I didn't like it much is that I wanted to keep
> the logic for deciding what lock level to use in the planner.  The
> executor seems to have more knowledge than I think maybe it should.

Ah, I see.  Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <[hidden email]> wrote:

>
> On Mon, Feb 11, 2019 at 5:32 AM David Rowley
> <[hidden email]> wrote:
> > 1. Adding a new field to RangeTblEntry to indicate the operation type
> > that's being performed on the relation; or
> > 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> > indexes of RangeTblEntry items that belong to DELETEs and ignore these
> > when setting resultRelids in finalize_lockmodes().
> >
> > For #2, the only place I can see to do this is
> > add_rtes_to_flat_rtable(), which would require either passing the
> > PlannerInfo into the function, or at least its parse's commandType.
> >
> > I don't really like either, but don't have any other ideas at the moment.
>
> But we would still need the same lock level upgrade logic on indexes
> for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
> relation I think.  #1 seems like a better solution to me.

I think I'd rather find some way to do it that didn't denormalise the
parse nodes like that.  It seems very strange to have a CmdType in the
Query struct, and then another set of them in RangeTblEntry. Besides
bloating the size of the RangeTblEntry struct a bit, it also could
lead to inconsistency bugs where the two CmdTypes differ, for some
reason.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Tue, 19 Feb 2019 at 12:13, David Rowley <[hidden email]> wrote:

>
> On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <[hidden email]> wrote:
> >
> > On Mon, Feb 11, 2019 at 5:32 AM David Rowley
> > <[hidden email]> wrote:
> > > 1. Adding a new field to RangeTblEntry to indicate the operation type
> > > that's being performed on the relation; or
> > > 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> > > indexes of RangeTblEntry items that belong to DELETEs and ignore these
> > > when setting resultRelids in finalize_lockmodes().
> > >
> > > For #2, the only place I can see to do this is
> > > add_rtes_to_flat_rtable(), which would require either passing the
> > > PlannerInfo into the function, or at least its parse's commandType.
> > >
> > > I don't really like either, but don't have any other ideas at the moment.
> >
> > But we would still need the same lock level upgrade logic on indexes
> > for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
> > relation I think.  #1 seems like a better solution to me.
>
> I think I'd rather find some way to do it that didn't denormalise the
> parse nodes like that.  It seems very strange to have a CmdType in the
> Query struct, and then another set of them in RangeTblEntry. Besides
> bloating the size of the RangeTblEntry struct a bit, it also could
> lead to inconsistency bugs where the two CmdTypes differ, for some
> reason.
I had another go at this patch and fixed the problem by just setting
the idxlockmode inside the planner just before the call to
expand_inherited_tables().  This allows the lockmode to be copied into
child RTEs.  Doing it later in planning is more of a problem since we
don't really store all the result relations in PlannerInfo, we only
store the one that was written in the query.  The others are stored in
the ModifyTable path and then later in the ModifyTable plan.  The only
other place I could see to do it (without adding an extra field in
PlannerInfo) was during createplan... which does not at all seem like
the right place, and is also too late for get_relation_info(), see
below.

The finalize_lockmodes() now does the same thing for idxlockmode as it
does for rellockmode, i.e, just set the strongest lock for each unique
rel oid.

However, during my work on this, I saw a few things that made me
wonder if all this is over-engineered and worth the trouble.  The
first thing I saw was that in get_relation_info() we obtain a
RowExclusiveLock if the relation is the result relation. This means we
obtain a RowExclusiveLock for DELETE targets too.  This is
inconsistent with what the executor tries to do and results in us
taking a weaker lock during execution than we do during planning.  It
also means we don't bump into the lock in the local lock table which
results in slower locking. That problem would be exaggerated with
partitioned tables with a large number of partitions or inheritance
parents with lots of children, however, likely that'll be drowned out
by all the other slow stuff around there.

Another thing that's on my mind is that in the patch in
nodeModifyTable.c we still do:

if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

i.e don't open the indexes for DELETEs.  I had ideas that maybe this
could be changed to check the idxlockmode and open the indexes if it's
above AccessSharedLock.  There didn't seem to be a very nice way to
fetch the RangeTblEntry from the ResultRelInfo though, so I didn't do
that, but leaving it as is does not really work well with the patch as
I really want the planner be the thing that decides what happens here.


My final thoughts were that I see a lot of work going on to implement
pluggable storage.  I ended up having an offlist conversation with
Thomas Munro about zheap and what its requirements were for locking
indexes during deletes. It seems modifying the index during delete is
not something that happens with the current version, but is planned
for future versions.  In any case, we can't really assume zheap is
going to be the only additional storage engine implementation that's
going to get hooked into this.

Current thoughts are, do we:

1. Allow the storage engine to communicate its locking needs via an
API function call and add code to check that in the
determine_index_lockmode function (new in the attached patch)
2. Just get rid of the "operation != CMD_DELETE &&" line in the above
code and just always lock indexes for DELETE targets in
RowExclusiveLock mode.

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes).  I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough.  We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

I'm adding Andres, Robert, Thomas, Amit and Haribabu due to the
involvement with pluggable storage and zheap.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

idxlockmode_and_lock_upgrade_fix_v3.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Amit Langote-2
(this is not a reply to your full proposal, just something I thought to
point out)

On 2019/03/13 10:38, David Rowley wrote:
> i.e don't open the indexes for DELETEs.  I had ideas that maybe this
> could be changed to check the idxlockmode and open the indexes if it's
> above AccessSharedLock.  There didn't seem to be a very nice way to
> fetch the RangeTblEntry from the ResultRelInfo though,

Did you miss ri_RangeTableIndex?  It's the range table index of the result
relation for which a given ResultRelInfo is created.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

David Rowley-3
On Wed, 13 Mar 2019 at 14:55, Amit Langote
<[hidden email]> wrote:
> Did you miss ri_RangeTableIndex?  It's the range table index of the result
> relation for which a given ResultRelInfo is created.

I did indeed. I'll hold off modifying the patch in favour of seeing
what other people think about what should be done here.

Thanks for pointing out ri_RangeTableIndex.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Inadequate executor locking of indexes

Amit Langote-2
In reply to this post by David Rowley-3
Hi David,

On 2019/03/13 10:38, David Rowley wrote:
> I had another go at this patch and fixed the problem by just setting
> the idxlockmode inside the planner just before the call to
> expand_inherited_tables().  This allows the lockmode to be copied into
> child RTEs.

I have one question about the relation between idxlockmode and
rellockmode?  From skimming the patch, it appears that they're almost
always set to the same value.  If so, why not use rellockmode for index
locking too?

> #2 would not address Tom's mention of there possibly being some way to
> have the index scan node initialise before the modify table node
> (currently we pass NoLock for indexes belonging to result rels in the
> index scan nodes).  I can't quite imagine at the moment how that's
> possible, but maybe my imagination is just not good enough.  We could
> fix that by passing RowExclusiveLock instead of NoLock. It just means
> we'd discover the lock already exists in the local lock table...
> unless of course there is a case where the index scan gets initialised
> before modify table is.

Tom gave an example upthread that looked like this:

explain (costs off) with x1 as materialized (select * from foo where a =
1), x2 as (update foo set a = a where a = 1 returning *) select * from x1,
x2 where x1.a = x2.a;
                     QUERY PLAN
────────────────────────────────────────────────────
 Hash Join
   Hash Cond: (x1.a = x2.a)
   CTE x1
     ->  Bitmap Heap Scan on foo
           Recheck Cond: (a = 1)
           ->  Bitmap Index Scan on foo_a_idx
                 Index Cond: (a = 1)
   CTE x2
     ->  Update on foo foo_1
           ->  Bitmap Heap Scan on foo foo_1
                 Recheck Cond: (a = 1)
                 ->  Bitmap Index Scan on foo_a_idx
                       Index Cond: (a = 1)
   ->  CTE Scan on x1
   ->  Hash
         ->  CTE Scan on x2
(16 rows)

When InitPlan() invokes ExecInitNode on the subplans of x1 and x2, it does
so in that order.  So, ExecInitBitmapIndexScan for x1 is called before
ExecInitModifyTable for x2.

But if finalize_lockmodes() in your patch set lockmodes correctly,
ExecInitBitmapIndexScan() called for x1 ought to lock the index with
RowExclusiveLock, no?

Thanks,
Amit


12