GSoC 2017: weekly progress reports (week 4) and patch for hash index

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

GSoC 2017: weekly progress reports (week 4) and patch for hash index

Shubham Barai

Project: Explicitly support predicate locks in index AMs besides b-tree

Hi,

During this week, I continued my work on predicate locking in the hash index and created a patch for it. As I have written in my proposal for the hash index, every scan operation acquires a predicate lock on the primary page of the bucket.
And hash indexes are used for equality comparison. So, this can still generate false positives when a scan operation and an insert operation are trying to access the same bucket but are looking for different tuples. Let's see that with an example.

setup:

create table hash_tbl(id int4, p integer);

create index hash_pointidx on hash_tbl using hash(p);

insert into hash_tbl (id, p)
select g, g*2 from generate_series(1, 10000000) g;

read operation:
select * from hash_tbl where p=78988658;

insert operation:
insert into hash_tbl values(99999999, 546789888); 

If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
Please correct me if this assumption about false positives is wrong.


In summary, I have done following things in this week.


1)  found appropriate places in the hash index AM to insert calls to existing functions (PredicateLockPage(),  CheckForSerializableConflictIn() ...etc)


2) created tests to check serialization failures and false positives


3) tested my patch on the current head

    

Regards,

Shubham





Sent with Mailtrack


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Predicate-Locking-in-hash-index_3.patch (39K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Thomas Munro-3
Hi Shubham,

On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <[hidden email]> wrote:
> If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
> Please correct me if this assumption about false positives is wrong.

I wonder if there is an opportunity to use computed hash values
directly in predicate lock tags, rather than doing it on the basis of
buckets.  Perhaps I'm missing something important about the way that
locks need to escalate that would prevent that from working.

> 3) tested my patch on the current head

This no longer applies, but it's in "Needs review" status in the
Commitfest.  Could you please post a rebased version?

--
Thomas Munro
http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Shubham Barai
Hi Thomas,

I have attached the rebased version of patch here.


Kind Regards,
Shubham

On 8 September 2017 at 06:37, Thomas Munro <[hidden email]> wrote:
Hi Shubham,

On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <[hidden email]> wrote:
> If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
> Please correct me if this assumption about false positives is wrong.

I wonder if there is an opportunity to use computed hash values
directly in predicate lock tags, rather than doing it on the basis of
buckets.  Perhaps I'm missing something important about the way that
locks need to escalate that would prevent that from working.

> 3) tested my patch on the current head

This no longer applies, but it's in "Needs review" status in the
Commitfest.  Could you please post a rebased version?

--
Thomas Munro
http://www.enterprisedb.com



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Predicate-locking-in-hash-index_4.patch (41K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Alexander Korotkov
In reply to this post by Thomas Munro-3
On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <[hidden email]> wrote:
Hi Shubham,

On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <[hidden email]> wrote:
> If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
> Please correct me if this assumption about false positives is wrong.

I wonder if there is an opportunity to use computed hash values
directly in predicate lock tags, rather than doing it on the basis of
buckets.  Perhaps I'm missing something important about the way that
locks need to escalate that would prevent that from working.

+1,
Very nice idea!  Locking hash values directly seems to be superior over locking hash index pages.
Shubham, do you have any comment on this?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Michael Paquier
In reply to this post by Shubham Barai
On Mon, Sep 25, 2017 at 10:34 PM, Shubham Barai
<[hidden email]> wrote:
> I have attached the rebased version of patch here.

The patch does not apply and there has been no reviews as well. In
consequence, I am moving it to next CF with "waiting on author" as
status. Please provide a rebased patch.
--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Stephen Frost
Greeting Shubham, all,

* Michael Paquier ([hidden email]) wrote:
> On Mon, Sep 25, 2017 at 10:34 PM, Shubham Barai
> <[hidden email]> wrote:
> > I have attached the rebased version of patch here.
>
> The patch does not apply and there has been no reviews as well. In
> consequence, I am moving it to next CF with "waiting on author" as
> status. Please provide a rebased patch.

Shubham, would you mind providing an updated patch which applies
cleanly, so we can change this to Needs Review and hopefully get someone
looking at it?  Also, it would be good to respond to Alexander's as to
if it would work or not (and perhaps updating the patch accordingly).
Otherwise, I'm afriad this patch may end up just getting bumped to the
next CF or ending up as 'returned with feedback'.  Would be great to get
this up to a point where it could be committed.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Shubham Barai


On 15 January 2018 at 08:03, Stephen Frost <[hidden email]> wrote:
Greeting Shubham, all,

* Michael Paquier ([hidden email]) wrote:
> On Mon, Sep 25, 2017 at 10:34 PM, Shubham Barai
> <[hidden email]> wrote:
> > I have attached the rebased version of patch here.
>
> The patch does not apply and there has been no reviews as well. In
> consequence, I am moving it to next CF with "waiting on author" as
> status. Please provide a rebased patch.

Shubham, would you mind providing an updated patch which applies
cleanly, so we can change this to Needs Review and hopefully get someone
looking at it?  Also, it would be good to respond to Alexander's as to
if it would work or not (and perhaps updating the patch accordingly).
Otherwise, I'm afriad this patch may end up just getting bumped to the
next CF or ending up as 'returned with feedback'.  Would be great to get
this up to a point where it could be committed.



Hi Stephen,

The new approach was suggested after completion of GSoC. So, I didn't get
enough time to implement this approach. Also, I was constantly updating my
patches for gist and gin index based on reviews.

Here, I am attaching the rebased version of the patch (which is based on an old approch:
acquiring a predicate lock on primary page of a bucket)

Kind Regards,
Shubham


Predicate-Locking-in-hash-index_v5.patch (39K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

akapila
In reply to this post by Alexander Korotkov
On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
<[hidden email]> wrote:

> On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <[hidden email]>
> wrote:
>>
>> Hi Shubham,
>>
>> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <[hidden email]>
>> wrote:
>> > If these two hash keys (78988658 and 546789888) mapped to the same
>> > bucket, this will result in false serialization failure.
>> > Please correct me if this assumption about false positives is wrong.
>>
>> I wonder if there is an opportunity to use computed hash values
>> directly in predicate lock tags, rather than doing it on the basis of
>> buckets.  Perhaps I'm missing something important about the way that
>> locks need to escalate that would prevent that from working.
>
>
> +1,
> Very nice idea!  Locking hash values directly seems to be superior over
> locking hash index pages.
> Shubham, do you have any comment on this?
>

As Shubham seems to be running out of time, I thought of helping him
by looking into the above-suggested idea.  I think one way to lock a
particular hash value is we can use TID of heap tuple associated with
each index entry (index entry for the hash index will be hash value).
However, do we really need it for implementing predicate locking for
hash indexes?  If we look at the "Index AM implementations" section of
README-SSI, it doesn't seem to be required.  Basically, if we look at
the strategy of predicate locks in btree [1], it seems to me locking
at page level for hash index seems to be a right direction as similar
to btree, the corresponding heap tuple read will be locked.

What do you think?

[1] -
    * B-tree index searches acquire predicate locks only on the index
*leaf* pages needed to lock the appropriate index range. If, however,
a search discovers that no root page has yet been created, a predicate
lock on the index relation is required.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Alexander Korotkov
On Sat, Jan 20, 2018 at 4:24 PM, Amit Kapila <[hidden email]> wrote:
On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
<[hidden email]> wrote:
> On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <[hidden email]>
> wrote:
>>
>> Hi Shubham,
>>
>> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <[hidden email]>
>> wrote:
>> > If these two hash keys (78988658 and 546789888) mapped to the same
>> > bucket, this will result in false serialization failure.
>> > Please correct me if this assumption about false positives is wrong.
>>
>> I wonder if there is an opportunity to use computed hash values
>> directly in predicate lock tags, rather than doing it on the basis of
>> buckets.  Perhaps I'm missing something important about the way that
>> locks need to escalate that would prevent that from working.
>
>
> +1,
> Very nice idea!  Locking hash values directly seems to be superior over
> locking hash index pages.
> Shubham, do you have any comment on this?
>

As Shubham seems to be running out of time, I thought of helping him
by looking into the above-suggested idea.  I think one way to lock a
particular hash value is we can use TID of heap tuple associated with
each index entry (index entry for the hash index will be hash value).

Sorry, I didn't get what do you particularly mean.  If locking either TID of
associated heap tuple or TID of hash index tuple, then what will we lock
in the case when nothing found?  Even if we found nothing, we have
to place some lock according to search key in order to detect cases when
somebody has inserted the row which we might see according to that search key.
 
However, do we really need it for implementing predicate locking for
hash indexes?  If we look at the "Index AM implementations" section of
README-SSI, it doesn't seem to be required.  Basically, if we look at
the strategy of predicate locks in btree [1], it seems to me locking
at page level for hash index seems to be a right direction as similar
to btree, the corresponding heap tuple read will be locked.

Btree uses leaf-pages locking because it supports both range searches
and exact value searches.  And it needs to detect overlaps between
these two kinds of searches.  Therefore, btree locks leaf-pages in both
cases.  Hash index case is different.  Hash index doesn't support and
isn't going to support range searches.  Assuming, that hash index
supports only exact value searches, locking hash values would be
superior over page locking (unless I'm missing something), because
it would provide better locality of locks.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

akapila
On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov
<[hidden email]> wrote:

> On Sat, Jan 20, 2018 at 4:24 PM, Amit Kapila <[hidden email]>
> wrote:
>>
>> On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
>> <[hidden email]> wrote:
>> > +1,
>> > Very nice idea!  Locking hash values directly seems to be superior over
>> > locking hash index pages.
>> > Shubham, do you have any comment on this?
>> >
>>
>> As Shubham seems to be running out of time, I thought of helping him
>> by looking into the above-suggested idea.  I think one way to lock a
>> particular hash value is we can use TID of heap tuple associated with
>> each index entry (index entry for the hash index will be hash value).
>
>
> Sorry, I didn't get what do you particularly mean.  If locking either TID of
> associated heap tuple or TID of hash index tuple, then what will we lock
> in the case when nothing found?  Even if we found nothing, we have
> to place some lock according to search key in order to detect cases when
> somebody has inserted the row which we might see according to that search
> key.
>

Okay, but if you use hash value as lock tag (which is possible) how
will we deal with things like page split?  I think even if use
blocknumber/page/bucketnumber corresponding to the hash value along
with hash value in lock tag, then also it doesn't appear to work.  I
think using page level locks for index makes sense, especially because
it will be convinient to deal with page splits.  Also, as predicate
locks stay in-memory, so creating too many such locks doesn't sound
like a nice strategy even though we have a way to upgrade it to next
level (page) as that has a separate cost to it.

>>
>> However, do we really need it for implementing predicate locking for
>> hash indexes?  If we look at the "Index AM implementations" section of
>> README-SSI, it doesn't seem to be required.  Basically, if we look at
>> the strategy of predicate locks in btree [1], it seems to me locking
>> at page level for hash index seems to be a right direction as similar
>> to btree, the corresponding heap tuple read will be locked.
>
>
> Btree uses leaf-pages locking because it supports both range searches
> and exact value searches.  And it needs to detect overlaps between
> these two kinds of searches.  Therefore, btree locks leaf-pages in both
> cases.
>

Also, probably using page level locks makes it easier to deal index
operations like page split.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

akapila
On Sun, Jan 28, 2018 at 7:28 AM, Amit Kapila <[hidden email]> wrote:

> On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov
> <[hidden email]> wrote:
>>> As Shubham seems to be running out of time, I thought of helping him
>>> by looking into the above-suggested idea.  I think one way to lock a
>>> particular hash value is we can use TID of heap tuple associated with
>>> each index entry (index entry for the hash index will be hash value).
>>
>>
>> Sorry, I didn't get what do you particularly mean.  If locking either TID of
>> associated heap tuple or TID of hash index tuple, then what will we lock
>> in the case when nothing found?  Even if we found nothing, we have
>> to place some lock according to search key in order to detect cases when
>> somebody has inserted the row which we might see according to that search
>> key.
>>
>
> Okay, but if you use hash value as lock tag (which is possible) how
> will we deal with things like page split?  I think even if use
> blocknumber/page/bucketnumber corresponding to the hash value along
> with hash value in lock tag, then also it doesn't appear to work.  I
> think using page level locks for index makes sense, especially because
> it will be convinient to deal with page splits.
>

What I intend to say here is that we already have a mechanism like
PredicateLockPageSplit() which can deal with predicate locks during
page split if we operate at page level.  However, if we want to go for
has value locking technique, it can be quite complex and expensive to
make it work as we have to search all the locks that belong to the
bucket being split and then move them for the new bucket.

Alexander/Thomas, do you have better ideas to make it work, otherwise,
I think we can proceed to review the Shubham's current approach/patch?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Thomas Munro-3
On Tue, Feb 13, 2018 at 4:28 PM, Amit Kapila <[hidden email]> wrote:

> On Sun, Jan 28, 2018 at 7:28 AM, Amit Kapila <[hidden email]> wrote:
>> On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov
>> <[hidden email]> wrote:
>>>> As Shubham seems to be running out of time, I thought of helping him
>>>> by looking into the above-suggested idea.  I think one way to lock a
>>>> particular hash value is we can use TID of heap tuple associated with
>>>> each index entry (index entry for the hash index will be hash value).
>>>
>>>
>>> Sorry, I didn't get what do you particularly mean.  If locking either TID of
>>> associated heap tuple or TID of hash index tuple, then what will we lock
>>> in the case when nothing found?  Even if we found nothing, we have
>>> to place some lock according to search key in order to detect cases when
>>> somebody has inserted the row which we might see according to that search
>>> key.
>>>
>>
>> Okay, but if you use hash value as lock tag (which is possible) how
>> will we deal with things like page split?  I think even if use
>> blocknumber/page/bucketnumber corresponding to the hash value along
>> with hash value in lock tag, then also it doesn't appear to work.  I
>> think using page level locks for index makes sense, especially because
>> it will be convinient to deal with page splits.
>>
>
> What I intend to say here is that we already have a mechanism like
> PredicateLockPageSplit() which can deal with predicate locks during
> page split if we operate at page level.  However, if we want to go for
> has value locking technique, it can be quite complex and expensive to
> make it work as we have to search all the locks that belong to the
> bucket being split and then move them for the new bucket.

True.

One way to avoid all that might be to use pseudo page numbers that
don't suffer from splits.   I don't know how you'd choose the
constant, but it could be something like pseudo page number = hash
value % 1024.  In other words, you'd use the full hash value for the
'tuple' part of the predicate lock tag, and a shorter hash value for
the 'page' part of the predicate lock tag, so you'd never have to
handle split, and promotion from fine grained 'tuple' (= hash value)
locks to coarse grained 'page' = (short hash value) locks would still
work automatically when space runs out.

> Alexander/Thomas, do you have better ideas to make it work, otherwise,
> I think we can proceed to review the Shubham's current approach/patch?

I think we should proceed to review the current patch.  As far as I
can see, adding more fine-grained locking would remain a possibility
for future improvement.  With this patch we get page-level hash index
SIREAD locks, and perhaps in a future patch we could add fine grained
hash value SIREAD locks (maybe as described above, if that works...)

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Thomas Munro-3
On Tue, Feb 13, 2018 at 7:47 PM, Thomas Munro
<[hidden email]> wrote:
> One way to avoid all that might be to use pseudo page numbers that
> don't suffer from splits.   I don't know how you'd choose the
> constant, but it could be something like pseudo page number = hash
> value % 1024.  In other words, you'd use the full hash value for the
> 'tuple' part of the predicate lock tag, and a shorter hash value for
> the 'page' part of the predicate lock tag, so you'd never have to
> handle split, and promotion from fine grained 'tuple' (= hash value)
> locks to coarse grained 'page' = (short hash value) locks would still
> work automatically when space runs out.

Thinking about how to tune that got me thinking about a simple middle
way we could perhaps consider...

What if we just always locked pseudo page numbers using hash_value %
max_predicate_locks_per_relation (= effectively 31 by default)?  Then
you'd have lower collision rates on small hash indexes, you'd never
have to deal with page splits, and you'd never exceed
max_predicate_locks_per_relation so you'd never escalate to relation
level locks on busy systems.  On the downside, you'd have eg ~3%
chance of collision instead of a 1/hash_maxbucket chance of collision,
so it gets a bit worse for large indexes on systems that are not busy
enough to exceed max_predicate_locks_per_relation.  You win some, you
lose some...

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Robert Haas
On Mon, Feb 26, 2018 at 7:51 PM, Thomas Munro
<[hidden email]> wrote:

> Thinking about how to tune that got me thinking about a simple middle
> way we could perhaps consider...
>
> What if we just always locked pseudo page numbers using hash_value %
> max_predicate_locks_per_relation (= effectively 31 by default)?  Then
> you'd have lower collision rates on small hash indexes, you'd never
> have to deal with page splits, and you'd never exceed
> max_predicate_locks_per_relation so you'd never escalate to relation
> level locks on busy systems.  On the downside, you'd have eg ~3%
> chance of collision instead of a 1/hash_maxbucket chance of collision,
> so it gets a bit worse for large indexes on systems that are not busy
> enough to exceed max_predicate_locks_per_relation.  You win some, you
> lose some...

Hmm, yeah, that definitely has some appeal.  On the other hand,
there's a lot of daylight between locking hv % 2^32 and locking hv %
31; the former is going to potentially blow out the lock table really
fast, while the latter is potentially going to create an uncomfortable
number of false collisions.  One could imagine a modulus larger than
31 and smaller than 4294967296, although I don't have a principled
suggestion for how to pick it.  On the third hand, people who are
using SSI heavily may well have increased
max_predicate_locks_per_relation and with this proposal that just
kinda does what you want.

I don't really know how we can judge the merits of any particular
modulus (or of committing the patch at all) without some test results
showing that it helps reduce rollbacks or increase performance or
something.  Otherwise we're just guessing.  It does however seem to me
that locking the hash value % (something) is better than basing the
locking on bucket or page numbers.  Basing it on page numbers strictly
speaking cannot work, since the same tuple could be present in any
page in the bucket chain; you'd have to lock the page number of the
head of the bucket chain.  There is however no advantage of doing that
over locking the bucket number directly.  Moreover, locking the bucket
number directly forces you to worry about splits, whereas if you log
hv % (something) then you don't have to care.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Andres Freund
Hi,

Based on this sub-thread this patch's status of 'needs review' doesn't
quite seem accurate and 'waiting on author' and then 'returned with
feedback' would be more fitting?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Thomas Munro-3
On Fri, Mar 2, 2018 at 3:57 PM, Andres Freund <[hidden email]> wrote:
> Based on this sub-thread this patch's status of 'needs review' doesn't
> quite seem accurate and 'waiting on author' and then 'returned with
> feedback' would be more fitting?

I personally think this patch is really close to RFC.  Shubham has
fulfilled the project requirement, it's a tidy and short patch, it has
tests.  I think we really just need to verify that the split case
works correctly.

Hmm.  I notice that this calls PredicateLockPageSplit() after both
calls to _hash_splitbucket() (the one in _hash_finish_split() and the
one in _hash_expandtable()) instead of doing it inside that function,
and that _hash_splitbucket() unlocks bucket_nbuf before returning.
What if someone else accesses bucket_nbuf between
LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and
PredicateLockPageSplit()?  Doesn't that mean that another session can
read a newly created page and miss a predicate lock that is about to
be transferred to it?  If that is indeed a race, could it be fixed by
calling PredicateLockPageSplit() at the start of _hash_splitbucket()
instead?

Could we get a few days to mull over this and Shubham's other patches?
 It'd be really great to get some of these into 11.

My thought experiments about pseudo-pages and avoiding the split stuff
were not intended to get the patch kicked out.  I thought for a while
that hash indexes were a special case and could benefit from
dispensing with those trickier problems.  Upon further reflection, for
interesting size hash indexes pure hash value predicate tags wouldn't
be much better.  Furthermore, if we do decide we want to use using x %
max_predicate_locks_per_relation to avoid having to escalate to
relation predicate locks at the cost of slightly higher collision rate
then we should consider that for the whole system (including heap page
predicate locking), not just hash indexes.  Please consider those
ideas parked for now.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

akapila
On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
<[hidden email]> wrote:

> On Fri, Mar 2, 2018 at 3:57 PM, Andres Freund <[hidden email]> wrote:
>> Based on this sub-thread this patch's status of 'needs review' doesn't
>> quite seem accurate and 'waiting on author' and then 'returned with
>> feedback' would be more fitting?
>
> I personally think this patch is really close to RFC.  Shubham has
> fulfilled the project requirement, it's a tidy and short patch, it has
> tests.  I think we really just need to verify that the split case
> works correctly.
>
> Hmm.  I notice that this calls PredicateLockPageSplit() after both
> calls to _hash_splitbucket() (the one in _hash_finish_split() and the
> one in _hash_expandtable()) instead of doing it inside that function,
> and that _hash_splitbucket() unlocks bucket_nbuf before returning.
> What if someone else accesses bucket_nbuf between
> LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and
> PredicateLockPageSplit()?  Doesn't that mean that another session can
> read a newly created page and miss a predicate lock that is about to
> be transferred to it?
>

Yes.  I think you are primarily worried about if there is an insert on
new bucket from another session as scans will anyway take the
predicate lock, right?

>  If that is indeed a race, could it be fixed by
> calling PredicateLockPageSplit() at the start of _hash_splitbucket()
> instead?
>

Yes, but I think it would be better if we call this once we are sure
that at least one tuple from the old bucket has been transferred
(consider if all tuples in the old bucket are dead).  Apart from this,
I think this patch has missed handling the cases where we scan the
buckets when the split is in progress.  In such cases, we scan both
old and new bucket, so I think we need to ensure that we take
PredicateLock on both the buckets during such scans.

> Could we get a few days to mull over this and Shubham's other patches?
>

I would also like to see this patch going in v11.  So, I can try to
finish the remaining review comments, if Shubham is not able to spare
time and you can help with the review.  I am also okay to review if
anyone else other than me can handle the remaining points.

>  It'd be really great to get some of these into 11.
>

+1.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

Thomas Munro-3
On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <[hidden email]> wrote:

> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
> <[hidden email]> wrote:
>> Hmm.  I notice that this calls PredicateLockPageSplit() after both
>> calls to _hash_splitbucket() (the one in _hash_finish_split() and the
>> one in _hash_expandtable()) instead of doing it inside that function,
>> and that _hash_splitbucket() unlocks bucket_nbuf before returning.
>> What if someone else accesses bucket_nbuf between
>> LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and
>> PredicateLockPageSplit()?  Doesn't that mean that another session can
>> read a newly created page and miss a predicate lock that is about to
>> be transferred to it?
>
> Yes.  I think you are primarily worried about if there is an insert on
> new bucket from another session as scans will anyway take the
> predicate lock, right?

Yeah.

>>  If that is indeed a race, could it be fixed by
>> calling PredicateLockPageSplit() at the start of _hash_splitbucket()
>> instead?
>
> Yes, but I think it would be better if we call this once we are sure
> that at least one tuple from the old bucket has been transferred
> (consider if all tuples in the old bucket are dead).  Apart from this,
> I think this patch has missed handling the cases where we scan the
> buckets when the split is in progress.  In such cases, we scan both
> old and new bucket, so I think we need to ensure that we take
> PredicateLock on both the buckets during such scans.

Hmm.  Yeah.

So, in _hash_first(), do you think we might just need this?

      if (H_BUCKET_BEING_POPULATED(opaque))
      {
          ...
          old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
          ...
          old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
+         PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
scan->xs_snapshot);
          TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));

That is, if you begin scanning a 'new' bucket, we remember the old
bucket and go and scan that too, so we'd better predicate-lock both up
front (or I suppose we could do it later when we visit that page, but
here it can be done in a single place).

What if we begin scanning an 'old' bucket that is being split?  I
think we'll only do that for tuples that actually belong in the old
bucket after the split, so no need to double-lock?  And I don't think
a split could begin while we are scanning.  Do I have that right?

As for inserting, I'm not sure if any special treatment is needed, as
long as the scan code path (above) and the split code path are
correct.  I'm not sure though.

I'm wondering how to test all this.  I'm thinking about a program that
repeatedly creates a hash index and then slowly adds more things to it
so that buckets split (maybe using distinct keys carefully crafted to
hit the same bucket?), while concurrently hammering it with a ton of
scans and then ... somehow checking correctness...

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

akapila
On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro
<[hidden email]> wrote:

> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <[hidden email]> wrote:
>> Yes, but I think it would be better if we call this once we are sure
>> that at least one tuple from the old bucket has been transferred
>> (consider if all tuples in the old bucket are dead).  Apart from this,
>> I think this patch has missed handling the cases where we scan the
>> buckets when the split is in progress.  In such cases, we scan both
>> old and new bucket, so I think we need to ensure that we take
>> PredicateLock on both the buckets during such scans.
>
> Hmm.  Yeah.
>
> So, in _hash_first(), do you think we might just need this?
>
>       if (H_BUCKET_BEING_POPULATED(opaque))
>       {
>           ...
>           old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
>           ...
>           old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
> +         PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
> scan->xs_snapshot);
>           TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));
>
> That is, if you begin scanning a 'new' bucket, we remember the old
> bucket and go and scan that too, so we'd better predicate-lock both up
> front (or I suppose we could do it later when we visit that page, but
> here it can be done in a single place).
>

Yeah, that can work, but I am slightly worried that we might actually
never scan the old bucket (say for queries with Limit clause) in which
case it might give false positives for insertions in old buckets.

> What if we begin scanning an 'old' bucket that is being split?  I
> think we'll only do that for tuples that actually belong in the old
> bucket after the split, so no need to double-lock?  And I don't think
> a split could begin while we are scanning.  Do I have that right?
>

Right.

> As for inserting, I'm not sure if any special treatment is needed, as
> long as the scan code path (above) and the split code path are
> correct.  I'm not sure though.
>

I also don't think we need any special handling for insertions.

> I'm wondering how to test all this.  I'm thinking about a program that
> repeatedly creates a hash index and then slowly adds more things to it
> so that buckets split (maybe using distinct keys carefully crafted to
> hit the same bucket?), while concurrently hammering it with a ton of
> scans and then ... somehow checking correctness...
>

Yeah, that will generate the required errors, but not sure how to
verify correctness.  One idea could be that when the split is in
progress, we somehow stop it in-between (say by cancel request) and
then run targeted selects and inserts on the bucket being scanned and
bucket being populated.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index

akapila
On Tue, Mar 6, 2018 at 11:26 AM, Amit Kapila <[hidden email]> wrote:

> On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro
> <[hidden email]> wrote:
>> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <[hidden email]> wrote:
>>> Yes, but I think it would be better if we call this once we are sure
>>> that at least one tuple from the old bucket has been transferred
>>> (consider if all tuples in the old bucket are dead).  Apart from this,
>>> I think this patch has missed handling the cases where we scan the
>>> buckets when the split is in progress.  In such cases, we scan both
>>> old and new bucket, so I think we need to ensure that we take
>>> PredicateLock on both the buckets during such scans.
>>
>> Hmm.  Yeah.
>>
>> So, in _hash_first(), do you think we might just need this?
>>
>>       if (H_BUCKET_BEING_POPULATED(opaque))
>>       {
>>           ...
>>           old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
>>           ...
>>           old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
>> +         PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
>> scan->xs_snapshot);
>>           TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));
>>
>> That is, if you begin scanning a 'new' bucket, we remember the old
>> bucket and go and scan that too, so we'd better predicate-lock both up
>> front (or I suppose we could do it later when we visit that page, but
>> here it can be done in a single place).
>>
>
> Yeah, that can work, but I am slightly worried that we might actually
> never scan the old bucket (say for queries with Limit clause) in which
> case it might give false positives for insertions in old buckets.
>
I have changed the patch to address this point by acquiring predicate
lock in _hash_readnext where it will acquire the lock only when it
tries to scan the old bucket. I have also addressed another problem
related to transfer of predicate locks during split such that it will
transfer locks only when there is any tuple transferred from old to
the new bucket.

>
>> I'm wondering how to test all this.  I'm thinking about a program that
>> repeatedly creates a hash index and then slowly adds more things to it
>> so that buckets split (maybe using distinct keys carefully crafted to
>> hit the same bucket?), while concurrently hammering it with a ton of
>> scans and then ... somehow checking correctness...
>>
>
> Yeah, that will generate the required errors, but not sure how to
> verify correctness.  One idea could be that when the split is in
> progress, we somehow stop it in-between (say by cancel request) and
> then run targeted selects and inserts on the bucket being scanned and
> bucket being populated.
>
I have verified the bucket split scenario manually as below:

Setup
------------
create table hash_tbl(id int4, p integer);
insert into hash_tbl (id, p) select g, 10 from generate_series(1, 10) g;
Analyze hash_tbl;
create index hash_idx on hash_tbl using hash(p);

Session-1
----------------
begin isolation level serializable;
set enable_seqscan=off;
set enable_bitmapscan=off;
set enable_indexonlyscan=on;
select sum(p) from hash_tbl where p=10;
 sum
-----
 100
(1 row)

insert into hash_tbl (id, p) select g, 10 from generate_series(10, 1000) g;
 -- Via debugger, stop in _hash_splitbucket at line 1283 {..
LockBuffer(bucket_obuf, BUFFER_LOCK_EXCLUSIVE); ...}

By this time split of bucket 1 is done but the split flag is not
cleared.  So, predicate lock from bucket-1 have been transferred to
bucket-3 (new bucket).

Session-2
----------------
begin isolation level serializable;
set enable_seqscan=off;
set enable_bitmapscan=off;
set enable_indexonlyscan=on;
select sum(p) from hash_tbl where p=10;
 sum
-----
 100
(1 row)

Session-1
--------------
Commit;

Session-2
----------
postgres=# insert into hash_tbl (id, p) select g, 10 from
generate_series(51, 60) g;
ERROR:  could not serialize access due to read/write dependencies
among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.

It got conflict while inserting in the new bucket (bucket number -3).

I think this patch now addresses all the open issues. Let me know what
do you think about it?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Predicate-Locking-in-hash-index_v6.patch (38K) Download Attachment
12
Previous Thread Next Thread