Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

Tomas Vondra-4
Hi,

It seems that Q19 from TPC-H is consistently failing with segfaults due
to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).

I'm not very familiar with how the dsa is initialized and passed around,
but I only see the failures when the bitmap is constructed by a mix of
BitmapAnd and BitmapOr operations.

Another interesting observation is that setting force_parallel_mode=on
may not be enough - there really need to be multiple parallel workers,
which is why the simple query does cpu_tuple_cost=1.

Attached is a bunch of files:

1) details for "full" query:

* query.sql
* plan.txt
* backtrace.txt

2) details for the "minimal" query triggering the issue:

* query-minimal.sql
* plan-minimal.txt
* backtrace-minimal.txt



regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

backtrace-simple.txt (4K) Download Attachment
plan-simple.txt (2K) Download Attachment
query-simple.txt (480 bytes) Download Attachment
query.sql (1K) Download Attachment
plan.txt (7K) Download Attachment
backtrace.txt (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

Dilip Kumar-2
On Thu, Oct 12, 2017 at 4:31 PM, Tomas Vondra
<[hidden email]> wrote:
> Hi,
>
> It seems that Q19 from TPC-H is consistently failing with segfaults due
> to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).
>
> I'm not very familiar with how the dsa is initialized and passed around,
> but I only see the failures when the bitmap is constructed by a mix of
> BitmapAnd and BitmapOr operations.
>
I think I have got the issue, bitmap_subplan_mark_shared is not
properly pushing the isshared flag to lower level bitmap index node,
and because of that tbm_create is passing NULL dsa while creating the
tidbitmap.  So this problem will come in very specific combination of
BitmapOr and BitmapAnd when BitmapAnd is the first subplan for the
BitmapOr.  If BitmapIndex is the first subplan under BitmapOr then
there is no problem because BitmapOr node will create the tbm by
itself and isshared is set for BitmapOr.

Attached patch fixing the issue for me.  I will thoroughly test this
patch with other scenario as well.  Thanks for reporting.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

parallel_bitmap_fix_v1.patch (868 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

Tomas Vondra-4


On 10/12/2017 02:40 PM, Dilip Kumar wrote:

> On Thu, Oct 12, 2017 at 4:31 PM, Tomas Vondra
> <[hidden email]> wrote:
>> Hi,
>>
>> It seems that Q19 from TPC-H is consistently failing with segfaults due
>> to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).
>>
>> I'm not very familiar with how the dsa is initialized and passed around,
>> but I only see the failures when the bitmap is constructed by a mix of
>> BitmapAnd and BitmapOr operations.
>>
> I think I have got the issue, bitmap_subplan_mark_shared is not
> properly pushing the isshared flag to lower level bitmap index node,
> and because of that tbm_create is passing NULL dsa while creating the
> tidbitmap.  So this problem will come in very specific combination of
> BitmapOr and BitmapAnd when BitmapAnd is the first subplan for the
> BitmapOr.  If BitmapIndex is the first subplan under BitmapOr then
> there is no problem because BitmapOr node will create the tbm by
> itself and isshared is set for BitmapOr.
>
> Attached patch fixing the issue for me.  I will thoroughly test this
> patch with other scenario as well.  Thanks for reporting.
>

Yep, this fixes the failures for me.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

Dilip Kumar-2
On Thu, Oct 12, 2017 at 6:37 PM, Tomas Vondra
<[hidden email]> wrote:

>
>
> On 10/12/2017 02:40 PM, Dilip Kumar wrote:
>> On Thu, Oct 12, 2017 at 4:31 PM, Tomas Vondra
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> It seems that Q19 from TPC-H is consistently failing with segfaults due
>>> to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).
>>>
>>> I'm not very familiar with how the dsa is initialized and passed around,
>>> but I only see the failures when the bitmap is constructed by a mix of
>>> BitmapAnd and BitmapOr operations.
>>>
>> I think I have got the issue, bitmap_subplan_mark_shared is not
>> properly pushing the isshared flag to lower level bitmap index node,
>> and because of that tbm_create is passing NULL dsa while creating the
>> tidbitmap.  So this problem will come in very specific combination of
>> BitmapOr and BitmapAnd when BitmapAnd is the first subplan for the
>> BitmapOr.  If BitmapIndex is the first subplan under BitmapOr then
>> there is no problem because BitmapOr node will create the tbm by
>> itself and isshared is set for BitmapOr.
>>
>> Attached patch fixing the issue for me.  I will thoroughly test this
>> patch with other scenario as well.  Thanks for reporting.
>>
>
> Yep, this fixes the failures for me.
>
Thanks for confirming.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

Robert Haas
On Thu, Oct 12, 2017 at 9:14 AM, Dilip Kumar <[hidden email]> wrote:
>> Yep, this fixes the failures for me.
>>
> Thanks for confirming.

Committed and back-patched to v10.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Previous Thread Next Thread