Wrong example in the bloom documentation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Wrong example in the bloom documentation

Daniel Westermann (DWE)
Hi,

as this does not get any attention on the docs-list, once again here.
Any thoughts on this?

Regards
Daniel





From: Daniel Westermann (DWE)
Sent: Sunday, September 27, 2020 17:58
To: Pg Docs <[hidden email]>
Subject: Wrong example in the bloom documentation
 
Hi,

I've briefly discussed this with Bruce some time ago in [1].
Replaying the example referenced in the documentation does not give a Bitmap Heap Scan on tbloom but a parallel seq scan with the default configuration:

-- tested on head
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                         QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbloom  (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0 loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
 Planning Time: 0.895 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms
 Execution Time: 2288.056 ms
(12 rows)


As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct:
postgres=# select version();
                                                 version                                                 
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
(1 row)
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2329
   Heap Blocks: exact=2288
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 282.059 ms
 Execution time: 2286.138 ms
(8 rows)

The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well:

postgres=# set max_parallel_workers_per_gather = 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbloom  (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0 loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
 Planning time: 0.580 ms
 Execution time: 1148.247 ms
(8 rows)

Starting with PostgreSQL 10 the example in the documentation is therefore wrong. Attached a proposal to fix this. The new example starts with 100x reduced rows (as suggested by Bruce in [1] and adds a note that the behavior changes as soon as parallel execution is cheaper than the index access.

Thoughts?

Regards
Daniel

[1] https://www.postgresql.org/message-id/flat/20191105231854.GA26542%40momjian.us#7859b106ce614dd9530941196dad5bbc

bloom-doc-fix-v01.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Bruce Momjian
On Thu, Oct  8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
> Hi,
>
> as this does not get any attention on the docs-list, once again here.
> Any thoughts on this?

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Daniel Westermann (DWE)
Hi Bruce,
 
>On Thu, Oct  8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
>> Hi,
>>
>> as this does not get any attention on the docs-list, once again here.
>> Any thoughts on this?

>I was hoping someone more experienced with this would comment, but
>seeing none, I will apply it in a day or two to all supported versions?
>Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

Regards
Daniel
Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Bruce Momjian
On Thu, Oct  8, 2020 at 06:12:47PM +0000, Daniel Westermann (DWE) wrote:

> Hi Bruce,
>  
> >On Thu, Oct  8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
> >> Hi,
> >>
> >> as this does not get any attention on the docs-list, once again here.
> >> Any thoughts on this?
>
> >I was hoping someone more experienced with this would comment, but
> >seeing none, I will apply it in a day or two to all supported versions?
> >Have you tested this output back to 9.5?
>
> I hoped that as well. No, I tested down to 9.6 because the change happened in
> 10.

OK, so the patch should be applied only to PG 10 and later?

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Tom Lane-2
In reply to this post by Daniel Westermann (DWE)
"Daniel Westermann (DWE)" <[hidden email]> writes:
>> I was hoping someone more experienced with this would comment, but
>> seeing none, I will apply it in a day or two to all supported versions?
>> Have you tested this output back to 9.5?

> I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Bruce Momjian
On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

> "Daniel Westermann (DWE)" <[hidden email]> writes:
> >> I was hoping someone more experienced with this would comment, but
> >> seeing none, I will apply it in a day or two to all supported versions?
> >> Have you tested this output back to 9.5?
>
> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
>
> The patch assumes that parallel query is enabled, which is not true by
> default before v10, so it should certainly not be applied before v10
> (at least not without significant revisions).

I think we should just go for simple application cases for this.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Daniel Westermann (DWE)
Hi Bruce, Tom,

On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

>> "Daniel Westermann (DWE)" <[hidden email]> writes:
>> >> I was hoping someone more experienced with this would comment, but
>> >> seeing none, I will apply it in a day or two to all supported versions?
>> >> Have you tested this output back to 9.5?
>>
>> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
>>
>> The patch assumes that parallel query is enabled, which is not true by
>> default before v10, so it should certainly not be applied before v10
>> (at least not without significant revisions).

Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.

Regards
Daniel



Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Bruce Momjian
On Fri, Oct  9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:

> Hi Bruce, Tom,
>
> On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> >> "Daniel Westermann (DWE)" <[hidden email]> writes:
> >> >> I was hoping someone more experienced with this would comment, but
> >> >> seeing none, I will apply it in a day or two to all supported versions?
> >> >> Have you tested this output back to 9.5?
> >>
> >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> >>
> >> The patch assumes that parallel query is enabled, which is not true by
> >> default before v10, so it should certainly not be applied before v10
> >> (at least not without significant revisions).
>
> Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.

Agreed.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Bruce Momjian
On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:

> On Fri, Oct  9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:
> > Hi Bruce, Tom,
> >
> > On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> > >> "Daniel Westermann (DWE)" <[hidden email]> writes:
> > >> >> I was hoping someone more experienced with this would comment, but
> > >> >> seeing none, I will apply it in a day or two to all supported versions?
> > >> >> Have you tested this output back to 9.5?
> > >>
> > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> > >>
> > >> The patch assumes that parallel query is enabled, which is not true by
> > >> default before v10, so it should certainly not be applied before v10
> > >> (at least not without significant revisions).
> >
> > Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.
>
> Agreed.

This is not applying to PG 12 or earlier because the patch mentions JIT,
which was only mentioned in the PG bloom docs in PG 13+.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Wrong example in the bloom documentation

Daniel Westermann (DWE)
On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:

> On Fri, Oct  9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:
> > Hi Bruce, Tom,
> >
> > On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> > >> "Daniel Westermann (DWE)" <[hidden email]> writes:
> > >> >> I was hoping someone more experienced with this would comment, but
> > >> >> seeing none, I will apply it in a day or two to all supported versions?
> > >> >> Have you tested this output back to 9.5?
> > >>
> > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> > >>
> > >> The patch assumes that parallel query is enabled, which is not true by
> > >> default before v10, so it should certainly not be applied before v10
> > >> (at least not without significant revisions).
> >>
> >> Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.
>>
>> Agreed.

>This is not applying to PG 12 or earlier because the patch mentions JIT,
>which was only mentioned in the PG bloom docs in PG 13+.

Does that mean we need separate patches for each release starting with 10?
As I am not frequently writing patches, I would need some help here.

Regards
Daniel