Bitmap scan is undercosted?

Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Bitmap scan is undercosted?

Vitaliy Garnashevich
Hi,

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

I've found some cases where similar issues with bitmap scans were
reported before:

https://www.postgresql.org/message-id/flat/1456154321.976561.528310154.6A623C0E%40webmail.messagingengine.com

https://www.postgresql.org/message-id/flat/CA%2BwPC0MRMhF_8fD9dc8%2BQWZQzUvHahPRSv%3DxMtCmsVLSsy-p0w%40mail.gmail.com

I've made a synthetic test, which kind of reproduces the issue:

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";9999985;44248;226
"i1";9999985;27422;365
"i2";9999985;27422;365

I've been running the same query while enabling and disabling different
kinds of scans:

1) set enable_bitmapscan = on;  set enable_indexscan = off; set
enable_seqscan = off;
2) set enable_bitmapscan = off; set enable_indexscan = on;  set
enable_seqscan = off;
3) set enable_bitmapscan = off; set enable_indexscan = off; set
enable_seqscan = on;

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Here are the results for PostgreSQL 9.6 (for 9.3 and 10.1 the results
are very similar):

1) Aggregate  (cost=24821.70..24821.71 rows=1 width=8) (actual
time=184.591..184.591 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=47259
   ->  Bitmap Heap Scan on public.aaa  (cost=13038.21..24796.22
rows=10189 width=0) (actual time=122.492..178.006 rows=100000 loops=1)
         Output: num, flag
         Recheck Cond: (aaa.num = 1)
         Filter: aaa.flag
         Heap Blocks: exact=44248
         Buffers: shared hit=47259
         ->  BitmapAnd  (cost=13038.21..13038.21 rows=10189 width=0)
(actual time=110.699..110.699 rows=0 loops=1)
               Buffers: shared hit=3011
               ->  Bitmap Index Scan on i1  (cost=0.00..1158.94
rows=99667 width=0) (actual time=19.600..19.600 rows=100000 loops=1)
                     Index Cond: (aaa.num = 1)
                     Buffers: shared hit=276
               ->  Bitmap Index Scan on i2  (cost=0.00..11873.92
rows=1022332 width=0) (actual time=81.676..81.676 rows=1000000 loops=1)
                     Index Cond: (aaa.flag = true)
                     Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 184.988 ms

2) Aggregate  (cost=67939.09..67939.10 rows=1 width=8) (actual
time=67.510..67.510 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=44524
   ->  Index Scan using i1 on public.aaa  (cost=0.44..67910.95
rows=11256 width=0) (actual time=0.020..61.180 rows=100000 loops=1)
         Output: num, flag
         Index Cond: (aaa.num = 1)
         Filter: aaa.flag
         Buffers: shared hit=44524
Planning time: 0.096 ms
Execution time: 67.543 ms

3) Aggregate  (cost=169276.49..169276.50 rows=1 width=8) (actual
time=977.063..977.063 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=44248
   ->  Seq Scan on public.aaa  (cost=0.00..169248.35 rows=11256 width=0)
(actual time=0.018..969.294 rows=100000 loops=1)
         Output: num, flag
         Filter: (aaa.flag AND (aaa.num = 1))
         Rows Removed by Filter: 9900000
         Buffers: shared hit=44248
Planning time: 0.099 ms
Execution time: 977.094 ms


The bitmap scan version runs more than twice slower than the one with
index scan, while being costed at more than twice cheaper.

I've tried to increase cpu_tuple_cost and cpu_index_tuple_cost, and this
behavior remains after 6x increase in values. Although the difference in
costs becomes much less. After increasing the settings more than 6x,
PostgreSQL decides to use a different plan for bitmap scans, so it's
hard to make conclusions at that point.

Could such cases be fixed with tuning of cost settings, or that's just
how PostgreSQL estimates bitmap scans and this can't be fixed without
modifying the optimizer? Or am I missing something and that's the
expected behavior? Thoughts?

Regards,
Vitaliy


Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Justin Pryzby
On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> We recently had an issue in production, where a bitmap scan was chosen
> instead of an index scan. Despite being 30x slower, the bitmap scan had
> about the same cost as the index scan.

Me too, see also:
https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA@...

> drop table if exists aaa;
> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
> generate_series(1, 10000000) id;
> create index i1 on aaa  (num);
> create index i2 on aaa  (flag);
> analyze aaa;
>
> select relname, reltuples::bigint, relpages::bigint,
> (reltuples/relpages)::bigint tpp from pg_class where relname
> in('aaa','i1','i2') order by relname;
> "aaa";9999985;44248;226
> "i1";9999985;27422;365
> "i2";9999985;27422;365
>
> The query was:
> explain (analyze,verbose,costs,buffers)
> select count(*) from aaa where num = 1 and flag = true;

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x.  It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read.  num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table.  But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page).  Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too.  You could confirm that by connecting strace to the
backend before starting the query.

Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case).  That's an elegant
test case, so thanks.

I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows.  I see the table+index is 559MB.

I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan.  Note that
some things (like max()) that can use an btree index cannot use brin.  PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed).

Justin

[0] I'm borrowing Jeff's language from here:
https://www.postgresql.org/message-id/CAMkU%3D1xwGn%2BO0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q%40mail.gmail.com
"density" wasn't our problem, but it's a perfect description of this issue.

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Vitaliy Garnashevich
On 01/12/2017 20:34, Justin Pryzby wrote:

> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
>> We recently had an issue in production, where a bitmap scan was chosen
>> instead of an index scan. Despite being 30x slower, the bitmap scan had
>> about the same cost as the index scan.
> Me too, see also:
> https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA@...
>
>> drop table if exists aaa;
>> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
>> generate_series(1, 10000000) id;
>> create index i1 on aaa  (num);
>> create index i2 on aaa  (flag);
>> analyze aaa;
>>
>> select relname, reltuples::bigint, relpages::bigint,
>> (reltuples/relpages)::bigint tpp from pg_class where relname
>> in('aaa','i1','i2') order by relname;
>> "aaa";9999985;44248;226
>> "i1";9999985;27422;365
>> "i2";9999985;27422;365
>>
>> The query was:
>> explain (analyze,verbose,costs,buffers)
>> select count(*) from aaa where num = 1 and flag = true;
> Note that id%100==1 implies flag='t', so the planner anticipates retrieving
> fewer rows than it will ultimately read, probably by 2x.  It makes sense that
> causes the index scan to be more expensive than expected, but that's only
> somewhat important, since there's no joins involved.
I don't think the planner is that smart to account for correlation
between values in different columns. When different values are used in
filter (num=2, num=39, num=74), the query actually runs faster, while
still being about twice slower than using an index scan. But the cost
does not change much. It jumps up and down for different values, but
it's still close to the initial value.

Aggregate  (cost=24239.02..24239.03 rows=1 width=8) (actual
time=105.239..105.239 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=3011
   ->  Bitmap Heap Scan on public.aaa  (cost=12812.05..24214.48
rows=9816 width=0) (actual time=105.236..105.236 rows=0 loops=1)
         Output: num, flag
         Recheck Cond: (aaa.num = 39)
         Filter: aaa.flag
         Buffers: shared hit=3011
         ->  BitmapAnd  (cost=12812.05..12812.05 rows=9816 width=0)
(actual time=105.157..105.157 rows=0 loops=1)
               Buffers: shared hit=3011
               ->  Bitmap Index Scan on i1  (cost=0.00..1134.94
rows=97667 width=0) (actual time=15.725..15.725 rows=100000 loops=1)
                     Index Cond: (aaa.num = 39)
                     Buffers: shared hit=276
               ->  Bitmap Index Scan on i2  (cost=0.00..11671.96
rows=1005003 width=0) (actual time=77.920..77.920 rows=1000000 loops=1)
                     Index Cond: (aaa.flag = true)
                     Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 105.553 ms

Aggregate  (cost=65785.99..65786.00 rows=1 width=8) (actual
time=48.587..48.587 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=44524
   ->  Index Scan using i1 on public.aaa  (cost=0.44..65761.45 rows=9816
width=0) (actual time=48.583..48.583 rows=0 loops=1)
         Output: num, flag
         Index Cond: (aaa.num = 39)
         Filter: aaa.flag
         Rows Removed by Filter: 100000
         Buffers: shared hit=44524
Planning time: 0.097 ms
Execution time: 48.620 ms

>
> The reason why it's more than a bit slower is due to the "density" [0] of the
> heap pages read.  num=1 is more selective than flag=true, so it scans i1,
> reading 1% of the whole table.  But it's not reading the first 1% or
> some other 1% of the table, it reads tuples evenly distributed across the
> entire table (226*0.01 = ~2 rows of each page).  Since the index was created
> after the INSERT, the repeated keys (logical value: id%100) are read in
> physical order on the heap, so this is basically doing a seq scan, but with the
> additional overhead of reading the index, and maybe doing an fseek() before
> each/some read()s, too.  You could confirm that by connecting strace to the
> backend before starting the query.
>
> Since you did that using % and with indices added after the INSERT, you can't
> improve it by reindexing (as I was able to for my case).  That's an elegant
> test case, so thanks.
>
> I think shared_buffers=512MB is just small enough for this test to be painful
> for 1e7 rows.  I see the table+index is 559MB.
            table           | ~count  |    size    |   toast |  idx   |
size + toast + idx
---------------------------+---------+------------+------------+--------+--------------------
  aaa                       | 9999994 | 346 MB     | 0 bytes    | 428 MB
| 774 MB

But the plan says all buffers are "shared hit", and none "read", so
that's probably not an issue.

>
> I don't know if that's really similar to your production use case, but I would
> recommend trying BRIN indices, which always require a bitmap scan.  Note that
> some things (like max()) that can use an btree index cannot use brin.  PG10.1
> has WITH autosummarize, which was important for our use, since we rarely do
> UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed).
Yes, BRIN indexes should be beneficial for our case, because there is a
date column which grows over time (not strictly regularly, but still).
Unfortunately, we're still migrating our databases from 9.3 to 9.6.
Anyway, thanks for the advice.
>
> Justin
>
> [0] I'm borrowing Jeff's language from here:
> https://www.postgresql.org/message-id/CAMkU%3D1xwGn%2BO0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q%40mail.gmail.com
> "density" wasn't our problem, but it's a perfect description of this issue.
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Justin Pryzby
In reply to this post by Justin Pryzby
I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> > We recently had an issue in production, where a bitmap scan was chosen
> > instead of an index scan. Despite being 30x slower, the bitmap scan had
> > about the same cost as the index scan.
>
> > drop table if exists aaa;
> > create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
> > generate_series(1, 10000000) id;
> > create index i1 on aaa  (num);
> > create index i2 on aaa  (flag);
> > analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.

> Note that id%100==1 implies flag='t', so the planner anticipates retrieving
> fewer rows than it will ultimately read, probably by 2x.  It makes sense that
> causes the index scan to be more expensive than expected, but that's only
> somewhat important, since there's no joins involved.

I changed the query from COUNT(*) TO * for easier to read explain:

CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX i1 ON aaa(num);
CREATE INDEX i2 ON aaa (flag);
ANALYZE VERBOSE aaa;
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
 Bitmap Heap Scan on public.aaa  (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107 rows=100000 loops=1)
   ->  BitmapAnd  (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1)
         ->  Bitmap Index Scan on i1  (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943 rows=100000 loops=1)
         ->  Bitmap Index Scan on i2  (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804 rows=1000000 loops=1)

..which is what's wanted with no planner hints (PG10.1 here).

Same on PG95:
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
 Bitmap Heap Scan on public.aaa  (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583 rows=100000 loops=1)
   ->  BitmapAnd  (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1)
         ->  Bitmap Index Scan on i1  (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911 rows=100000 loops=1)
         ->  Bitmap Index Scan on i2  (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237 rows=1000000 loops=1)

The rowcount is off, but not a critical issue without a join.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Vitaliy Garnashevich
On 02/12/2017 01:11, Justin Pryzby wrote:

> I tried to reproduce this issue and couldn't, under PG95 and 10.1:
>
> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
>> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
>>> We recently had an issue in production, where a bitmap scan was chosen
>>> instead of an index scan. Despite being 30x slower, the bitmap scan had
>>> about the same cost as the index scan.
>>> drop table if exists aaa;
>>> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
>>> generate_series(1, 10000000) id;
>>> create index i1 on aaa  (num);
>>> create index i2 on aaa  (flag);
>>> analyze aaa;
> What is:
> effective_io_concurrency
> max_parallel_workers_per_gather (I gather you don't have this)
effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For this test I'm using SSD and Windows (if that matters). On production
we also use SSD, hence lower random_page_cost. But with the default
random_page_cost=4.0, the difference in cost between the index scan plan
and the bitmap scan plan is even bigger.

>
> Note:
> postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
> correlation | 0.00710112
>
> ..so this is different from the issue corrected by the patch I created while
> testing.
>
>> Note that id%100==1 implies flag='t', so the planner anticipates retrieving
>> fewer rows than it will ultimately read, probably by 2x.  It makes sense that
>> causes the index scan to be more expensive than expected, but that's only
>> somewhat important, since there's no joins involved.
> I changed the query from COUNT(*) TO * for easier to read explain:
>
> CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 10000000) id;
> CREATE INDEX i1 ON aaa(num);
> CREATE INDEX i2 ON aaa (flag);
> ANALYZE VERBOSE aaa;
> EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
>   Bitmap Heap Scan on public.aaa  (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107 rows=100000 loops=1)
>     ->  BitmapAnd  (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1)
>           ->  Bitmap Index Scan on i1  (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943 rows=100000 loops=1)
>           ->  Bitmap Index Scan on i2  (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804 rows=1000000 loops=1)
>
> ..which is what's wanted with no planner hints (PG10.1 here).
Yes, that's what you get without planner hints, but it's strange to get
this plan, when there is another one, which runs 2-3 times faster, but
happens to be estimated to be twice more costly than the one with bitmap
scans:

# set enable_bitmapscan = off; set enable_indexscan = on;  set
enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5)
(actual time=0.020..57.765 rows=100000 loops=1)

vs.

# set enable_bitmapscan = on;  set enable_indexscan = off; set
enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5)
(actual time=122.137..182.811 rows=100000 loops=1)
   ->  BitmapAnd  (cost=13099.33..13099.33 rows=10428 width=0) (actual
time=110.168..110.168 rows=0 loops=1)
         ->  Bitmap Index Scan on i1  (cost=0.00..1181.44 rows=101667
width=0) (actual time=20.845..20.845 rows=100000 loops=1)
         ->  Bitmap Index Scan on i2  (cost=0.00..11912.43 rows=1025666
width=0) (actual time=80.323..80.323 rows=1000000 loops=1)

>
> Same on PG95:
> postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
>   Bitmap Heap Scan on public.aaa  (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583 rows=100000 loops=1)
>     ->  BitmapAnd  (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1)
>           ->  Bitmap Index Scan on i1  (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911 rows=100000 loops=1)
>           ->  Bitmap Index Scan on i2  (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237 rows=1000000 loops=1)
>
> The rowcount is off, but not a critical issue without a join.
>
> Justin
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Justin Pryzby
In reply to this post by Justin Pryzby
On Fri, Dec 01, 2017 at 05:11:04PM -0600, Justin Pryzby wrote:
> I tried to reproduce this issue and couldn't, under PG95 and 10.1:

I'm embarassed to say that I mis-read your message, despite you're amply clear
subject.  You're getting a bitmap scan but you'd prefer to get an index scan.
I anticipated the opposite problem (which is what I've had issues with myself).

> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
> > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> > > We recently had an issue in production, where a bitmap scan was chosen
> > > instead of an index scan. Despite being 30x slower, the bitmap scan had
> > > about the same cost as the index scan.
>
> Note:
> postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
> correlation | 0.00710112
>
> ..so this is different from the issue corrected by the patch I created while
> testing.

Actually, that the table is "not correlated" on "num" column is maybe the
primary reason why PG avoids using an index scan.  It (more or less correctly)
deduces that it's going to have to "read" a large fraction of the pages (even
if only to process a small fraction of the rows), which is costly, except it's
all cached..  In your case, that overly-penalizes the index scan.

This is cost_index() and cost_bitmap_heap_scan() in costsize.c.  Since the
index is uncorrelated, it's returning something close to max_IO_cost.  It looks
like effective_cache_size only affects index_pages_fetched().

I'm going to try to dig some more into it.  Maybe there's evidence to
re-evaluate one of these:

cost_index()
| run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
or
cost_bitmap_heap_scan()
| cost_per_page = spc_random_page_cost -
|               (spc_random_page_cost - spc_seq_page_cost)
|               * sqrt(pages_fetched / T);

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Jeff Janes
In reply to this post by Vitaliy Garnashevich
On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich <[hidden email]> wrote:
On 02/12/2017 01:11, Justin Pryzby wrote:
I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;
What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)
effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For the aaa.num = 39 case, the faster index scan actually does hit 15 times more buffers than the bitmap scan does.  While 1.5 is lot lower than 4.0, it is still much higher than the true cost of reading a page from the buffer cache.   This why the index scan is getting punished.  You could lower random_page_cost and  seq_page_cost to 0, to remove those considerations.  (I'm not saying you should do this on your production system, but rather you should do it as a way to investigate the issue.  But it might make sense on production as well)


For this test I'm using SSD and Windows (if that matters). On production we also use SSD, hence lower random_page_cost. But with the default random_page_cost=4.0, the difference in cost between the index scan plan and the bitmap scan plan is even bigger.

Since it is all shared buffers hits, it doesn't matter if you have SSD for this particular test case.
 
Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Justin Pryzby
In reply to this post by Vitaliy Garnashevich
On Sat, Dec 02, 2017 at 01:54:09AM +0200, Vitaliy Garnashevich wrote:

> On 02/12/2017 01:11, Justin Pryzby wrote:
> >..which is what's wanted with no planner hints (PG10.1 here).
> Yes, that's what you get without planner hints, but it's strange to get this
> plan, when there is another one, which runs 2-3 times faster, but happens to
> be estimated to be twice more costly than the one with bitmap scans:
>
> # set enable_bitmapscan = off; set enable_indexscan = on;  set enable_seqscan = off;
> # explain analyze select * from aaa where num = 1 and flag = true;
> Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5) (actual time=0.020..57.765 rows=100000 loops=1)
>
> vs.
>
> # set enable_bitmapscan = on;  set enable_indexscan = off; set enable_seqscan = off;
> # explain analyze select * from aaa where num = 1 and flag = true;
> Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5) (actual time=122.137..182.811 rows=100000 loops=1)

I was able to get an index plan with:

SET random_page_cost=1; SET cpu_index_tuple_cost=.04; -- default: 0.005; see selfuncs.c
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
 Index Scan using i1 on public.aaa  (cost=0.43..50120.71 rows=10754 width=5) (actual time=0.040..149.580 rows=100000 loops=1)

Or with:
SET random_page_cost=1; SET cpu_operator_cost=0.03; -- default: 0.0025 see cost_bitmap_tree_node()
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= true;  
 Index Scan using i1 on public.aaa  (cost=5.22..49328.00 rows=10754 width=5) (actual time=0.051..109.082 rows=100000 loops=1)

Or a combination trying to minimize the cost of the index scan:
postgres=# SET random_page_cost=1; SET cpu_index_tuple_cost=.0017; SET cpu_operator_cost=0.03; EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= true;  
 Index Scan using i1 on public.aaa  (cost=5.22..48977.10 rows=10754 width=5) (actual time=0.032..86.883 rows=100000 loops=1)

Not sure if that's reasonable, but maybe it helps to understand.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Vitaliy Garnashevich
In reply to this post by Jeff Janes
On 02/12/2017 07:51, Jeff Janes wrote:
On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich <[hidden email]> wrote:
On 02/12/2017 01:11, Justin Pryzby wrote:
I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;
What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)
effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For the aaa.num = 39 case, the faster index scan actually does hit 15 times more buffers than the bitmap scan does.  While 1.5 is lot lower than 4.0, it is still much higher than the true cost of reading a page from the buffer cache.   This why the index scan is getting punished.  You could lower random_page_cost and  seq_page_cost to 0, to remove those considerations.  (I'm not saying you should do this on your production system, but rather you should do it as a way to investigate the issue.  But it might make sense on production as well)
seq_page_cost = 1.0
random_page_cost = 1.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=11536.74..20856.96 rows=10257 width=5) (actual time=108.338..108.338 rows=0 loops=1)
  ->  BitmapAnd  (cost=11536.74..11536.74 rows=10257 width=0) (actual time=108.226..108.226 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..1025.43 rows=100000 width=0) (actual time=18.563..18.563 rows=100000 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..10505.93 rows=1025666 width=0) (actual time=78.493..78.493 rows=1000000 loops=1)

Index Scan using i1 on aaa  (cost=0.44..44663.58 rows=10257 width=5) (actual time=51.264..51.264 rows=0 loops=1)

Here I've used the filter num = 2, which produces rows=0 at BitmapAnd, and thus avoids a lot of work at "Bitmap Heap Scan" node, while still leaving about the same proportion in bitmap vs index - the bitmap is twice slower but twice less costly. It does not matter much which value to use for the filter, if it's other than num = 1.


seq_page_cost = 0.0
random_page_cost = 0.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=753.00..2003.00 rows=10257 width=5) (actual time=82.212..82.212 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..750.43 rows=100000 width=0) (actual time=17.401..17.401 rows=100000 loops=1)

Index Scan using i1 on aaa  (cost=0.44..1750.43 rows=10257 width=5) (actual time=49.766..49.766 rows=0 loops=1)

The bitmap plan was reduced to use only one bitmap scan, and finally it costs more than the index plan. But I doubt that the settings seq_page_cost = random_page_cost = 0.0 should actually be used. Probably it should be instead something like 1.0/1.0 or 1.0/1.1, but other costs increased, to have more weight.


# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

Bitmap Heap Scan on aaa  (cost=36882.97..46587.82 rows=10257 width=5) (actual time=106.045..106.045 rows=0 loops=1)
  ->  BitmapAnd  (cost=36882.97..36882.97 rows=10257 width=0) (actual time=105.966..105.966 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..3276.74 rows=100000 width=0) (actual time=15.977..15.977 rows=100000 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..33584.72 rows=1025666 width=0) (actual time=79.208..79.208 rows=1000000 loops=1)

Index Scan using i1 on aaa  (cost=1.74..49914.89 rows=10257 width=5) (actual time=50.144..50.144 rows=0 loops=1)


# x5 tuple/operator costs - switched to single bitmap index scan, but now it costs more than the index scan
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.05;
set cpu_index_tuple_cost = 0.025;
set cpu_operator_cost = 0.0125;

Bitmap Heap Scan on aaa  (cost=4040.00..54538.00 rows=10257 width=5) (actual time=82.338..82.338 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..4027.18 rows=100000 width=0) (actual time=19.541..19.541 rows=100000 loops=1)

Index Scan using i1 on aaa  (cost=2.17..51665.32 rows=10257 width=5) (actual time=49.545..49.545 rows=0 loops=1)


I've also tried seq_page_cost = 1.0,  random_page_cost = 1.1, but that would require more than x10 increase in tuple/operator costs, to make bitmap more costly than index.



For this test I'm using SSD and Windows (if that matters). On production we also use SSD, hence lower random_page_cost. But with the default random_page_cost=4.0, the difference in cost between the index scan plan and the bitmap scan plan is even bigger.

Since it is all shared buffers hits, it doesn't matter if you have SSD for this particular test case.
Agree. I've just tried to justify the value of random_page_cost, which is lower than like 2.0.

 
Cheers,

Jeff


Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Jeff Janes
On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <[hidden email]> wrote:


seq_page_cost = 0.0
random_page_cost = 0.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=753.00..2003.00 rows=10257 width=5) (actual time=82.212..82.212 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..750.43 rows=100000 width=0) (actual time=17.401..17.401 rows=100000 loops=1)

Index Scan using i1 on aaa  (cost=0.44..1750.43 rows=10257 width=5) (actual time=49.766..49.766 rows=0 loops=1)

The bitmap plan was reduced to use only one bitmap scan, and finally it costs more than the index plan.

Right, so there is a cpu costing problem (which could only be fixed by hacking postgresql and recompiling it), but it is much smaller of a problem than the IO cost not being accurate due to the high hit rate.  Fixing the CPU costing problem is unlikely to make a difference to your real query.  If you set the page costs to zero, what happens to your real query?
 
But I doubt that the settings seq_page_cost = random_page_cost = 0.0 should actually be used.

Why not?  If your production server really has everything in memory during normal operation, that is the correct course of action.  If you ever restart the server, then you could have some unpleasant time getting it back up to speed again, but pg_prewarm could help with that.  
 
Probably it should be instead something like 1.0/1.0 or 1.0/1.1, but other costs increased, to have more weight.

This doesn't make any  sense to me.  Halving the page costs is mathematically the same as doubling all the other constants.  But the first way of doing things says what you are doing, and the second way is an obfuscation of what you are doing.
 

# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

If you really want to target the plan with the BitmapAnd, you should increase  cpu_index_tuple_cost and/or cpu_operator_cost but not increase cpu_tuple_cost.  That is because the  unselective bitmap index scan does not incur any cpu_tuple_cost, but does incur index_tuple and operator costs.  Unfortunately all other index scans in the system will also be skewed by such a change if you make the change system-wide.

Incidentally, the "actual rows" field of BitmapAnd is always zero.  That field is not implemented for that node type.  


Why do you have an index on flag in the first place?  What does the index accomplish, other than enticing the planner into bad plans?  I don't know how this translates back into your real query, but dropping that index should be considered.  Or replace both indexes with one on (num,flag).

Or you can re-write the part of the WHERE clause in a way that it can't use an index, something like:

and flag::text ='t'

Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Tom Lane-2
Jeff Janes <[hidden email]> writes:
> On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
> [hidden email]> wrote:
>> # x4 tuple/operator costs - bitmap scan still a bit cheaper
>> set seq_page_cost = 1.0;
>> set random_page_cost = 1.0;
>> set cpu_tuple_cost = 0.04;
>> set cpu_index_tuple_cost = 0.02;
>> set cpu_operator_cost = 0.01;

> If you really want to target the plan with the BitmapAnd, you should
> increase  cpu_index_tuple_cost and/or cpu_operator_cost but not increase
> cpu_tuple_cost.  That is because the  unselective bitmap index scan does
> not incur any cpu_tuple_cost, but does incur index_tuple and operator
> costs.  Unfortunately all other index scans in the system will also be
> skewed by such a change if you make the change system-wide.

I think it'd be a serious error to screw around with your cost settings
on the basis of a single case in which the rowcount estimates are so
far off.  It's really those estimates that are the problem AFAICS.

The core issue in this example is that, the way the test data is set up,
the "flag = true" condition actually adds no selectivity at all, because
every row with "num = 1" is certain to have "flag = true".  If the planner
realized that, it would certainly not bother with BitmapAnd'ing the flag
index onto the results of the num index.  But it doesn't know that those
columns are correlated, so it supposes that adding the extra index will
give a 10x reduction in the number of heap rows that have to be visited
(since it knows that only 1/10th of the rows have "flag = true").
*That* is what causes the overly optimistic cost estimate for the
two-index bitmapscan, and no amount of fiddling with the cost parameters
will make that better.

I tried creating multiple-column statistics using the v10 facility for
that:

regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE

but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again.  But I've not
looked closer yet.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Jeff Janes
On Sat, Dec 2, 2017 at 3:44 PM, Tom Lane <[hidden email]> wrote:
Jeff Janes <[hidden email]> writes:
> On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
> [hidden email]> wrote:
>> # x4 tuple/operator costs - bitmap scan still a bit cheaper
>> set seq_page_cost = 1.0;
>> set random_page_cost = 1.0;
>> set cpu_tuple_cost = 0.04;
>> set cpu_index_tuple_cost = 0.02;
>> set cpu_operator_cost = 0.01;

> If you really want to target the plan with the BitmapAnd, you should
> increase  cpu_index_tuple_cost and/or cpu_operator_cost but not increase
> cpu_tuple_cost.  That is because the  unselective bitmap index scan does
> not incur any cpu_tuple_cost, but does incur index_tuple and operator
> costs.  Unfortunately all other index scans in the system will also be
> skewed by such a change if you make the change system-wide.

I think it'd be a serious error to screw around with your cost settings
on the basis of a single case in which the rowcount estimates are so
far off.  It's really those estimates that are the problem AFAICS.

The core issue in this example is that, the way the test data is set up,
the "flag = true" condition actually adds no selectivity at all, because
every row with "num = 1" is certain to have "flag = true".  If the planner
realized that, it would certainly not bother with BitmapAnd'ing the flag
index onto the results of the num index.  But it doesn't know that those
columns are correlated, so it supposes that adding the extra index will
give a 10x reduction in the number of heap rows that have to be visited
(since it knows that only 1/10th of the rows have "flag = true").
*That* is what causes the overly optimistic cost estimate for the
two-index bitmapscan, and no amount of fiddling with the cost parameters
will make that better.


But he also tested with num=2 and num=39, which reverses the situation so the bitmap is 100% selective rather than the 90% the planner thinks it will be.

But it is still slower for him (I am having trouble replicating that exact behavior), so building the bitmap to rule out 100% of the rows is empirically not worth it, I don't see how building it to rule out 90%, as the planner things, would be any better.  


I tried creating multiple-column statistics using the v10 facility for
that:

regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE

but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again.  But I've not
looked closer yet.

I think the non-extended stats code also has trouble with booleans.  pg_stats gives me a correlation  of 0.8 or higher for the flag column.

Due to that, when I disable bitmapscans and seqscans, I start getting slow index scans on the wrong index, i2 rather than i1.  I don't know why he doesn't see that in his example.
 
Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted? - boolean correlation

Justin Pryzby
On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:
> I think the non-extended stats code also has trouble with booleans.
> pg_stats gives me a correlation  of 0.8 or higher for the flag column.

It's not due to the boolean though; you see the same thing if you do:
CREATE INDEX aaa_f ON aaa((flag::text));
ANALYZE aaa;
correlation | 0.81193

or:
ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
correlation | 0.81193

I think it's caused by having so few (2) values to correlate.

most_common_vals       | {f,t}
most_common_freqs      | {0.9014,0.0986}
correlation            | 0.822792

It thinks there's somewhat-high correlation since it gets a list of x and y
values (integer positions by logical and physical sort order) and 90% of the x
list (logical value) are the same value ('t'), and the CTIDs are in order on
the new index, so 90% of the values are 100% correlated.

It improves (by which I mean here that it spits out a lower number) if it's not
a 90/10 split:

CREATE TABLE aaa5 AS SELECT (id%100)::int num, (id%10>5)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX ON aaa5 (flag);

tablename   | aaa5
attname     | flag
correlation | 0.522184

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Vitaliy Garnashevich
In reply to this post by Jeff Janes
On 02/12/2017 23:17, Jeff Janes wrote:
Right, so there is a cpu costing problem (which could only be fixed by hacking postgresql and recompiling it), but it is much smaller of a problem than the IO cost not being accurate due to the high hit rate.  Fixing the CPU costing problem is unlikely to make a difference to your real query.  If you set the page costs to zero, what happens to your real query?
I can't reproduce the exact issue on the real database any more. The query started to use the slow bitmap scan recently, and had been doing so for some time lately, but now it's switched back to use the index scan. The table involved in the query gets modified a lot. It has hundreds of millions of rows. Lots of new rows are appended to it every day, the oldest rows are sometimes removed. The table is analyzed at least daily. It's possible that statistics was updated and that caused the query to run differently. But I still would like to understand why that issue happened, and how to properly fix it, in case the issue returns.
 
But I doubt that the settings seq_page_cost = random_page_cost = 0.0 should actually be used.

Why not?  If your production server really has everything in memory during normal operation, that is the correct course of action.  If you ever restart the server, then you could have some unpleasant time getting it back up to speed again, but pg_prewarm could help with that. 
In the real database, not everything is in memory. There are 200GB+ of RAM, but DB is 500GB+. The table involved in the query itself is 60GB+ of data and 100GB+ of indexes. I'm running the test case in a way where all reads are done from RAM, only to make it easier to reproduce and to avoid unrelated effects.

As far as know, costs in Postgres were designed to be relative to seq_page_cost, which for that reason is usually defined as 1.0. Even if everything would be in RAM, accesses to the pages would still not have zero cost. Setting 0.0 just seems too extreme, as all other non-zero costs would become infinitely bigger.
If you really want to target the plan with the BitmapAnd, you should increase  cpu_index_tuple_cost and/or cpu_operator_cost but not increase cpu_tuple_cost.  That is because the  unselective bitmap index scan does not incur any cpu_tuple_cost, but does incur index_tuple and operator costs.  Unfortunately all other index scans in the system will also be skewed by such a change if you make the change system-wide.
Exactly. I'd like to understand why the worse plan is being chosen, and 1) if it's fixable by tuning costs, to figure out the right settings which could be used in production, 2) if there is a bug in Postgres optimizer, then to bring some attention to it, so that it's eventually fixed in one of future releases, 3) if Postgres is supposed to work this way, then at least I (and people who ever read this thread) would understand it better.

Regards,
Vitaliy


Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Vitaliy Garnashevich
In reply to this post by Tom Lane-2
On 03/12/2017 01:44, Tom Lane wrote:

> I think it'd be a serious error to screw around with your cost settings
> on the basis of a single case in which the rowcount estimates are so
> far off.  It's really those estimates that are the problem AFAICS.
>
> The core issue in this example is that, the way the test data is set up,
> the "flag = true" condition actually adds no selectivity at all, because
> every row with "num = 1" is certain to have "flag = true".  If the planner
> realized that, it would certainly not bother with BitmapAnd'ing the flag
> index onto the results of the num index.  But it doesn't know that those
> columns are correlated, so it supposes that adding the extra index will
> give a 10x reduction in the number of heap rows that have to be visited
> (since it knows that only 1/10th of the rows have "flag = true").
> *That* is what causes the overly optimistic cost estimate for the
> two-index bitmapscan, and no amount of fiddling with the cost parameters
> will make that better.
Here I've tried to make a test which would not have correlation between
the two columns.

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select floor(random()*100)::int num, (random()*10 <
1)::bool flag from generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";10000033;44248;226
"i1";10000033;27422;365
"i2";10000033;27422;365

select flag, count(*) from aaa group by flag order by flag;
f;9000661
t;999339

select num, count(*) from aaa group by num order by num;
0;99852
1;99631
2;99699
3;100493
...
96;100345
97;99322
98;100013
99;100030

explain (analyze,verbose,costs,buffers)
select * from aaa where num = 1 and flag = true;

Bitmap Heap Scan on public.aaa  (cost=12829.83..24729.85 rows=10340
width=5) (actual time=104.941..112.649 rows=9944 loops=1)
   Output: num, flag
   Recheck Cond: (aaa.num = 1)
   Filter: aaa.flag
   Heap Blocks: exact=8922
   Buffers: shared hit=11932
   ->  BitmapAnd  (cost=12829.83..12829.83 rows=10340 width=0) (actual
time=102.926..102.926 rows=0 loops=1)
         Buffers: shared hit=3010
         ->  Bitmap Index Scan on i1  (cost=0.00..1201.44 rows=103334
width=0) (actual time=15.459..15.459 rows=99631 loops=1)
               Index Cond: (aaa.num = 1)
               Buffers: shared hit=276
         ->  Bitmap Index Scan on i2  (cost=0.00..11622.97 rows=1000671
width=0) (actual time=76.906..76.906 rows=999339 loops=1)
               Index Cond: (aaa.flag = true)
               Buffers: shared hit=2734
Planning time: 0.110 ms
Execution time: 113.272 ms

Index Scan using i1 on public.aaa  (cost=0.44..66621.56 rows=10340
width=5) (actual time=0.027..47.075 rows=9944 loops=1)
   Output: num, flag
   Index Cond: (aaa.num = 1)
   Filter: aaa.flag
   Rows Removed by Filter: 89687
   Buffers: shared hit=39949
Planning time: 0.104 ms
Execution time: 47.351 ms

>
> I tried creating multiple-column statistics using the v10 facility for
> that:
>
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> ANALYZE
>
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again.  But I've not
> looked closer yet.
>
> regards, tom lane
> .
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted?

Vitaliy Garnashevich
In reply to this post by Jeff Janes
On 03/12/2017 03:27, Jeff Janes wrote:
Due to that, when I disable bitmapscans and seqscans, I start getting slow index scans on the wrong index, i2 rather than i1.  I don't know why he doesn't see that in his example.
When I increase effective_cache_size to 1024MB, I start getting the plan with the slower index i2, too.

Bitmap Heap Scan on public.aaa  (cost=12600.90..23688.70 rows=9488 width=5) (actual time=107.529..115.902 rows=9976 loops=1)
  ->  BitmapAnd  (cost=12600.90..12600.90 rows=9488 width=0) (actual time=105.133..105.133 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..1116.43 rows=96000 width=0) (actual time=16.313..16.313 rows=100508 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..11479.47 rows=988338 width=0) (actual time=77.950..77.950 rows=1000200 loops=1)

Index Scan using i2 on public.aaa  (cost=0.44..48227.31 rows=9488 width=5) (actual time=0.020..285.695 rows=9976 loops=1)

Seq Scan on public.aaa  (cost=0.00..169248.54 rows=9488 width=5) (actual time=0.024..966.469 rows=9976 loops=1)

This way the estimates and the actual time get more sense. But then there's the question - maybe it's i1 runs too fast, and is estimated incorrectly? Why that happens?

Here are the complete plans with the two different kinds of index scans once again:

Index Scan using i1 on public.aaa  (cost=0.44..66621.56 rows=10340 width=5) (actual time=0.027..47.075 rows=9944 loops=1)
  Output: num, flag
  Index Cond: (aaa.num = 1)
  Filter: aaa.flag
  Rows Removed by Filter: 89687
  Buffers: shared hit=39949
Planning time: 0.104 ms
Execution time: 47.351 ms

Index Scan using i2 on public.aaa  (cost=0.44..48227.31 rows=9488 width=5) (actual time=0.020..285.695 rows=9976 loops=1)
  Output: num, flag
  Index Cond: (aaa.flag = true)
  Filter: (aaa.flag AND (aaa.num = 1))
  Rows Removed by Filter: 990224
  Buffers: shared hit=46984
Planning time: 0.098 ms
Execution time: 286.081 ms


// The test DB was populated with: create table aaa as select floor(random()*100)::int num, (random()*10 < 1)::bool flag from generate_series(1, 10000000) id;

Regards,
Vitaliy

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted? - boolean correlation

Jeff Janes
In reply to this post by Justin Pryzby
On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <[hidden email]> wrote:
On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:
> I think the non-extended stats code also has trouble with booleans.
> pg_stats gives me a correlation  of 0.8 or higher for the flag column.

It's not due to the boolean though; you see the same thing if you do:
CREATE INDEX aaa_f ON aaa((flag::text));
ANALYZE aaa;
correlation | 0.81193

or:
ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
correlation | 0.81193

I think it's caused by having so few (2) values to correlate.

most_common_vals       | {f,t}
most_common_freqs      | {0.9014,0.0986}
correlation            | 0.822792

It thinks there's somewhat-high correlation since it gets a list of x and y
values (integer positions by logical and physical sort order) and 90% of the x
list (logical value) are the same value ('t'), and the CTIDs are in order on
the new index, so 90% of the values are 100% correlated.

But there is no index involved (except in the case of the functional index).  The correlation of table columns to physical order of the table doesn't depend on the existence of an index, or the physical order within an index.

But I do see that ties within the logical order of the column values are broken to agree with the physical order.  That is wrong, right?  Is there any argument that this is desirable?

It looks like it could be fixed with a few extra double calcs per distinct value.  Considering we already sorted the sample values using SQL-callable collation dependent comparators, I doubt a few C-level double calcs is going to be meaningful.

Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted? - boolean correlation

Tom Lane-2
Jeff Janes <[hidden email]> writes:
> On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <[hidden email]> wrote:
>> It thinks there's somewhat-high correlation since it gets a list of x
>> and y values (integer positions by logical and physical sort order) and
>> 90% of the x list (logical value) are the same value ('t'), and the
>> CTIDs are in order on the new index, so 90% of the values are 100%
>> correlated.

> But there is no index involved (except in the case of the functional
> index).  The correlation of table columns to physical order of the table
> doesn't depend on the existence of an index, or the physical order within
> an index.

> But I do see that ties within the logical order of the column values are
> broken to agree with the physical order.  That is wrong, right?  Is there
> any argument that this is desirable?

Uh ... what do you propose doing instead?  We'd have to do something with
ties, and it's not so obvious this way is wrong.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted? - boolean correlation

Jeff Janes


On Dec 3, 2017 15:31, "Tom Lane" <[hidden email]> wrote:
Jeff Janes <[hidden email]> writes:
> On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <[hidden email]> wrote:
>> It thinks there's somewhat-high correlation since it gets a list of x
>> and y values (integer positions by logical and physical sort order) and
>> 90% of the x list (logical value) are the same value ('t'), and the
>> CTIDs are in order on the new index, so 90% of the values are 100%
>> correlated.

> But there is no index involved (except in the case of the functional
> index).  The correlation of table columns to physical order of the table
> doesn't depend on the existence of an index, or the physical order within
> an index.

> But I do see that ties within the logical order of the column values are
> broken to agree with the physical order.  That is wrong, right?  Is there
> any argument that this is desirable?

Uh ... what do you propose doing instead?  We'd have to do something with
ties, and it's not so obvious this way is wrong.

Let them be tied.  If there are 10 distinct values, number the values 0 to 9, and all rows of a given distinct values get the same number for the logical order axis.

Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to me.  Although if we switched btree to store duplicate values with tid as a tie breaker, then maybe it wouldn't be as obviously wrong.

Cheers,

Jeff 
Reply | Threaded
Open this post in threaded view
|

Re: Bitmap scan is undercosted? - boolean correlation

Tom Lane-2
Jeff Janes <[hidden email]> writes:
> On Dec 3, 2017 15:31, "Tom Lane" <[hidden email]> wrote:
>> Jeff Janes <[hidden email]> writes:
>>> But I do see that ties within the logical order of the column values are
>>> broken to agree with the physical order.  That is wrong, right?  Is there
>>> any argument that this is desirable?

>> Uh ... what do you propose doing instead?  We'd have to do something with
>> ties, and it's not so obvious this way is wrong.

> Let them be tied.  If there are 10 distinct values, number the values 0 to
> 9, and all rows of a given distinct values get the same number for the
> logical order axis.
> Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to
> me.  Although if we switched btree to store duplicate values with tid as a
> tie breaker, then maybe it wouldn't be as obviously wrong.

I thought some more about this.  What we really want the correlation stat
to do is help us estimate how randomly an index-ordered scan will access
the heap.  If the values we've sampled are all unequal then there's no
particular issue.  However, if we have some group of equal values, we
do not really know what order an indexscan will visit them in.  The
existing correlation calculation is making the *most optimistic possible*
assumption, that such a group will be visited exactly in heap order ---
and that assumption isn't too defensible.  IIRC, a freshly built b-tree
will behave that way, because the initial sort of a btree breaks ties
using heap TIDs; but we don't maintain that property during later
insertions.  In any case, given that we do this calculation without regard
to any specific index, we can't reasonably expect to model exactly what
the index will do.  It would be best to adopt some middle-of-the-road
assumption about what the heap visitation order will be for a set of
duplicate values: not exactly heap order, but I think we should not use
a worst-case assumption either, since the btree may retain some amount
of its initial ordering.

BTW, I disagree that "correlation = zero" is the right answer for this
particular example.  If the btree is freshly built, then an index-order
scan would visit all the heap pages in sequence to fetch "f" rows, and
then visit them all in sequence again to fetch "t" rows, which is a whole
lot better than the purely random access that zero correlation implies.
So I think 0.8 or so is actually a perfectly reasonable answer when the
index is fresh.  The trouble is just that it'd behoove us to derate that
answer somewhat for the probability that the index isn't fresh.

My first thought for a concrete fix was to use the mean position of
a group of duplicates for the purposes of the correlation calculation,
but on reflection that's clearly wrong.  We do have an idea, from the
data we have, whether the duplicates are close together in the heap
or spread all over.  Using only mean position would fail to distinguish
those cases, but really we'd better penalize the spread-all-over case.
I'm not sure how to do that.

Or we could leave this calculation alone and try to move towards keeping
equal values in TID order in btrees.  Not sure about the downsides of
that, though.

                        regards, tom lane

12
Next Thread