How does the planner determine plan_rows ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How does the planner determine plan_rows ?

Donald Dong
Hi,

I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.

=# CREATE TABLE t1(id INT, data INT);
=# EXPLAIN ANALYZE SELECT * FROM t1;
  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
  time=0.003..0.003 rows=0 loops=1)

=# CREATE TABLE t2(data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t2;
  Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
  time=0.002..0.002 rows=0 loops=1)

=# CREATE TABLE t3(id INT, data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t3;
  Seq Scan on t3  (cost=0.00..22.70 rows=1270 width=36) (actual
  time=0.001..0.001 rows=0 loops=1)

I found this behavior unexpected. I'm still trying to find out how/where the planner
determines the plan_rows. Any help will be appreciated!

Thank you,
Donald Dong
Reply | Threaded
Open this post in threaded view
|

Re: How does the planner determine plan_rows ?

Andrew Gierth
>>>>> "Donald" == Donald Dong <[hidden email]> writes:

 Donald> Hi,
 Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
 Donald> `SELECT * `. I found the results have different row numbers,
 Donald> but the tables are all empty.

Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.

The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:

 Donald> =# CREATE TABLE t1(id INT, data INT);
 Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
 Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
 Donald>   time=0.003..0.003 rows=0 loops=1)

An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.

 Donald> =# CREATE TABLE t2(data VARCHAR);
 Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
 Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
 Donald>   time=0.002..0.002 rows=0 loops=1)

Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: How does the planner determine plan_rows ?

Donald Dong
Thank you for the great explanation!

> On Jan 10, 2019, at 7:48 PM, Andrew Gierth <[hidden email]> wrote:
>
>>>>>> "Donald" == Donald Dong <[hidden email]> writes:
>
> Donald> Hi,
> Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
> Donald> `SELECT * `. I found the results have different row numbers,
> Donald> but the tables are all empty.
>
> Empty tables are something of a special case, because the planner
> doesn't assume that they will _stay_ empty, and using an estimate of 0
> or 1 rows would tend to create a distorted plan that would likely blow
> up in runtime as soon as you insert a second row.
>
> The place to look for info would be estimate_rel_size in
> optimizer/util/plancat.c, from which you can see that empty tables get
> a default size estimate of 10 pages. Thus:
>
> Donald> =# CREATE TABLE t1(id INT, data INT);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
> Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
> Donald>   time=0.003..0.003 rows=0 loops=1)
>
> An (int,int) tuple takes about 36 bytes, so you can get about 226 of
> them on a page, so 10 pages is 2260 rows.
>
> Donald> =# CREATE TABLE t2(data VARCHAR);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
> Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
> Donald>   time=0.002..0.002 rows=0 loops=1)
>
> Size of a varchar with no specified length isn't known, so the planner
> determines an average length of 32 by the time-honoured method of rectal
> extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
> page.
>
> --
> Andrew (irc:RhodiumToad)


Reply | Threaded
Open this post in threaded view
|

Re: How does the planner determine plan_rows ?

Tom Lane-2
In reply to this post by Donald Dong
Donald Dong <[hidden email]> writes:
> I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
> the results have different row numbers, but the tables are all empty.

This isn't a terribly interesting case, since you've neither loaded
any data nor vacuumed/analyzed the table, but ...

> I found this behavior unexpected. I'm still trying to find out how/where the planner
> determines the plan_rows.

... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: How does the planner determine plan_rows ?

Donald Dong

> On Jan 10, 2019, at 8:01 PM, Tom Lane <[hidden email]> wrote:
>
> ... estimate_rel_size() in plancat.c is where to look to find out
> about the planner's default estimates when it's lacking hard data.

Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.

To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.

I'm very interested to learn how the professionals explore the codebase!