Weird seqscan node plan

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Weird seqscan node plan

Игорь Выскорко
Hi all!
I'm confused by planner choice: seqscan but not index scan when index scan cost is much cheaper.
1st plan: https://explain.depesz.com/s/Cti#l8
2nd plan (with "set enable_seqscan = off"): https://explain.depesz.com/s/qn0I#l8

Look at 8th row: this nested loop decided to join the second node by using seqscan (1st plan) when index is available (2nd plan). Index scan is much cheaper (0.430 over 257.760).

What am I missing?

And thanks for any reply!


Reply | Threaded
Open this post in threaded view
|

Re: Weird seqscan node plan

Andrei Zhidenkov
How many tables do you have in your query? If too many, in your case “Genetic Query Optiomiation” might be used (https://www.postgresql.org/docs/10/geqo-pg-intro.html).

On 26. Nov 2019, at 03:19, Игорь Выскорко <[hidden email]> wrote:

Hi all!
I'm confused by planner choice: seqscan but not index scan when index scan cost is much cheaper.
1st plan: https://explain.depesz.com/s/Cti#l8
2nd plan (with "set enable_seqscan = off"): https://explain.depesz.com/s/qn0I#l8

Look at 8th row: this nested loop decided to join the second node by using seqscan (1st plan) when index is available (2nd plan). Index scan is much cheaper (0.430 over 257.760).

What am I missing?

And thanks for any reply!



Reply | Threaded
Open this post in threaded view
|

Re: Weird seqscan node plan

Игорь Выскорко
 
 
26.11.2019, 16:02, "Andrei Zhidenkov" <[hidden email]>:
How many tables do you have in your query? If too many, in your case “Genetic Query Optiomiation” might be used (https://www.postgresql.org/docs/10/geqo-pg-intro.html).
 
On 26. Nov 2019, at 03:19, Игорь Выскорко <[hidden email]> wrote:
 
Hi all!
I'm confused by planner choice: seqscan but not index scan when index scan cost is much cheaper.
1st plan: https://explain.depesz.com/s/Cti#l8
2nd plan (with "set enable_seqscan = off"): https://explain.depesz.com/s/qn0I#l8

Look at 8th row: this nested loop decided to join the second node by using seqscan (1st plan) when index is available (2nd plan). Index scan is much cheaper (0.430 over 257.760).

What am I missing?

And thanks for any reply!
 
 
Hm... about 12 tables.
I tried to disable geqo (set geqo = off;)  plan didn't change. But thanks for your try )

Version I'm using (if matter):
select version();
PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
Reply | Threaded
Open this post in threaded view
|

RE: Weird seqscan node plan

Igor Neyman

From: Игорь Выскорко [mailto:[hidden email]]
Sent: Tuesday, November 26, 2019 4:13 AM
To: Andrei Zhidenkov <[hidden email]>
Cc: [hidden email]
Subject: Re: Weird seqscan node plan

 

26.11.2019, 16:02, "Andrei Zhidenkov" <[hidden email]>:

How many tables do you have in your query? If too many, in your case “Genetic Query Optiomiation” might be used (https://www.postgresql.org/docs/10/geqo-pg-intro.html).

 

On 26. Nov 2019, at 03:19, Игорь Выскорко <[hidden email]> wrote:

 

Hi all!
I'm confused by planner choice: seqscan but not index scan when index scan cost is much cheaper.
1st plan: https://explain.depesz.com/s/Cti#l8
2nd plan (with "set enable_seqscan = off"): https://explain.depesz.com/s/qn0I#l8

Look at 8th row: this nested loop decided to join the second node by using seqscan (1st plan) when index is available (2nd plan). Index scan is much cheaper (0.430 over 257.760).

What am I missing?

And thanks for any reply!
 

 

Hm... about 12 tables.
I tried to disable geqo (set geqo = off;)  plan didn't change. But thanks for your try )

Version I'm using (if matter):
select version();

PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

 

Try increasing the following parameters to 14 (or even 16, if you are not sure about number of tables involved):

 

geqo_threshold = 14

from_collapse_limit = 14

join_collapse_limit = 14

 

“about 12” is too close to default limit, 12.

 

Regards,

Igor Neyman

 

Reply | Threaded
Open this post in threaded view
|

Re: Weird seqscan node plan

Игорь Выскорко


>
> Try increasing the following parameters to 14 (or even 16, if you are not sure about number of tables involved):
>
> geqo_threshold = 14
>
> from_collapse_limit = 14
>
> join_collapse_limit = 14
>
> “about 12” is too close to default limit, 12.
>
> Regards,
>
> Igor Neyman

Hi Igor,

Is "set geqo = off;" not enough to fully disable geqo?
I know what is geqo and know about limit when it is in action. Moreover, I actually tried to set these parameters to 100 and it doesn't help

> Hi Игорь
>
> I suggest running Explain and Analyze to see what the actual query results vs the planner are ,
>
> Post the SQL code  

Hi Justin,
let me show 2 variants of "explain analyze" which differs only by actual rows returned by inner node (9th row):

1st case:  https://explain.depesz.com/s/lA4f
45358 rows actually returned and postgres decided to join each row of 45358 set with each row in yankee_foxtrot using seq scan:
  Seq Scan on yankee_foxtrot foxtrot_bravo (cost=0.000..267.670 rows=7,467 width=13) (actual time=0.003..1.090 rows=7,467 loops=45,358)
 and then filter it:
  Rows Removed by Join Filter: 338685224
it was an awful choice =)

2st: case: https://explain.depesz.com/s/zkKY
4130 rows returned and now index only scan in action


Why planner mistakes in determining the number of rows (every time planner expects only 1 row) in this step I can understand - inner nodes do some joins (inner and outer with filtration) and it's hard to predict result.
But what I can't understand is why seq scan when it is always slower than index. Forget to mention that join condition is by unique key. So, when planner expects only 1 row then it must join only one row from second table!

>
> Also pull this part of the query out and run it by itself to see if the "Planner" changes  how  it joins these two tables.  
>
> May need to increase the statistics collected
> https://www.postgresql.org/docs/12/planner-stats.html
>
> also read this
> https://www.postgresql.org/docs/12/explicit-joins.html

In this step of planning statistics can't help (let me know if I'm wrong). Using stats, planner knows everything about second table (yankee_foxtrot) but It can only suggest approximate stats of first set (and it actually wrongs about it)
And yes I know about the possibility of explicit joining

> forgot to state
>
> Generally, it's index scan -> bitmap index scan -> sequential scan, the more rows  expected to be retrieved to number of rows in the table the more likely the planner will go to sequential scan

The key word is "expected" here I assume and according to expectation of planner it must be only 1 row...


So, to conclude: I know how to make my query faster (how to exclude this "seq scan") but what I really what to understand: WHY seq scan is in my plan? Why planner thinks it's the best choice?


Reply | Threaded
Open this post in threaded view
|

Re: Weird seqscan node plan

Andrei Zhidenkov
At this point I disagree. It’s faster to fetch one row using seq scan that using index scan as well as fetching number of consecutive rows is faster via seq scan. Index scan is not always faster.

On 27. Nov 2019, at 04:53, Игорь Выскорко <[hidden email]> wrote:

Why planner mistakes in determining the number of rows (every time planner expects only 1 row) in this step I can understand - inner nodes do some joins (inner and outer with filtration) and it's hard to predict result.
But what I can't understand is why seq scan when it is always slower than index.

Reply | Threaded
Open this post in threaded view
|

Re: Weird seqscan node plan

Игорь Выскорко


27.11.2019, 15:42, "Andrei Zhidenkov" <[hidden email]>:
> At this point I disagree. It’s faster to fetch one row using seq scan that using index scan as well as fetching number of consecutive rows is faster via seq scan. Index scan is not always faster.
>

Yes, you are right in common: Index scan is not always faster.
But in my current case I have table with ~8k tuples (309 relpages) and to find 1 row in the worst case (when this row in the last page) we need 309 heap fetches.
For the same table to find one unique tuple in index we need about 4 (not sure about this number) index fetches and 1 heap fetch. That's why I decided that index scan is faster.

When I was thinking about your point I looked at pg_class table to determine relpages for both index and table. Index had more than 700 pages... Index bloat?
So, I dropped and recreated index. Now it takes only 33 pages. And yes, my plan is now using index only scan

Ok, last question here - even with bloated index overall number of index fetches must be much lower than 309. Am I wrong?