BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16885
Logged by:          Tavin Cole
Email address:      [hidden email]
PostgreSQL version: 13.2
Operating system:   Amazon Linux 2
Description:        

We're trying to upgrade from the version 9 series and have a deal-breaking
slow query, which runs okay in 10 and 11 but is many times slower in 12+.
I've tested across minor versions from 12.1 through 12.6, and the minor
version doesn't affect it. I've also found the same behavior in 13.2.

The problem lies in the planner choosing a nested loop join instead of the
hash join it should use. Please see this stackoverflow post for the explain
analyze query plans illustrating the difference:
https://stackoverflow.com/questions/66248669/postgresql-upgrade-to-12-changes-hash-join-to-slow-nested-loop

I note the nested loop join filter removes some 60 million rows. This is 60x
the row count of the tables involved.

Postgres is installed from the official pgdg rpm. The upgrade in our test
environment is done with pg_upgrade. Then a full analyze is done before
testing the query. I've tried setting default_statistics_target as high as
500 with no improvement. In fact the query only runs fast before the
statistics are gathered.

Thanks for your attention!

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Magnus Hagander-2
On Mon, Feb 22, 2021 at 11:46 AM PG Bug reporting form
<[hidden email]> wrote:

>
> The following bug has been logged on the website:
>
> Bug reference:      16885
> Logged by:          Tavin Cole
> Email address:      [hidden email]
> PostgreSQL version: 13.2
> Operating system:   Amazon Linux 2
> Description:
>
> We're trying to upgrade from the version 9 series and have a deal-breaking
> slow query, which runs okay in 10 and 11 but is many times slower in 12+.
> I've tested across minor versions from 12.1 through 12.6, and the minor
> version doesn't affect it. I've also found the same behavior in 13.2.
>
> The problem lies in the planner choosing a nested loop join instead of the
> hash join it should use. Please see this stackoverflow post for the explain
> analyze query plans illustrating the difference:
> https://stackoverflow.com/questions/66248669/postgresql-upgrade-to-12-changes-hash-join-to-slow-nested-loop
>
> I note the nested loop join filter removes some 60 million rows. This is 60x
> the row count of the tables involved.
>
> Postgres is installed from the official pgdg rpm. The upgrade in our test
> environment is done with pg_upgrade. Then a full analyze is done before
> testing the query. I've tried setting default_statistics_target as high as
> 500 with no improvement. In fact the query only runs fast before the
> statistics are gathered.
>
> Thanks for your attention!

You'll have to post your actual query to get proper feedback.

That said, a typical case for this specifically when going to version
12 would be that CTEs are no longer optimization barriers. If you do
use CTEs in your query, try changing them from "WITH x AS (...)" to
"WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
see if that improves things.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Tavin Cole
On Mon, Feb 22, 2021 at 11:56 AM Magnus Hagander <[hidden email]> wrote:

You'll have to post your actual query to get proper feedback.

That said, a typical case for this specifically when going to version
12 would be that CTEs are no longer optimization barriers. If you do
use CTEs in your query, try changing them from "WITH x AS (...)" to
"WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
see if that improves things.

I don't know if I will be allowed to publicly post the actual query. I can say it's quite complicated, and involves the following features:

* an inner SELECT DISTINCT from a particular primary table and some joins
* a join to the same primary table in the outer query, among others
* a join to a view in the outer query
* said view uses CTEs like you say, though based on unrelated tables
* then said view makes a UNION of 2 different SELECTs
* each of these again joins to the same primary table

So I'm not surprised the query planner gets confused. And your trick works. Materializing those CTEs makes the query planner choose the hash join again.

I still think there is a bug here though. The query planner is getting something really wrong, and at least now we know it's a consequence of trying to optimize through the CTEs.

I will see what I can do about sharing the actual query. If this is interesting to a pg developer, you're welcome to contact me.

Best,
Tavin
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Pavel Stehule


po 22. 2. 2021 v 15:16 odesílatel Tavin Cole <[hidden email]> napsal:
On Mon, Feb 22, 2021 at 11:56 AM Magnus Hagander <[hidden email]> wrote:

You'll have to post your actual query to get proper feedback.

That said, a typical case for this specifically when going to version
12 would be that CTEs are no longer optimization barriers. If you do
use CTEs in your query, try changing them from "WITH x AS (...)" to
"WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
see if that improves things.

I don't know if I will be allowed to publicly post the actual query. I can say it's quite complicated, and involves the following features:

* an inner SELECT DISTINCT from a particular primary table and some joins
* a join to the same primary table in the outer query, among others
* a join to a view in the outer query
* said view uses CTEs like you say, though based on unrelated tables
* then said view makes a UNION of 2 different SELECTs
* each of these again joins to the same primary table

So I'm not surprised the query planner gets confused. And your trick works. Materializing those CTEs makes the query planner choose the hash join again.

I still think there is a bug here though. The query planner is getting something really wrong, and at least now we know it's a consequence of trying to optimize through the CTEs.

I will see what I can do about sharing the actual query. If this is interesting to a pg developer, you're welcome to contact me.

The optimizer is controlled by estimations. With bad estimations, then the result of the planner is random. Sometimes it is better, sometimes it is worse. Materialized CTE works as an optimization barrier. In these cases it can help - and in others the using materialized CTE can be worse.

you don't need share query - just send an execution plan - https://explain.depesz.com/ can do anonymization of the execution plan, if it is necessary.

Regards

Pavel



Best,
Tavin
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Tavin Cole
On Mon, Feb 22, 2021 at 3:21 PM Pavel Stehule <[hidden email]> wrote:
you don't need share query - just send an execution plan - https://explain.depesz.com/ can do anonymization of the execution plan, if it is necessary.


Cheers,
Tavin 
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Pavel Stehule


po 22. 2. 2021 v 15:25 odesílatel Tavin Cole <[hidden email]> napsal:
On Mon, Feb 22, 2021 at 3:21 PM Pavel Stehule <[hidden email]> wrote:
you don't need share query - just send an execution plan - https://explain.depesz.com/ can do anonymization of the execution plan, if it is necessary.


You can see lot of underestimations and overestimations

Hash Join (cost=15,018.84..89,129.05 rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056 loops=1) 

But probably the significant problem is underestimation of aggregation

HashAggregate (cost=275,174.24..278,605.86 rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462 loops=57)

The estimated cost is less than in real cost, and then the cost of a nested loop can be cheaper.

Regards

Pavel






Cheers,
Tavin 
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Tavin Cole
On Mon, Feb 22, 2021 at 4:37 PM Pavel Stehule <[hidden email]> wrote:

You can see lot of underestimations and overestimations

Hash Join (cost=15,018.84..89,129.05 rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056 loops=1) 

But probably the significant problem is underestimation of aggregation

HashAggregate (cost=275,174.24..278,605.86 rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462 loops=57)

The estimated cost is less than in real cost, and then the cost of a nested loop can be cheaper.


Yes this must be the union in the view I mentioned. Both halves of the union select from the same table with different where clauses, in such a way that together virtually all of the 1+ million rows are returned. But it seems the planner estimates that a majority of the rows will be filtered out before applying the union.

Now I see that the planner makes the same miscalculation in the v11 plan, but at a point in the execution plan where it is harmless.

It seems postgres was never able to successfully estimate this query, but in the past we got lucky with the optimization barrier created by the CTEs.

I wonder if this is a situation which the planner might be able to successfully estimate in the future, with some improvements, or if it is just impossible because the SQL is unoptimized?

Thanks all for the help.

Best,
Tavin

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Pavel Stehule


po 22. 2. 2021 v 20:31 odesílatel Tavin Cole <[hidden email]> napsal:
On Mon, Feb 22, 2021 at 4:37 PM Pavel Stehule <[hidden email]> wrote:

You can see lot of underestimations and overestimations

Hash Join (cost=15,018.84..89,129.05 rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056 loops=1) 

But probably the significant problem is underestimation of aggregation

HashAggregate (cost=275,174.24..278,605.86 rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462 loops=57)

The estimated cost is less than in real cost, and then the cost of a nested loop can be cheaper.


Yes this must be the union in the view I mentioned. Both halves of the union select from the same table with different where clauses, in such a way that together virtually all of the 1+ million rows are returned. But it seems the planner estimates that a majority of the rows will be filtered out before applying the union.

Now I see that the planner makes the same miscalculation in the v11 plan, but at a point in the execution plan where it is harmless.

It seems postgres was never able to successfully estimate this query, but in the past we got lucky with the optimization barrier created by the CTEs.

I wonder if this is a situation which the planner might be able to successfully estimate in the future, with some improvements, or if it is just impossible because the SQL is unoptimized?

It is a difficult question - some estimation errors can be fixed - some not. It depends on data, data models (some data models are unhappy designed). Sometimes it is better to break a query to some subqueries,  store results to temp tables, do analysis over these temp tables, and then continue the query. This usually fix estimation errors well, but there is an overhead of temp tables.

Tomas Vondra is working on expression statistics, and I can say so estimation has made big progress in the last three years. But the precision of any estimation will be limited.



Thanks all for the help.

Best,
Tavin