15x slower PreparedStatement vs raw query

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

15x slower PreparedStatement vs raw query

Alex-10

PreparedStatement: 15s
Raw query with embedded params: 1s

See issue on github with query and explain analyze:



Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Justin Pryzby
On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:
> PreparedStatement: 15s
> Raw query with embedded params: 1s
> See issue on github with query and explain analyze:
> https://github.com/pgjdbc/pgjdbc/issues/2145

| ..PostgreSQL Version?  12
|Prepared statement
|...
|Planning Time: 11.596 ms
|Execution Time: 14799.266 ms
|
|Raw statement
|Planning Time: 22.685 ms
|Execution Time: 1012.992 ms

The prepared statemnt has 2x faster planning time, which is what it's meant to
improve.

The execution time is slower, and I think you can improve it with this.
https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE
|plan_cache_mode (enum)
|    Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a generic plan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may be inefficient. The choice between these options is normally made automatically, but it can be overridden with plan_cache_mode. The allowed values are auto (the default), force_custom_plan and force_generic_plan. This setting is considered when a cached plan is to be executed, not when it is prepared. For more information see PREPARE.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Alex-10
Shouldn't this process be automatic based on some heuristics?

Saving 10ms planning but costing 14s execution is catastrophic.

For example, using some statistics to limit planner time to some percent of of previous executions. 
This way, if query is fast, planning is fast, but if query is slow, more planning can save huge execution time.
This is a better general usage option and should be enabled by default, and users who want fast planning should set the variable to use the generic plan.



Justin Pryzby wrote:


On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:

PreparedStatement: 15s
Raw query with embedded params: 1s
See issue on github with query and explain analyze:


| ..PostgreSQL Version?  12
|Prepared statement
|...
|Planning Time: 11.596 ms
|Execution Time: 14799.266 ms
|
|Raw statement
|Planning Time: 22.685 ms
|Execution Time: 1012.992 ms

The prepared statemnt has 2x faster planning time, which is what it's meant to
improve.

The execution time is slower, and I think you can improve it with this.
|plan_cache_mode (enum)
|    Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a generic plan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may be inefficient. The choice between these options is normally made automatically, but it can be overridden with plan_cache_mode. The allowed values are auto (the default), force_custom_plan and force_generic_plan. This setting is considered when a cached plan is to be executed, not when it is prepared. For more information see PREPARE.

--
Justin
Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Rick Otten-2


On Tue, May 4, 2021 at 6:05 AM Alex <[hidden email]> wrote:
Shouldn't this process be automatic based on some heuristics?

Saving 10ms planning but costing 14s execution is catastrophic.

For example, using some statistics to limit planner time to some percent of of previous executions. 
This way, if query is fast, planning is fast, but if query is slow, more planning can save huge execution time.
This is a better general usage option and should be enabled by default, and users who want fast planning should set the variable to use the generic plan.



"fast" and "slow" are relative things.  There are many queries that I would be overjoyed with if they completed in 5 _minutes_.  And others where they have to complete within 100ms or something is really wrong.  We don't really know what the execution time is until the query actually executes.  Planning is a guess for the best approach.

Another factor is whether the data is in cache or out on disk.  Sometimes you don't really know until you try to go get it.  That can significantly change query performance and plans - especially if some of the tables in a query with a lot of joins are in cache and some aren't and maybe some have to be swapped out to pick up others.

If you are running the same dozen queries with different but similarly scoped parameters over and over, one would hope that the system would slowly tune itself to be highly optimized for those dozen queries.  That is a pretty narrow use case for a powerful general purpose relational database though.

Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Alex-10
"Powerful general purpose relational database" but not smart... 

I propose a feature to use information from previously executed queries to adjust the query plan time accordingly.
Reusing the same generic plan may and will lead to very long execution times.



Rick Otten wrote:




On Tue, May 4, 2021 at 6:05 AM Alex <[hidden email]> wrote:
Shouldn't this process be automatic based on some heuristics?

Saving 10ms planning but costing 14s execution is catastrophic.

For example, using some statistics to limit planner time to some percent of of previous executions. 
This way, if query is fast, planning is fast, but if query is slow, more planning can save huge execution time.
This is a better general usage option and should be enabled by default, and users who want fast planning should set the variable to use the generic plan.



"fast" and "slow" are relative things.  There are many queries that I would be overjoyed with if they completed in 5 _minutes_.  And others where they have to complete within 100ms or something is really wrong.  We don't really know what the execution time is until the query actually executes.  Planning is a guess for the best approach.

Another factor is whether the data is in cache or out on disk.  Sometimes you don't really know until you try to go get it.  That can significantly change query performance and plans - especially if some of the tables in a query with a lot of joins are in cache and some aren't and maybe some have to be swapped out to pick up others.

If you are running the same dozen queries with different but similarly scoped parameters over and over, one would hope that the system would slowly tune itself to be highly optimized for those dozen queries.  That is a pretty narrow use case for a powerful general purpose relational database though.

Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Laurenz Albe
On Tue, 2021-05-04 at 13:59 +0000, Alex wrote:
> "Powerful general purpose relational database" but not smart...

Too smart can easily become slow...

> I propose a feature to use information from previously executed queries to adjust the query plan time accordingly.
> Reusing the same generic plan may and will lead to very long execution times.

AI can go wrong too, and I personally would be worried that such cases
are very hard to debug...

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Vijaykumar Jain-2
I am not an expert on this, But I would like to take a shot :)

Is it possible to share your prepared statement and parameter types.
I mean 

something like this 
PREPARE usrrptplan (int) AS
    SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
    AND l.date = $2;
EXECUTE usrrptplan(1, current_date);

It's just that sometimes the datatypes of the prepared statement params are not the same as the datatype of the field in the join and as a result it may add some overhead.
There was one more thread where a person has similar issues, which was sorted by using the relevant field type in the prepared field.


bPMA | explain.depesz.com  -> slow (prepared)  Row 8
TsNn | explain.depesz.com     -> fast (direct)   Row 8
It seems the join filters in the prepared version are doing a lot of work on the fields massaging the fields that may add the cost overhead,
 

Also, if the above does not work, can you try the below plan GUC to check if you see any improvements.


Thanks,
Vijay

On Tue, 4 May 2021 at 20:52, Laurenz Albe <[hidden email]> wrote:
On Tue, 2021-05-04 at 13:59 +0000, Alex wrote:
> "Powerful general purpose relational database" but not smart...

Too smart can easily become slow...

> I propose a feature to use information from previously executed queries to adjust the query plan time accordingly.
> Reusing the same generic plan may and will lead to very long execution times.

AI can go wrong too, and I personally would be worried that such cases
are very hard to debug...

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com





--
Thanks,
Vijay
Mumbai, India
Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

David Rowley
In reply to this post by Alex-10
On Tue, 4 May 2021 at 22:05, Alex <[hidden email]> wrote:
> Shouldn't this process be automatic based on some heuristics?

When plan_cache_mode is set to "auto", then the decision to use a
generic or custom plan is cost-based. See [1]. There's a fairly crude
method there for estimating the effort required to replan the query.
The remainder is based on the average cost of the previous custom
plans + estimated planning effort vs cost of the generic plan.  The
cheaper one wins.

Certainly, what's there is far from perfect.  There are various
problems with it.  The estimated planning cost is pretty crude and
could do with an overhaul.   There are also issues with the plan costs
not being true to the cost of the query.  One problem there is that
run-time partition pruning is not costed into the plan.  This might
cause choose_custom_plan() to pick a custom plan when a generic one
with run-time pruning might have been better.

In order to get a better idea of where things are going wrong for you,
we'd need to see the EXPLAIN ANALYZE output for both the custom and
the generic plan.

David

[1] https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1019


Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Justin Pryzby
In reply to this post by Justin Pryzby
On Mon, May 03, 2021 at 03:18:11PM -0500, Justin Pryzby wrote:

> On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:
> > PreparedStatement: 15s
> > Raw query with embedded params: 1s
> > See issue on github with query and explain analyze:
> > https://github.com/pgjdbc/pgjdbc/issues/2145
>
> | ..PostgreSQL Version?  12
> |Prepared statement
> |...
> |Planning Time: 11.596 ms
> |Execution Time: 14799.266 ms
> |
> |Raw statement
> |Planning Time: 22.685 ms
> |Execution Time: 1012.992 ms
>
> The prepared statemnt has 2x faster planning time, which is what it's meant to
> improve.
>
> The execution time is slower, and I think you can improve it with this.
> https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE

Also, the rowcount estimates are way off starting with the scan nodes.

  ->  Bitmap Heap Scan on category_property_name cpn_limits  (cost=32.13..53.55 rows=14 width=29) (actual time=0.665..8.822 rows=2650 loops=1)
        Recheck Cond: ((lexeme = ANY ('{''rata'',""''polling'' ''rata'' ''ratez''"",""''polling'' ''rata''"",""''rata'' ''ratez'' ''semnal'' ''usb-ul''""}'::tsvector[])) OR (lexeme = '''frecventa'' ''frecventez'''::tsvector) OR (lexeme = '''raportare'' ''rata'' ''ratez'''::tsvector) OR (lexeme = ANY ('{''latime'',""''latime'' ''placi''"",""''compatibila'' ''latime'' ''telefon''""}'::tsvector[])) OR (lexeme = '''lungime'''::tsvector) OR (lexeme = '''cablu'' ''lungime'''::tsvector) OR (lexeme = '''inaltime'''::tsvector) OR (lexeme = '''rezolutie'''::tsvector) OR (lexeme = '''greutate'''::tsvector))
        Heap Blocks: exact=85
        ->  BitmapOr  (cost=32.13..32.13 rows=14 width=0) (actual time=0.574..0.577 rows=0 loops=1)
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..9.17 rows=4 width=0) (actual time=0.088..0.089 rows=10 loops=1)
                    Index Cond: (lexeme = ANY ('{''rata'',""''polling'' ''rata'' ''ratez''"",""''polling'' ''rata''"",""''rata'' ''ratez'' ''semnal'' ''usb-ul''""}'::tsvector[]))
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.047..0.047 rows=171 loops=1)
                    Index Cond: (lexeme = '''frecventa'' ''frecventez'''::tsvector)                                                                                                                                                              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.015..0.015 rows=1 loops=1)                                                                                                   Index Cond: (lexeme = '''raportare'' ''rata'' ''ratez'''::tsvector)
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..6.88 rows=3 width=0) (actual time=0.097..0.097 rows=547 loops=1)                                                                                                 Index Cond: (lexeme = ANY ('{''latime'',""''latime'' ''placi''"",""''compatibila'' ''latime'' ''telefon''""}'::tsvector[]))
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.107..0.107 rows=604 loops=1)                                                                                                 Index Cond: (lexeme = '''lungime'''::tsvector)
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.030..0.030 rows=137 loops=1)
                    Index Cond: (lexeme = '''cablu'' ''lungime'''::tsvector)
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.079..0.079 rows=479 loops=1)                                                                                                 Index Cond: (lexeme = '''inaltime'''::tsvector)
              ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.020..0.020 rows=40 loops=1)
                    Index Cond: (lexeme = '''rezolutie'''::tsvector)                                                                                                                                                                             ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.088..0.088 rows=661 loops=1)                                                                                                 Index Cond: (lexeme = '''greutate'''::tsvector)


Reply | Threaded
Open this post in threaded view
|

Re: 15x slower PreparedStatement vs raw query

Alex-10
In reply to this post by David Rowley
This is exactly my issue.

Using raw query, planning takes 22ms (custom plan), but using PreparedStatement planning takes 11ms (generic plan).
It choose the faster generic plan 11ms, wining 11ms faster than custom plan, but loosing 14seconds!!! to execution...

The auto choose algorithm should be changed to include execution time in the decision.

On Wednesday, May 5, 2021, 9:57:20 AM GMT+3, David Rowley <[hidden email]> wrote:


On Tue, 4 May 2021 at 22:05, Alex <[hidden email]> wrote:
> Shouldn't this process be automatic based on some heuristics?

When plan_cache_mode is set to "auto", then the decision to use a
generic or custom plan is cost-based. See [1]. There's a fairly crude
method there for estimating the effort required to replan the query.
The remainder is based on the average cost of the previous custom
plans + estimated planning effort vs cost of the generic plan.  The
cheaper one wins.

Certainly, what's there is far from perfect.  There are various
problems with it.  The estimated planning cost is pretty crude and
could do with an overhaul.  There are also issues with the plan costs
not being true to the cost of the query.  One problem there is that
run-time partition pruning is not costed into the plan.  This might
cause choose_custom_plan() to pick a custom plan when a generic one
with run-time pruning might have been better.

In order to get a better idea of where things are going wrong for you,
we'd need to see the EXPLAIN ANALYZE output for both the custom and
the generic plan.

David


[1]
https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1019