Simple IN vs IN values performace

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple IN vs IN values performace

Oleksandr Voytsekhovskyy
Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax
Field IN (VALUES(1465), (1478), ...
Instead of 
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

->  HashAggregate  (cost=5.78..9.62 rows=385 width=4)
       Group Key: ""*VALUES*"".column1" 
                   ->  Values Scan on ""*VALUES*""  (cost=0.00..4.81 rows=385 width=4)" 
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

David G Johnston
On Monday, February 22, 2021, Oleksandr Voytsekhovskyy <[hidden email]> wrote:
What is the right way to pass long INT values list to IN filter

Don’t.

Pass in a delimited string, then parse that string into an array and use “= any(array)”.

This has the primary benefit of making the input a single parameter.

David J.
 
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Michael Lewis
Wouldn't using “= any(array)” change how the query is planned? Or is the concern just parsing the values?
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Ron-2
In reply to this post by Oleksandr Voytsekhovskyy
On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:
Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax
Field IN (VALUES(1465), (1478), ...
Instead of 
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

Insert the values into a TEMPORARY TABLE, then join that to your main table?


So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

->  HashAggregate  (cost=5.78..9.62 rows=385 width=4)
       Group Key: ""*VALUES*"".column1" 
                   ->  Values Scan on ""*VALUES*""  (cost=0.00..4.81 rows=385 width=4)" 
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Michael Lewis
Insert the values into a TEMPORARY TABLE, then join that to your main table?

In my experience, this is very performant but needs an analyze command after populating the temp table to ensure there are statistics so the plan doesn't go awry. Otherwise, I'm not sure it is different from a materialized CTE or a subquery with OFFSET 0 at the end.
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Alexander Voytsekhovskyy
Greetings,

Didn’t get your ideas with  materialized CTE or a subquery with OFFSET 0

Could you please show simple example?

23 февр. 2021 г., в 04:33, Michael Lewis <[hidden email]> написал(а):

Insert the values into a TEMPORARY TABLE, then join that to your main table?

In my experience, this is very performant but needs an analyze command after populating the temp table to ensure there are statistics so the plan doesn't go awry. Otherwise, I'm not sure it is different from a materialized CTE or a subquery with OFFSET 0 at the end.

Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Alexander Voytsekhovskyy
In reply to this post by Ron-2
Thanks for suggestion

with tmp tables there are another issue - there are already 5-6 tables and 2-3 IN filters. If i will replace them with tmp tables it may hit query planner limits and it will become to produce terrible query plans, for example when genetic query optimizer starts

On Tue, Feb 23, 2021 at 1:45 AM Ron <[hidden email]> wrote:
On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:
Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax
Field IN (VALUES(1465), (1478), ...
Instead of 
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

Insert the values into a TEMPORARY TABLE, then join that to your main table?


So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

->  HashAggregate  (cost=5.78..9.62 rows=385 width=4)
       Group Key: ""*VALUES*"".column1" 
                   ->  Values Scan on ""*VALUES*""  (cost=0.00..4.81 rows=385 width=4)" 
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Pavel Stehule
Hi

út 23. 2. 2021 v 10:36 odesílatel Alexander Voytsekhovskyy <[hidden email]> napsal:
Thanks for suggestion

with tmp tables there are another issue - there are already 5-6 tables and 2-3 IN filters. If i will replace them with tmp tables it may hit query planner limits and it will become to produce terrible query plans, for example when genetic query optimizer starts

you can increase these limits - they are relatively low, and can be increased on modern CPU.


Regards

Pavel


On Tue, Feb 23, 2021 at 1:45 AM Ron <[hidden email]> wrote:
On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:
Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax
Field IN (VALUES(1465), (1478), ...
Instead of 
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

Insert the values into a TEMPORARY TABLE, then join that to your main table?


So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

->  HashAggregate  (cost=5.78..9.62 rows=385 width=4)
       Group Key: ""*VALUES*"".column1" 
                   ->  Values Scan on ""*VALUES*""  (cost=0.00..4.81 rows=385 width=4)" 
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

Re: Simple IN vs IN values performace

Alexander Voytsekhovskyy
In reply to this post by David G Johnston
Greetings,

i have tested both options ANY with string + parsing and simple array - and there are cases when execution time 100+ times worse than IN

On Tue, Feb 23, 2021 at 12:23 AM David G. Johnston <[hidden email]> wrote:
On Monday, February 22, 2021, Oleksandr Voytsekhovskyy <[hidden email]> wrote:
What is the right way to pass long INT values list to IN filter

Don’t.

Pass in a delimited string, then parse that string into an array and use “= any(array)”.

This has the primary benefit of making the input a single parameter.

David J.