PostgreSQL 10.0 SELECT LIMIT performance problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

PostgreSQL 10.0 SELECT LIMIT performance problem

Mareks Kalnačs

Hi!

 

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

 

We have select with subselect filter:

select a.id, a.jdata

from oss_alarms a

where

                a.jdata->>'dn' in

                (

                    select o.jdata->>'ossDn'   

                    from oss_objects o, tvc_entity e

                    where e.jtype='object'   

                                                and o.jdata->>'sid'=e.jdata->>'siteId'      

                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')

                )

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc

limit 20;

 

Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.

Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.

 

Version string    PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit

 

I’m gathered some technical information about DB structure and execution (see attachment).

 

 

 

Mareks Kalnačs

Software Engineer

Software Development Department

T:             +371 67628888

M:            +371 26479242
@:            [hidden email]

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

 



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia



technical-data.txt (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Victor Yegorov
ср, 12 сент. 2018 г. в 11:39, Mareks Kalnačs <[hidden email]>:

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.


Hi, Mareks.

As this is not actually a bug, it's better to use pgsql-performance or pgsql-general for such questions next time.


We have select with subselect filter:

select a.id, a.jdata

from oss_alarms a

where

                a.jdata->>'dn' in

                (

                    select o.jdata->>'ossDn'   

                    from oss_objects o, tvc_entity e

                    where e.jtype='object'   

                                                and o.jdata->>'sid'=e.jdata->>'siteId'      

                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')

                )

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc

limit 20;


As execution plan without limit shows, your join conditions yield 316 rows, explicit sort is fast for this amount

For the plan with the limit, planner prefers to use `idx_oss_alarms_alarm_time` index, to avoid extra sort.
This leads to:
        Rows Removed by Join Filter: 10717797

I.e. you're reading 10M rows via index scan and later throw them away, as they do not match your join condition:
        Join Filter: ((a.jdata ->> 'dn'::text) = (o.jdata ->> 'ossDn'::text))

It looks like a.jdata->'dn' and a.jdata->'alarmTime' are correlated, although planner doesn't knows that.
PostgreSQL 10 has `CREATE STATISTICS`, but it works on table columns, not expressions.
You can try disabling `idx_oss_alarms_alarm_time` by using expression in `ORDER BY`:

    order by (idx_oss_alarms_alarm_timetvc_convert_array_to_date(a.jdata -> 'alarmTime'))+INTERVAL '0' desc

But be warned — this can help in some situations and make things worse in others.

I would recommend to move JOIN and ORDER BY columns out of JSON and make them direct table columns.

 
--
Victor Yegorov

image003.png (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Pavel Stehule
In reply to this post by Mareks Kalnačs


2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <[hidden email]>:

Hi!

 

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

 

We have select with subselect filter:

select a.id, a.jdata

from oss_alarms a

where

                a.jdata->>'dn' in

                (

                    select o.jdata->>'ossDn'   

                    from oss_objects o, tvc_entity e

                    where e.jtype='object'   

                                                and o.jdata->>'sid'=e.jdata->>'siteId'      

                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')

                )

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc

limit 20;

 

Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.

Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.

 

Version string    PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit

 

I’m gathered some technical information about DB structure and execution (see attachment).


Sometimes LIMIT clause can confuse optimizator, when data are not uniform.

You can try OFFSET 0 trick:

Original query: SELECT * FROM t ORDER BY c LIMIT 10

transform to:

SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;



 

 

 

Mareks Kalnačs

Software Engineer

Software Development Department

T:             +371 67628888

M:            +371 26479242
@:            [hidden email]

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

 



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia



Reply | Threaded
Open this post in threaded view
|

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

Mareks Kalnačs

Hi Pavel!

 

He know how to trick SQL, but this not a solution. We have different filters with different distribution fields, for example, jdata->>'dn' distribution is 2.8%, but jdata->>’tech’ 25%, this means we must get count and based on count modify select. This is DB server job, not an application server job.

 

Best regards,

 

Mareks Kalnačs

Software Engineer

Software Development Department

T:             +371 67628888

M:            +371 26479242
@:            [hidden email]

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

 

 

 

 

From: Pavel Stehule <[hidden email]>
Sent: trešdiena, 2018. gada 12. septembris 15:43
To: Mareks Kalnačs <[hidden email]>
Cc: [hidden email]; Māris Rucis <[hidden email]>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

 

 

 

2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <[hidden email]>:

Hi!

 

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

 

We have select with subselect filter:

select a.id, a.jdata

from oss_alarms a

where

                a.jdata->>'dn' in

                (

                    select o.jdata->>'ossDn'   

                    from oss_objects o, tvc_entity e

                    where e.jtype='object'   

                                                and o.jdata->>'sid'=e.jdata->>'siteId'      

                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')

                )

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc

limit 20;

 

Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.

Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.

 

Version string    PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit

 

I’m gathered some technical information about DB structure and execution (see attachment).

 

Sometimes LIMIT clause can confuse optimizator, when data are not uniform.

 

You can try OFFSET 0 trick:

 

Original query: SELECT * FROM t ORDER BY c LIMIT 10

 

transform to:

 

SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;

 

 

 

 

 

 

Mareks Kalnačs

Software Engineer

Software Development Department

T:             +371 67628888

M:            +371 26479242
@:            [hidden email]

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

 



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia


 



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia


Reply | Threaded
Open this post in threaded view
|

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

Mareks Kalnačs
In reply to this post by Victor Yegorov

Hi Victor!

 

I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.

 

Subselect:

select o.jdata->>'ossDn'   

from oss_objects o, tvc_entity e

where e.jtype='object'   

     and e.jdata->>'siteId' IS NOT NULL

     and o.jdata->>'sid'=e.jdata->>'siteId'      

     and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%za%')

 

oss_objects.jdata->>’sid’ distribution is 4.2% , null value count 319, total rows 31693, but execution plan always think that real rows returned will be less than actual:

Gather  (cost=1014.16..5280.19 rows=9 width=32) (actual time=1.304..18.982 rows=1291 loops=1)

  ->  Nested Loop  (cost=14.16..4279.28 rows=5 width=32) (actual time=3.172..14.497 rows=646 loops=2)

        ->  Parallel Bitmap Heap Scan on tvcis.tvc_entity e  (cost=13.87..1694.08 rows=149 width=208) (actual time=3.017..11.597 rows=36 loops=2)

        ->  Index Scan using idx_oss_objects_sid on tvcis.oss_objects o  (cost=0.29..17.12 rows=23 width=951) (actual time=0.015..0.054 rows=18 loops=71)

 

If oss_objects.jdata->>’sid’ distribution is 4.2% then for every unique tvc_entity.jdata->>’siteId’ will be selected 31693 * 4.2% = 23 rows, this means join must return more rows than select from tvc_entity, what is real result, but optimized think in reverse.

 

We have another example, where we don’t understand how to solve it, and this may not a bug. In this example we have full text search index within one table, and, when we hit value with low hit count, select runs slow:

select a.id, a.jdata

from tvcis.oss_alarms a

where (

                                to_tsvector(

                                                'simple'::regconfig,

                                                a.ts_vector_fields

                                ) @@ to_tsquery(

                                                'simple',

                                                ' (decodedType5b6f0753ossAlarmsGenerator) '

                                )

)

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) DESC

limit 20;

 

Bad case:

Sort  (cost=8787.37..8799.58 rows=4885 width=809) (actual time=7.351..7.423 rows=586 loops=1)

Without limit:

Planning time: 0.301 ms

Execution time: 7.574 ms

With limit:

Planning time: 0.312 ms

Execution time: 4972.245 ms

Good case:

From count:

->  Parallel Bitmap Heap Scan on tvcis.oss_alarms a  (cost=5080.23..455246.09 rows=248615 width=0) (actual time=1142.280..175706.097 rows=199195 loops=3)

Without limit:

Too many rows to test, get count instead

With limit:

Planning time: 0.370 ms

Execution time: 43.131 ms

 

But counts is a nightmare:

Planning time: 0.236 ms

Execution time: 176343.730 ms

 

Couple examples of explain plan attached.

 

Best regards,

Mareks Kalnačs

Software Engineer

Software Development Department

T:             +371 67628888

M:            +371 26479242
@:            [hidden email]

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

 

 

 

From: Victor Yegorov <[hidden email]>
Sent: trešdiena, 2018. gada 12. septembris 15:32
To: Mareks Kalnačs <[hidden email]>
Cc: [hidden email]; Māris Rucis <[hidden email]>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

 

ср, 12 сент. 2018 г. в 11:39, Mareks Kalnačs <[hidden email]>:

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

 

Hi, Mareks.

 

As this is not actually a bug, it's better to use pgsql-performance or pgsql-general for such questions next time.

 

 

We have select with subselect filter:

select a.id, a.jdata

from oss_alarms a

where

                a.jdata->>'dn' in

                (

                    select o.jdata->>'ossDn'   

                    from oss_objects o, tvc_entity e

                    where e.jtype='object'   

                                                and o.jdata->>'sid'=e.jdata->>'siteId'      

                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')

                )

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc

limit 20;

 

As execution plan without limit shows, your join conditions yield 316 rows, explicit sort is fast for this amount

 

For the plan with the limit, planner prefers to use `idx_oss_alarms_alarm_time` index, to avoid extra sort.

This leads to:

        Rows Removed by Join Filter: 10717797

 

I.e. you're reading 10M rows via index scan and later throw them away, as they do not match your join condition:

        Join Filter: ((a.jdata ->> 'dn'::text) = (o.jdata ->> 'ossDn'::text))

 

It looks like a.jdata->'dn' and a.jdata->'alarmTime' are correlated, although planner doesn't knows that.

PostgreSQL 10 has `CREATE STATISTICS`, but it works on table columns, not expressions.

You can try disabling `idx_oss_alarms_alarm_time` by using expression in `ORDER BY`:

 

    order by (idx_oss_alarms_alarm_timetvc_convert_array_to_date(a.jdata -> 'alarmTime'))+INTERVAL '0' desc

 

But be warned — this can help in some situations and make things worse in others.

 

I would recommend to move JOIN and ORDER BY columns out of JSON and make them direct table columns.

 

--

Victor Yegorov



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia



technical-data-2.txt (34K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Victor Yegorov
ср, 12 сент. 2018 г. в 19:21, Mareks Kalnačs <[hidden email]>:

I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.


The fact, that planner is not accurate on the estimates of JSON internal keys is expected, PostgreSQL is not parsing JSON values when gathering stats.
You cannot expect planner to be picky about all possible corner cases, it would make planning time enormously huge.
That is the reason I outlined, that important keys should be extracted into plain columns.

If you still consider this is a bug, please — send isolated reproducible test case that demonstrates the bug.

--
Victor Yegorov

image002.png (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Tom Lane-2
Victor Yegorov <[hidden email]> writes:
> The fact, that planner is not accurate on the estimates of JSON internal
> keys is expected, PostgreSQL is not parsing JSON values when gathering
> stats.
> You cannot expect planner to be picky about all possible corner cases, it
> would make planning time enormously huge.

Right.  The fact that it doesn't make the right guesses without help
can't be considered to be a bug in all cases.  These are engineering
tradeoffs we have to make.

> That is the reason I outlined, that important keys should be extracted into
> plain columns.

If that seems infeasible from an application standpoint, another
possibility is to make expression indexes on those important keys.
ANALYZE will gather stats on the values of indexed expressions, and then
perhaps the planner will have enough info to make better decisions.

In the other case mentioned, where the problem is a poor guess about
the selectivity of

where (
                                to_tsvector(
                                                'simple'::regconfig,
                                                a.ts_vector_fields
                                ) @@ to_tsquery(
                                                'simple',
                                                ' (decodedType5b6f0753ossAlarmsGenerator) '
                                )
)

it's the same problem: the planner has no stats that would let it
figure out the selectivity.  It can't reasonably extract an
estimate on the fly --- if it did, you'd be complaining that
planning time was too long.  The only way to get reasonable behavior
is to set things up so that ANALYZE will accumulate stats about
the values of "to_tsvector('simple',a.ts_vector_fields)".  You
can arrange that either by extracting that into a column, or by
making an index on it.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Adrien Nayrat-2
On 9/12/18 7:04 PM, Tom Lane wrote:

> Victor Yegorov <[hidden email]> writes:
>> The fact, that planner is not accurate on the estimates of JSON internal
>> keys is expected, PostgreSQL is not parsing JSON values when gathering
>> stats.
>> You cannot expect planner to be picky about all possible corner cases, it
>> would make planning time enormously huge.
> Right.  The fact that it doesn't make the right guesses without help
> can't be considered to be a bug in all cases.  These are engineering
> tradeoffs we have to make.
>
>> That is the reason I outlined, that important keys should be extracted into
>> plain columns.
> If that seems infeasible from an application standpoint, another
> possibility is to make expression indexes on those important keys.
> ANALYZE will gather stats on the values of indexed expressions, and then
> perhaps the planner will have enough info to make better decisions.

Hi,

For what it can help, I wrote this article explaining that:
https://blog.anayrat.info/en/2017/11/26/postgresql---jsonb-and-statistics/

Regards,


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

Mareks Kalnačs
In reply to this post by Victor Yegorov

Hi!

 

But we are using value indexes not a json index:

 

CREATE INDEX idx_oss_alarms_dn

  ON oss_alarms

  USING btree

  ((jdata ->> 'dn'::text) COLLATE pg_catalog."default");

 

CREATE INDEX idx_oss_objects_sid

  ON oss_objects

  USING btree

  ((jdata ->> 'sid'::text) COLLATE pg_catalog."default");

 

CREATE UNIQUE INDEX idx_tvc_entity_sid_u

  ON tvc_entity

  USING btree

  ((jdata ->> 'siteId'::text) COLLATE pg_catalog."default")

  WHERE (jdata ->> 'siteId'::text) IS NOT NULL;

 

May be we don’t understand this index behavior?

 

Mareks

 

From: Victor Yegorov <[hidden email]>
Sent: trešdiena, 2018. gada 12. septembris 19:43
To: Mareks Kalnačs <[hidden email]>
Cc: [hidden email]; Māris Rucis <[hidden email]>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

 

ср, 12 сент. 2018 г. в 19:21, Mareks Kalnačs <[hidden email]>:

I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.

 

The fact, that planner is not accurate on the estimates of JSON internal keys is expected, PostgreSQL is not parsing JSON values when gathering stats.

You cannot expect planner to be picky about all possible corner cases, it would make planning time enormously huge.

That is the reason I outlined, that important keys should be extracted into plain columns.

 

If you still consider this is a bug, please — send isolated reproducible test case that demonstrates the bug.

 

--

Victor Yegorov



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia


Reply | Threaded
Open this post in threaded view
|

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

Mareks Kalnačs
In reply to this post by Tom Lane-2

Hi!

 

My point is, why planned ignore table hits and, when joining tables, think that rows will be less than in reality:

Table1: Index Scan using idx_tvc_entity_sid_u on tvcis.tvc_entity e  (cost=0.28..1856.72 rows=253 width=208) (actual time=0.091..13.944 rows=71 loops=1)

Table2: Index Scan using idx_oss_objects_sid on tvcis.oss_objects o  (cost=0.29..16.84 rows=23 width=951) (actual time=0.009..0.033 rows=18 loops=71)

Join: Nested Loop  (cost=0.57..6175.55 rows=9 width=951) (actual time=0.112..17.355 rows=1291 loops=1)

 

Why planned don’t take in account results from each table:

Join rows = Table1 rows * Table2 rows == 253 * 23 = 5819? In this case difference will be 22%, but planner got 9 rows what is 0.7% from actual result

As we see, actual rows almost equal with subquery row multiplication 71 * 18 ~= 1291

 

Mareks

 

-----Original Message-----
From: Tom Lane <[hidden email]>
Sent: trešdiena, 2018. gada 12. septembris 20:04
To: Victor Yegorov <[hidden email]>
Cc: Mareks Kalnačs <[hidden email]>; [hidden email]; Māris Rucis <[hidden email]>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

 

Victor Yegorov <[hidden email]> writes:

> The fact, that planner is not accurate on the estimates of JSON

> internal keys is expected, PostgreSQL is not parsing JSON values when

> gathering stats.

> You cannot expect planner to be picky about all possible corner cases,

> it would make planning time enormously huge.

 

Right.  The fact that it doesn't make the right guesses without help can't be considered to be a bug in all cases.  These are engineering tradeoffs we have to make.

 

> That is the reason I outlined, that important keys should be extracted

> into plain columns.

 

If that seems infeasible from an application standpoint, another possibility is to make expression indexes on those important keys.

ANALYZE will gather stats on the values of indexed expressions, and then perhaps the planner will have enough info to make better decisions.

 

In the other case mentioned, where the problem is a poor guess about the selectivity of

 

where (

                                to_tsvector(

                                                'simple'::regconfig,

                                                a.ts_vector_fields

                                ) @@ to_tsquery(

                                                'simple',

                                                ' (decodedType5b6f0753ossAlarmsGenerator) '

                                )

)

 

it's the same problem: the planner has no stats that would let it figure out the selectivity.  It can't reasonably extract an estimate on the fly --- if it did, you'd be complaining that planning time was too long.  The only way to get reasonable behavior is to set things up so that ANALYZE will accumulate stats about the values of "to_tsvector('simple',a.ts_vector_fields)".  You can arrange that either by extracting that into a column, or by making an index on it.

 

                                                regards, tom lane



Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia


Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Tom Lane-2
In reply to this post by Mareks Kalnačs
=?utf-8?B?TWFyZWtzIEthbG5hxI1z?= <[hidden email]> writes:
> But we are using value indexes not a json index:

> CREATE INDEX idx_oss_alarms_dn
>   ON oss_alarms
>   USING btree
>   ((jdata ->> 'dn'::text) COLLATE pg_catalog."default");

I think you're outsmarting yourself by including those COLLATE clauses.
They don't do anything, since they're just selecting the default behavior
--- but they're enough to make the planner not realize that stats
collected on the index expression would be applicable to a plain
reference to oss_alarms.jdata ->> 'dn'.  In general you want the index
expression to be spelled exactly the same way that you refer to the
value in queries, else the system may not realize it's relevant.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

Mareks Kalnačs
Actually Postgres trying to outsmart himself, I do not add COLLATE ;)

Original trigger text
CREATE INDEX idx_oss_alarms_dn ON oss_alarms ((jdata ->> 'dn'));

Mareks

-----Original Message-----
From: Tom Lane <[hidden email]>
Sent: ceturtdiena, 2018. gada 13. septembris 20:42
To: Mareks Kalnačs <[hidden email]>
Cc: Victor Yegorov <[hidden email]>; [hidden email]; Māris Rucis <[hidden email]>; Pāvels Koržs <[hidden email]>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

=?utf-8?B?TWFyZWtzIEthbG5hxI1z?= <[hidden email]> writes:
> But we are using value indexes not a json index:

> CREATE INDEX idx_oss_alarms_dn
>   ON oss_alarms
>   USING btree
>   ((jdata ->> 'dn'::text) COLLATE pg_catalog."default");

I think you're outsmarting yourself by including those COLLATE clauses.
They don't do anything, since they're just selecting the default behavior
--- but they're enough to make the planner not realize that stats collected on the index expression would be applicable to a plain reference to oss_alarms.jdata ->> 'dn'.  In general you want the index expression to be spelled exactly the same way that you refer to the value in queries, else the system may not realize it's relevant.

regards, tom lane
________________________________

Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia

________________________________