Query slow for new participants

classic Classic list List threaded Threaded
9 messages Options
Kim
Reply | Threaded
Open this post in threaded view
|

Query slow for new participants

Kim
Hello,

I have an article query which returns articles enabled for a participant.
Article table – Participant table – Table in between which stores the links
between the Article and particitpant including characteristics such as
enabled.
It is possible to search on the articles by number, description,…
For all of my participants, the articles are return in up to 3 seconds.
However, when I add a new participant, which has in fact very few articles
enabled, the query takes up to 30 seconds.
When running analyse explain, I can see that the execution plan for all
participants uses indexes and joins the table in the same order.
For the new participant, also indexes are used, but the tables are joined in
a different order which makes the query very slow.
Is there any way how I can make the queries fast for new participants? This
is a big problem, because for new participants, speed is even more
important.

Thank you for your help.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html

Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Justin Pryzby
On Mon, Feb 25, 2019 at 03:41:18AM -0700, Kim wrote:
> Is there any way how I can make the queries fast for new participants? This
> is a big problem, because for new participants, speed is even more
> important.
>
> Thank you for your help.

Could you include information requested here ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions

Justin

Kim
Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Kim
Hi,

thank you for your reply.
Yes, I will go through this page.

Regards,
Kim

Op ma 25 feb. 2019 om 17:16 schreef Justin Pryzby <[hidden email]>:
On Mon, Feb 25, 2019 at 03:41:18AM -0700, Kim wrote:
> Is there any way how I can make the queries fast for new participants? This
> is a big problem, because for new participants, speed is even more
> important.
>
> Thank you for your help.

Could you include information requested here ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions

Justin


--
Met vriendelijke groeten,

Kim
Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Kim
In reply to this post by Justin Pryzby
Hello,

Things to Try Before You Post

-> I went through these steps and they did not bring any difference.


Information You Need To Include

Postgres version

"PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit"

Full Table and Index Schema

The difference is very bad for the new company,  even on the simplest query

   SELECT * FROM CompanyArticleDB 
     WHERE CompanyId = '77'  
     AND ArticleId= '7869071' 

                              Table "public.companyarticledb"
           Column           |            Type             | Collation | Nullable | Default
----------------------------+-----------------------------+-----------+----------+---------
 companyid                  | integer                     |           | not null |
 articleid                  | integer                     |           | not null |
 price                      | numeric(19,4)               |           |          |
 contractstartdate          | timestamp without time zone |           |          |
 contractenddate            | timestamp without time zone |           |          |
 enabled                    | boolean                     |           |          |
 visible                    | boolean                     |           |          |
 sheid                      | integer                     |           |          |
 inmassbalance              | boolean                     |           |          |
 internalwastetype          | character varying(50)       |           |          |
 buom                       | character varying(50)       |           |          |
 stockunit                  | numeric(18,2)               |           |          |
 priceperbuom               | numeric(19,4)               |           |          |
 purchaseunit               | numeric(18,2)               |           |          |
 preventioncounselorid      | integer                     |           |          |
 licenseprovided            | boolean                     |           |          |
 licensevaliduntil          | timestamp without time zone |           |          |
 authorisationlocationid    | integer                     |           |          |
 priceagreementreference    | character varying(50)       |           |          |
 interfaceaccountid         | integer                     |           |          |
 createdon                  | timestamp without time zone |           |          |
 modifiedby                 | integer                     |           |          |
 createdby                  | integer                     |           |          |
 modifiedon                 | timestamp without time zone |           |          |
 createdonsupplier          | timestamp without time zone |           |          |
 modifiedbysupplier         | integer                     |           |          |
 createdbysupplier          | integer                     |           |          |
 modifiedonsupplier         | timestamp without time zone |           |          |
 newprice                   | numeric(19,4)               |           |          |
 newcontractstartdate       | timestamp without time zone |           |          |
 newcontractenddate         | timestamp without time zone |           |          |
 newpriceagreementreference | character varying(50)       |           |          |
 licensereference           | character varying(50)       |           |          |
 purchasercomment           | character varying(500)      |           |          |
 reportingunit              | character varying(5)        |           |          |
 articlecode                | character varying(50)       |           |          |
 participantdescription     | character varying(500)      |           |          |
 motivationneeded           | boolean                     |           |          |
 photourl                   | character varying(500)      |           |          |
 reviewedshe                | boolean                     |           |          |
noinspectionuntil          | timestamp without time zone |           |          |
 priority                   | boolean                     |           |          |
 needschecking              | boolean                     |           |          |
 role                       | character varying(20)       |           |          |
Indexes:
    "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
    "EnabledIndex" btree (enabled)
    "ix_companyarticledb_article" btree (articleid)
    "ix_companyarticledb_company" btree (companyid)
    "participantarticlecodeindex" btree (articlecode)
    "participantdescriptionindex" gin (participantdescription gin_trgm_ops)
Foreign-key constraints:
    "fk_companyarticledb_accountsdb" FOREIGN KEY (modifiedby) REFERENCES accountsdb(id)
    "fk_companyarticledb_accountsdb1" FOREIGN KEY (createdby) REFERENCES accountsdb(id)
    "fk_companyarticledb_accountsdb2" FOREIGN KEY (preventioncounselorid) REFERENCES accountsdb(id)
    "fk_companyarticledb_articledb" FOREIGN KEY (articleid) REFERENCES articledb(id)
    "fk_companyarticledb_companydb" FOREIGN KEY (companyid) REFERENCES companydb(id)
    "fk_companyarticledb_interfaceaccountdb" FOREIGN KEY (interfaceaccountid) REFERENCES interfaceaccountdb(id)
    "fk_companyarticledb_supplieraccountdb" FOREIGN KEY (createdbysupplier) REFERENCES supplieraccountdb(id)
    "fk_companyarticledb_supplieraccountdb1" FOREIGN KEY (modifiedbysupplier) REFERENCES supplieraccountdb(id)

Table Metadata

relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
"companyarticledb" 6191886 "5.40276e+08" 6188459 "r" 44 false "50737979392"


EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN

"Index Scan using ix_companyarticledb_company on companyarticledb  (cost=0.57..2.80 rows=1 width=193) (actual time=1011.335..1011.454 rows=1 loops=1)"
"  Index Cond: (companyid = 77)"
"  Filter: (articleid = 7869071)"
"  Rows Removed by Filter: 2674361"
"  Buffers: shared hit=30287"
"Planning time: 0.220 ms"
"Execution time: 1011.502 ms"

History


For all other participants this returns a lot faster, for this new participant this goes very slow.

Example for another participant, there another index is used.

"Index Scan using pk_pricedb on companyarticledb  (cost=0.57..2.79 rows=1 width=193) (actual time=0.038..0.039 rows=0 loops=1)"
"  Index Cond: ((companyid = 39) AND (articleid = 7869071))"
"  Buffers: shared hit=4"
"Planning time: 0.233 ms"
"Execution time: 0.087 ms"



This is applicable for all queries joining companyarticledb for companyid='77' for this participant.
I do not know why this participant is different than the others except that it was recently added.


Hardware

Standard DS15 v2 (20 vcpus, 140 GB memory) 



Maintenance Setup

I did ran VACUUM on the db just before executing the queries
I did reindex the indexes on companyarticledb 

GUC Settings


"application_name" "pgAdmin 4 - CONN:6235249" "client"
"bytea_output" "escape" "session"
"checkpoint_completion_target" "0.7" "configuration file"
"client_encoding" "UNICODE" "session"
"client_min_messages" "notice" "session"
"DateStyle" "ISO, MDY" "session"
"default_statistics_target" "100" "configuration file"
"effective_cache_size" "105GB" "configuration file"
"effective_io_concurrency" "200" "configuration file"
"external_pid_file" "/opt/bitnami/postgresql/tmp/postgresql.pid" "command line"
"hot_standby" "on" "configuration file"
"listen_addresses" "*" "configuration file"
"maintenance_work_mem" "2GB" "configuration file"
"max_connections" "200" "configuration file"
"max_parallel_workers" "20" "configuration file"
"max_parallel_workers_per_gather" "10" "configuration file"
"max_stack_depth" "2MB" "environment variable"
"max_wal_senders" "16" "configuration file"
"max_wal_size" "2GB" "configuration file"
"max_worker_processes" "20" "configuration file"
"min_wal_size" "1GB" "configuration file"
"random_page_cost" "1.1" "configuration file"
"shared_buffers" "35GB" "configuration file"
"wal_buffers" "16MB" "configuration file"
"wal_keep_segments" "32" "configuration file"
"wal_level" "replica" "configuration file"
"work_mem" "18350kB" "configuration file"
 

Thank you for your help

Regards,
Kim

Op ma 25 feb. 2019 om 17:16 schreef Justin Pryzby <[hidden email]>:
On Mon, Feb 25, 2019 at 03:41:18AM -0700, Kim wrote:
> Is there any way how I can make the queries fast for new participants? This
> is a big problem, because for new participants, speed is even more
> important.
>
> Thank you for your help.

Could you include information requested here ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions

Justin


--
Met vriendelijke groeten,

Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Justin Pryzby
On Tue, Feb 26, 2019 at 12:22:39AM +0100, [hidden email] wrote:

> Hardware
> Standard DS15 v2 (20 vcpus, 140 GB memory)

> "effective_cache_size" "105GB" "configuration file"
> "effective_io_concurrency" "200" "configuration file"
> "maintenance_work_mem" "2GB" "configuration file"
> "max_parallel_workers" "20" "configuration file"
> "max_parallel_workers_per_gather" "10" "configuration file"
> "max_worker_processes" "20" "configuration file"
> "random_page_cost" "1.1" "configuration file"
> "shared_buffers" "35GB" "configuration file"
> "work_mem" "18350kB" "configuration file"

I don't know for sure, but 35GB is very possibly too large shared_buffers.  The
rule of thumb is "start at 25% of RAM" but I think anything over 10-15GB is
frequently too large, unless you can keep the whole DB in RAM (can you?)

> Table Metadata
> relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
> "companyarticledb" 6191886 "5.40276e+08" 6188459 "r" 44 false "50737979392"

work_mem could probably benefit from being larger (just be careful that you
don't end up with 20x parallel workers running complex plans each node of which
using 100MB work_mem).

> Full Table and Index Schema
> The difference is very bad for the new company,  even on the simplest query
>
>    SELECT * FROM CompanyArticleDB
>      WHERE CompanyId = '77'
>      AND ArticleId= '7869071'

It sounds to me like the planner thinks that the distribution of companyID and
articleID are independent, when they're not.  For example it think that
companyID=33 filters out 99% of the rows.

>  companyid                  | integer                     |           | not null |
>  articleid                  | integer                     |           | not null |

> EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
>   SELECT * FROM CompanyArticleDB
>     WHERE CompanyId = '77'
>     AND ArticleId= '7869071'
> "Index Scan using ix_companyarticledb_company on companyarticledb (cost=0.57..2.80 rows=1 width=193) (actual time=1011.335..1011.454 rows=1 loops=1)"
> "  Index Cond: (companyid = 77)"
> "  Filter: (articleid = 7869071)"
> "  Rows Removed by Filter: 2674361"
> "  Buffers: shared hit=30287"

> Example for another participant, there another index is used.
> "Index Scan using pk_pricedb on companyarticledb  (cost=0.57..2.79 rows=1 width=193) (actual time=0.038..0.039 rows=0 loops=1)"
> "  Index Cond: ((companyid = 39) AND (articleid = 7869071))"
> "  Buffers: shared hit=4"

> I do not know why this participant is different than the others except that
> it was recently added.

Were the tables ANALYZEd since then ?  You could check:
SELECT * FROM pg_stat_user_tables WHERE relname='companyarticledb';

If you have small number of companyIDs (~100), then the table statistics may
incldue a most-common-values list, and companies not in the MCV list may end up
with different query plans, even without correlation issues.

It looks like the NEW company has ~3e6 articles, out of a total ~5e8 articles.
The planner may think that companyID doesn't exist at all, so scanning the idx
on companyID will be slightly faster than using the larger, composite index on
companyID,articleID.

Justin

> Indexes:
>     "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
>     "EnabledIndex" btree (enabled)
>     "ix_companyarticledb_article" btree (articleid)
>     "ix_companyarticledb_company" btree (companyid)
>     "participantarticlecodeindex" btree (articlecode)
>     "participantdescriptionindex" gin (participantdescription gin_trgm_ops)
> Foreign-key constraints:
>     "fk_companyarticledb_accountsdb" FOREIGN KEY (modifiedby) REFERENCES accountsdb(id)
>     "fk_companyarticledb_accountsdb1" FOREIGN KEY (createdby) REFERENCES accountsdb(id)
>     "fk_companyarticledb_accountsdb2" FOREIGN KEY (preventioncounselorid) REFERENCES accountsdb(id)
>     "fk_companyarticledb_articledb" FOREIGN KEY (articleid) REFERENCES articledb(id)
>     "fk_companyarticledb_companydb" FOREIGN KEY (companyid) REFERENCES companydb(id)
>     "fk_companyarticledb_interfaceaccountdb" FOREIGN KEY (interfaceaccountid) REFERENCES interfaceaccountdb(id)
>     "fk_companyarticledb_supplieraccountdb" FOREIGN KEY (createdbysupplier) REFERENCES supplieraccountdb(id)
>     "fk_companyarticledb_supplieraccountdb1" FOREIGN KEY (modifiedbysupplier) REFERENCES supplieraccountdb(id)

Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Michaeldba@sqlexec.com
Regarding shared_buffers, please install the pg_buffercache extension and run the recommended queries with that extension during high load times to really get an idea about the right value for shared_buffers.  Let's take the guess work out of it.

Regards,
Michael Vitale

Monday, February 25, 2019 6:59 PM
On Tue, Feb 26, 2019 at 12:22:39AM +0100, [hidden email] wrote:

Hardware
Standard DS15 v2 (20 vcpus, 140 GB memory)

"effective_cache_size" "105GB" "configuration file"
"effective_io_concurrency" "200" "configuration file"
"maintenance_work_mem" "2GB" "configuration file"
"max_parallel_workers" "20" "configuration file"
"max_parallel_workers_per_gather" "10" "configuration file"
"max_worker_processes" "20" "configuration file"
"random_page_cost" "1.1" "configuration file"
"shared_buffers" "35GB" "configuration file"
"work_mem" "18350kB" "configuration file"

I don't know for sure, but 35GB is very possibly too large shared_buffers.  The
rule of thumb is "start at 25% of RAM" but I think anything over 10-15GB is
frequently too large, unless you can keep the whole DB in RAM (can you?)

Table Metadata
relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
"companyarticledb" 6191886 "5.40276e+08" 6188459 "r" 44 false "50737979392"

work_mem could probably benefit from being larger (just be careful that you
don't end up with 20x parallel workers running complex plans each node of which
using 100MB work_mem).

Full Table and Index Schema
The difference is very bad for the new company,  even on the simplest query

   SELECT * FROM CompanyArticleDB
     WHERE CompanyId = '77'
     AND ArticleId= '7869071'

It sounds to me like the planner thinks that the distribution of companyID and
articleID are independent, when they're not.  For example it think that
companyID=33 filters out 99% of the rows.

 companyid                  | integer                     |           | not null |
 articleid                  | integer                     |           | not null |

EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
  SELECT * FROM CompanyArticleDB
    WHERE CompanyId = '77'
    AND ArticleId= '7869071'
"Index Scan using ix_companyarticledb_company on companyarticledb (cost=0.57..2.80 rows=1 width=193) (actual time=1011.335..1011.454 rows=1 loops=1)"
"  Index Cond: (companyid = 77)"
"  Filter: (articleid = 7869071)"
"  Rows Removed by Filter: 2674361"
"  Buffers: shared hit=30287"

Example for another participant, there another index is used.
"Index Scan using pk_pricedb on companyarticledb  (cost=0.57..2.79 rows=1 width=193) (actual time=0.038..0.039 rows=0 loops=1)"
"  Index Cond: ((companyid = 39) AND (articleid = 7869071))"
"  Buffers: shared hit=4"

I do not know why this participant is different than the others except that
it was recently added.

Were the tables ANALYZEd since then ?  You could check:
SELECT * FROM pg_stat_user_tables WHERE relname='companyarticledb';

If you have small number of companyIDs (~100), then the table statistics may
incldue a most-common-values list, and companies not in the MCV list may end up
with different query plans, even without correlation issues.

It looks like the NEW company has ~3e6 articles, out of a total ~5e8 articles.
The planner may think that companyID doesn't exist at all, so scanning the idx
on companyID will be slightly faster than using the larger, composite index on
companyID,articleID.

Justin

Indexes:
    "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
    "EnabledIndex" btree (enabled)
    "ix_companyarticledb_article" btree (articleid)
    "ix_companyarticledb_company" btree (companyid)
    "participantarticlecodeindex" btree (articlecode)
    "participantdescriptionindex" gin (participantdescription gin_trgm_ops)
Foreign-key constraints:
    "fk_companyarticledb_accountsdb" FOREIGN KEY (modifiedby) REFERENCES accountsdb(id)
    "fk_companyarticledb_accountsdb1" FOREIGN KEY (createdby) REFERENCES accountsdb(id)
    "fk_companyarticledb_accountsdb2" FOREIGN KEY (preventioncounselorid) REFERENCES accountsdb(id)
    "fk_companyarticledb_articledb" FOREIGN KEY (articleid) REFERENCES articledb(id)
    "fk_companyarticledb_companydb" FOREIGN KEY (companyid) REFERENCES companydb(id)
    "fk_companyarticledb_interfaceaccountdb" FOREIGN KEY (interfaceaccountid) REFERENCES interfaceaccountdb(id)
    "fk_companyarticledb_supplieraccountdb" FOREIGN KEY (createdbysupplier) REFERENCES supplieraccountdb(id)
    "fk_companyarticledb_supplieraccountdb1" FOREIGN KEY (modifiedbysupplier) REFERENCES supplieraccountdb(id)


Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Michael Lewis
In reply to this post by Kim

Indexes:
    "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
    "EnabledIndex" btree (enabled)
    "ix_companyarticledb_article" btree (articleid)
    "ix_companyarticledb_company" btree (companyid)

I'd say drop ix_companyarticledb_company since pk_pricedb can be used instead even if other queries are only on companyid field, and it will be faster for this case certainly since it targets the row you want directly from the index without the "Rows Removed by Filter: 2674361"

I doubt the default_statistics_target = 100 default is doing you any favors. You may want to try increasing that to 500 or 1000 if you can afford a small increase in planning cost and more storage for the bigger sampling of stats.
Kim
Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Kim

Hello All,

 

Thank you very much for your help. You have really helped me out!

The query is now as fast as the others.

 

The indexes ix_companyarticledb_article and ix_companyarticledb_company are removed.

The parameter for default_statistics_target was set to 1000

ANALYZE was performed on the database

 

I am so happy this worked out.

The pg_buffercache extension is now installed, and I will be working with it the coming days to improve my settings.

First time I ran the query (evening, not high peak usage)

 

SELECT c.relname, count(*) AS buffers

             FROM pg_buffercache b INNER JOIN pg_class c

             ON b.relfilenode = pg_relation_filenode(c.oid) AND

                b.reldatabase IN (0, (SELECT oid FROM pg_database

                                      WHERE datname = current_database()))

             GROUP BY c.relname

             ORDER BY 2 DESC

             LIMIT 10;

 

"pk_pricedb"    "1479655"

"companyarticledb"      "1378549"

"articledb"         "780821"

"pricedb"           "280771"

"descriptionindex"        "138514"

"ix_pricedb"      "122833"

"pk_articledb"  "47290"

"EnabledIndex" "29958"

"strippedmanufacturernumberindex"   "25604"

"strippedcataloguenumberindex"          "24360"

 

 

How can I see if the whole DB is kept in RAM?

How to define the best setting for work_mem ?

 

Thanks for your help!

 

Regards,

Kim

 

 

 

 

 

 

 


Op di 26 feb. 2019 om 20:08 schreef Michael Lewis <[hidden email]>:

Indexes:
    "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
    "EnabledIndex" btree (enabled)
    "ix_companyarticledb_article" btree (articleid)
    "ix_companyarticledb_company" btree (companyid)

I'd say drop ix_companyarticledb_company since pk_pricedb can be used instead even if other queries are only on companyid field, and it will be faster for this case certainly since it targets the row you want directly from the index without the "Rows Removed by Filter: 2674361"

I doubt the default_statistics_target = 100 default is doing you any favors. You may want to try increasing that to 500 or 1000 if you can afford a small increase in planning cost and more storage for the bigger sampling of stats.


--
Met vriendelijke groeten,

Reply | Threaded
Open this post in threaded view
|

Re: Query slow for new participants

Laurenz Albe
In reply to this post by Kim
[hidden email] wrote:
> EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
> "Index Scan using ix_companyarticledb_company on companyarticledb  (cost=0.57..2.80 rows=1 width=193) (actual time=1011.335..1011.454 rows=1 loops=1)"
> "  Index Cond: (companyid = 77)"
> "  Filter: (articleid = 7869071)"
> "  Rows Removed by Filter: 2674361"
> "  Buffers: shared hit=30287"
> "Planning time: 0.220 ms"
> "Execution time: 1011.502 ms"

Your problem are the "Rows Removed by Filter: 2674361".

The first thing I would try is:

  ALTER TABLE public.companyarticledb
     ALTER companyid SET STATISTICS 1000;

  ALTER TABLE public.companyarticledb
     ALTER articleid SET STATISTICS 1000;

  ANALYZE public.companyarticledb;

Then PostgreSQL has a better idea which condition is selective.

You can set STATISTICS up to 10000, but don't forget that high values
make ANALYZE and planning slower.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com