Performance issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance issue

Nagaraj Raj
Hello


My PostgreSQL server 10.11 running on windows which are running very slow. DB has two tables with ~200Mil records in each. user queries are very slow even explain analyze also taking a longer.


Could you please help me to tune this query and any suggestions to improve system performance?

Table structures:

Table1:

-- Records 213621151

CREATE TABLE test1
(
    individual_entity_proxy_id bigint NOT NULL,
    household_entity_proxy_id bigint,
    individual_personal_link_sid bigint NOT NULL,
    city_name character varying(100) COLLATE pg_catalog."default",
    state_prov_cd character varying(40) COLLATE pg_catalog."default",
    pstl_code character varying(40) COLLATE pg_catalog."default",
    npa integer,
    nxx integer,
    email_domain character varying(400) COLLATE pg_catalog."default",
    email_preference character varying(40) COLLATE pg_catalog."default",
    direct_mail_preference character varying(40) COLLATE pg_catalog."default",
    profane_wrd_ind character(1) COLLATE pg_catalog."default",
    tmo_ofnsv_name_ind character(1) COLLATE pg_catalog."default",
    census_block_id character varying(40) COLLATE pg_catalog."default",
    has_first_name character(1) COLLATE pg_catalog."default",
    has_middle_name character(1) COLLATE pg_catalog."default",
    has_last_name character(1) COLLATE pg_catalog."default",
    has_email_address character(1) COLLATE pg_catalog."default",
    has_individual_address character(1) COLLATE pg_catalog."default",
    email_address_sid bigint,
    person_name_sid bigint,
    physical_address_sid bigint,
    telephone_number_sid bigint,
    shared_email_with_customer_ind character(1) COLLATE pg_catalog."default",
    shared_paddr_with_customer_ind character(1) COLLATE pg_catalog."default",
    last_contacted_email_datetime timestamp without time zone,
    last_contacted_dm_datetime timestamp without time zone,
    last_contacted_digital_datetime timestamp without time zone,
    last_contacted_anychannel_dttm timestamp without time zone,
    hard_bounce_ind integer,
    src_sys_id integer NOT NULL,
    insrt_prcs_id bigint,
    updt_prcs_id bigint,
    stg_prcs_id bigint,
    load_dttm timestamp without time zone NOT NULL,
    updt_dttm timestamp without time zone,
    md5_chk_sum character varying(200) COLLATE pg_catalog."default",
    deld_from_src_ind character(1) COLLATE pg_catalog."default" NOT NULL,
    orphan_ind character(1) COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT pk_i_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)

);
CREATE INDEX indx_prospect_indv_entty_id
    ON test1 USING btree
    (individual_entity_proxy_id )


Table 2:
-- Records 260652202

CREATE TABLE test2
(
    individual_entity_proxy_id bigint NOT NULL,
    cstmr_prspct_ind character varying(40) COLLATE pg_catalog."default",
    last_appnd_dttm timestamp without time zone,
    last_sprsn_dttm timestamp without time zone,
    infrrd_gender_code character varying(40) COLLATE pg_catalog."default",
    govt_prison_ind character(1) COLLATE pg_catalog."default",
    tax_bnkrpt_dcsd_ind character(1) COLLATE pg_catalog."default",
    underbank_rank_nbr integer,
    hvy_txn_rank_nbr integer,
    prominence_nbr integer,
    ocptn_code character varying(40) COLLATE pg_catalog."default",
    educ_lvl_nbr integer,
    gender_code character varying(40) COLLATE pg_catalog."default",
    infrrd_hh_rank_nbr integer,
    econmc_stable_nbr integer,
    directv_sbscrbr_propnsty_code character varying(40) COLLATE pg_catalog."default",
    dish_sbscrbr_propnsty_code character varying(40) COLLATE pg_catalog."default",
    iphone_user_propnsty_code character varying(40) COLLATE pg_catalog."default",
    smrt_hm_devc_propnsty_code character varying(40) COLLATE pg_catalog."default",
    sml_busi_ownr_propnsty_code character varying(40) COLLATE pg_catalog."default",
    tv_internet_bndl_propnsty_code character varying(40) COLLATE pg_catalog."default",
    dog_ownr_propnsty_code character varying(40) COLLATE pg_catalog."default",
    cat_ownr_propnsty_code character varying(40) COLLATE pg_catalog."default",
 dine_out_propnsty_code character varying(40) COLLATE pg_catalog."default",
    taco_bell_diner_propnsty_code character varying(40) COLLATE pg_catalog."default",
    auto_insrnc_byr_propnsty_code character varying(40) COLLATE pg_catalog."default",
    src_sys_id integer NOT NULL,
    insrt_prcs_id bigint,
    updt_prcs_id bigint,
    stg_prcs_id bigint,
    load_dttm timestamp without time zone NOT NULL,
    updt_dttm timestamp without time zone,
    deld_from_src_ind character(1) COLLATE pg_catalog."default" NOT NULL,
    orphan_ind character(1) COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT pk_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)
);


User query:

explain analyze select COUNT(*) as "DII_1"
  from ( select distinct table0."individual_entity_proxy_id" as "INDIVIDUAL_ENTITY_PROXY_ID"
                                from test1 table0
                                inner join test2 table1
                                on table0."individual_entity_proxy_id" = table1."individual_entity_proxy_id"
                                where ((table0."shared_paddr_with_customer_ind" = 'N')
                                                   and (table0."profane_wrd_ind" = 'N')
                                                   and (table0."tmo_ofnsv_name_ind" = 'N')
                                                   and ((table0."last_contacted_dm_datetime" is null)
                                                   or (table0."last_contacted_dm_datetime" <  TIMESTAMP '2020-03-15  0:00:00.000000'))
                                                   and (table0."has_individual_address" = 'Y')
                                                   and (table0."has_last_name" = 'Y')
                                                   and (table0."has_first_name" = 'Y')
                                                   and (table0."direct_mail_preference" is null))
                                                   and ((table1."tax_bnkrpt_dcsd_ind" = 'N')
                                                  and (table1."cstmr_prspct_ind" = 'Prospect')
                                                  and (table1."govt_prison_ind" = 'N')) ) TXT_1;

Explain Analyze :

"Aggregate  (cost=5345632.91..5345632.92 rows=1 width=8) (actual time=442688.462..442688.462 rows=1 loops=1)"
"  ->  Unique  (cost=150.13..4943749.39 rows=32150682 width=8) (actual time=0.022..439964.214 rows=32368180 loops=1)"
"        ->  Merge Join  (cost=150.13..4863372.68 rows=32150682 width=8) (actual time=0.021..435818.276 rows=32368180 loops=1)"
"              Merge Cond: (table0.individual_entity_proxy_id = table1.individual_entity_proxy_id)"
"              ->  Index Scan using indx_prospect_indv_entty_id on test1 table0  (cost=0.56..2493461.92 rows=32233405 width=8) (actual time=0.011..63009.551 rows=32368180 loops=1)"
"                    Filter: ((direct_mail_preference IS NULL) AND ((last_contacted_dm_datetime IS NULL) OR (last_contacted_dm_datetime < '2020-03-15 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))"
"                    Rows Removed by Filter: 7709177"
"              ->  Index Scan using pk_entity_proxy_id on test2 table1  (cost=0.56..1867677.94 rows=40071417 width=8) (actual time=0.008..363534.437 rows=40077727 loops=1)"
"                    Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND ((cstmr_prspct_ind)::text = 'Prospect'::text))"
"                    Rows Removed by Filter: 94756"
"Planning time: 0.400 ms"
"Execution time: 442688.523 ms"

Server config:

PostgreSQL v10.11
RAM: 380GB
vCore: 32
Shared_buffers: 65GB
work_mem:104857kB
maintenance_work_mem:256MB
effective_cache_size: 160GB




https://dba.stackexchange.com/questions/269138/postgresql-server-running-very-slow-at-minimal-work-load


Thanks,
Raj


Reply | Threaded
Open this post in threaded view
|

Re: Performance issue

David Rowley
On Mon, 15 Jun 2020 at 10:46, Nagaraj Raj <[hidden email]> wrote:
> CREATE TABLE test1
> (
...

>     CONSTRAINT pk_i_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)
>
> );

> CREATE TABLE test2
> (
...

>     CONSTRAINT pk_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)
> );
>
>
> User query:
>
> explain analyze select COUNT(*) as "DII_1"
>   from ( select distinct table0."individual_entity_proxy_id" as "INDIVIDUAL_ENTITY_PROXY_ID"
>                                 from test1 table0
>                                 inner join test2 table1
>                                 on table0."individual_entity_proxy_id" = table1."individual_entity_proxy_id"

Why do you use "select distinct". It seems to me that you're putting a
distinct clause on the primary key of test1 and joining to another
table in a way that cannot cause duplicates.

I imagine dropping that distinct will speed up the query quite a bit.

David


Reply | Threaded
Open this post in threaded view
|

Re: Performance issue

Justin Pryzby
In reply to this post by Nagaraj Raj
On Sun, Jun 14, 2020 at 10:45:52PM +0000, Nagaraj Raj wrote:
> My PostgreSQL server 10.11 running on windows which are running very slow. DB has two tables with ~200Mil records in each. user queries are very slow even explain analyze also taking a longer.
>
> Could you please help me to tune this query and any suggestions to improve system performance?

> CREATE TABLE test1
> (
>     individual_entity_proxy_id bigint NOT NULL,
...
>     CONSTRAINT pk_i_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)
>
> );
> CREATE INDEX indx_prospect_indv_entty_id ON test1 USING btree (individual_entity_proxy_id )

This index is redundant with the primary key, which implicitly creates a unique
index.

The table structure seems strange: you have two tables with the same PK column,
which is how they're being joined.  It seems like that's better expressed as a
single table with all the columns rather than separate tables (but see below).

> explain analyze select COUNT(*) as "DII_1"
>   from ( select distinct table0."individual_entity_proxy_id" as "INDIVIDUAL_ENTITY_PROXY_ID"
>                                 from test1 table0
>                                 inner join test2 table1

I think this may be better written as something like:

| SELECT COUNT(id) FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t1.id=t0.id AND ...) AND ...

It's guaranteed to be distinct since it's a PK column, so it doesn't need a
"Unique" node.

I think it might prefer an seq scan on t0, which might be good since it seems
to be returning over 10% of records.

> Explain Analyze :
>
> "Aggregate  (cost=5345632.91..5345632.92 rows=1 width=8) (actual time=442688.462..442688.462 rows=1 loops=1)"
> "  ->  Unique  (cost=150.13..4943749.39 rows=32150682 width=8) (actual time=0.022..439964.214 rows=32368180 loops=1)"
> "        ->  Merge Join  (cost=150.13..4863372.68 rows=32150682 width=8) (actual time=0.021..435818.276 rows=32368180 loops=1)"
> "              Merge Cond: (table0.individual_entity_proxy_id = table1.individual_entity_proxy_id)"
> "              ->  Index Scan using indx_prospect_indv_entty_id on test1 table0  (cost=0.56..2493461.92 rows=32233405 width=8) (actual time=0.011..63009.551 rows=32368180 loops=1)"
> "                    Filter: ((direct_mail_preference IS NULL) AND ((last_contacted_dm_datetime IS NULL) OR (last_contacted_dm_datetime < '2020-03-15 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))"
> "                    Rows Removed by Filter: 7709177"
> "              ->  Index Scan using pk_entity_proxy_id on test2 table1  (cost=0.56..1867677.94 rows=40071417 width=8) (actual time=0.008..363534.437 rows=40077727 loops=1)"
> "                    Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND ((cstmr_prspct_ind)::text = 'Prospect'::text))"
> "                    Rows Removed by Filter: 94756"

It might help to show explain(ANALYZE,BUFFERS).

It looks like test2/table1 index scan is a lot slower than table0.
Maybe table1 gets lots of updates, so isn't clustered on its primary key, so
the index scan is highly random.  You could check the "correlation" of its PK
ID column:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

If true, that would be a good reason to have separate tables.

> vCore: 32

Possibly it would be advantageous to use parallel query.
A better query+plan might allow that.

--
Justin