Performance Issue (Not using Index when joining two tables).

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance Issue (Not using Index when joining two tables).

Gopisetty, Ramesh
Hi,

Good Morning!

Postgres Version :  11.6  (AWS Native Postgres/AWS Aurora  tried on both flavours).

When i'm joining two tables the primary index is not being used.  While is use  in clause with values then the index is being used.  I have reindexed all the tables,  run the auto vaccum as well. 


pgwfc01q=> select count(*) from chr_simple_val;
 count
-------
 13158
(1 row)

pgwfc01q=> select count(*) from chr_emp_position;
 count
-------
   228
(1 row)


The primary key for the table chr_Simple_val  contains OID.   Still not using the index.

I'm sharing the explain plan over here..  

pgwfc01q=> explain analyze select cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;
                                                                                                             QUERY P
LAN
--------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=49299.91..51848.83 rows=651 width=42) (actual time=3512.692..3797.583 rows=228 loops=1)
   Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
   ->  Seq Scan on chr_emp_position cep  (cost=0.00..2437.77 rows=436 width=11) (actual time=44.713..329.435 rows=22
8 loops=1)
         Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
         Rows Removed by Filter: 3695
   ->  Hash  (cost=49176.40..49176.40 rows=9881 width=31) (actual time=3467.907..3467.908 rows=13158 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 1031kB
         ->  Seq Scan on chr_simple_val ctc  (cost=0.00..49176.40 rows=9881 width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
               Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying, prod_locale_code))
               Rows Removed by Filter: 75771
 Planning Time: 0.297 ms
 Execution Time: 3797.768 ms
(12 rows)


Thank you.. 

Regards,
Ramesh G
Reply | Threaded
Open this post in threaded view
|

Re: Performance Issue (Not using Index when joining two tables).

Tomas Vondra-4
On Sun, Sep 13, 2020 at 02:58:15PM +0000, Gopisetty, Ramesh wrote:

>Hi,
>
>Good Morning!
>
>Postgres Version :  11.6  (AWS Native Postgres/AWS Aurora  tried on both flavours).
>
>When i'm joining two tables the primary index is not being used.  While is use  in clause with values then the index is being used.  I have reindexed all the tables,  run the auto vaccum as well.
>
>
>pgwfc01q=> select count(*) from chr_simple_val;
> count
>-------
> 13158
>(1 row)
>
>pgwfc01q=> select count(*) from chr_emp_position;
> count
>-------
>   228
>(1 row)
>
>
>The primary key for the table chr_Simple_val  contains OID.   Still not using the index.
>
>I'm sharing the explain plan over here..
>
>pgwfc01q=> explain analyze select cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;
>                                                                                                             QUERY P
>LAN
>--------------------------------------------------------------------------------------------------------------------
>----------------------------------------------------------------------------------------------------------------
> Hash Join  (cost=49299.91..51848.83 rows=651 width=42) (actual time=3512.692..3797.583 rows=228 loops=1)
>   Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
>   ->  Seq Scan on chr_emp_position cep  (cost=0.00..2437.77 rows=436 width=11) (actual time=44.713..329.435 rows=22
>8 loops=1)
>         Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
>R_EMP_POSITION'::character varying) AND f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
>         Rows Removed by Filter: 3695
>   ->  Hash  (cost=49176.40..49176.40 rows=9881 width=31) (actual time=3467.907..3467.908 rows=13158 loops=1)
>         Buckets: 16384  Batches: 1  Memory Usage: 1031kB
>         ->  Seq Scan on chr_simple_val ctc  (cost=0.00..49176.40 rows=9881 width=31) (actual time=2.191..3460.929 r
>ows=13158 loops=1)
>               Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
>vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying, prod_locale_code))
>               Rows Removed by Filter: 75771
> Planning Time: 0.297 ms
> Execution Time: 3797.768 ms
>(12 rows)
>

Most of the time (3460ms) is spent in the sequential scan on
chr_simple_val, and the seqscan on chr_emp_position is taking ~330ms).
Combined that's 3790ms out of 3797ms, so the join is pretty much
irrelevant.

Either the seqscans are causing a lot of I/O, or maybe the f_sel_*
functions in the filter are expensive. Judging by how few rows are in
the tables (not sure how large the tables are), I'd guess it's the
latter ... Hard to say without knowing what the functions do etc.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Performance Issue (Not using Index when joining two tables).

Tom Lane-2
Tomas Vondra <[hidden email]> writes:
> Most of the time (3460ms) is spent in the sequential scan on
> chr_simple_val, and the seqscan on chr_emp_position is taking ~330ms).
> Combined that's 3790ms out of 3797ms, so the join is pretty much
> irrelevant.

> Either the seqscans are causing a lot of I/O, or maybe the f_sel_*
> functions in the filter are expensive. Judging by how few rows are in
> the tables (not sure how large the tables are), I'd guess it's the
> latter ... Hard to say without knowing what the functions do etc.

I think the OP is wishing that the filter functions for the larger table
would be postponed till after the join condition is applied.  I'm a
little dubious that that's going to save anything meaningful; but maybe
increasing the cost attributed to those functions would persuade the
planner to try it that way.

First though, does forcing a nestloop plan (turn off enable_hashjoin,
and enable_mergejoin too if needed) produce the shape of plan you
want?  And if so, is it actually faster?  Only if those things are
true is it going to be worth messing with costing parameters.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Performance Issue (Not using Index when joining two tables).

Michael Lewis
Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))

This looks like some stuff for row level security perhaps. My understanding is limited, but perhaps those restrictions are influencing the planners access or reliance on stats.

Also, it would seem like you need the entire table since you don't have an explicit where clause. Why would scanning an index and then also visiting every row in the table be faster than just going directly to the table?
Reply | Threaded
Open this post in threaded view
|

Re: Performance Issue (Not using Index when joining two tables).

Gopisetty, Ramesh

Hi,

Thanks for looking into the problem/issue.    Let me give more details about the functions...    Yes,  we are using row level security. 

Actually, we have converted an Oracle VPD database (Virtual Private Databases -  In short row level security)  into postgresql.   We have several functions available to filter or to provide the row level security.

f_sel_policy_ty_static;  f_sel_policy_all  filters the tables where the vpd_key is provided initially.
f_sel_policy_prod_locale  filters the table where the prod_locale_code is provided initially.

Before running any queries in the database, we will set the context settings/row level security  based on the function below.. 

 CALLvpd_filter(vpd_key=>'XXXX',mod_user=>'XXXXX',user_locale=>'en_XX',prod_locale=>'XX');

This will set the context variables and provide row level security.   All the tables in our database consists of vpd_key which is a filter for to run the queries for a given client. 

The tables mentioned below chr_emp_position and chr_simple_val consists of many rows and the functions filter them based on the vpd_key and prod_user_locale_code.
Once after providing the row level security we executed the query joining the tables..    And where the index is not being utlitized/ the query runs slower i.e., greater than 8seconds.

The normal structure of the tables will be like this.. 

chr_emp_position  --- has columns  vpd_key,oid, home_Dept_oid, eff_date, start_Date,.....etc.,    (almost having 200+ columns).   -- primary key is  vpd_key and oid.
chr_simple_Val   --- has columns   vpd_key, oid , category, description..et.c,     (almost has around 70 columns).    (primary key is  vpd_key and oid)

The rows mentioned below are after setting the row level security on those tables .. 

i.e,  after executing the function 

 CALL vpd_filter(spv_vpd_key=>'XXXX',spv_mod_usr=>'XXXXX',spv_user_locale=>'en_XX',spv_prod_locale=>'XX');


pgwfc01q=> select count(*) from chr_simple_val;
 count
-------
 13158
(1 row)

pgwfc01q=> select count(*) from chr_emp_position;
 count
-------
   228
(1 row)


The primary key for the table chr_Simple_val  contains OID.   Still not using the index.

I'm sharing the explain plan over here..

pgwfc01q=> explain analyze select cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;
                                                                                                             QUERY P
LAN
--------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=49299.91..51848.83 rows=651 width=42) (actual time=3512.692..3797.583 rows=228 loops=1)
   Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
   ->  Seq Scan on chr_emp_position cep  (cost=0.00..2437.77 rows=436 width=11) (actual time=44.713..329.435 rows=22
8 loops=1)
         Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
         Rows Removed by Filter: 3695
   ->  Hash  (cost=49176.40..49176.40 rows=9881 width=31) (actual time=3467.907..3467.908 rows=13158 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 1031kB
         ->  Seq Scan on chr_simple_val ctc  (cost=0.00..49176.40 rows=9881 width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
               Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying, prod_locale_code))
Planning Time: 0.297 ms
 Execution Time: 3797.768 ms
(12 rows)


If i don't set the context and run as a root user the explain plan is as below..   And it executes in milliseconds even without the index having the full table scan.  

  1.  I'm not sure if my filters are time consuming.  Most of the queries works except few.  We hadn't seen the problem in Oracle.  I'm not comparing between Oracle and Postgres here.   I see both are two different flavors. but trying to get my query runs less than 8seconds.
  2.  I'm not sure why the index on chr_simple_val is not being used here  vpd_key,oid.   I'm confident if it uses index, it will/might  be faster as it is looking for 2 or  3 home departments based on oid.
  3.  I'm not sure why even having the full scan it worked for the root user.  
  4.  I'm not sure why the bitmap heap scan was not followed after setting the row level security.   How to make the bitmap heap scan on chr_emp_position as i observed here.

fyi.,

Running as a root user.

pgwfc01q=> explain analyze select cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from xxxx.chr_emp_position cep inner join wfnsch001.chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID and (ctc.vpd_key='COMMON' or ctc.vpd_key=cep.vpd_key) and cep.vpd_key='xxxxxxxxxx';

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
-------------------
 Hash Join  (cost=5503.95..6742.82 rows=453 width=42) (actual time=131.241..154.201 rows=228 loops=1)
   Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
   Join Filter: (((ctc.vpd_key)::text = 'NG_COMMON'::text) OR ((ctc.vpd_key)::text = (cep.vpd_key)::text))
   Rows Removed by Join Filter: 19770
   ->  Bitmap Heap Scan on chr_emp_position cep  (cost=10.05..362.25 rows=228 width=28) (actual time=0.056..0.253 ro
ws=228 loops=1)
         Recheck Cond: ((vpd_key)::text = 'xxxxxxxxxx'::text)
         Heap Blocks: exact=26
         ->  Bitmap Index Scan on uq1_chr_emp_position  (cost=0.00..9.99 rows=228 width=0) (actual time=0.041..0.041
 rows=228 loops=1)
               Index Cond: ((vpd_key)::text = 'xxxxxxxxxx'::text)
   ->  Hash  (cost=3600.29..3600.29 rows=88929 width=48) (actual time=130.826..130.826 rows=88929 loops=1)
         Buckets: 65536 (originally 65536)  Batches: 4 (originally 2)  Memory Usage: 3585kB
         ->  Seq Scan on chr_simple_val ctc  (cost=0.00..3600.29 rows=88929 width=48) (actual time=0.005..33.356 row
s=88929 loops=1)
 Planning Time: 3.977 ms
 Execution Time: 154.535 ms
(14 rows)

pgwfc01q=> select count(*) from wfnsch001.chr_emp_position;
 count
-------
  3923
(1 row)

pgwfc01q=> select count(*) from wfnsch001.chr_Simple_Val;
 count
-------
 88929
(1 row)

 

I'm not sure if i'm thinking in the right way or not. (As of safety purpose, i have  rebuilded indexes, analyzed, did vaccum on those tables).   Sorry for the lengthy email and i'm trying to explain my best on this.

Thank you.

Regards,
Ramesh G



From: Michael Lewis <[hidden email]>
Sent: Sunday, September 13, 2020 10:51 PM
To: Tom Lane <[hidden email]>
Cc: Tomas Vondra <[hidden email]>; Gopisetty, Ramesh <[hidden email]>; [hidden email] <[hidden email]>
Subject: Re: Performance Issue (Not using Index when joining two tables).
 
Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))

This looks like some stuff for row level security perhaps. My understanding is limited, but perhaps those restrictions are influencing the planners access or reliance on stats.

Also, it would seem like you need the entire table since you don't have an explicit where clause. Why would scanning an index and then also visiting every row in the table be faster than just going directly to the table?
Reply | Threaded
Open this post in threaded view
|

Re: Performance Issue (Not using Index when joining two tables).

Tom Lane-2
"Gopisetty, Ramesh" <[hidden email]> writes:
> Thanks for looking into the problem/issue.    Let me give more details about the functions...    Yes,  we are using row level security.

Hm.  If those expensive filter functions are being injected by RLS on the
target tables (rather than by something like an intermediate view), then
the planner is constrained to ensure that they execute before any query
conditions that it doesn't know to be "leakproof".  So unless your join
operator is leakproof, the shape of plan that you're hoping for will not
be allowed.  Since you haven't mentioned anything about data types, it's
hard to know whether that's the issue.  (The hash condition seems to be
texteq, which is leakproof, but there are also casts involved which
might not be.)

The two queries you provided explain plans for are not the same, so
comparing their plans is a fairly pointless activity.  *Of course*
the query runs faster when you restrict it to fetch fewer rows.  The
original query has no restriction clause that corresponds to the
clauses being used for index conditions in the second query, so it's
hardly a surprise that you do not get that plan, RLS or no RLS.

                        regards, tom lane