Join push down on FDW partitions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Join push down on FDW partitions

Anders Svensson
Hi.

I have not found any in-depth examples regarding join push down and the support for FDWs but from what I have read my
test of it should work but is does not. I have tested on both PG 11 and PG 13

I have two remote shards one on server cloud1 and the other cloud2

I do the following

CREATE SCHEMA cloud1;
CREATE SERVER cloud1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'x.y.z.w', port '5436', dbname 'mydb');
CREATE USER MAPPING FOR user1 SERVER cloud1 OPTIONS (user 'user1', password 'igP3tasdsad');
IMPORT FOREIGN SCHEMA public LIMIT TO (psg_240 ,pm) FROM SERVER cloud1 INTO cloud1;

CREATE SCHEMA cloud2;
CREATE SERVER cloud2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'x.y.z.w', port '5436', dbname 'mydb');
CREATE USER MAPPING FOR agama SERVER cloud2 OPTIONS (user 'agama', password 'igP3tasdsad');
IMPORT FOREIGN SCHEMA public LIMIT TO (psg_240, pm) FROM SERVER cloud2 INTO cloud2;

DROP TABLE IF EXISTS pm;
CREATE TABLE pm (LIKE cloud1.pm INCLUDING ALL) PARTITION BY HASH(pid);
ALTER TABLE pm ATTACH PARTITION cloud1.pm FOR VALUES WITH (modulus 2, remainder 0);
ALTER TABLE pm ATTACH PARTITION cloud2.pm FOR VALUES WITH (modulus 2, remainder 1);

DROP TABLE IF EXISTS psg_240;
CREATE TABLE psg_240 (LIKE cloud1.psg_240 INCLUDING ALL) PARTITION BY HASH(pid);
ALTER TABLE psg_240 ATTACH PARTITION cloud1.psg_240 FOR VALUES WITH (modulus 2, remainder 0);
ALTER TABLE psg_240 ATTACH PARTITION cloud2.psg_240 FOR VALUES WITH (modulus 2, remainder 1);

Then the query 

explain verbose select pm.pid from pm left join psg_240 on (psg_240.pid = pm.pid) where param1='fooBar';                                                                          

Gives

                                                  QUERY PLAN                                                  
----------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=384.97..710.03 rows=666 width=8)
   Output: pm.pid
   Hash Cond: (psg.pid = pm.pid)
   ->  Append  (cost=100.00..399.20 rows=5120 width=8)
         ->  Foreign Scan on cloud1.psg_240 psg_1  (cost=100.00..186.80 rows=2560 width=8)
               Output: psg_1.pid
               Remote SQL: SELECT pid FROM public.psg_240
         ->  Foreign Scan on cloud2.psg_240 psg_2  (cost=100.00..186.80 rows=2560 width=8)
               Output: psg_2.probe_id
               Remote SQL: SELECT pid FROM public.psg_240
   ->  Hash  (cost=284.65..284.65 rows=26 width=8)
         Output: pm.pid
         ->  Append  (cost=100.00..284.65 rows=26 width=8)
               ->  Foreign Scan on cloud1.pm pm_1  (cost=100.00..142.26 rows=13 width=8)
                     Output: pm_1.pid
                     Remote SQL: SELECT pid FROM public.probe_metadata WHERE ((param1 = 'fooBar'::text))
               ->  Foreign Scan on cloud2.pm pm_2  (cost=100.00..142.26 rows=13 width=8)
                     Output: pm_2.pid
                     Remote SQL: SELECT pid FROM public.pm WHERE ((param1 = 'fooBar'::text))
(19 rows)


I would expect that the join where push:ed down to the shards.

Does Postgres support this kind of usecase? If so what do I do wrong here?

best regards,
  Anders
Reply | Threaded
Open this post in threaded view
|

Re: Join push down on FDW partitions

Samed YILDIRIM
Hi Anders,
 
I haven't tested with hash partitioning. But, I think enabling enable_partitionwise_join can help you. Could you please try to enable and check its query plan?
 
set enable_partitionwise_join to on;
explain verbose select pm.pid from pm left join psg_240 on (psg_240.pid = pm.pid) where param1='fooBar';  
 
Note: It will enable this option only for your session. You need to update your postgresql.conf and reload it to enable for everybody.
* https://www.postgresql.org/docs/13/runtime-config-query.html
Best regards.
Samed YILDIRIM
 
02.12.2020, 17:35, "Anders Svensson" <[hidden email]>:
Hi.

I have not found any in-depth examples regarding join push down and the support for FDWs but from what I have read my
test of it should work but is does not. I have tested on both PG 11 and PG 13

I have two remote shards one on server cloud1 and the other cloud2

I do the following
 
CREATE SCHEMA cloud1;
CREATE SERVER cloud1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'x.y.z.w', port '5436', dbname 'mydb');
CREATE USER MAPPING FOR user1 SERVER cloud1 OPTIONS (user 'user1', password 'igP3tasdsad');
IMPORT FOREIGN SCHEMA public LIMIT TO (psg_240 ,pm) FROM SERVER cloud1 INTO cloud1;
 
CREATE SCHEMA cloud2;
CREATE SERVER cloud2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'x.y.z.w', port '5436', dbname 'mydb');
CREATE USER MAPPING FOR agama SERVER cloud2 OPTIONS (user 'agama', password 'igP3tasdsad');
IMPORT FOREIGN SCHEMA public LIMIT TO (psg_240, pm) FROM SERVER cloud2 INTO cloud2;
 
DROP TABLE IF EXISTS pm;
CREATE TABLE pm (LIKE cloud1.pm INCLUDING ALL) PARTITION BY HASH(pid);
ALTER TABLE pm ATTACH PARTITION cloud1.pm FOR VALUES WITH (modulus 2, remainder 0);
ALTER TABLE pm ATTACH PARTITION cloud2.pm FOR VALUES WITH (modulus 2, remainder 1);
 
DROP TABLE IF EXISTS psg_240;
CREATE TABLE psg_240 (LIKE cloud1.psg_240 INCLUDING ALL) PARTITION BY HASH(pid);
ALTER TABLE psg_240 ATTACH PARTITION cloud1.psg_240 FOR VALUES WITH (modulus 2, remainder 0);
ALTER TABLE psg_240 ATTACH PARTITION cloud2.psg_240 FOR VALUES WITH (modulus 2, remainder 1);

Then the query 

explain verbose select pm.pid from pm left join psg_240 on (psg_240.pid = pm.pid) where param1='fooBar';                                                                          

Gives
 
                                                  QUERY PLAN                                                  
----------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=384.97..710.03 rows=666 width=8)
   Output: pm.pid
   Hash Cond: (psg.pid = pm.pid)
   ->  Append  (cost=100.00..399.20 rows=5120 width=8)
         ->  Foreign Scan on cloud1.psg_240 psg_1  (cost=100.00..186.80 rows=2560 width=8)
               Output: psg_1.pid
               Remote SQL: SELECT pid FROM public.psg_240
         ->  Foreign Scan on cloud2.psg_240 psg_2  (cost=100.00..186.80 rows=2560 width=8)
               Output: psg_2.probe_id
               Remote SQL: SELECT pid FROM public.psg_240
   ->  Hash  (cost=284.65..284.65 rows=26 width=8)
         Output: pm.pid
         ->  Append  (cost=100.00..284.65 rows=26 width=8)
               ->  Foreign Scan on cloud1.pm pm_1  (cost=100.00..142.26 rows=13 width=8)
                     Output: pm_1.pid
                     Remote SQL: SELECT pid FROM public.probe_metadata WHERE ((param1 = 'fooBar'::text))
               ->  Foreign Scan on cloud2.pm pm_2  (cost=100.00..142.26 rows=13 width=8)
                     Output: pm_2.pid
                     Remote SQL: SELECT pid FROM public.pm WHERE ((param1 = 'fooBar'::text))
(19 rows)


I would expect that the join where push:ed down to the shards.

Does Postgres support this kind of usecase? If so what do I do wrong here?

best regards,
  Anders
Reply | Threaded
Open this post in threaded view
|

Re: Join push down on FDW partitions

Anders Svensson
Hi Samed.

Thanks again for the hint to set the right option! 🙂

I have another question that you also may know the answer to. I assume that a PG goal of having these join push downs etc is to 
get as much work done as possible on the remote servers which is great. However, it is very hard to get the Postgres planner to 
take the "right" decisions 🙁. If I add a group by it will not push down the group by which I think it should as the group by:s typically reduces
the nr of returned tuple. Regardless of the tuple cost I set it refuses to push the group by down and the query is super simple

Do you know any way to force it?

select pm.type, sum(metric1) from pm left join psg_240 on (psg_240.pid = pm.pid) where time > x and time < y group by 1;                                                     
best regards,
  Anders


From: Samed YILDIRIM <[hidden email]>
Sent: Wednesday, December 2, 2020 15:44
To: Anders Svensson <[hidden email]>; [hidden email] <[hidden email]>
Subject: Re: Join push down on FDW partitions
 
Hi Anders,
 
I haven't tested with hash partitioning. But, I think enabling enable_partitionwise_join can help you. Could you please try to enable and check its query plan?
 
set enable_partitionwise_join to on;
explain verbose select pm.pid from pm left join psg_240 on (psg_240.pid = pm.pid) where param1='fooBar';  
 
Note: It will enable this option only for your session. You need to update your postgresql.conf and reload it to enable for everybody.
* https://www.postgresql.org/docs/13/runtime-config-query.html
Best regards.
Samed YILDIRIM
 
02.12.2020, 17:35, "Anders Svensson" <[hidden email]>:
Hi.

I have not found any in-depth examples regarding join push down and the support for FDWs but from what I have read my
test of it should work but is does not. I have tested on both PG 11 and PG 13

I have two remote shards one on server cloud1 and the other cloud2

I do the following
 
CREATE SCHEMA cloud1;
CREATE SERVER cloud1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'x.y.z.w', port '5436', dbname 'mydb');
CREATE USER MAPPING FOR user1 SERVER cloud1 OPTIONS (user 'user1', password 'igP3tasdsad');
IMPORT FOREIGN SCHEMA public LIMIT TO (psg_240 ,pm) FROM SERVER cloud1 INTO cloud1;
 
CREATE SCHEMA cloud2;
CREATE SERVER cloud2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'x.y.z.w', port '5436', dbname 'mydb');
CREATE USER MAPPING FOR agama SERVER cloud2 OPTIONS (user 'agama', password 'igP3tasdsad');
IMPORT FOREIGN SCHEMA public LIMIT TO (psg_240, pm) FROM SERVER cloud2 INTO cloud2;
 
DROP TABLE IF EXISTS pm;
CREATE TABLE pm (LIKE cloud1.pm INCLUDING ALL) PARTITION BY HASH(pid);
ALTER TABLE pm ATTACH PARTITION cloud1.pm FOR VALUES WITH (modulus 2, remainder 0);
ALTER TABLE pm ATTACH PARTITION cloud2.pm FOR VALUES WITH (modulus 2, remainder 1);
 
DROP TABLE IF EXISTS psg_240;
CREATE TABLE psg_240 (LIKE cloud1.psg_240 INCLUDING ALL) PARTITION BY HASH(pid);
ALTER TABLE psg_240 ATTACH PARTITION cloud1.psg_240 FOR VALUES WITH (modulus 2, remainder 0);
ALTER TABLE psg_240 ATTACH PARTITION cloud2.psg_240 FOR VALUES WITH (modulus 2, remainder 1);

Then the query 

explain verbose select pm.pid from pm left join psg_240 on (psg_240.pid = pm.pid) where param1='fooBar';                                                                          

Gives
 
                                                  QUERY PLAN                                                  
----------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=384.97..710.03 rows=666 width=8)
   Output: pm.pid
   Hash Cond: (psg.pid = pm.pid)
   ->  Append  (cost=100.00..399.20 rows=5120 width=8)
         ->  Foreign Scan on cloud1.psg_240 psg_1  (cost=100.00..186.80 rows=2560 width=8)
               Output: psg_1.pid
               Remote SQL: SELECT pid FROM public.psg_240
         ->  Foreign Scan on cloud2.psg_240 psg_2  (cost=100.00..186.80 rows=2560 width=8)
               Output: psg_2.probe_id
               Remote SQL: SELECT pid FROM public.psg_240
   ->  Hash  (cost=284.65..284.65 rows=26 width=8)
         Output: pm.pid
         ->  Append  (cost=100.00..284.65 rows=26 width=8)
               ->  Foreign Scan on cloud1.pm pm_1  (cost=100.00..142.26 rows=13 width=8)
                     Output: pm_1.pid
                     Remote SQL: SELECT pid FROM public.probe_metadata WHERE ((param1 = 'fooBar'::text))
               ->  Foreign Scan on cloud2.pm pm_2  (cost=100.00..142.26 rows=13 width=8)
                     Output: pm_2.pid
                     Remote SQL: SELECT pid FROM public.pm WHERE ((param1 = 'fooBar'::text))
(19 rows)


I would expect that the join where push:ed down to the shards.

Does Postgres support this kind of usecase? If so what do I do wrong here?

best regards,
  Anders