Hi,
The attached patches add support for filtering rows in the publisher. The output plugin will do the work if a filter was defined in CREATE PUBLICATION command. An optional WHERE clause can be added after the table name in the CREATE PUBLICATION such as: CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000); Row that doesn't match the WHERE clause will not be sent to the subscribers. Patches 0001 and 0002 are only refactors and can be applied independently. 0003 doesn't include row filtering on initial synchronization. Comments? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento ![]() ![]() ![]() |
On Wed, Feb 28, 2018 at 08:03:02PM -0300, Euler Taveira wrote:
> Hi, > > The attached patches add support for filtering rows in the publisher. > The output plugin will do the work if a filter was defined in CREATE > PUBLICATION command. An optional WHERE clause can be added after the > table name in the CREATE PUBLICATION such as: > > CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000); > > Row that doesn't match the WHERE clause will not be sent to the subscribers. > > Patches 0001 and 0002 are only refactors and can be applied > independently. 0003 doesn't include row filtering on initial > synchronization. > > Comments? Great feature! I think a lot of people will like to have the option of trading a little extra CPU on the pub side for a bunch of network traffic and some work on the sub side. I noticed that the WHERE clause applies to all tables in the publication. Is that actually the right thing? I'm thinking of a case where we have foo(id, ...) and bar(foo_id, ....). To slice that correctly, we'd want to do the ids in the foo table and the foo_ids in the bar table. In the system as written, that would entail, at least potentially, writing a lot of publications by hand. Something like WHERE ( (table_1,..., table_N) HAS (/* WHERE clause here */) AND (table_N+1,..., table_M) HAS (/* WHERE clause here */) AND ... ) could be one way to specify. I also noticed that in psql, \dRp+ doesn't show the WHERE clause, which it probably should. Does it need regression tests? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate |
In reply to this post by Euler Taveira
On 1 March 2018 at 07:03, Euler Taveira <[hidden email]> wrote: Hi, Good idea. I haven't read this yet, but one thing to make sure you've handled is limiting the clause to referencing only the current tuple and the catalogs. user-catalog tables are OK, too, anything that is RelationIsAccessibleInLogicalDecoding(). It might be worth looking at the current logic for CHECK expressions, since the requirements are similar. In my opinion you could safely not bother with allowing access to user catalog tables in the filter expressions and limit them strictly to immutable functions and the tuple its self. |
In reply to this post by Euler Taveira
On 2018-03-01 00:03, Euler Taveira wrote:
> The attached patches add support for filtering rows in the publisher. > 001-Refactor-function-create_estate_for_relation.patch > 0002-Rename-a-WHERE-node.patch > 0003-Row-filtering-for-logical-replication.patch > Comments? Very, very useful. I really do hope this patch survives the late-arrival-cull. I built this functionality into a test program I have been using and in simple cascading replication tests it works well. I did find what I think is a bug (a bug easy to avoid but also easy to run into): The test I used was to cascade 3 instances (all on one machine) from A->B->C I ran a pgbench session in instance A, and used: in A: alter publication pub0_6515 add table pgbench_accounts where (aid between 40000 and 60000-1); in B: alter publication pub1_6516 add table pgbench_accounts; The above worked well, but when I did the same but used the filter in both publications: in A: alter publication pub0_6515 add table pgbench_accounts where (aid between 40000 and 60000-1); in B: alter publication pub1_6516 add table pgbench_accounts where (aid between 40000 and 60000-1); then the replication only worked for (pgbench-)scale 1 (hence: very little data); with larger scales it became slow (taking many minutes where the above had taken less than 1 minute), and ended up using far too much memory (or blowing up/crashing altogether). Something not quite right there. Nevertheless, I am much in favour of acquiring this functionality as soon as possible. Thanks, Erik Rijkers |
In reply to this post by David Fetter
2018-02-28 21:47 GMT-03:00 David Fetter <[hidden email]>:
> I noticed that the WHERE clause applies to all tables in the > publication. Is that actually the right thing? I'm thinking of a > case where we have foo(id, ...) and bar(foo_id, ....). To slice that > correctly, we'd want to do the ids in the foo table and the foo_ids in > the bar table. In the system as written, that would entail, at least > potentially, writing a lot of publications by hand. > I didn't make it clear in my previous email and I think you misread the attached docs. Each table can have an optional WHERE clause. I'll made it clear when I rewrite the tests. Something like: CREATE PUBLICATION tap_pub FOR TABLE tab_rowfilter_1 WHERE (a > 1000 AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0), tab_rowfilter_3; Such syntax will not block another future feature that will publish only few columns of the table. > I also noticed that in psql, \dRp+ doesn't show the WHERE clause, > which it probably should. > Yea, it could be added be I'm afraid of such long WHERE clauses. > Does it need regression tests? > I included some tests just to demonstrate the feature but I'm planning to add a separate test file for it. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento |
On Thu, Mar 01, 2018 at 12:41:04PM -0300, Euler Taveira wrote:
> 2018-02-28 21:47 GMT-03:00 David Fetter <[hidden email]>: > > I noticed that the WHERE clause applies to all tables in the > > publication. Is that actually the right thing? I'm thinking of a > > case where we have foo(id, ...) and bar(foo_id, ....). To slice that > > correctly, we'd want to do the ids in the foo table and the foo_ids in > > the bar table. In the system as written, that would entail, at least > > potentially, writing a lot of publications by hand. > > > I didn't make it clear in my previous email and I think you misread > the attached docs. Each table can have an optional WHERE clause. I'll > made it clear when I rewrite the tests. Something like: Sorry I misunderstood. > CREATE PUBLICATION tap_pub FOR TABLE tab_rowfilter_1 WHERE (a > 1000 > AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0), > tab_rowfilter_3; That's great! > Such syntax will not block another future feature that will publish > only few columns of the table. > > > I also noticed that in psql, \dRp+ doesn't show the WHERE clause, > > which it probably should. > > > Yea, it could be added be I'm afraid of such long WHERE clauses. I think of + as signifying, "I am ready to get a LOT of output in order to see more detail." Perhaps that's just me. > > Does it need regression tests? > > > I included some tests just to demonstrate the feature but I'm > planning to add a separate test file for it. Excellent. This feature looks like a nice big chunk of the user-space infrastructure needed for sharding, among other things. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate |
In reply to this post by Erik Rijkers
On 2018-03-01 16:27, Erik Rijkers wrote:
> On 2018-03-01 00:03, Euler Taveira wrote: >> The attached patches add support for filtering rows in the publisher. > >> 001-Refactor-function-create_estate_for_relation.patch >> 0002-Rename-a-WHERE-node.patch >> 0003-Row-filtering-for-logical-replication.patch > >> Comments? > > Very, very useful. I really do hope this patch survives the > late-arrival-cull. > > I built this functionality into a test program I have been using and > in simple cascading replication tests it works well. > > I did find what I think is a bug (a bug easy to avoid but also easy to > run into): > The test I used was to cascade 3 instances (all on one machine) from > A->B->C > I ran a pgbench session in instance A, and used: > in A: alter publication pub0_6515 add table pgbench_accounts where > (aid between 40000 and 60000-1); > in B: alter publication pub1_6516 add table pgbench_accounts; > > The above worked well, but when I did the same but used the filter in > both publications: > in A: alter publication pub0_6515 add table pgbench_accounts where > (aid between 40000 and 60000-1); > in B: alter publication pub1_6516 add table pgbench_accounts where > (aid between 40000 and 60000-1); > > then the replication only worked for (pgbench-)scale 1 (hence: very > little data); with larger scales it became slow (taking many minutes > where the above had taken less than 1 minute), and ended up using far > too much memory (or blowing up/crashing altogether). Something not > quite right there. > > Nevertheless, I am much in favour of acquiring this functionality as > soon as possible. Attached is 'logrep_rowfilter.sh', a demonstration of above-described bug. The program runs initdb for 3 instances in /tmp (using ports 6515, 6516, and 6517) and sets up logical replication from 1->2->3. It can be made to work by removing de where-clause on the second 'create publication' ( i.e., outcomment the $where2 variable ). > Thanks, > > > Erik Rijkers |
In reply to this post by Erik Rijkers
Hi,
On 2018-03-01 16:27:11 +0100, Erik Rijkers wrote: > Very, very useful. I really do hope this patch survives the > late-arrival-cull. FWIW, I don't think it'd be fair or prudent. There's definitely some issues (see e.g. Craig's reply), and I don't see why this patch'd deserve an exemption from the "nontrivial patches shouldn't be submitted to the last CF" policy? - Andres |
Hi,
On 3/1/18 4:27 PM, Andres Freund wrote: > On 2018-03-01 16:27:11 +0100, Erik Rijkers wrote: >> Very, very useful. I really do hope this patch survives the >> late-arrival-cull. > > FWIW, I don't think it'd be fair or prudent. There's definitely some > issues (see e.g. Craig's reply), and I don't see why this patch'd > deserve an exemption from the "nontrivial patches shouldn't be submitted > to the last CF" policy? I'm unable to find this in the CF under the title or author name. If it didn't get entered then it is definitely out. If it does have an entry, then I agree with Andres that it should be pushed to the next CF. -- -David [hidden email] |
In reply to this post by Andres Freund
2018-03-01 18:27 GMT-03:00 Andres Freund <[hidden email]>:
> FWIW, I don't think it'd be fair or prudent. There's definitely some > issues (see e.g. Craig's reply), and I don't see why this patch'd > deserve an exemption from the "nontrivial patches shouldn't be submitted > to the last CF" policy? > I forgot to mention but this feature is for v12. I know the rules and that is why I didn't add it to the in progress CF. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento |
In reply to this post by Erik Rijkers
2018-03-01 18:25 GMT-03:00 Erik Rijkers <[hidden email]>:
> Attached is 'logrep_rowfilter.sh', a demonstration of above-described bug. > Thanks for testing. I will figure out what is happening. There are some leaks around. I'll post another version when I fix some of those bugs. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento |
In reply to this post by Craig Ringer-3
2018-02-28 21:54 GMT-03:00 Craig Ringer <[hidden email]>:
> Good idea. I haven't read this yet, but one thing to make sure you've > handled is limiting the clause to referencing only the current tuple and the > catalogs. user-catalog tables are OK, too, anything that is > RelationIsAccessibleInLogicalDecoding(). > > This means only immutable functions may be invoked, since a stable or > volatile function might attempt to access a table. And views must be > prohibited or recursively checked. (We have tree walkers that would help > with this). > > It might be worth looking at the current logic for CHECK expressions, since > the requirements are similar. In my opinion you could safely not bother with > allowing access to user catalog tables in the filter expressions and limit > them strictly to immutable functions and the tuple its self. > these rules. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento |
In reply to this post by Euler Taveira
On 3/1/18 6:00 PM, Euler Taveira wrote:
> 2018-03-01 18:27 GMT-03:00 Andres Freund <[hidden email]>: >> FWIW, I don't think it'd be fair or prudent. There's definitely some >> issues (see e.g. Craig's reply), and I don't see why this patch'd >> deserve an exemption from the "nontrivial patches shouldn't be submitted >> to the last CF" policy? >> > I forgot to mention but this feature is for v12. I know the rules and > that is why I didn't add it to the in progress CF. That was the right thing to do, thank you! -- -David [hidden email] |
On Thu, Mar 01, 2018 at 06:16:17PM -0500, David Steele wrote:
> That was the right thing to do, thank you! This patch has been waiting on author for a couple of months and does not apply anymore, so I am marking as returned with feedback. If you can rebase, please feel free to resubmit. -- Michael |
In reply to this post by Euler Taveira
Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
<[hidden email]> escreveu: > The attached patches add support for filtering rows in the publisher. > I rebased the patch. I added row filtering for initial synchronization, pg_dump support and psql support. 0001 removes unused code. 0002 reduces memory use. 0003 passes only structure member that is used in create_estate_for_relation. 0004 reuses a parser node for row filtering. 0005 is the feature. 0006 prints WHERE expression in psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm not sure some of these messages will be part of the final patch). 0001, 0002, 0003 and 0008 are not mandatory for this feature. Comments? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On 2018-11-01 01:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira > <[hidden email]> escreveu: >> The attached patches add support for filtering rows in the publisher. >> I ran pgbench-over-logical-replication with a WHERE-clause and could not get this to do a correct replication. Below is the output of the attached test program. $ ./logrep_rowfilter.sh -- /home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb --pgdata=/tmp/cascade/instance1/data --encoding=UTF8 --pwfile=/tmp/bugs -- /home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb --pgdata=/tmp/cascade/instance2/data --encoding=UTF8 --pwfile=/tmp/bugs -- /home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb --pgdata=/tmp/cascade/instance3/data --encoding=UTF8 --pwfile=/tmp/bugs sleep 3s dropping old tables... creating tables... generating data... 100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s) vacuuming... creating primary keys... done. create publication pub_6515_to_6516; alter publication pub_6515_to_6516 add table pgbench_accounts where (aid between 40000 and 60000-1) ; --> where 1 alter publication pub_6515_to_6516 add table pgbench_branches; alter publication pub_6515_to_6516 add table pgbench_tellers; alter publication pub_6515_to_6516 add table pgbench_history; create publication pub_6516_to_6517; alter publication pub_6516_to_6517 add table pgbench_accounts ; -- where (aid between 40000 and 60000-1) ; --> where 2 alter publication pub_6516_to_6517 add table pgbench_branches; alter publication pub_6516_to_6517 add table pgbench_tellers; alter publication pub_6516_to_6517 add table pgbench_history; create subscription pub_6516_from_6515 connection 'port=6515 application_name=rowfilter' publication pub_6515_to_6516 with(enabled=false); alter subscription pub_6516_from_6515 enable; create subscription pub_6517_from_6516 connection 'port=6516 application_name=rowfilter' publication pub_6516_to_6517 with(enabled=false); alter subscription pub_6517_from_6516 enable; -- pgbench -p 6515 -c 16 -j 8 -T 5 -n postgres # scale 1 transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 16 number of threads: 8 duration: 5 s number of transactions actually processed: 80 latency average = 1178.106 ms tps = 13.581120 (including connections establishing) tps = 13.597443 (excluding connections establishing) accounts branches tellers history --------- --------- --------- --------- 6515 6546b1f0f 2d328ed28 7406473b0 7c1351523 e8c07347b 6516 6546b1f0f 2d328ed28 d41d8cd98 d41d8cd98 e7235f541 6517 f7c0791c8 d9c63e471 d41d8cd98 d41d8cd98 30892eea1 NOK 6515 6546b1f0f 2d328ed28 7406473b0 7c1351523 e8c07347b 6516 6546b1f0f 2d328ed28 7406473b0 5a54cf7c5 191ae1af3 6517 6546b1f0f 2d328ed28 7406473b0 5a54cf7c5 191ae1af3 NOK 6515 6546b1f0f 2d328ed28 7406473b0 7c1351523 e8c07347b 6516 6546b1f0f 2d328ed28 7406473b0 5a54cf7c5 191ae1af3 6517 6546b1f0f 2d328ed28 7406473b0 5a54cf7c5 191ae1af3 NOK [...] I let that run for 10 minutes or so but that pgbench_history table md5-values (of ports 6516 and 6517) do not change anymore, which shows that it is and remains different from the original pgbench_history table in 6515. When there is a where-clause this goes *always* wrong. Without a where-clause all logical replication tests were OK. Perhaps the error is not in our patch but something in logical replication. Attached is the test program (will need some tweaking of PATHs, PG-variables (PGPASSFILE) etc). This is the same program I used in march when you first posted a version of this patch alhough the error is different. thanks, Erik Rijkers |
On 2018-11-01 08:56, Erik Rijkers wrote:
> On 2018-11-01 01:29, Euler Taveira wrote: >> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira >> <[hidden email]> escreveu: >>> The attached patches add support for filtering rows in the publisher. >>> > > I ran pgbench-over-logical-replication with a WHERE-clause and could > not get this to do a correct replication. Below is the output of the > attached test program. > > > $ ./logrep_rowfilter.sh I have noticed that the failure to replicate correctly can be avoided by putting a wait state of (on my machine) at least 3 seconds between the setting up of the subscription and the start of pgbench. See the bash program I attached in my previous mail. The bug can be avoided by a 'sleep 5' just before the start of the actual pgbench run. So it seems this bug is due to some timing error in your patch (or possibly in logical replication itself). Erik Rijkers |
Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers <[hidden email]> escreveu:
> > I ran pgbench-over-logical-replication with a WHERE-clause and could > > not get this to do a correct replication. Below is the output of the > > attached test program. > > > > > > $ ./logrep_rowfilter.sh > Erik, thanks for testing. > So it seems this bug is due to some timing error in your patch (or > possibly in logical replication itself). > It is a bug in the new synchronization code. I'm doing some code cleanup/review and will post a new patchset after I finish it. If you want to give it a try again, apply the following patch. diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index e0eb73c..4797e0b 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -757,7 +757,7 @@ fetch_remote_table_info(char *nspname, char *relname, /* Fetch row filtering info */ resetStringInfo(&cmd); - appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter, prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (", MyLogicalRepWorker->relid); + appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter, prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (", lrel->remoteid); -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento |
On 2018-11-02 02:59, Euler Taveira wrote:
> Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers <[hidden email]> > escreveu: >> > I ran pgbench-over-logical-replication with a WHERE-clause and could >> > not get this to do a correct replication. Below is the output of the >> > attached test program. >> > >> > >> > $ ./logrep_rowfilter.sh >> > Erik, thanks for testing. > >> So it seems this bug is due to some timing error in your patch (or >> possibly in logical replication itself). >> > It is a bug in the new synchronization code. I'm doing some code > cleanup/review and will post a new patchset after I finish it. If you > want to give it a try again, apply the following patch. > > diff --git a/src/backend/replication/logical/tablesync.c > b/src/backend/replication/logical/tablesync.c > index e0eb73c..4797e0b 100644 > --- a/src/backend/replication/logical/tablesync.c > +++ b/src/backend/replication/logical/tablesync.c > [...] That does indeed fix it. Thank you, Erik Rijkers |
In reply to this post by Euler Taveira
On 2018/11/01 0:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira > <[hidden email]> escreveu: >> The attached patches add support for filtering rows in the publisher. >> > I rebased the patch. I added row filtering for initial > synchronization, pg_dump support and psql support. 0001 removes unused > code. 0002 reduces memory use. 0003 passes only structure member that > is used in create_estate_for_relation. 0004 reuses a parser node for > row filtering. 0005 is the feature. 0006 prints WHERE expression in > psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm > not sure some of these messages will be part of the final patch). > 0001, 0002, 0003 and 0008 are not mandatory for this feature. > > Comments? > > Hi, I reviewed your patches and I found a bug when I tested ALTER PUBLICATION statement. In short, ALTER PUBLICATION SET with a WHERE clause does not applied new WHERE clause. I describe the outline of the test I did and my conclusion. [TEST] I show the test case I tried in below. (1)Publisher and Subscriber I executed each statement on the publisher and the subscriber. ``` testdb=# CREATE PUBLICATION pub_testdb_t FOR TABLE t WHERE (id > 10); CREATE PUBLICATION ``` ``` testdb=# CREATE SUBSCRIPTION sub_testdb_t CONNECTION 'dbname=testdb port=5432 user=postgres' PUBLICATION pub_testdb_t; NOTICE: created replication slot "sub_testdb_t" on publisher CREATE SUBSCRIPTION ``` (2)Publisher I executed these statements shown below. testdb=# INSERT INTO t VALUES (1,1); INSERT 0 1 testdb=# INSERT INTO t VALUES (11,11); INSERT 0 1 (3)Subscriber I confirmed that the CREATE PUBLICATION statement worked well. ``` testdb=# SELECT * FROM t; id | data ----+------ 11 | 11 (1 row) ``` (4)Publisher After that, I executed ALTER PUBLICATION with a WHERE clause and inserted a new row. ``` testdb=# ALTER PUBLICATION pub_testdb_t SET TABLE t WHERE (id > 5); ALTER PUBLICATION testdb=# INSERT INTO t VALUES (7,7); INSERT 0 1 testdb=# SELECT * FROM t; id | data ----+------ 1 | 1 11 | 11 7 | 7 (3 rows) ``` (5)Subscriber I confirmed that the change of WHERE clause set by ALTER PUBLICATION statement was ignored. ``` testdb=# SELECT * FROM t; id | data ----+------ 11 | 11 (1 row) ``` [Conclusion] I think AlterPublicationTables()@publicationcmds.c has a bug. In the foreach(oldlc, oldrelids) loop, oldrel must be appended to delrels if oldrel or newrel has a WHERE clause. However, the current implementation does not, therefore, old WHERE clause is not deleted and the new WHERE clause is ignored. This is my speculation. It may not be correct, but , at least, it is a fact that ALTER PUBLICATION with a WHERE clause is not functioned in my environment and my operation described in above. Best regards, |
Free forum by Nabble | Edit this page |