row filtering for logical replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
84 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

row filtering for logical replication

Euler Taveira
Hi,

The attached patches add support for filtering rows in the publisher.
The output plugin will do the work if a filter was defined in CREATE
PUBLICATION command. An optional WHERE clause can be added after the
table name in the CREATE PUBLICATION such as:

CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);

Row that doesn't match the WHERE clause will not be sent to the subscribers.

Patches 0001 and 0002 are only refactors and can be applied
independently. 0003 doesn't include row filtering on initial
synchronization.

Comments?


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

0001-Refactor-function-create_estate_for_relation.patch (3K) Download Attachment
0002-Rename-a-WHERE-node.patch (2K) Download Attachment
0003-Row-filtering-for-logical-replication.patch (43K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

David Fetter
On Wed, Feb 28, 2018 at 08:03:02PM -0300, Euler Taveira wrote:

> Hi,
>
> The attached patches add support for filtering rows in the publisher.
> The output plugin will do the work if a filter was defined in CREATE
> PUBLICATION command. An optional WHERE clause can be added after the
> table name in the CREATE PUBLICATION such as:
>
> CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);
>
> Row that doesn't match the WHERE clause will not be sent to the subscribers.
>
> Patches 0001 and 0002 are only refactors and can be applied
> independently. 0003 doesn't include row filtering on initial
> synchronization.
>
> Comments?

Great feature!  I think a lot of people will like to have the option
of trading a little extra CPU on the pub side for a bunch of network
traffic and some work on the sub side.

I noticed that the WHERE clause applies to all tables in the
publication.  Is that actually the right thing?  I'm thinking of a
case where we have foo(id, ...) and bar(foo_id, ....).  To slice that
correctly, we'd want to do the ids in the foo table and the foo_ids in
the bar table.  In the system as written, that would entail, at least
potentially, writing a lot of publications by hand.

Something like
    WHERE (
        (table_1,..., table_N) HAS (/* WHERE clause here */) AND
        (table_N+1,..., table_M) HAS (/* WHERE clause here */) AND
        ...
    )

could be one way to specify.

I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
which it probably should.

Does it need regression tests?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Craig Ringer-3
In reply to this post by Euler Taveira
On 1 March 2018 at 07:03, Euler Taveira <[hidden email]> wrote:
Hi,

The attached patches add support for filtering rows in the publisher.
The output plugin will do the work if a filter was defined in CREATE
PUBLICATION command. An optional WHERE clause can be added after the
table name in the CREATE PUBLICATION such as:

CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);

Row that doesn't match the WHERE clause will not be sent to the subscribers.

Patches 0001 and 0002 are only refactors and can be applied
independently. 0003 doesn't include row filtering on initial
synchronization.


Good idea. I haven't read this yet, but one thing to make sure you've handled is limiting the clause to referencing only the current tuple and the catalogs. user-catalog tables are OK, too, anything that is RelationIsAccessibleInLogicalDecoding().

This means only immutable functions may be invoked, since a stable or volatile function might attempt to access a table. And views must be prohibited or recursively checked. (We have tree walkers that would help with this).

It might be worth looking at the current logic for CHECK expressions, since the requirements are similar. In my opinion you could safely not bother with allowing access to user catalog tables in the filter expressions and limit them strictly to immutable functions and the tuple its self.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Erik Rijkers
In reply to this post by Euler Taveira
On 2018-03-01 00:03, Euler Taveira wrote:
> The attached patches add support for filtering rows in the publisher.

> 001-Refactor-function-create_estate_for_relation.patch
> 0002-Rename-a-WHERE-node.patch
> 0003-Row-filtering-for-logical-replication.patch

> Comments?

Very, very useful.  I really do hope this patch survives the
late-arrival-cull.

I built this functionality into a test program I have been using and in
simple cascading replication tests it works well.

I did find what I think is a bug (a bug easy to avoid but also easy to
run into):
The test I used was to cascade 3 instances (all on one machine) from
A->B->C
I ran a pgbench session in instance A, and used:
   in A: alter publication pub0_6515 add table pgbench_accounts where
(aid between 40000 and 60000-1);
   in B: alter publication pub1_6516 add table pgbench_accounts;

The above worked well, but when I did the same but used the filter in
both publications:
   in A: alter publication pub0_6515 add table pgbench_accounts where
(aid between 40000 and 60000-1);
   in B: alter publication pub1_6516 add table pgbench_accounts where
(aid between 40000 and 60000-1);

then the replication only worked for (pgbench-)scale 1 (hence: very
little data); with larger scales it became slow (taking many minutes
where the above had taken less than 1 minute), and ended up using far
too much memory (or blowing up/crashing altogether).  Something not
quite right there.

Nevertheless, I am much in favour of acquiring this functionality as
soon as possible.


Thanks,


Erik Rijkers











Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Euler Taveira
In reply to this post by David Fetter
2018-02-28 21:47 GMT-03:00 David Fetter <[hidden email]>:
> I noticed that the WHERE clause applies to all tables in the
> publication.  Is that actually the right thing?  I'm thinking of a
> case where we have foo(id, ...) and bar(foo_id, ....).  To slice that
> correctly, we'd want to do the ids in the foo table and the foo_ids in
> the bar table.  In the system as written, that would entail, at least
> potentially, writing a lot of publications by hand.
>
I didn't make it clear in my previous email and I think you misread
the attached docs. Each table can have an optional WHERE clause. I'll
made it clear when I rewrite the tests. Something like:

CREATE PUBLICATION tap_pub FOR TABLE tab_rowfilter_1 WHERE (a > 1000
AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0),
tab_rowfilter_3;

Such syntax will not block another future feature that will publish
only few columns of the table.

> I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
> which it probably should.
>
Yea, it could be added be I'm afraid of such long WHERE clauses.

> Does it need regression tests?
>
I included some tests just to demonstrate the feature but I'm planning
to add a separate test file for it.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

David Fetter
On Thu, Mar 01, 2018 at 12:41:04PM -0300, Euler Taveira wrote:

> 2018-02-28 21:47 GMT-03:00 David Fetter <[hidden email]>:
> > I noticed that the WHERE clause applies to all tables in the
> > publication.  Is that actually the right thing?  I'm thinking of a
> > case where we have foo(id, ...) and bar(foo_id, ....).  To slice that
> > correctly, we'd want to do the ids in the foo table and the foo_ids in
> > the bar table.  In the system as written, that would entail, at least
> > potentially, writing a lot of publications by hand.
> >
> I didn't make it clear in my previous email and I think you misread
> the attached docs. Each table can have an optional WHERE clause. I'll
> made it clear when I rewrite the tests. Something like:

Sorry I misunderstood.

> CREATE PUBLICATION tap_pub FOR TABLE tab_rowfilter_1 WHERE (a > 1000
> AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0),
> tab_rowfilter_3;

That's great!

> Such syntax will not block another future feature that will publish
> only few columns of the table.
>
> > I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
> > which it probably should.
> >
> Yea, it could be added be I'm afraid of such long WHERE clauses.

I think of + as signifying, "I am ready to get a LOT of output in
order to see more detail."  Perhaps that's just me.

> > Does it need regression tests?
> >
> I included some tests just to demonstrate the feature but I'm
> planning to add a separate test file for it.

Excellent. This feature looks like a nice big chunk of the user-space
infrastructure needed for sharding, among other things.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Erik Rijkers
In reply to this post by Erik Rijkers
On 2018-03-01 16:27, Erik Rijkers wrote:

> On 2018-03-01 00:03, Euler Taveira wrote:
>> The attached patches add support for filtering rows in the publisher.
>
>> 001-Refactor-function-create_estate_for_relation.patch
>> 0002-Rename-a-WHERE-node.patch
>> 0003-Row-filtering-for-logical-replication.patch
>
>> Comments?
>
> Very, very useful.  I really do hope this patch survives the
> late-arrival-cull.
>
> I built this functionality into a test program I have been using and
> in simple cascading replication tests it works well.
>
> I did find what I think is a bug (a bug easy to avoid but also easy to
> run into):
> The test I used was to cascade 3 instances (all on one machine) from
> A->B->C
> I ran a pgbench session in instance A, and used:
>   in A: alter publication pub0_6515 add table pgbench_accounts where
> (aid between 40000 and 60000-1);
>   in B: alter publication pub1_6516 add table pgbench_accounts;
>
> The above worked well, but when I did the same but used the filter in
> both publications:
>   in A: alter publication pub0_6515 add table pgbench_accounts where
> (aid between 40000 and 60000-1);
>   in B: alter publication pub1_6516 add table pgbench_accounts where
> (aid between 40000 and 60000-1);
>
> then the replication only worked for (pgbench-)scale 1 (hence: very
> little data); with larger scales it became slow (taking many minutes
> where the above had taken less than 1 minute), and ended up using far
> too much memory (or blowing up/crashing altogether).  Something not
> quite right there.
>
> Nevertheless, I am much in favour of acquiring this functionality as
> soon as possible.

Attached is 'logrep_rowfilter.sh', a demonstration of above-described
bug.

The program runs initdb for 3 instances in /tmp (using ports 6515, 6516,
and 6517) and sets up logical replication from 1->2->3.

It can be made to work by removing de where-clause on the second 'create
publication' ( i.e., outcomment the $where2 variable ).


> Thanks,
>
>
> Erik Rijkers

logrep_rowfilter.sh (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Andres Freund
In reply to this post by Erik Rijkers
Hi,

On 2018-03-01 16:27:11 +0100, Erik Rijkers wrote:
> Very, very useful.  I really do hope this patch survives the
> late-arrival-cull.

FWIW, I don't think it'd be fair or prudent. There's definitely some
issues (see e.g. Craig's reply), and I don't see why this patch'd
deserve an exemption from the "nontrivial patches shouldn't be submitted
to the last CF" policy?

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

David Steele
Hi,

On 3/1/18 4:27 PM, Andres Freund wrote:
> On 2018-03-01 16:27:11 +0100, Erik Rijkers wrote:
>> Very, very useful.  I really do hope this patch survives the
>> late-arrival-cull.
>
> FWIW, I don't think it'd be fair or prudent. There's definitely some
> issues (see e.g. Craig's reply), and I don't see why this patch'd
> deserve an exemption from the "nontrivial patches shouldn't be submitted
> to the last CF" policy?

I'm unable to find this in the CF under the title or author name.  If it
didn't get entered then it is definitely out.

If it does have an entry, then I agree with Andres that it should be
pushed to the next CF.

--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Euler Taveira
In reply to this post by Andres Freund
2018-03-01 18:27 GMT-03:00 Andres Freund <[hidden email]>:
> FWIW, I don't think it'd be fair or prudent. There's definitely some
> issues (see e.g. Craig's reply), and I don't see why this patch'd
> deserve an exemption from the "nontrivial patches shouldn't be submitted
> to the last CF" policy?
>
I forgot to mention but this feature is for v12. I know the rules and
that is why I didn't add it to the in progress CF.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Euler Taveira
In reply to this post by Erik Rijkers
2018-03-01 18:25 GMT-03:00 Erik Rijkers <[hidden email]>:
> Attached is 'logrep_rowfilter.sh', a demonstration of above-described bug.
>
Thanks for testing. I will figure out what is happening. There are
some leaks around. I'll post another version when I fix some of those
bugs.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Euler Taveira
In reply to this post by Craig Ringer-3
2018-02-28 21:54 GMT-03:00 Craig Ringer <[hidden email]>:

> Good idea. I haven't read this yet, but one thing to make sure you've
> handled is limiting the clause to referencing only the current tuple and the
> catalogs. user-catalog tables are OK, too, anything that is
> RelationIsAccessibleInLogicalDecoding().
>
> This means only immutable functions may be invoked, since a stable or
> volatile function might attempt to access a table. And views must be
> prohibited or recursively checked. (We have tree walkers that would help
> with this).
>
> It might be worth looking at the current logic for CHECK expressions, since
> the requirements are similar. In my opinion you could safely not bother with
> allowing access to user catalog tables in the filter expressions and limit
> them strictly to immutable functions and the tuple its self.
>
IIRC implementation is similar to RLS expressions. I'll check all of
these rules.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

David Steele
In reply to this post by Euler Taveira
On 3/1/18 6:00 PM, Euler Taveira wrote:
> 2018-03-01 18:27 GMT-03:00 Andres Freund <[hidden email]>:
>> FWIW, I don't think it'd be fair or prudent. There's definitely some
>> issues (see e.g. Craig's reply), and I don't see why this patch'd
>> deserve an exemption from the "nontrivial patches shouldn't be submitted
>> to the last CF" policy?
>>
> I forgot to mention but this feature is for v12. I know the rules and
> that is why I didn't add it to the in progress CF.

That was the right thing to do, thank you!

--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Michael Paquier-2
On Thu, Mar 01, 2018 at 06:16:17PM -0500, David Steele wrote:
> That was the right thing to do, thank you!

This patch has been waiting on author for a couple of months and does
not apply anymore, so I am marking as returned with feedback.  If you
can rebase, please feel free to resubmit.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Euler Taveira
In reply to this post by Euler Taveira
Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
<[hidden email]> escreveu:
> The attached patches add support for filtering rows in the publisher.
>
I rebased the patch. I added row filtering for initial
synchronization, pg_dump support and psql support. 0001 removes unused
code. 0002 reduces memory use. 0003 passes only structure member that
is used in create_estate_for_relation. 0004 reuses a parser node for
row filtering. 0005 is the feature. 0006 prints WHERE expression in
psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
not sure some of these messages will be part of the final patch).
0001, 0002, 0003 and 0008 are not mandatory for this feature.

Comments?


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

0001-Remove-unused-atttypmod-column-from-initial-table-sy.patch (2K) Download Attachment
0002-Store-number-of-tuples-in-WalRcvExecResult.patch (4K) Download Attachment
0003-Refactor-function-create_estate_for_relation.patch (3K) Download Attachment
0004-Rename-a-WHERE-node.patch (2K) Download Attachment
0005-Row-filtering-for-logical-replication.patch (55K) Download Attachment
0006-Print-publication-WHERE-condition-in-psql.patch (1K) Download Attachment
0007-Publication-where-condition-support-for-pg_dump.patch (3K) Download Attachment
0008-Debug-for-row-filtering.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Erik Rijkers
On 2018-11-01 01:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> <[hidden email]> escreveu:
>> The attached patches add support for filtering rows in the publisher.
>>

I ran pgbench-over-logical-replication with a WHERE-clause and could not
get this to do a correct replication.  Below is the output of the
attached test program.


$ ./logrep_rowfilter.sh
--
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb
--pgdata=/tmp/cascade/instance1/data --encoding=UTF8 --pwfile=/tmp/bugs
--
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb
--pgdata=/tmp/cascade/instance2/data --encoding=UTF8 --pwfile=/tmp/bugs
--
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb
--pgdata=/tmp/cascade/instance3/data --encoding=UTF8 --pwfile=/tmp/bugs
sleep 3s
dropping old tables...
creating tables...
generating data...
100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
create publication pub_6515_to_6516;
alter publication pub_6515_to_6516 add table pgbench_accounts where (aid
between 40000 and 60000-1) ; --> where 1
alter publication pub_6515_to_6516 add table pgbench_branches;
alter publication pub_6515_to_6516 add table pgbench_tellers;
alter publication pub_6515_to_6516 add table pgbench_history;
create publication pub_6516_to_6517;
alter publication pub_6516_to_6517 add table pgbench_accounts ; -- where
(aid between 40000 and 60000-1) ; --> where 2
alter publication pub_6516_to_6517 add table pgbench_branches;
alter publication pub_6516_to_6517 add table pgbench_tellers;
alter publication pub_6516_to_6517 add table pgbench_history;

create subscription pub_6516_from_6515 connection 'port=6515
application_name=rowfilter'
        publication pub_6515_to_6516 with(enabled=false);
alter subscription pub_6516_from_6515 enable;
create subscription pub_6517_from_6516 connection 'port=6516
application_name=rowfilter'
        publication pub_6516_to_6517 with(enabled=false);
alter subscription pub_6517_from_6516 enable;
-- pgbench -p 6515 -c 16 -j 8 -T 5 -n postgres    #  scale 1
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 8
duration: 5 s
number of transactions actually processed: 80
latency average = 1178.106 ms
tps = 13.581120 (including connections establishing)
tps = 13.597443 (excluding connections establishing)

        accounts  branches   tellers   history
        --------- --------- --------- ---------
6515   6546b1f0f 2d328ed28 7406473b0 7c1351523    e8c07347b
6516   6546b1f0f 2d328ed28 d41d8cd98 d41d8cd98    e7235f541
6517   f7c0791c8 d9c63e471 d41d8cd98 d41d8cd98    30892eea1   NOK

6515   6546b1f0f 2d328ed28 7406473b0 7c1351523    e8c07347b
6516   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3
6517   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3   NOK

6515   6546b1f0f 2d328ed28 7406473b0 7c1351523    e8c07347b
6516   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3
6517   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3   NOK

[...]

I let that run for 10 minutes or so but that pgbench_history table
md5-values (of ports 6516 and 6517) do not change anymore, which shows
that it is and remains different from the original pgbench_history table
in 6515.


When there is a where-clause this goes *always* wrong.

Without a where-clause all logical replication tests were OK.  Perhaps
the error is not in our patch but something in logical replication.

Attached is the test program (will need some tweaking of PATHs,
PG-variables (PGPASSFILE) etc).  This is the same program I used in
march when you first posted a version of this patch alhough the error is
different.


thanks,


Erik Rijkers






logrep_rowfilter.sh (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Erik Rijkers
On 2018-11-01 08:56, Erik Rijkers wrote:

> On 2018-11-01 01:29, Euler Taveira wrote:
>> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
>> <[hidden email]> escreveu:
>>> The attached patches add support for filtering rows in the publisher.
>>>
>
> I ran pgbench-over-logical-replication with a WHERE-clause and could
> not get this to do a correct replication.  Below is the output of the
> attached test program.
>
>
> $ ./logrep_rowfilter.sh

I have noticed that the failure to replicate correctly can be avoided by
putting a wait state of (on my machine) at least 3 seconds between the
setting up of the subscription and the start of pgbench.  See the bash
program I attached in my previous mail.  The bug can be avoided by a
'sleep 5' just before the start of the actual pgbench run.

So it seems this bug is due to some timing error in your patch (or
possibly in logical replication itself).


Erik Rijkers



Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Euler Taveira
Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers <[hidden email]> escreveu:
> > I ran pgbench-over-logical-replication with a WHERE-clause and could
> > not get this to do a correct replication.  Below is the output of the
> > attached test program.
> >
> >
> > $ ./logrep_rowfilter.sh
>
Erik, thanks for testing.

> So it seems this bug is due to some timing error in your patch (or
> possibly in logical replication itself).
>
It is a bug in the new synchronization code. I'm doing some code
cleanup/review and will post a new patchset after I finish it. If you
want to give it a try again, apply the following patch.

diff --git a/src/backend/replication/logical/tablesync.c
b/src/backend/replication/logical/tablesync.c
index e0eb73c..4797e0b 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -757,7 +757,7 @@ fetch_remote_table_info(char *nspname, char *relname,

        /* Fetch row filtering info */
        resetStringInfo(&cmd);
-       appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
MyLogicalRepWorker->relid);
+       appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
lrel->remoteid);


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Erik Rijkers
On 2018-11-02 02:59, Euler Taveira wrote:

> Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers <[hidden email]>
> escreveu:
>> > I ran pgbench-over-logical-replication with a WHERE-clause and could
>> > not get this to do a correct replication.  Below is the output of the
>> > attached test program.
>> >
>> >
>> > $ ./logrep_rowfilter.sh
>>
> Erik, thanks for testing.
>
>> So it seems this bug is due to some timing error in your patch (or
>> possibly in logical replication itself).
>>
> It is a bug in the new synchronization code. I'm doing some code
> cleanup/review and will post a new patchset after I finish it. If you
> want to give it a try again, apply the following patch.
>
> diff --git a/src/backend/replication/logical/tablesync.c
> b/src/backend/replication/logical/tablesync.c
> index e0eb73c..4797e0b 100644
> --- a/src/backend/replication/logical/tablesync.c
> +++ b/src/backend/replication/logical/tablesync.c
> [...]


That does indeed fix it.

Thank you,

Erik Rijkers


Reply | Threaded
Open this post in threaded view
|

Re: row filtering for logical replication

Hironobu SUZUKI
In reply to this post by Euler Taveira
On 2018/11/01 0:29, Euler Taveira wrote:

> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> <[hidden email]> escreveu:
>> The attached patches add support for filtering rows in the publisher.
>>
> I rebased the patch. I added row filtering for initial
> synchronization, pg_dump support and psql support. 0001 removes unused
> code. 0002 reduces memory use. 0003 passes only structure member that
> is used in create_estate_for_relation. 0004 reuses a parser node for
> row filtering. 0005 is the feature. 0006 prints WHERE expression in
> psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
> not sure some of these messages will be part of the final patch).
> 0001, 0002, 0003 and 0008 are not mandatory for this feature.
>
> Comments?
>
>

Hi,

I reviewed your patches and I found a bug when I tested ALTER
PUBLICATION statement.

In short, ALTER PUBLICATION SET with a WHERE clause does not applied new
WHERE clause.

I describe the outline of the test I did and my conclusion.

[TEST]
I show the test case I tried in below.

(1)Publisher and Subscriber

I executed each statement on the publisher and the subscriber.

```
testdb=# CREATE PUBLICATION pub_testdb_t FOR TABLE t WHERE (id > 10);
CREATE PUBLICATION
```

```
testdb=# CREATE SUBSCRIPTION sub_testdb_t CONNECTION 'dbname=testdb
port=5432 user=postgres' PUBLICATION pub_testdb_t;
NOTICE:  created replication slot "sub_testdb_t" on publisher
CREATE SUBSCRIPTION
```

(2)Publisher

I executed these statements shown below.

testdb=# INSERT INTO t VALUES (1,1);
INSERT 0 1
testdb=# INSERT INTO t VALUES (11,11);
INSERT 0 1

(3)Subscriber

I confirmed that the CREATE PUBLICATION statement worked well.

```
testdb=# SELECT * FROM t;
  id | data
----+------
  11 |   11
(1 row)
```

(4)Publisher
After that, I executed ALTER PUBLICATION with a WHERE clause and
inserted a new row.

```
testdb=# ALTER  PUBLICATION pub_testdb_t SET TABLE t WHERE (id > 5);
ALTER PUBLICATION

testdb=# INSERT INTO t VALUES (7,7);
INSERT 0 1

testdb=# SELECT * FROM t;
  id | data
----+------
   1 |    1
  11 |   11
   7 |    7
(3 rows)
```

(5)Subscriber
I confirmed that the change of WHERE clause set by ALTER PUBLICATION
statement was ignored.

```
testdb=# SELECT * FROM t;
  id | data
----+------
  11 |   11
(1 row)
```

[Conclusion]
I think AlterPublicationTables()@publicationcmds.c has a bug.

In the foreach(oldlc, oldrelids) loop, oldrel must be appended to
delrels if oldrel or newrel has a WHERE clause. However, the current
implementation does not, therefore, old WHERE clause is not deleted and
the new WHERE clause is ignored.

This is my speculation. It may not be correct, but , at least, it is a
fact that ALTER PUBLICATION with a WHERE clause is not functioned in my
environment and my operation described in above.


Best regards,

12345