Added schema level support for publication.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Added schema level support for publication.

vignesh C
Hi,

This feature adds schema option while creating publication. Users will
be able to specify one or more schemas while creating publication,
when the user specifies schema option, then the data changes for the
tables present in the schema specified by the user will be replicated
to the subscriber. Few examples have been listed below:

Create a publication that publishes all changes for all the tables
present in production schema:
CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;

Create a publication that publishes all changes for all the tables
present in marketing and sales schemas:
CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;

Add some schemas to the publication:
ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;

Drop some schema from the publication:
ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;

Attached is a POC patch for the same. I felt this feature would be quite useful.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

v1-0001-Added-schema-level-support-for-publication.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

akapila
On Thu, Jan 7, 2021 at 10:03 PM vignesh C <[hidden email]> wrote:

>
> This feature adds schema option while creating publication. Users will
> be able to specify one or more schemas while creating publication,
> when the user specifies schema option, then the data changes for the
> tables present in the schema specified by the user will be replicated
> to the subscriber. Few examples have been listed below:
>
> Create a publication that publishes all changes for all the tables
> present in production schema:
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;
>
> Create a publication that publishes all changes for all the tables
> present in marketing and sales schemas:
> CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;
>
> Add some schemas to the publication:
> ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
>
> Drop some schema from the publication:
> ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;
>
> Attached is a POC patch for the same. I felt this feature would be quite useful.
>

What do we do if the user Drops the schema? Do we automatically remove
it from the publication?

I see some use of such a feature but you haven't described the use
case or how did you arrive at the conclusion that it would be quite
useful?

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
On Fri, Jan 8, 2021 at 4:32 PM Amit Kapila <[hidden email]> wrote:

>
> On Thu, Jan 7, 2021 at 10:03 PM vignesh C <[hidden email]> wrote:
> >
> > This feature adds schema option while creating publication. Users will
> > be able to specify one or more schemas while creating publication,
> > when the user specifies schema option, then the data changes for the
> > tables present in the schema specified by the user will be replicated
> > to the subscriber. Few examples have been listed below:
> >
> > Create a publication that publishes all changes for all the tables
> > present in production schema:
> > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;
> >
> > Create a publication that publishes all changes for all the tables
> > present in marketing and sales schemas:
> > CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;
> >
> > Add some schemas to the publication:
> > ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
> >
> > Drop some schema from the publication:
> > ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;
> >
> > Attached is a POC patch for the same. I felt this feature would be quite useful.
> >
>
> What do we do if the user Drops the schema? Do we automatically remove
> it from the publication?
>
I have not yet handled this scenario yet, I will handle this and
adding of tests in the next patch.

> I see some use of such a feature but you haven't described the use
> case or how did you arrive at the conclusion that it would be quite
> useful?
>
Currently there are a couple of options "FOR All TABLES" and "FOR
TABLE" when a user creates a publication, 1) either to subscribe to
the changes of all the tables or 2) subscribe to a few tables. There
is no option for users to subscribe to relations present in the
schemas. User has to manually identify the list of tables present in
the schema and specify the list of tables in that schema using the
"FOR TABLE" option. Similarly if a user wants to subscribe to n number
of schemas, the user has to do this for the required schemas, this is
a tedious process. This feature helps the user to take care of this
internally using schema option.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Bharath Rupireddy
On Sat, Jan 9, 2021 at 5:21 PM vignesh C <[hidden email]> wrote:

> > What do we do if the user Drops the schema? Do we automatically remove
> > it from the publication?
> >
> I have not yet handled this scenario yet, I will handle this and
> adding of tests in the next patch.
>
> > I see some use of such a feature but you haven't described the use
> > case or how did you arrive at the conclusion that it would be quite
> > useful?
> >
> Currently there are a couple of options "FOR All TABLES" and "FOR
> TABLE" when a user creates a publication, 1) either to subscribe to
> the changes of all the tables or 2) subscribe to a few tables. There
> is no option for users to subscribe to relations present in the
> schemas. User has to manually identify the list of tables present in
> the schema and specify the list of tables in that schema using the
> "FOR TABLE" option. Similarly if a user wants to subscribe to n number
> of schemas, the user has to do this for the required schemas, this is
> a tedious process. This feature helps the user to take care of this
> internally using schema option.

I think this feature can be useful, in case a user has a lot of tables
to publish inside a schema. Having said that, I wonder if this feature
mandates users to create the same schema with same
permissions/authorizations manually on the subscriber, because logical
replication doesn't propagate any ddl's so are the schema or schema
changes? Or is it that the list of tables from the publisher can go
into a different schema on the subscriber?

Since the schema can have other objects such as data types, functions,
operators, I'm sure with your feature, non-table objects will be
skipped.

As Amit pointed out earlier, the behaviour when schema dropped, I
think we should also consider when schema is altered, say altered to a
different name, maybe we should change that in the publication too.

In general, what happens if we have some temporary tables or foreign
tables inside the schema, will they be allowed to send the data to
subscribers?

And, with this feature, since there can be many huge tables inside a
schema, the initial table sync phase of the replication can take a
while.

Say a user has created a publication for a schema with hundreds of
tables in it, at some point later, can he stop replicating a single or
some tables from that schema?

IMO, it's better to have the syntax - CREATE PUBLICATION
production_publication FOR ALL TABLES IN SCHEMA production - just
added IN between for all tables and schema.

Say a user has a schema with 121 tables in it, and wants to replicate
only 120 or 199 or even lesser tables out of it, so can we have some
skip option to the new syntax, something like below?
CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
production WITH skip = marketing, accounts, sales;  --> meaning is,
replicate all the tables in the schema production except marketing,
accounts, sales tables.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Bharath Rupireddy
On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
<[hidden email]> wrote:

> I think this feature can be useful, in case a user has a lot of tables
> to publish inside a schema. Having said that, I wonder if this feature
> mandates users to create the same schema with same
> permissions/authorizations manually on the subscriber, because logical
> replication doesn't propagate any ddl's so are the schema or schema
> changes? Or is it that the list of tables from the publisher can go
> into a different schema on the subscriber?
>
> Since the schema can have other objects such as data types, functions,
> operators, I'm sure with your feature, non-table objects will be
> skipped.
>
> As Amit pointed out earlier, the behaviour when schema dropped, I
> think we should also consider when schema is altered, say altered to a
> different name, maybe we should change that in the publication too.
>
> In general, what happens if we have some temporary tables or foreign
> tables inside the schema, will they be allowed to send the data to
> subscribers?
>
> And, with this feature, since there can be many huge tables inside a
> schema, the initial table sync phase of the replication can take a
> while.
>
> Say a user has created a publication for a schema with hundreds of
> tables in it, at some point later, can he stop replicating a single or
> some tables from that schema?
>
> IMO, it's better to have the syntax - CREATE PUBLICATION
> production_publication FOR ALL TABLES IN SCHEMA production - just
> added IN between for all tables and schema.
>
> Say a user has a schema with 121 tables in it, and wants to replicate
> only 120 or 199 or even lesser tables out of it, so can we have some
> skip option to the new syntax, something like below?
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> production WITH skip = marketing, accounts, sales;  --> meaning is,
> replicate all the tables in the schema production except marketing,
> accounts, sales tables.

One more point - if the publication is created for a schema with no or
some initial tables, will all the future tables that may get added to
the schema will be replicated too?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Dilip Kumar-2
On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy
<[hidden email]> wrote:

>
> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
> <[hidden email]> wrote:
> > I think this feature can be useful, in case a user has a lot of tables
> > to publish inside a schema. Having said that, I wonder if this feature
> > mandates users to create the same schema with same
> > permissions/authorizations manually on the subscriber, because logical
> > replication doesn't propagate any ddl's so are the schema or schema
> > changes? Or is it that the list of tables from the publisher can go
> > into a different schema on the subscriber?
> >
> > Since the schema can have other objects such as data types, functions,
> > operators, I'm sure with your feature, non-table objects will be
> > skipped.
> >
> > As Amit pointed out earlier, the behaviour when schema dropped, I
> > think we should also consider when schema is altered, say altered to a
> > different name, maybe we should change that in the publication too.
> >
> > In general, what happens if we have some temporary tables or foreign
> > tables inside the schema, will they be allowed to send the data to
> > subscribers?
> >
> > And, with this feature, since there can be many huge tables inside a
> > schema, the initial table sync phase of the replication can take a
> > while.
> >
> > Say a user has created a publication for a schema with hundreds of
> > tables in it, at some point later, can he stop replicating a single or
> > some tables from that schema?
> >
> > IMO, it's better to have the syntax - CREATE PUBLICATION
> > production_publication FOR ALL TABLES IN SCHEMA production - just
> > added IN between for all tables and schema.
> >
> > Say a user has a schema with 121 tables in it, and wants to replicate
> > only 120 or 199 or even lesser tables out of it, so can we have some
> > skip option to the new syntax, something like below?
> > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> > production WITH skip = marketing, accounts, sales;  --> meaning is,
> > replicate all the tables in the schema production except marketing,
> > accounts, sales tables.
>
> One more point - if the publication is created for a schema with no or
> some initial tables, will all the future tables that may get added to
> the schema will be replicated too?
>

I expect this should be the behavior.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
In reply to this post by Bharath Rupireddy
Thanks for your comments Bharath, please find my opinion below.

On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
<[hidden email]> wrote:
> I think this feature can be useful, in case a user has a lot of tables
> to publish inside a schema. Having said that, I wonder if this feature
> mandates users to create the same schema with same
> permissions/authorizations manually on the subscriber, because logical
> replication doesn't propagate any ddl's so are the schema or schema
> changes? Or is it that the list of tables from the publisher can go
> into a different schema on the subscriber?
>

DDL's will not be propagated to the subscriber. Users have to create
the schema & tables in the subscriber. No change in
Permissions/authorizations handling, it will be the same as the
existing behavior for relations.

> Since the schema can have other objects such as data types, functions,
> operators, I'm sure with your feature, non-table objects will be
> skipped.
>

Yes, only table data will be sent to subscribers, non-table objects
will be skipped.

> As Amit pointed out earlier, the behaviour when schema dropped, I
> think we should also consider when schema is altered, say altered to a
> different name, maybe we should change that in the publication too.
>

I agree that when schema is altered the renamed schema should be
reflected in the publication.

> In general, what happens if we have some temporary tables or foreign
> tables inside the schema, will they be allowed to send the data to
> subscribers?
>

Temporary tables & foreign tables will not be added to the publications.

> And, with this feature, since there can be many huge tables inside a
> schema, the initial table sync phase of the replication can take a
> while.
>

Yes this is required.

> Say a user has created a publication for a schema with hundreds of
> tables in it, at some point later, can he stop replicating a single or
> some tables from that schema?
>

There is no provision for this currently.

> IMO, it's better to have the syntax - CREATE PUBLICATION
> production_publication FOR ALL TABLES IN SCHEMA production - just
> added IN between for all tables and schema.
>

I'm ok with the proposed syntax, I would like others' opinion too
before making the change.

> Say a user has a schema with 121 tables in it, and wants to replicate
> only 120 or 199 or even lesser tables out of it, so can we have some
> skip option to the new syntax, something like below?
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> production WITH skip = marketing, accounts, sales;  --> meaning is,
> replicate all the tables in the schema production except marketing,
> accounts, sales tables.
>

Yes this is a good use case, will include this change.

Thanks for the comments, I will handle the comments and post a patch for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
In reply to this post by Bharath Rupireddy
On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy
<[hidden email]> wrote:
>
> One more point - if the publication is created for a schema with no or
> some initial tables, will all the future tables that may get added to
> the schema will be replicated too?
>

I agree on this, when a relation is added to the schema it should be
added to the publication.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Bharath Rupireddy
In reply to this post by vignesh C
On Sun, Jan 10, 2021 at 11:21 PM vignesh C <[hidden email]> wrote:

> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
> <[hidden email]> wrote:
> > I think this feature can be useful, in case a user has a lot of tables
> > to publish inside a schema. Having said that, I wonder if this feature
> > mandates users to create the same schema with same
> > permissions/authorizations manually on the subscriber, because logical
> > replication doesn't propagate any ddl's so are the schema or schema
> > changes? Or is it that the list of tables from the publisher can go
> > into a different schema on the subscriber?
> >
>
> DDL's will not be propagated to the subscriber. Users have to create
> the schema & tables in the subscriber. No change in
> Permissions/authorizations handling, it will be the same as the
> existing behavior for relations.

Looks like the existing behaviour already requires users to create the
schema on the subscriber when publishing the tables from that schema.
Otherwise, an error is thrown on the subscriber [1].

[1] on publisher:
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1(a1 int, b1 int);
INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
CREATE PUBLICATION testpub FOR TABLE myschema.t1;

on subscriber:
postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
dbname=postgres user=bharath port=5432' PUBLICATION testpub;
ERROR:  schema "myschema" does not exist
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1(a1 int, b1 int);
postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
dbname=postgres user=bharath port=5432' PUBLICATION testpub;
NOTICE:  created replication slot "testsub" on publisher
CREATE SUBSCRIPTION

> > Since the schema can have other objects such as data types, functions,
> > operators, I'm sure with your feature, non-table objects will be
> > skipped.
> >
>
> Yes, only table data will be sent to subscribers, non-table objects
> will be skipped.

Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
for all the tables in the database, does this i.e. skips non-table
objects and temporary tables, foreign tables and so on. So, your
feature also can behave the same way, but within the scope of the
given schema/s.

> > As Amit pointed out earlier, the behaviour when schema dropped, I
> > think we should also consider when schema is altered, say altered to a
> > different name, maybe we should change that in the publication too.
> >
>
> I agree that when schema is altered the renamed schema should be
> reflected in the publication.

I think, it's not only making sure that the publisher side has the new
altered schema, but also the subscriber needs those alters. Having
said that, since these alters come under DDL changes and in logical
replication we don't publish the scheme changes to the subscriber, we
may not need to anything extra for informing the schema alters to the
subscriber from the publisher, the users might have to do the same
schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION;  should work for them? If this understanding is
correct, then we should document this.

> > In general, what happens if we have some temporary tables or foreign
> > tables inside the schema, will they be allowed to send the data to
> > subscribers?
> >
>
> Temporary tables & foreign tables will not be added to the publications.

Yes the existing logical replication framework doesn't allow
replication of temporary, unlogged, foreign tables and other non-table
relations such as materialized views, indexes etc [1]. The CREATE
PUBLICATION statement either fails in check_publication_add_relation
or before that.

CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
single table tab1 is any of the above restricted tables, seems fine.
But, if there's a list of tables with CREATE PUBLICATION testpub FOR
TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
unlogged_tab5, normal_tab6, normal_tab7 ......; This query fails on
first encounter of the restricted table, say at temp_tab2. Whereas,
CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
tables and continue to add the accepted tables into the publication
within the database.

IMHO, if there's a list of tables specified with FOR TABLE, then
instead of throwing an error in case of any restricted table, we can
issue a warning and continue with the addition of accepted tables into
the publication. If done, this behaviour will be in sync with FOR ALL
TABLES;

Thoughts? If okay, I can work on a patch.

Related to error messages: when foreign table is specified in CREATE
PUBLICATION statement, then "ERROR:  "f1" is not a table", is thrown
[1], how about the error message "ERROR: foerign table "f1" cannot be
replicated". In general, it would be good if we could have the error
messages something like in [2] instead of the existing [1].

Thoughts? If okay, I can work on a patch.

[1]
t1 is a temporary table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR:  table "t1" cannot be replicated
DETAIL:  Temporary and unlogged relations cannot be replicated.

t1 is an unlogged table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR:  table "t1" cannot be replicated
DETAIL:  Temporary and unlogged relations cannot be replicated.

f1 is a foreign table:
postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
ERROR:  "f1" is not a table
DETAIL:  Only tables can be added to publications.

mv1 is a mat view:
postgres=# CREATE PUBLICATION testpub FOR TABLE mv1;
ERROR:  "mv1" is not a table

idx1 is an index:
postgres=# CREATE PUBLICATION testpub FOR TABLE idx1;
ERROR:  "idx1" is an index

[2]
t1 is a temporary table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR:  temporary table "t1" cannot be replicated
DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.

t1 is an unlogged table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR:  unlogged table "t1" cannot be replicated
DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.

f1 is a foreign table:
postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
ERROR:  foreign table "f1" cannot be replicated
DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.

> > Say a user has created a publication for a schema with hundreds of
> > tables in it, at some point later, can he stop replicating a single or
> > some tables from that schema?
> >
>
> There is no provision for this currently.

The documentation [1] says, we can ALTER PUBLICATION testpub DROP
TABLE t1; which removes the table from the list of published tables,
but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
PUBLICATION; for the changes to become effective on the subscriber. I
have done some testing for this case:
1) created publication for table t1, see \d+ t1, the associated
publication is visible in the output
2) created subscription on the subscriber, initial available data from
the publisher for table t1 is received
3) insert into table t1 on the publisher
4) inserted data in (3) is received in the subscriber table t1
5) alter publication to drop the table t1 on the publisher, see \d+
t1, there will not be any associated publication in the output
6) execute alter subscription refresh publication on the subscriber,
with the expectation that it should not receive the data from the
publisher for the table t1 since it's dropped from the publication in
(5)
7) insert into table t1 on the publisher
8) still the newly inserted data in (7) from the publisher, will be
received into the table t1 in the subscriber

IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
the above use case, it looks like a bug to me. If I'm wrong, can
someone correct me?

Thoughts?

[1] - https://www.postgresql.org/docs/devel/sql-alterpublication.html

> > IMO, it's better to have the syntax - CREATE PUBLICATION
> > production_publication FOR ALL TABLES IN SCHEMA production - just
> > added IN between for all tables and schema.
> >
>
> I'm ok with the proposed syntax, I would like others' opinion too
> before making the change.

Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

japin

On Mon, 11 Jan 2021 at 14:15, Bharath Rupireddy wrote:

> On Sun, Jan 10, 2021 at 11:21 PM vignesh C <[hidden email]> wrote:
>> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
>> <[hidden email]> wrote:
>> > I think this feature can be useful, in case a user has a lot of tables
>> > to publish inside a schema. Having said that, I wonder if this feature
>> > mandates users to create the same schema with same
>> > permissions/authorizations manually on the subscriber, because logical
>> > replication doesn't propagate any ddl's so are the schema or schema
>> > changes? Or is it that the list of tables from the publisher can go
>> > into a different schema on the subscriber?
>> >
>>
>> DDL's will not be propagated to the subscriber. Users have to create
>> the schema & tables in the subscriber. No change in
>> Permissions/authorizations handling, it will be the same as the
>> existing behavior for relations.
>
> Looks like the existing behaviour already requires users to create the
> schema on the subscriber when publishing the tables from that schema.
> Otherwise, an error is thrown on the subscriber [1].
>
> [1] on publisher:
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
> CREATE PUBLICATION testpub FOR TABLE myschema.t1;
>
> on subscriber:
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> ERROR:  schema "myschema" does not exist
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> NOTICE:  created replication slot "testsub" on publisher
> CREATE SUBSCRIPTION
>
>> > Since the schema can have other objects such as data types, functions,
>> > operators, I'm sure with your feature, non-table objects will be
>> > skipped.
>> >
>>
>> Yes, only table data will be sent to subscribers, non-table objects
>> will be skipped.
>
> Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
> for all the tables in the database, does this i.e. skips non-table
> objects and temporary tables, foreign tables and so on. So, your
> feature also can behave the same way, but within the scope of the
> given schema/s.
>
>> > As Amit pointed out earlier, the behaviour when schema dropped, I
>> > think we should also consider when schema is altered, say altered to a
>> > different name, maybe we should change that in the publication too.
>> >
>>
>> I agree that when schema is altered the renamed schema should be
>> reflected in the publication.
>
> I think, it's not only making sure that the publisher side has the new
> altered schema, but also the subscriber needs those alters. Having
> said that, since these alters come under DDL changes and in logical
> replication we don't publish the scheme changes to the subscriber, we
> may not need to anything extra for informing the schema alters to the
> subscriber from the publisher, the users might have to do the same
> schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
> REFRESH PUBLICATION;  should work for them? If this understanding is
> correct, then we should document this.
>
>> > In general, what happens if we have some temporary tables or foreign
>> > tables inside the schema, will they be allowed to send the data to
>> > subscribers?
>> >
>>
>> Temporary tables & foreign tables will not be added to the publications.
>
> Yes the existing logical replication framework doesn't allow
> replication of temporary, unlogged, foreign tables and other non-table
> relations such as materialized views, indexes etc [1]. The CREATE
> PUBLICATION statement either fails in check_publication_add_relation
> or before that.
>
> CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
> single table tab1 is any of the above restricted tables, seems fine.
> But, if there's a list of tables with CREATE PUBLICATION testpub FOR
> TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
> unlogged_tab5, normal_tab6, normal_tab7 ......; This query fails on
> first encounter of the restricted table, say at temp_tab2. Whereas,
> CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
> tables and continue to add the accepted tables into the publication
> within the database.
>
> IMHO, if there's a list of tables specified with FOR TABLE, then
> instead of throwing an error in case of any restricted table, we can
> issue a warning and continue with the addition of accepted tables into
> the publication. If done, this behaviour will be in sync with FOR ALL
> TABLES;
>
> Thoughts? If okay, I can work on a patch.
>
> Related to error messages: when foreign table is specified in CREATE
> PUBLICATION statement, then "ERROR:  "f1" is not a table", is thrown
> [1], how about the error message "ERROR: foerign table "f1" cannot be
> replicated". In general, it would be good if we could have the error
> messages something like in [2] instead of the existing [1].
>
> Thoughts? If okay, I can work on a patch.
>
> [1]
> t1 is a temporary table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  table "t1" cannot be replicated
> DETAIL:  Temporary and unlogged relations cannot be replicated.
>
> t1 is an unlogged table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  table "t1" cannot be replicated
> DETAIL:  Temporary and unlogged relations cannot be replicated.
>
> f1 is a foreign table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> ERROR:  "f1" is not a table
> DETAIL:  Only tables can be added to publications.
>
> mv1 is a mat view:
> postgres=# CREATE PUBLICATION testpub FOR TABLE mv1;
> ERROR:  "mv1" is not a table
>
> idx1 is an index:
> postgres=# CREATE PUBLICATION testpub FOR TABLE idx1;
> ERROR:  "idx1" is an index
>
> [2]
> t1 is a temporary table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  temporary table "t1" cannot be replicated
> DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.
>
> t1 is an unlogged table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  unlogged table "t1" cannot be replicated
> DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.
>
> f1 is a foreign table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> ERROR:  foreign table "f1" cannot be replicated
> DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.
>
>> > Say a user has created a publication for a schema with hundreds of
>> > tables in it, at some point later, can he stop replicating a single or
>> > some tables from that schema?
>> >
>>
>> There is no provision for this currently.
>
> The documentation [1] says, we can ALTER PUBLICATION testpub DROP
> TABLE t1; which removes the table from the list of published tables,
> but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
> PUBLICATION; for the changes to become effective on the subscriber. I
> have done some testing for this case:
> 1) created publication for table t1, see \d+ t1, the associated
> publication is visible in the output
> 2) created subscription on the subscriber, initial available data from
> the publisher for table t1 is received
> 3) insert into table t1 on the publisher
> 4) inserted data in (3) is received in the subscriber table t1
> 5) alter publication to drop the table t1 on the publisher, see \d+
> t1, there will not be any associated publication in the output
> 6) execute alter subscription refresh publication on the subscriber,
> with the expectation that it should not receive the data from the
> publisher for the table t1 since it's dropped from the publication in
> (5)
> 7) insert into table t1 on the publisher
> 8) still the newly inserted data in (7) from the publisher, will be
> received into the table t1 in the subscriber
>
> IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
> the above use case, it looks like a bug to me. If I'm wrong, can
> someone correct me?
>

Yes, if we modify the publication, we should refresh the subscription on
each subscriber.  It looks strange for me, especially for partitioned
tables [1].

> Thoughts?
>

Can we trace the different between publication and subscription, and
auto-refresh subscription on subscriber?

[1]
https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Bharath Rupireddy
On Mon, Jan 11, 2021 at 1:29 PM japin <[hidden email]> wrote:

> >> > Say a user has created a publication for a schema with hundreds of
> >> > tables in it, at some point later, can he stop replicating a single or
> >> > some tables from that schema?
> >> >
> >>
> >> There is no provision for this currently.
> >
> > The documentation [1] says, we can ALTER PUBLICATION testpub DROP
> > TABLE t1; which removes the table from the list of published tables,
> > but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
> > PUBLICATION; for the changes to become effective on the subscriber. I
> > have done some testing for this case:
> > 1) created publication for table t1, see \d+ t1, the associated
> > publication is visible in the output
> > 2) created subscription on the subscriber, initial available data from
> > the publisher for table t1 is received
> > 3) insert into table t1 on the publisher
> > 4) inserted data in (3) is received in the subscriber table t1
> > 5) alter publication to drop the table t1 on the publisher, see \d+
> > t1, there will not be any associated publication in the output
> > 6) execute alter subscription refresh publication on the subscriber,
> > with the expectation that it should not receive the data from the
> > publisher for the table t1 since it's dropped from the publication in
> > (5)
> > 7) insert into table t1 on the publisher
> > 8) still the newly inserted data in (7) from the publisher, will be
> > received into the table t1 in the subscriber
> >
> > IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
> > the above use case, it looks like a bug to me. If I'm wrong, can
> > someone correct me?
> >
>
> Yes, if we modify the publication, we should refresh the subscription on
> each subscriber.  It looks strange for me, especially for partitioned
> tables [1].
>
> > Thoughts?
> >
>
> Can we trace the different between publication and subscription, and
> auto-refresh subscription on subscriber?
>
> [1]
> https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

As Amit stated in your thread [1], DDLs like creation of the new
tables or partitions, schema changes etc. on the publisher can not be
replicated automatically by the logical replication framework to the
subscriber. Users have to perform those DDLs on the subscribers by
themselves.

If your point is to at least issue the ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION; from the publication whenever the publication is
altered i.e. added or dropped tables, IMO, we cannot do this, because
running this command on the subscriber only makes sense, after user
runs the same DDLs (which were run on the publisher) also on the
subscriber. To illustrate this:
1) create a new table or partition on the publisher and add it to
publisher, note that the same table has not yet been created on the
subscriber
2) imagine the publisher issuing an auto refresh command to all the
subscribers, then, no point in that right, because the new table or
the partition is not yet created on all the subscribers.

So, IMO, we can not have an auto refresh mechanism, until we have the
feature to replicate the DDL changes to all the subscribers.

What I stated in my earlier mail [1] is that even though we drop the
table from the publication in the publisher and run a refresh
publication on the subscriber, still the data is being replicated from
the publisher to the subscriber table. I just wanted to know whether
this is the expected behaviour or what exactly means. a user running
ALTER PUBLICATION mypub DROP TABLE mytable;

[1] - https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

japin

On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy <[hidden email]> wrote:

On Mon, Jan 11, 2021 at 1:29 PM japin <[hidden email]> wrote:
Say a user has created a publication for a schema with hundreds of
tables in it, at some point later, can he stop replicating a single or
some tables from that schema?


There is no provision for this currently.

The documentation [1] says, we can ALTER PUBLICATION testpub DROP
TABLE t1; which removes the table from the list of published tables,
but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
PUBLICATION; for the changes to become effective on the subscriber. I
have done some testing for this case:
1) created publication for table t1, see \d+ t1, the associated
publication is visible in the output
2) created subscription on the subscriber, initial available data from
the publisher for table t1 is received
3) insert into table t1 on the publisher
4) inserted data in (3) is received in the subscriber table t1
5) alter publication to drop the table t1 on the publisher, see \d+
t1, there will not be any associated publication in the output
6) execute alter subscription refresh publication on the subscriber,
with the expectation that it should not receive the data from the
publisher for the table t1 since it's dropped from the publication in
(5)
7) insert into table t1 on the publisher
8) still the newly inserted data in (7) from the publisher, will be
received into the table t1 in the subscriber

IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
the above use case, it looks like a bug to me. If I'm wrong, can
someone correct me?


Yes, if we modify the publication, we should refresh the subscription on
each subscriber.  It looks strange for me, especially for partitioned
tables [1].

Thoughts?


Can we trace the different between publication and subscription, and
auto-refresh subscription on subscriber?

[1]
https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

As Amit stated in your thread [1], DDLs like creation of the new
tables or partitions, schema changes etc. on the publisher can not be
replicated automatically by the logical replication framework to the
subscriber. Users have to perform those DDLs on the subscribers by
themselves.

Yeah, DDLs is not supported now. On publisher, the partitions are added to the
publication automatically.  However, even if we created the partitions on subscriber,
it will not sync the new partitions, because it likes normal table, we must execute
ALTER SUBSCRIPTION my_test REFRESH PUBLICATION; 
I preferred it will automatically add to subscription when we create the new partitions
if the partitions is already in publication.

If your point is to at least issue the ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION; from the publication whenever the publication is
altered i.e. added or dropped tables, IMO, we cannot do this, because
running this command on the subscriber only makes sense, after user
runs the same DDLs (which were run on the publisher) also on the
subscriber. To illustrate this:
1) create a new table or partition on the publisher and add it to
publisher, note that the same table has not yet been created on the
subscriber
2) imagine the publisher issuing an auto refresh command to all the
subscribers, then, no point in that right, because the new table or
the partition is not yet created on all the subscribers.

So, IMO, we can not have an auto refresh mechanism, until we have the
feature to replicate the DDL changes to all the subscribers.

Thanks for clarification. 

What I stated in my earlier mail [1] is that even though we drop the
table from the publication in the publisher and run a refresh
publication on the subscriber, still the data is being replicated from
the publisher to the subscriber table. I just wanted to know whether
this is the expected behaviour or what exactly means. a user running
ALTER PUBLICATION mypub DROP TABLE mytable;

[1] - https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com

Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not insert data
between step (5) and (6), it will not ship the new records, however, if we insert
data between step (5) and (6), it will ship the new records.

(1) created publication for table t1, t2
postgres[8765]=# CREATE TABLE t1 (a int);
CREATE TABLE
postgres[8765]=# CREATE TABLE t2 (a int);
CREATE TABLE
postgres[8765]=# INSERT INTO t1 VALUES (1);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (1);
INSERT 0 1
postgres[8765]=# CREATE PUBLICATION mypub1 FOR TABLE t1;
CREATE PUBLICATION
postgres[8765]=# CREATE PUBLICATION mypub2 FOR TABLE t2;
CREATE PUBLICATION

(2) created subscription on the subscriber
postgres[9812]=# CREATE TABLE t1 (a int);
CREATE TABLE
postgres[9812]=# CREATE TABLE t2 (a int);
CREATE TABLE
postgres[9812]=# CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub1;
NOTICE:  created replication slot "mysub1" on publisher
CREATE SUBSCRIPTION
postgres[9812]=# CREATE SUBSCRIPTION mysub2 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub2;
NOTICE:  created replication slot "mysub2" on publisher
CREATE SUBSCRIPTION
postgres[9812]=# TABLE t1;
 a
---
 1
(1 row)

postgres[9812]=# TABLE t2;
 a
---
 1
(1 row)

(3) insert into table t1, t2 on the publisher
postgres[8765]=# INSERT INTO t1 VALUES (2);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (2);
INSERT 0 1

(4) inserted data in (3) is received in the subscriber table t1, t2
postgres[9812]=# TABLE t1;
 a
---
 1
 2
(2 rows)

postgres[9812]=# TABLE t2;
 a
---
 1
 2
(2 rows)

(5) alter publication to drop table, we insert a record into t1 on publisher
postgres[8765]=# ALTER PUBLICATION mypub1 DROP TABLE t1;
ALTER PUBLICATION
postgres[8765]=# ALTER PUBLICATION mypub2 DROP TABLE t2;
ALTER PUBLICATION
postgres[8765]=# INSERT INTO t1 VALUES (3);
INSERT 0 1

(6) check the data on subscriber
postgres[9812]=# TABLE t1;
 a
---
 1
 2
 3
(3 rows)

postgres[9812]=# TABLE t2;
 a
---
 1
 2
(2 rows)

(7) refresh subscription on the subscriber
postgres[9812]=# ALTER SUBSCRIPTION mysub1 REFRESH PUBLICATION;
ALTER SUBSCRIPTION
postgres[9812]=# ALTER SUBSCRIPTION mysub2 REFRESH PUBLICATION;
ALTER SUBSCRIPTION

(8) insert into table t1, t2 on the publisher
postgres[8765]=# INSERT INTO t1 VALUES (4);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (4);
INSERT 0 1

(9) the newly inserted data in (5), (7) for table t1 shipped to subscriber, however
t2 doesn’t
postgres[9812]=# TABLE t1;
 a
---
 1
 2
 3
 4
(4 rows)

postgres[9812]=# TABLE t2;
 a
---
 1
 2
(2 rows)

It might be a bug.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

akapila
On Mon, Jan 11, 2021 at 4:29 PM Li Japin <[hidden email]> wrote:

>
>
> On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy <[hidden email]> wrote:
>
> On Mon, Jan 11, 2021 at 1:29 PM japin <[hidden email]> wrote:
>
> Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not insert data
> between step (5) and (6), it will not ship the new records, however, if we insert
> data between step (5) and (6), it will ship the new records.
>
>
..
> It might be a bug.
>

Can you check pg_publication_rel and pg_subscription_rel? Also, this
is not related to the feature proposed in this thread, so it is better
to start a new thread to conclude whether this is a bug or not.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Bharath Rupireddy
On Mon, Jan 11, 2021 at 5:25 PM Amit Kapila <[hidden email]> wrote:

>
> On Mon, Jan 11, 2021 at 4:29 PM Li Japin <[hidden email]> wrote:
> >
> >
> > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy <[hidden email]> wrote:
> >
> > On Mon, Jan 11, 2021 at 1:29 PM japin <[hidden email]> wrote:
> >
> > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not insert data
> > between step (5) and (6), it will not ship the new records, however, if we insert
> > data between step (5) and (6), it will ship the new records.
> >
> >
> ..
> > It might be a bug.
> >
>
> Can you check pg_publication_rel and pg_subscription_rel? Also, this
> is not related to the feature proposed in this thread, so it is better
> to start a new thread to conclude whether this is a bug or not.

Thanks Amit, sure I will verify and start a new thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Bharath Rupireddy
On Mon, Jan 11, 2021 at 5:28 PM Bharath Rupireddy
<[hidden email]> wrote:

>
> On Mon, Jan 11, 2021 at 5:25 PM Amit Kapila <[hidden email]> wrote:
> >
> > On Mon, Jan 11, 2021 at 4:29 PM Li Japin <[hidden email]> wrote:
> > >
> > >
> > > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy <[hidden email]> wrote:
> > >
> > > On Mon, Jan 11, 2021 at 1:29 PM japin <[hidden email]> wrote:
> > >
> > > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not insert data
> > > between step (5) and (6), it will not ship the new records, however, if we insert
> > > data between step (5) and (6), it will ship the new records.
> > >
> > >
> > ..
> > > It might be a bug.
> > >
> >
> > Can you check pg_publication_rel and pg_subscription_rel? Also, this
> > is not related to the feature proposed in this thread, so it is better
> > to start a new thread to conclude whether this is a bug or not.
>
> Thanks Amit, sure I will verify and start a new thread.

I started a new thread [1] for this, please have a look.

[1] - https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

Rahila Syed-2
In reply to this post by vignesh C
Hi Vignesh,

I had a look at the patch, please consider following comments.

On Thu, Jan 7, 2021 at 10:03 PM vignesh C <[hidden email]> wrote:
Hi,

This feature adds schema option while creating publication. Users will
be able to specify one or more schemas while creating publication,
when the user specifies schema option, then the data changes for the
tables present in the schema specified by the user will be replicated
to the subscriber. Few examples have been listed below:

Create a publication that publishes all changes for all the tables
present in production schema:
CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;

Should it be FOR TABLES IN SCHEMA instead of FOR ALL TABLES SCHEMA?
 
Create a publication that publishes all changes for all the tables
present in marketing and sales schemas:
CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;

Add some schemas to the publication:
ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;

As per current implementation this command fails even if one of the schemas does not 
exist. I think this is counterintuitive, it should throw a warning and continue adding the rest.
 
Drop some schema from the publication:
ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;

Same for drop schema, if one of these schemas does not exist in publication,
the entire DROP operation is aborted.
 
Thank you,
Rahila Syed
 
Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
In reply to this post by Bharath Rupireddy
On Mon, Jan 11, 2021 at 11:45 AM Bharath Rupireddy
<[hidden email]> wrote:

>
> On Sun, Jan 10, 2021 at 11:21 PM vignesh C <[hidden email]> wrote:
> > On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
> > <[hidden email]> wrote:
> > > I think this feature can be useful, in case a user has a lot of tables
> > > to publish inside a schema. Having said that, I wonder if this feature
> > > mandates users to create the same schema with same
> > > permissions/authorizations manually on the subscriber, because logical
> > > replication doesn't propagate any ddl's so are the schema or schema
> > > changes? Or is it that the list of tables from the publisher can go
> > > into a different schema on the subscriber?
> > >
> >
> > DDL's will not be propagated to the subscriber. Users have to create
> > the schema & tables in the subscriber. No change in
> > Permissions/authorizations handling, it will be the same as the
> > existing behavior for relations.
>
> Looks like the existing behaviour already requires users to create the
> schema on the subscriber when publishing the tables from that schema.
> Otherwise, an error is thrown on the subscriber [1].
>
> [1] on publisher:
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
> CREATE PUBLICATION testpub FOR TABLE myschema.t1;
>
> on subscriber:
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> ERROR:  schema "myschema" does not exist
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> NOTICE:  created replication slot "testsub" on publisher
> CREATE SUBSCRIPTION
>

Yes this feature will also have the same behavior, DDL creation should
be taken care of by DBA similar to how it is handled may be using
pg_dump or use sql scripts/statements to update.

> > > Since the schema can have other objects such as data types, functions,
> > > operators, I'm sure with your feature, non-table objects will be
> > > skipped.
> > >
> >
> > Yes, only table data will be sent to subscribers, non-table objects
> > will be skipped.
>
> Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
> for all the tables in the database, does this i.e. skips non-table
> objects and temporary tables, foreign tables and so on. So, your
> feature also can behave the same way, but within the scope of the
> given schema/s.
>

Yes, it will support only normal tables. Non table objects, foreign
tables & temporary tables will not be supported.

> > > As Amit pointed out earlier, the behaviour when schema dropped, I
> > > think we should also consider when schema is altered, say altered to a
> > > different name, maybe we should change that in the publication too.
> > >
> >
> > I agree that when schema is altered the renamed schema should be
> > reflected in the publication.
>
> I think, it's not only making sure that the publisher side has the new
> altered schema, but also the subscriber needs those alters. Having
> said that, since these alters come under DDL changes and in logical
> replication we don't publish the scheme changes to the subscriber, we
> may not need to anything extra for informing the schema alters to the
> subscriber from the publisher, the users might have to do the same
> schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
> REFRESH PUBLICATION;  should work for them? If this understanding is
> correct, then we should document this.
>

Yes, alter schema changes will be reflected in the publication, the
corresponding change needs to be done by the user on the subscriber
side. Once a user does ALTER SUBSCRIPTION testsub REFRESH PUBLICATION,
the new altered schema changes will be reflected in the subscriber. I
will update the documentation that user need to take care for
subscription refresh.

> > > In general, what happens if we have some temporary tables or foreign
> > > tables inside the schema, will they be allowed to send the data to
> > > subscribers?
> > >
> >
> > Temporary tables & foreign tables will not be added to the publications.
>
> Yes the existing logical replication framework doesn't allow
> replication of temporary, unlogged, foreign tables and other non-table
> relations such as materialized views, indexes etc [1]. The CREATE
> PUBLICATION statement either fails in check_publication_add_relation
> or before that.
>
> CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
> single table tab1 is any of the above restricted tables, seems fine.
> But, if there's a list of tables with CREATE PUBLICATION testpub FOR
> TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
> unlogged_tab5, normal_tab6, normal_tab7 ......; This query fails on
> first encounter of the restricted table, say at temp_tab2. Whereas,
> CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
> tables and continue to add the accepted tables into the publication
> within the database.
>
> IMHO, if there's a list of tables specified with FOR TABLE, then
> instead of throwing an error in case of any restricted table, we can
> issue a warning and continue with the addition of accepted tables into
> the publication. If done, this behaviour will be in sync with FOR ALL
> TABLES;
>
> Thoughts? If okay, I can work on a patch.
>

I feel we can start a new thread for this to seek opinion if this base
change is required and reach consensus.

> Related to error messages: when foreign table is specified in CREATE
> PUBLICATION statement, then "ERROR:  "f1" is not a table", is thrown
> [1], how about the error message "ERROR: foerign table "f1" cannot be
> replicated". In general, it would be good if we could have the error
> messages something like in [2] instead of the existing [1].
>
> Thoughts? If okay, I can work on a patch.
>

I feel we can start a new thread for this to seek opinion and reach consensus.

> [1]
> t1 is a temporary table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  table "t1" cannot be replicated
> DETAIL:  Temporary and unlogged relations cannot be replicated.
>
> t1 is an unlogged table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  table "t1" cannot be replicated
> DETAIL:  Temporary and unlogged relations cannot be replicated.
>
> f1 is a foreign table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> ERROR:  "f1" is not a table
> DETAIL:  Only tables can be added to publications.
>
> mv1 is a mat view:
> postgres=# CREATE PUBLICATION testpub FOR TABLE mv1;
> ERROR:  "mv1" is not a table
>
> idx1 is an index:
> postgres=# CREATE PUBLICATION testpub FOR TABLE idx1;
> ERROR:  "idx1" is an index
>
> [2]
> t1 is a temporary table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  temporary table "t1" cannot be replicated
> DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.
>
> t1 is an unlogged table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR:  unlogged table "t1" cannot be replicated
> DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.
>
> f1 is a foreign table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> ERROR:  foreign table "f1" cannot be replicated
> DETAIL:  Temporary, unlogged and foreign relations cannot be replicated.
>
> > > Say a user has created a publication for a schema with hundreds of
> > > tables in it, at some point later, can he stop replicating a single or
> > > some tables from that schema?
> > >
> >
> > There is no provision for this currently.
>
> The documentation [1] says, we can ALTER PUBLICATION testpub DROP
> TABLE t1; which removes the table from the list of published tables,
> but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
> PUBLICATION; for the changes to become effective on the subscriber. I
> have done some testing for this case:
> 1) created publication for table t1, see \d+ t1, the associated
> publication is visible in the output
> 2) created subscription on the subscriber, initial available data from
> the publisher for table t1 is received
> 3) insert into table t1 on the publisher
> 4) inserted data in (3) is received in the subscriber table t1
> 5) alter publication to drop the table t1 on the publisher, see \d+
> t1, there will not be any associated publication in the output
> 6) execute alter subscription refresh publication on the subscriber,
> with the expectation that it should not receive the data from the
> publisher for the table t1 since it's dropped from the publication in
> (5)
> 7) insert into table t1 on the publisher
> 8) still the newly inserted data in (7) from the publisher, will be
> received into the table t1 in the subscriber
>
> IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
> the above use case, it looks like a bug to me. If I'm wrong, can
> someone correct me?
>
> Thoughts?

I think you started a new thread for this, let's conclude on this there.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
In reply to this post by Rahila Syed-2
Thanks Rahila for your comments, please find my thoughts below.

On Tue, Jan 12, 2021 at 5:16 PM Rahila Syed <[hidden email]> wrote:

>
> Hi Vignesh,
>
> I had a look at the patch, please consider following comments.
>
> On Thu, Jan 7, 2021 at 10:03 PM vignesh C <[hidden email]> wrote:
>>
>> Hi,
>>
>> This feature adds schema option while creating publication. Users will
>> be able to specify one or more schemas while creating publication,
>> when the user specifies schema option, then the data changes for the
>> tables present in the schema specified by the user will be replicated
>> to the subscriber. Few examples have been listed below:
>>
>> Create a publication that publishes all changes for all the tables
>> present in production schema:
>> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;
>>
> Should it be FOR TABLES IN SCHEMA instead of FOR ALL TABLES SCHEMA?
>  

For adding tables into publication we have syntax like:
CREATE PUBLICATION mypub FOR TABLE tbl1, tbl2;
For all tables we have syntax like:
CREATE PUBLICATION mypub FOR ALL TABLES;

Initial syntax that I proposed was:
CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;

I feel the below syntax is better, as it is consistent with others:
CREATE PUBLICATION mypub FOR SCHEMA sch1, sch2;

>>
>> Create a publication that publishes all changes for all the tables
>> present in marketing and sales schemas:
>> CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;
>>
>> Add some schemas to the publication:
>> ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
>>
> As per current implementation this command fails even if one of the schemas does not
> exist. I think this is counterintuitive, it should throw a warning and continue adding the rest.
>  

We have the similar behavior in case of adding non-existent table while creating a publication:
CREATE PUBLICATION mypub3 FOR TABLE non_existent_table;
ERROR:  relation "non_existent_table" does not exist
I feel we can keep the behavior similarly to maintain the consistency.

>>
>> Drop some schema from the publication:
>> ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;
>>
> Same for drop schema, if one of these schemas does not exist in publication,
> the entire DROP operation is aborted.

We have similar behavior in case of dropping non-existent table while altering publication
alter publication mypub5 drop table test1,testx;
ERROR:  relation "testx" does not exist
I feel we can keep the behavior similarly to maintain the consistency.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
In reply to this post by vignesh C
On Sat, Jan 9, 2021 at 5:21 PM vignesh C <[hidden email]> wrote:

>
> On Fri, Jan 8, 2021 at 4:32 PM Amit Kapila <[hidden email]> wrote:
> >
> > On Thu, Jan 7, 2021 at 10:03 PM vignesh C <[hidden email]> wrote:
> > >
> > > This feature adds schema option while creating publication. Users will
> > > be able to specify one or more schemas while creating publication,
> > > when the user specifies schema option, then the data changes for the
> > > tables present in the schema specified by the user will be replicated
> > > to the subscriber. Few examples have been listed below:
> > >
> > > Create a publication that publishes all changes for all the tables
> > > present in production schema:
> > > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;
> > >
> > > Create a publication that publishes all changes for all the tables
> > > present in marketing and sales schemas:
> > > CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;
> > >
> > > Add some schemas to the publication:
> > > ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
> > >
> > > Drop some schema from the publication:
> > > ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;
> > >
> > > Attached is a POC patch for the same. I felt this feature would be quite useful.
> > >
> >
> > What do we do if the user Drops the schema? Do we automatically remove
> > it from the publication?
> >
> I have not yet handled this scenario yet, I will handle this and
> adding of tests in the next patch.
>
I have handled the above scenario(drop schema should automatically
remove the schema entry from publication schema relation) & addition
of tests in the new v2 patch attached.
Thoughts?


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

v2-0001-Added-schema-level-support-for-publication.patch (127K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Added schema level support for publication.

vignesh C
In reply to this post by Bharath Rupireddy
On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy <[hidden email]> wrote:
>
>
> As Amit pointed out earlier, the behaviour when schema dropped, I
> think we should also consider when schema is altered, say altered to a
> different name, maybe we should change that in the publication too.
>

This scenario is handled now in the patch posted at [1].

>
> Say a user has a schema with 121 tables in it, and wants to replicate
> only 120 or 199 or even lesser tables out of it, so can we have some
> skip option to the new syntax, something like below?
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> production WITH skip = marketing, accounts, sales;  --> meaning is,
> replicate all the tables in the schema production except marketing,
> accounts, sales tables.

I have not yet handled this, I'm working on this and will try post a patch for this in the next version.
12