Question about logically replicating a multi-TB database

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about logically replicating a multi-TB database

Ron-2

https://www.postgresql.org/docs/12/logical-replication-architecture.html

"30.5.1. Initial Snapshot
The initial data in existing subscribed tables are snapshotted and copied in
a parallel instance of a special kind of apply process. This process will
create its own temporary replication slot and copy the existing data. Once
existing data is copied, the worker enters synchronization mode, which
ensures that the table is brought up to a synchronized state with the main
apply process by streaming any changes that happened during the initial data
copy using standard logical replication. Once the synchronization is done,
the control of the replication of the table is given back to the main apply
process where the replication continues as normal."

We've got a Large and busy database which we need to migrate to AWS, and a
shared 1Gbps pipe between the source and AWS.

How does one set up publish/subscribe in such a case?  All the examples I've
seen are with trivially small databases.

--
Angular momentum makes the world go 'round.


Reply | Threaded
Open this post in threaded view
|

Re: Question about logically replicating a multi-TB database

Avinash Kumar
Hi,


On Sat, Jan 23, 2021 at 7:28 PM Ron <[hidden email]> wrote:

https://www.postgresql.org/docs/12/logical-replication-architecture.html

"30.5.1. Initial Snapshot
The initial data in existing subscribed tables are snapshotted and copied in
a parallel instance of a special kind of apply process. This process will
create its own temporary replication slot and copy the existing data. Once
existing data is copied, the worker enters synchronization mode, which
ensures that the table is brought up to a synchronized state with the main
apply process by streaming any changes that happened during the initial data
copy using standard logical replication. Once the synchronization is done,
the control of the replication of the table is given back to the main apply
process where the replication continues as normal."

We've got a Large and busy database which we need to migrate to AWS, and a
shared 1Gbps pipe between the source and AWS.

How does one set up publish/subscribe in such a case?  All the examples I've
seen are with trivially small databases.
 
If the speed is the concern, You should try avoiding the copy of the initial snapshot and rather do it manually through a parallel pg_dump and pg_restore. 
And rather disable the initial snapshot copy once the parallel restore is completed. 

--
Angular momentum makes the world go 'round.




--
Regards,
Avinash Vallarapu (Avi)

Reply | Threaded
Open this post in threaded view
|

Re: Question about logically replicating a multi-TB database

Ron-2
On 1/23/21 5:37 PM, Avinash Kumar wrote:
Hi,


On Sat, Jan 23, 2021 at 7:28 PM Ron <[hidden email]> wrote:

https://www.postgresql.org/docs/12/logical-replication-architecture.html

"30.5.1. Initial Snapshot
The initial data in existing subscribed tables are snapshotted and copied in
a parallel instance of a special kind of apply process. This process will
create its own temporary replication slot and copy the existing data. Once
existing data is copied, the worker enters synchronization mode, which
ensures that the table is brought up to a synchronized state with the main
apply process by streaming any changes that happened during the initial data
copy using standard logical replication. Once the synchronization is done,
the control of the replication of the table is given back to the main apply
process where the replication continues as normal."

We've got a Large and busy database which we need to migrate to AWS, and a
shared 1Gbps pipe between the source and AWS.

How does one set up publish/subscribe in such a case?  All the examples I've
seen are with trivially small databases.
 
If the speed is the concern, You should try avoiding the copy of the initial snapshot and rather do it manually through a parallel pg_dump and pg_restore. 
And rather disable the initial snapshot copy once the parallel restore is completed.

Is this what you mean?

source> CREATE PUBLICATION ...
target> CREATE SUBSCRIPTION ... WITH (COPY_DATA='false');
source> pg_dump -d database ...
transport dump files to target system
target> pg_restore ...
target> ALTER SUBSCRIPTION ... REFRESH PUBLICATION


--
Angular momentum makes the world go 'round.