WIP/PoC for parallel backup

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
120 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Robert Haas
On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <[hidden email]> wrote:
The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000. 
The errors generated for the workers starting from backup worke=1017.

It wasn't the fact that you set max_wal_senders to 2000. It was the fact that you specified 1990 parallel workers. By so doing, you overloaded the machine, which is why everything failed. That's to be expected.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Kashif Zeeshan-2


On Thu, Apr 2, 2020 at 4:48 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <[hidden email]> wrote:
The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000. 
The errors generated for the workers starting from backup worke=1017.

It wasn't the fact that you set max_wal_senders to 2000. It was the fact that you specified 1990 parallel workers. By so doing, you overloaded the machine, which is why everything failed. That's to be expected.

Thanks alot Robert,
In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.
 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Robert Haas
On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<[hidden email]> wrote:
> Thanks alot Robert,
> In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Kashif Zeeshan-2


On Thu, Apr 2, 2020 at 6:23 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<[hidden email]> wrote:
> Thanks alot Robert,
> In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
The cleanup is done in the cases I have seen so far with base pg_basebackup functionality (not including the parallel backup feature) with the message "pg_basebackup: removing contents of data directory"
A similar case was also fixed for parallel backup reported by Rajkumar where the contents of the backup folder were not cleaned up after the error.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Robert Haas
On Thu, Apr 2, 2020 at 9:46 AM Kashif Zeeshan <[hidden email]> wrote:
Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
The cleanup is done in the cases I have seen so far with base pg_basebackup functionality (not including the parallel backup feature) with the message "pg_basebackup: removing contents of data directory"
A similar case was also fixed for parallel backup reported by Rajkumar where the contents of the backup folder were not cleaned up after the error.

What I'm saying is that it's unclear whether there's a bug here or whether it just failed because of the very extreme test scenario you created. Spawning >1000 processes on a small machine can easily make a lot of things fail. 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Asif Rehman
In reply to this post by Robert Haas


On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <[hidden email]> wrote:
On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <[hidden email]> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.

I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?

Okay will rename and will modify the function to send a single file as well.


> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.

Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.


Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
in progress. So if the backup is a large one, early error detection would be much beneficial.
This is the current behavior of non-parallel backup as well.
 

> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.

I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.

Okay, then I will add the shared state. And since we are adding the shared state, we can use
that for throttling, progress-reporting and standby early error checking.


> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.

I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.

There are two possible options:

(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.


This BackupID should be given back as a response to start backup command. All client workers

must append this ID to all parallel backup replication commands. So that we can use this identifier

to search for that particular backup. Does that sound good?
 

 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Robert Haas
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Asif Rehman


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Kashif Zeeshan-2
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact               postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal      postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone


On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <[hidden email]> wrote:


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Jeevan Chalke
Asif,

After recent backup manifest addition, patches needed to rebase and
reconsideration of a few things like making sure that parallel backup creates
a manifest file correctly or not etc.

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Kashif Zeeshan-2
In reply to this post by Kashif Zeeshan-2


On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <[hidden email]> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact               postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal      postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone


Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks



On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <[hidden email]> wrote:


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Asif Rehman
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    - This will generate a unique backupid using pg_strong_random(16) and hex-encoded
      it. which is then returned as the result set.
    - It will also create a shared state and add it to the hashtable. The hash table size is set
      to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
      sufficient initial size. max_wal_senders is not used, because it can be set to quite a 
      large values.

JOIN_BACKUP 'backup_id'
    - finds 'backup_id' in hashtable and attaches it to server process.


SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    - renamed SEND_FILES to SEND_FILE
    - removed START_WAL_LOCATION from this because 'startptr' is now accessible through
      shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.


On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <[hidden email]> wrote:


On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <[hidden email]> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact               postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal      postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone


Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks



On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <[hidden email]> wrote:


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company



--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


parallel_backup_v11.zip (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Jeevan Chalke


On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <[hidden email]> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    - This will generate a unique backupid using pg_strong_random(16) and hex-encoded
      it. which is then returned as the result set.
    - It will also create a shared state and add it to the hashtable. The hash table size is set
      to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
      sufficient initial size. max_wal_senders is not used, because it can be set to quite a 
      large values.

JOIN_BACKUP 'backup_id'
    - finds 'backup_id' in hashtable and attaches it to server process.


SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    - renamed SEND_FILES to SEND_FILE
    - removed START_WAL_LOCATION from this because 'startptr' is now accessible through
      shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

So, are you working on to make it work? I don't think a parallel backup feature should be creating a backup with no manifest.





--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Asif Rehman


On Tue, Apr 7, 2020 at 10:03 PM Jeevan Chalke <[hidden email]> wrote:


On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <[hidden email]> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    - This will generate a unique backupid using pg_strong_random(16) and hex-encoded
      it. which is then returned as the result set.
    - It will also create a shared state and add it to the hashtable. The hash table size is set
      to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
      sufficient initial size. max_wal_senders is not used, because it can be set to quite a 
      large values.

JOIN_BACKUP 'backup_id'
    - finds 'backup_id' in hashtable and attaches it to server process.


SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    - renamed SEND_FILES to SEND_FILE
    - removed START_WAL_LOCATION from this because 'startptr' is now accessible through
      shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

So, are you working on to make it work? I don't think a parallel backup feature should be creating a backup with no manifest.

I will, however parallel backup is already quite a large patch. So I think we should first
agree on the current work before adding a backup manifest and progress-reporting support.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Robert Haas
In reply to this post by Asif Rehman
On Fri, Apr 3, 2020 at 4:46 AM Asif Rehman <[hidden email]> wrote:
> Non-parallel backup already does the early error checking. I only intended
> to make parallel behave the same as non-parallel here. So, I agree with
> you that the behavior of parallel backup should be consistent with the
> non-parallel one.  Please see the code snippet below from
> basebackup.c:sendDir()

Oh, OK. So then we need to preserve that behavior, I think. Sorry, I
didn't realize the check was happening there.

> I am thinking of the following struct for shared state:
>> typedef struct
>> {
>> char backupid[NAMEDATALEN];
>> XLogRecPtr startptr;
>> slock_t lock;
>> int64 throttling_counter;
>> bool backup_started_in_recovery;
>> } BackupSharedState;

Looks broadly reasonable. Can anything other than lock and
throttling_counter change while it's running? If not, how about using
pg_atomic_uint64 for the throttling counter, and dropping lock? If
that gets too complicated it's OK to keep it as you have it.

> The shared state structure entries would be maintained by a shared hash table.
> There will be one structure per parallel backup. Since a single parallel backup
> can engage more than one wal sender, so I think max_wal_senders might be a little
> too much; perhaps max_wal_senders/2 since there will be at least 2 connections
> per parallel backup? Alternatively, we can set a new GUC that defines the maximum
> number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
> perhaps, or we can make it user-configurable.

I don't think you need a hash table. Linear search should be fine. And
I see no point in dividing max_wal_senders by 2 either. The default is
*10*. You'd need to increase that by more than an order of magnitude
for a hash table to be needed, and more than that for the shared
memory consumption to matter.

> The key would be “backupid=hex_encode(pg_random_strong(16))”

wfm

> Progress Reporting:
> Although I think we should add progress-reporting for parallel backup as a
> separate patch. The relevant entries for progress-reporting such as
> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
> as well.

I mean, you can separate it for review if you wish, but it would need
to be committed together.

> START_BACKUP [LABEL '<label>'] [FAST]
>   - returns startptr, tli, backup_label, unique_backup_id

OK. But what if I want to use this interface for a non-parallel backup?

> STOP_BACKUP [NOWAIT]
>   - returns startptr, tli, backup_label

I don't think it makes sense for STOP_BACKUP to return the same values
that START_BACKUP already returned. Presumably STOP_BACKUP should
return the end LSN. It could also return the backup label and
tablespace map files, as the corresponding SQL function does, unless
there's some better way of returning those in this case.

> JOIN_BACKUP ‘unique_backup_id’
>   - attaches a shared state identified by ‘unique_backup_id’ to a backend process.

OK.

> LIST_TABLESPACES [PROGRESS]

OK.

> LIST_FILES [TABLESPACE]

OK.

> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

Why not just LIST_WAL_FILES 'startptr' 'endptr'?

> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Why parens? That seems useless.

Maybe it would make sense to have SEND_DATA_FILE 'datafilename' and
SEND_WAL_FILE 'walfilename' as separate commands. But not sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Robert Haas
In reply to this post by Asif Rehman
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <[hidden email]> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

rajkumar.raghuwanshi
Hi Asif,

Thanks for new patches.

Patches need to be rebased on head. Getting a failure while applying the 0003 patch.
edb@localhost postgresql]$ git apply  v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply

I have applied v11 patches on commit - 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.

pg_basebackup has a new option "--no-estimate-size",  pg_basebackup crashes when using this option.

[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)

--stacktrace
[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298  while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1  0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2  0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at pg_basebackup.c:2526

Thanks & Regards,
Rajkumar Raghuwanshi


On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <[hidden email]> wrote:
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <[hidden email]> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Asif Rehman

rebased and updated to current master (d025cf88ba). v12 is attahced.

Also, changed the grammar for LIST_WAL_FILES and SEND_FILE to:

- LIST_WAL_FILES 'startptr' 'endptr'
- SEND_FILE 'FILE'  [NOVERIFY_CHECKSUMS]


On Wed, Apr 8, 2020 at 10:48 AM Rajkumar Raghuwanshi <[hidden email]> wrote:
Hi Asif,

Thanks for new patches.

Patches need to be rebased on head. Getting a failure while applying the 0003 patch.
edb@localhost postgresql]$ git apply  v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply

I have applied v11 patches on commit - 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.

pg_basebackup has a new option "--no-estimate-size",  pg_basebackup crashes when using this option.

[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)

--stacktrace
[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298  while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1  0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2  0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at pg_basebackup.c:2526

Thanks & Regards,
Rajkumar Raghuwanshi


On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <[hidden email]> wrote:
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <[hidden email]> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


parallel_backup_v12.zip (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Kashif Zeeshan-2
In reply to this post by Asif Rehman


On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <[hidden email]> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    - This will generate a unique backupid using pg_strong_random(16) and hex-encoded
      it. which is then returned as the result set.
    - It will also create a shared state and add it to the hashtable. The hash table size is set
      to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
      sufficient initial size. max_wal_senders is not used, because it can be set to quite a 
      large values.

JOIN_BACKUP 'backup_id'
    - finds 'backup_id' in hashtable and attaches it to server process.


SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    - renamed SEND_FILES to SEND_FILE
    - removed START_WAL_LOCATION from this because 'startptr' is now accessible through
      shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.


1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.


[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D  /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$


[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)




b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit  Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL:  number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)




2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$



[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base         pg_hba.conf    pg_logical    pg_notify    pg_serial     pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$




Thanks
Kashif Zeeshan


On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <[hidden email]> wrote:


On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <[hidden email]> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact               postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal      postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone


Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks



On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <[hidden email]> wrote:


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company



--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP/PoC for parallel backup

Asif Rehman


On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <[hidden email]> wrote:


On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <[hidden email]> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    - This will generate a unique backupid using pg_strong_random(16) and hex-encoded
      it. which is then returned as the result set.
    - It will also create a shared state and add it to the hashtable. The hash table size is set
      to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
      sufficient initial size. max_wal_senders is not used, because it can be set to quite a 
      large values.

JOIN_BACKUP 'backup_id'
    - finds 'backup_id' in hashtable and attaches it to server process.


SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    - renamed SEND_FILES to SEND_FILE
    - removed START_WAL_LOCATION from this because 'startptr' is now accessible through
      shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.


1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.


[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D  /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$


[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)




b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit  Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL:  number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)




2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$



[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base         pg_hba.conf    pg_logical    pg_notify    pg_serial     pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$




Thanks
Kashif Zeeshan


On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <[hidden email]> wrote:


On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <[hidden email]> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact               postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal      postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone


Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks



On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <[hidden email]> wrote:


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <[hidden email]> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <[hidden email]> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]




Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process remains
in an idle state. So after each increment, the worker process will signal master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This is
because multiple workers may be copying files from different tablespaces.


- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files
that are being transferred by the work. Once all backup files are transferred, the
workers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t add the
header, nor the WAL information in their manifest. These two will be added by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into a single file
which is then sent back to the client. 

The manifest file is created when the following command is received:
    BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied all the
$PGDATA files including WAL files.



--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


parallel_backup_v13.zip (78K) Download Attachment
123456