base backup client as auxiliary backend process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

base backup client as auxiliary backend process

Peter Eisentraut-6
Setting up a standby instance is still quite complicated.  You need to
run pg_basebackup with all the right options.  You need to make sure
pg_basebackup has the right permissions for the target directories.  The
created instance has to be integrated into the operating system's start
scripts.  There is this slightly awkward business of the --recovery-conf
option and how it interacts with other features.  And you should
probably run pg_basebackup under screen.  And then how do you get
notified when it's done.  And when it's done you have to log back in and
finish up.  Too many steps.

My idea is that the postmaster can launch a base backup worker, wait
till it's done, then proceed with the rest of the startup.  initdb gets
a special option to create a "minimal" data directory with only a few
files, directories, and the usual configuration files.  Then you create
a $PGDATA/basebackup.signal, start the postmaster as normal.  It sees
the signal file, launches an auxiliary process that runs the base
backup, then proceeds with normal startup in standby mode.

This makes a whole bunch of things much nicer: The connection
information for where to get the base backup from comes from
postgresql.conf, so you only need to specify it in one place.
pg_basebackup is completely out of the picture; no need to deal with
command-line options, --recovery-conf, screen, monitoring for
completion, etc.  If something fails, the base backup process can
automatically be restarted (maybe).  Operating system integration is
much easier: You only call initdb and then pg_ctl or postgres, as you
are already doing.  Automated deployment systems don't need to wait for
pg_basebackup to finish: You only call initdb, then start the server,
and then you're done -- waiting for the base backup to finish can be
done by the regular monitoring system.

Attached is a very hackish patch to implement this.  It works like this:

    # (assuming you have a primary already running somewhere)
    initdb -D data2 --minimal
    $EDITOR data2/postgresql.conf  # set primary_conninfo
    pg_ctl -D data2 start

(Curious side note: If you don’t set primary_conninfo in these steps,
then libpq defaults apply, so the default behavior might end up being
that a given instance attempts to replicate from itself.)

It works for basic cases.  It's missing tablespace support, proper
fsyncing, progress reporting, probably more.  Those would be pretty
straightforward I think.  The interesting bit is the delicate ordering
of the postmaster startup:  Normally, the pg_control file is read quite
early, but if starting from a minimal data directory, we need to wait
until the base backup is done.  There is also the question what you do
if the base backup fails halfway through.  Currently you probably need
to delete the whole data directory and start again with initdb.  Better
might be a way to start again and overwrite any existing files, but that
can clearly also be dangerous.  All this needs some careful analysis,
but I think it's doable.

Any thoughts?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v1-0001-Base-backup-client-as-auxiliary-backend-process.patch (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Thomas Munro-5
On Sun, Jun 30, 2019 at 8:05 AM Peter Eisentraut
<[hidden email]> wrote:
> Attached is a very hackish patch to implement this.  It works like this:
>
>     # (assuming you have a primary already running somewhere)
>     initdb -D data2 --minimal
>     $EDITOR data2/postgresql.conf  # set primary_conninfo
>     pg_ctl -D data2 start

+1, very nice.  How about --replica?

FIY Windows doesn't like your patch:

src/backend/postmaster/postmaster.c(1396): warning C4013: 'sleep'
undefined; assuming extern returning int
[C:\projects\postgresql\postgres.vcxproj]

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.45930

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Sergei Kornilov
Hello

>>  Attached is a very hackish patch to implement this. It works like this:
>>
>>      # (assuming you have a primary already running somewhere)
>>      initdb -D data2 --minimal
>>      $EDITOR data2/postgresql.conf # set primary_conninfo
>>      pg_ctl -D data2 start
>
> +1, very nice. How about --replica?

+1

Also not works with -DEXEC_BACKEND for me.

> There is also the question what you do
> if the base backup fails halfway through. Currently you probably need
> to delete the whole data directory and start again with initdb. Better
> might be a way to start again and overwrite any existing files, but that
> can clearly also be dangerous.

I think the need for delete directory and rerun initdb is better than overwrite files.

- we need check major version. Basebackup can works with different versions, but would be useless to copying cluster which we can not run
- basebackup silently overwrite configs (pg_hba.conf, postgresql.conf, postgresql.auto.conf) in $PGDATA. This is ok for pg_basebackup but not for backend process
- I think we need start walreceiver. At best, without interruption during startup replay (if possible)

> XXX Is there a use for
> * switching into (non-standby) recovery here?

I think not.

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Euler Taveira
In reply to this post by Peter Eisentraut-6
Em sáb, 29 de jun de 2019 às 17:05, Peter Eisentraut
<[hidden email]> escreveu:

>
> Setting up a standby instance is still quite complicated.  You need to
> run pg_basebackup with all the right options.  You need to make sure
> Attached is a very hackish patch to implement this.  It works like this:
>
>     # (assuming you have a primary already running somewhere)
>     initdb -D data2 --minimal
>     $EDITOR data2/postgresql.conf  # set primary_conninfo
>     pg_ctl -D data2 start
>
Great! The main complaints about pg_basebackup usage in TB clusters
are: (a) it can't be restarted and (b) it can't be parallelized.
AFAICS your proposal doesn't solve them. It would be nice if it can be
solved in future releases (using rsync or another in-house tool is as
fragile as using pg_basebackup).


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Robert Haas
In reply to this post by Peter Eisentraut-6
On Sat, Jun 29, 2019 at 4:05 PM Peter Eisentraut
<[hidden email]> wrote:
> My idea is that the postmaster can launch a base backup worker, wait
> till it's done, then proceed with the rest of the startup.  initdb gets
> a special option to create a "minimal" data directory with only a few
> files, directories, and the usual configuration files.

Why do we even have to do that much?  Can we remove the need for an
initdb altogether?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Sat, Jun 29, 2019 at 4:05 PM Peter Eisentraut
> <[hidden email]> wrote:
>> My idea is that the postmaster can launch a base backup worker, wait
>> till it's done, then proceed with the rest of the startup.  initdb gets
>> a special option to create a "minimal" data directory with only a few
>> files, directories, and the usual configuration files.

> Why do we even have to do that much?  Can we remove the need for an
> initdb altogether?

Gotta have config files in place already, no?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Robert Haas
On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <[hidden email]> wrote:
> Gotta have config files in place already, no?

Why?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <[hidden email]> wrote:
>> Gotta have config files in place already, no?

> Why?

How's the postmaster to know that it's supposed to run pg_basebackup
rather than start normally?  Where will it get the connection information?
Seem to need configuration data *somewhere*.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Robert Haas
On Thu, Jul 11, 2019 at 4:10 PM Tom Lane <[hidden email]> wrote:
> Robert Haas <[hidden email]> writes:
> > On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <[hidden email]> wrote:
> >> Gotta have config files in place already, no?
>
> > Why?
>
> How's the postmaster to know that it's supposed to run pg_basebackup
> rather than start normally?  Where will it get the connection information?
> Seem to need configuration data *somewhere*.

Maybe just:

./postgres --replica='connstr' -D createme

?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Peter Eisentraut-6
On 2019-07-11 22:20, Robert Haas wrote:

> On Thu, Jul 11, 2019 at 4:10 PM Tom Lane <[hidden email]> wrote:
>> Robert Haas <[hidden email]> writes:
>>> On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <[hidden email]> wrote:
>>>> Gotta have config files in place already, no?
>>
>>> Why?
>>
>> How's the postmaster to know that it's supposed to run pg_basebackup
>> rather than start normally?  Where will it get the connection information?
>> Seem to need configuration data *somewhere*.
>
> Maybe just:
>
> ./postgres --replica='connstr' -D createme

What you are describing is of course theoretically possible, but it
doesn't really fit with how existing tooling normally deals with this,
which is one of the problems I want to address.

initdb has all the knowledge of how to create the data *directory*, how
to set permissions, deal with existing and non-empty directories, how to
set up a separate WAL directory.  Packaged environments might wrap this
further by using the correct OS users, creating the directory first as
root, then changing owner, etc.  This is all logic that we can reuse and
probably don't want to duplicate elsewhere.

Furthermore, we have for the longest time encouraged packagers *not* to
create data directories automatically when a service is started, because
this might store data in places that will be hidden by a later mount.
Keeping this property requires making the initialization of the data
directory a separate step somehow.  That step doesn't have to be called
"initdb", it could be a new "pg_mkdirs", but for the reasons described
above, this would create a fair mount of code duplication and not really
gain anything.

Finally, many installations want to have the configuration files under
control of some centralized configuration management system.  The way
those want to work is usually: (1) create file system structures, (2)
install configuration files from some templates, (3) start service.
This is of course how setting up a primary works.  Having such a system
set up a standby is currently seemingly impossible in an elegant way,
because the order and timing of how things work is all wrong.  My
proposed change would fix this because things would be set up in the
same three-step process.  (As has been pointed out, this would require
that the base backup does not copy over the configuration files from the
remote, which my patch currently doesn't do correctly.)

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Tom Lane-2
Peter Eisentraut <[hidden email]> writes:
> On 2019-07-11 22:20, Robert Haas wrote:
>> On Thu, Jul 11, 2019 at 4:10 PM Tom Lane <[hidden email]> wrote:
>>> How's the postmaster to know that it's supposed to run pg_basebackup
>>> rather than start normally?  Where will it get the connection information?
>>> Seem to need configuration data *somewhere*.
>>
>> Maybe just:
>>
>> ./postgres --replica='connstr' -D createme

> What you are describing is of course theoretically possible, but it
> doesn't really fit with how existing tooling normally deals with this,
> which is one of the problems I want to address.

I don't care for Robert's suggestion for a different reason: it presumes
that all data that can possibly be needed to set up a new replica is
feasible to cram onto the postmaster command line, and always will be.

An immediate counterexample is that's not where you want to be specifying
the password for a replication connection.  But even without that sort of
security issue, this approach won't scale.  It also does not work even a
little bit nicely for tooling in which the postmaster is not supposed to
be started directly by the user.  (Which is to say, all postgres-service
tooling everywhere.)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Kyotaro Horiguchi-4
In reply to this post by Peter Eisentraut-6
Hello.

At Sat, 29 Jun 2019 22:05:22 +0200, Peter Eisentraut <[hidden email]> wrote in <[hidden email]>

> Setting up a standby instance is still quite complicated.  You need to
> run pg_basebackup with all the right options.  You need to make sure
> pg_basebackup has the right permissions for the target directories.  The
> created instance has to be integrated into the operating system's start
> scripts.  There is this slightly awkward business of the --recovery-conf
> option and how it interacts with other features.  And you should
> probably run pg_basebackup under screen.  And then how do you get
> notified when it's done.  And when it's done you have to log back in and
> finish up.  Too many steps.
>
> My idea is that the postmaster can launch a base backup worker, wait
> till it's done, then proceed with the rest of the startup.  initdb gets
> a special option to create a "minimal" data directory with only a few
> files, directories, and the usual configuration files.  Then you create
> a $PGDATA/basebackup.signal, start the postmaster as normal.  It sees
> the signal file, launches an auxiliary process that runs the base
> backup, then proceeds with normal startup in standby mode.
>
> This makes a whole bunch of things much nicer: The connection
> information for where to get the base backup from comes from
> postgresql.conf, so you only need to specify it in one place.
> pg_basebackup is completely out of the picture; no need to deal with
> command-line options, --recovery-conf, screen, monitoring for
> completion, etc.  If something fails, the base backup process can
> automatically be restarted (maybe).  Operating system integration is
> much easier: You only call initdb and then pg_ctl or postgres, as you
> are already doing.  Automated deployment systems don't need to wait for
> pg_basebackup to finish: You only call initdb, then start the server,
> and then you're done -- waiting for the base backup to finish can be
> done by the regular monitoring system.
>
> Attached is a very hackish patch to implement this.  It works like this:
>
>     # (assuming you have a primary already running somewhere)
>     initdb -D data2 --minimal
>     $EDITOR data2/postgresql.conf  # set primary_conninfo
>     pg_ctl -D data2 start

Nice idea!

> (Curious side note: If you don’t set primary_conninfo in these steps,
> then libpq defaults apply, so the default behavior might end up being
> that a given instance attempts to replicate from itself.)

We may be able to have different setting for primary and replica
for other settings if we could have sections in the configuration
file, defining, say, [replica] section gives us more frexibility.
Though it is a bit far from the topic, dedicate command-line
configuration editor that can find and replace specified
parameter would elimite the sublte editing step. It is annoying
that finding specific separator in conf file then trim then add
new contnet.

> It works for basic cases.  It's missing tablespace support, proper
> fsyncing, progress reporting, probably more.  Those would be pretty

While catching up master, connections to replica are once
accepted then result in FATAL error. I now and then receive
inquiries for that. With the new feature, we get FATAL also while
basebackup phase. That can let users fear more frequently.

> straightforward I think.  The interesting bit is the delicate ordering
> of the postmaster startup:  Normally, the pg_control file is read quite
> early, but if starting from a minimal data directory, we need to wait
> until the base backup is done.  There is also the question what you do
> if the base backup fails halfway through.  Currently you probably need
> to delete the whole data directory and start again with initdb.  Better
> might be a way to start again and overwrite any existing files, but that
> can clearly also be dangerous.  All this needs some careful analysis,
> but I think it's doable.
>
> Any thoughts?

Just overwriting won't work since files removed just before
retrying are left alon in replica. I think it should work
similarly to initdb, that is, removing all then retrying.

It's easy if we don't consider reducing startup time. Just do
initdb then start exising postmaster internally. But melding them
together makes room for reducing the startup time. We even could
redirect read-only queries to master while setting up the server.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Peter Eisentraut-6
In reply to this post by Peter Eisentraut-6
> Attached is a very hackish patch to implement this.  It works like this:
>
>     # (assuming you have a primary already running somewhere)
>     initdb -D data2 --replica
>     $EDITOR data2/postgresql.conf  # set primary_conninfo
>     pg_ctl -D data2 start

Attached is an updated patch for this.  I have changed the initdb option
name per suggestion.  The WAL receiver is now started concurrently with
the base backup.  There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore.  There is a simple test
suite.  Tablespace support is still missing, but it would be
straightforward.

It's still all to be considered experimental, but it's taking shape and
working pretty well.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-0001-Base-backup-client-as-auxiliary-backend-process.patch (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Alvaro Herrera-9
On 2019-Aug-30, Peter Eisentraut wrote:

> Attached is an updated patch for this.  I have changed the initdb option
> name per suggestion.  The WAL receiver is now started concurrently with
> the base backup.  There is progress reporting (ps display), fsyncing.
> Configuration files are not copied anymore.  There is a simple test
> suite.  Tablespace support is still missing, but it would be
> straightforward.

This is an amazing feature.  How come we don't have people cramming to
review this?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Kyotaro Horiguchi-4
Hello, thanks for pinging.

At Wed, 11 Sep 2019 19:15:24 -0300, Alvaro Herrera <[hidden email]> wrote in <[hidden email]>

> On 2019-Aug-30, Peter Eisentraut wrote:
>
> > Attached is an updated patch for this.  I have changed the initdb option
> > name per suggestion.  The WAL receiver is now started concurrently with
> > the base backup.  There is progress reporting (ps display), fsyncing.
> > Configuration files are not copied anymore.  There is a simple test
> > suite.  Tablespace support is still missing, but it would be
> > straightforward.
>
> This is an amazing feature.  How come we don't have people cramming to
> review this?

I love it, too. As for me, the reason for hesitating review this
is the patch is said to be experimental. That means 'the details
don't matter, let's discuss it's design/outline.'. So I wanted to
see what the past reviewers comment on the revised shape before I
would stir up the discussion by maybe-pointless comment. (Then
forgotten..)

I'll re-look on this.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Michael Paquier-2
In reply to this post by Peter Eisentraut-6
On Fri, Aug 30, 2019 at 09:10:10PM +0200, Peter Eisentraut wrote:

> > Attached is a very hackish patch to implement this.  It works like this:
> >
> >     # (assuming you have a primary already running somewhere)
> >     initdb -D data2 --replica
> >     $EDITOR data2/postgresql.conf  # set primary_conninfo
> >     pg_ctl -D data2 start
>
> Attached is an updated patch for this.  I have changed the initdb option
> name per suggestion.  The WAL receiver is now started concurrently with
> the base backup.  There is progress reporting (ps display), fsyncing.
> Configuration files are not copied anymore.  There is a simple test
> suite.  Tablespace support is still missing, but it would be
> straightforward.
I find this idea and this spec neat.

-    * Verify XLOG status looks valid.
+    * Check that contents look valid.
     */
-   if (ControlFile->state < DB_SHUTDOWNED ||
-       ControlFile->state > DB_IN_PRODUCTION ||
-       !XRecOffIsValid(ControlFile->checkPoint))
+   if (!XRecOffIsValid(ControlFile->checkPoint))
             ereport(FATAL,
Doesn't seem like a good idea to me to remove this sanity check for
normal deployments, but actually you moved that down in StartupXLOG().
It seems to me tha this is unrelated and could be a separate patch so
as the errors produced are more verbose.  I think that we should also
change that code to use a switch/case on ControlFile->state.

The current defaults of pg_basebackup have been thought so as the
backups taken have a good stability and so as monitoring is eased
thanks to --wal-method=stream and the use of replication slots.
Shouldn't the use of a least a temporary replication slot be mandatory
for the stability of the copy?  It seems to me that there is a good
argument for having a second process which streams WAL on top of the
main backup process, and just use a WAL receiver for that.

One problem which is not tackled here is what to do for the tablespace
map.  pg_basebackup has its own specific trick for that, and with that
new feature we may want something equivalent?  Not something to
consider as a first stage of course.

  */
-static void
+void
 WriteControlFile(void)
[...]
-static void
+void
 ReadControlFile(void)
[...]
If you begin to publish those routines, it seems to me that there
could be more consolidation with controldata_utils.c which includes
now a routine to update a control file.

+#ifndef FRONTEND
+extern void InitControlFile(uint64 sysidentifier);
+extern void WriteControlFile(void);
+extern void ReadControlFile(void);
+#endif
It would be nice to avoid that.

-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
Separate patch here?

+   if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+   {
+       int         fd;
+
+       fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR |
PG_BINARY,
+                              S_IRUSR | S_IWUSR);
+       if (fd >= 0)
+       {
+           (void) pg_fsync(fd);
+           close(fd);
+       }
+       basebackup_signal_file_found = true;
+   }
I would put that in a different routine.

+       /*
+        * Wait until done.  Start WAL receiver in the meantime, once
base
+        * backup has received the starting position.
+        */
+       while (BaseBackupPID != 0)
+       {
+           PG_SETMASK(&UnBlockSig);
+           pg_usleep(1000000L);
+           PG_SETMASK(&BlockSig);

+   primary_sysid = strtoull(walrcv_identify_system(wrconn,
&primaryTLI), NULL, 10);
No more strtol with base 10 stuff please :)
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Peter Eisentraut-6
Updated patch attached.

On 2019-09-18 10:31, Michael Paquier wrote:

> -    * Verify XLOG status looks valid.
> +    * Check that contents look valid.
>       */
> -   if (ControlFile->state < DB_SHUTDOWNED ||
> -       ControlFile->state > DB_IN_PRODUCTION ||
> -       !XRecOffIsValid(ControlFile->checkPoint))
> +   if (!XRecOffIsValid(ControlFile->checkPoint))
>               ereport(FATAL,
> Doesn't seem like a good idea to me to remove this sanity check for
> normal deployments, but actually you moved that down in StartupXLOG().
> It seems to me tha this is unrelated and could be a separate patch so
> as the errors produced are more verbose.  I think that we should also
> change that code to use a switch/case on ControlFile->state.
Done.  Yes, this was really a change made to get more precise error
messaged during debugging.  It could be committed separately.

> The current defaults of pg_basebackup have been thought so as the
> backups taken have a good stability and so as monitoring is eased
> thanks to --wal-method=stream and the use of replication slots.
> Shouldn't the use of a least a temporary replication slot be mandatory
> for the stability of the copy?  It seems to me that there is a good
> argument for having a second process which streams WAL on top of the
> main backup process, and just use a WAL receiver for that.

Is this something that the walreceiver should be doing independent of
this patch?

> One problem which is not tackled here is what to do for the tablespace
> map.  pg_basebackup has its own specific trick for that, and with that
> new feature we may want something equivalent?  Not something to
> consider as a first stage of course.

The updated has support for tablespaces without mapping.  I'm thinking
about putting the mapping specification into a GUC list somehow.
Shouldn't be too hard.

>    */
> -static void
> +void
>   WriteControlFile(void)
> [...]
> -static void
> +void
>   ReadControlFile(void)
> [...]
> If you begin to publish those routines, it seems to me that there
> could be more consolidation with controldata_utils.c which includes
> now a routine to update a control file.
Hmm, maybe long-term, but it seems too much dangerous surgery for this
patch.

> +#ifndef FRONTEND
> +extern void InitControlFile(uint64 sysidentifier);
> +extern void WriteControlFile(void);
> +extern void ReadControlFile(void);
> +#endif
> It would be nice to avoid that.

Fixed by renaming a function in pg_resetwal.c.

> +       /*
> +        * Wait until done.  Start WAL receiver in the meantime, once
> base
> +        * backup has received the starting position.
> +        */
> +       while (BaseBackupPID != 0)
> +       {
> +           PG_SETMASK(&UnBlockSig);
> +           pg_usleep(1000000L);
> +           PG_SETMASK(&BlockSig);
>
> +   primary_sysid = strtoull(walrcv_identify_system(wrconn,
> &primaryTLI), NULL, 10);
> No more strtol with base 10 stuff please :)
Hmm, why not?  What's the replacement?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v3-0001-Base-backup-client-as-auxiliary-backend-process.patch (54K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Michael Paquier-2
On Mon, Oct 28, 2019 at 09:30:52AM +0100, Peter Eisentraut wrote:

> On 2019-09-18 10:31, Michael Paquier wrote:
>> -    * Verify XLOG status looks valid.
>> +    * Check that contents look valid.
>>       */
>> -   if (ControlFile->state < DB_SHUTDOWNED ||
>> -       ControlFile->state > DB_IN_PRODUCTION ||
>> -       !XRecOffIsValid(ControlFile->checkPoint))
>> +   if (!XRecOffIsValid(ControlFile->checkPoint))
>>               ereport(FATAL,
>> Doesn't seem like a good idea to me to remove this sanity check for
>> normal deployments, but actually you moved that down in StartupXLOG().
>> It seems to me tha this is unrelated and could be a separate patch so
>> as the errors produced are more verbose.  I think that we should also
>> change that code to use a switch/case on ControlFile->state.
>
> Done.  Yes, this was really a change made to get more precise error messaged
> during debugging.  It could be committed separately.
If you wish to do so now, that's fine by me.

>> The current defaults of pg_basebackup have been thought so as the
>> backups taken have a good stability and so as monitoring is eased
>> thanks to --wal-method=stream and the use of replication slots.
>> Shouldn't the use of a least a temporary replication slot be mandatory
>> for the stability of the copy?  It seems to me that there is a good
>> argument for having a second process which streams WAL on top of the
>> main backup process, and just use a WAL receiver for that.
>
> Is this something that the walreceiver should be doing independent of this
> patch?
There could be an argument for switching a WAL receiver to use a
temporary replication slot by default.  Still, it seems to me that
this backup solution suffers from the same set of problems we have
spent years to fix with pg_basebackup with missing WAL files caused by
concurrent checkpoints removing things needed while the copy of the
main data folder and other tablespaces happens.

>> One problem which is not tackled here is what to do for the tablespace
>> map.  pg_basebackup has its own specific trick for that, and with that
>> new feature we may want something equivalent?  Not something to
>> consider as a first stage of course.
>
> The updated has support for tablespaces without mapping.  I'm thinking about
> putting the mapping specification into a GUC list somehow. Shouldn't be too
> hard.

That may become ugly if there are many tablespaces to take care of.
Another idea I can come up with would be to pass the new mapping to
initdb, still this requires an extra intermediate step to store the
new map, and then compare it with the mapping received at BASE_BACKUP
time.  But perhaps you are looking for an experience different than
pg_basebackup.  The first version of the patch does not actually
require that anyway..

>> No more strtol with base 10 stuff please :)
>
> Hmm, why not?  What's the replacement?

I was referring to this patch:
https://commitfest.postgresql.org/25/2272/
It happens that all our calls of strtol in core use a base of 10.  But
please just ignore this part.

ReceiveAndUnpackTarFile() is in both libpqwalreceiver.c and
pg_basebackup.c.  It would be nice to refactor that.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Peter Eisentraut-6
On 2019-11-07 05:16, Michael Paquier wrote:

>>> The current defaults of pg_basebackup have been thought so as the
>>> backups taken have a good stability and so as monitoring is eased
>>> thanks to --wal-method=stream and the use of replication slots.
>>> Shouldn't the use of a least a temporary replication slot be mandatory
>>> for the stability of the copy?  It seems to me that there is a good
>>> argument for having a second process which streams WAL on top of the
>>> main backup process, and just use a WAL receiver for that.
>> Is this something that the walreceiver should be doing independent of this
>> patch?
> There could be an argument for switching a WAL receiver to use a
> temporary replication slot by default.  Still, it seems to me that
> this backup solution suffers from the same set of problems we have
> spent years to fix with pg_basebackup with missing WAL files caused by
> concurrent checkpoints removing things needed while the copy of the
> main data folder and other tablespaces happens.
I looked into this.  It seems trivial to make walsender create and use a
temporary replication slot by default if no permanent replication slot
is specified.  This is basically the logic that pg_basebackup has but
done server-side.  See attached patch for a demonstration.  Any reason
not to do that?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-walsender-uses-a-temporary-replication-slot-by-defau.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: base backup client as auxiliary backend process

Sergei Kornilov
Hello

Could you rebase patch please? I have errors during patch apply. CFbot checks latest demonstration patch.

> I looked into this. It seems trivial to make walsender create and use a
> temporary replication slot by default if no permanent replication slot
> is specified. This is basically the logic that pg_basebackup has but
> done server-side. See attached patch for a demonstration. Any reason
> not to do that?

Seems this would break pg_basebackup --no-slot option?

> +          Do not copy configuration files, that is, files that end in
> +          <filename>.conf</filename>.

possible we need ignore *.signal files too?

> +/*
> + * XXX copied from pg_basebackup.c
> + */
> +
> +unsigned long long totaldone;
> +unsigned long long totalsize_kb;
> +int tablespacenum;
> +int tablespacecount;

Variable declaration in the middle of file is correct for coding style? Not a problem for me, I just want to clarify.
Should not be declared "static"?
Also how about tablespacedone instead of tablespacenum?

> The updated has support for tablespaces without mapping.  I'm thinking
> about putting the mapping specification into a GUC list somehow.
> Shouldn't be too hard.

I think we can leave tablespace mapping for pg_basebackup only. More powerful tool for less common scenarios. Or for another future patch.

regards, Sergei


12