backup manifests

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
207 messages Options
1234 ... 11
Reply | Threaded
Open this post in threaded view
|

backup manifests

Robert Haas
In the lengthy thread on block-level incremental backup,[1] both
Vignesh C[2] and Stephen Frost[3] have suggested storing a manifest as
part of each backup, somethig that could be useful not only for
incremental backups but also for full backups. I initially didn't
think this was necessary,[4] but some of my colleagues figured out
that my design was broken, because my proposal was to detect new
blocks just using LSN, and that ignores the fact that CREATE DATABASE
and ALTER TABLE .. SET TABLESPACE do physical copies without bumping
page LSNs, which I knew but somehow forgot about.  Fortunately, some
of my colleagues realized my mistake in testing.[5] Because of this
problem, for an LSN-based approach to work, we'll need to send not
only an LSN, but also a list of files (and file sizes) that exist in
the previous full backup; so, some kind of backup manifest now seems
like a good idea to me.[6] That whole approach might still be dead on
arrival if it's possible to add new blocks with old LSNs to existing
files,[7] but there seems to be room to hope that there are no such
cases.[8]

So, let's suppose we invent a backup manifest. What should it contain?
I imagine that it would consist of a list of files, and the lengths of
those files, and a checksum for each file. I think you should have a
choice of what kind of checksums to use, because algorithms that used
to seem like good choices (e.g. MD5) no longer do; this trend can
probably be expected to continue. Even if we initially support only
one kind of checksum -- presumably SHA-something since we have code
for that already for SCRAM -- I think that it would also be a good
idea to allow for future changes. And maybe it's best to just allow a
choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
gate, so that we can avoid bikeshedding over which one is secure
enough. I guess we'll still have to argue about the default. I also
think that it should be possible to build a manifest with no
checksums, so that one need not pay the overhead of computing
checksums if one does not wish. Of course, such a manifest is of much
less utility for checking backup integrity, but you can still check
that you've got the right files, which is noticeably better than
nothing.  The manifest should probably also contain a checksum of its
own contents so that the integrity of the manifest itself can be
verified. And maybe a few other bits of metadata, but I'm not sure
exactly what.  Ideas?

Once we invent the concept of a backup manifest, what do we need to do
with them? I think we'd want three things initially:

(1) When taking a backup, have the option (perhaps enabled by default)
to include a backup manifest.
(2) Given an existing backup that has not got a manifest, construct one.
(3) Cross-check a manifest against a backup and complain about extra
files, missing files, size differences, or checksum mismatches.

One thing I'm not quite sure about is where to store the backup
manifest. If you take a base backup in tar format, you get base.tar,
pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
Does the backup manifest go into base.tar? Get written into a separate
file outside of any tar archive? Something else? And what about a
plain-format backup? I suppose then we should just write the manifest
into the top level of the main data directory, but perhaps someone has
another idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://www.postgresql.org/message-id/flat/CA%2BTgmoYxQLL%3DmVyN90HZgH0X_EUrw%2BaZ0xsXJk7XV3-3LygTvA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm310fUZ72nM2n%3DcD0eSHKRAoJPuCyvvR0dhTEZ9Oytyzg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/20190916143817.GA6962%40tamriel.snowman.net
[4] https://www.postgresql.org/message-id/CA%2BTgmoaj-zw4Mou4YBcJSkHmQM%2BJA-dAVJnRP8zSASP1S4ZVgw%40mail.gmail.com
[5] https://www.postgresql.org/message-id/CAM2%2B6%3DXfJX%3DKXvpTgDvgd1rQjya_Am27j4UvJtL3nA%2BJMCTGVQ%40mail.gmail.com
[6] https://www.postgresql.org/message-id/CA%2BTgmoYg9i8TZhyjf8MqCyU8unUVuW%2B03FeBF1LGDu_-eOONag%40mail.gmail.com
[7] https://www.postgresql.org/message-id/CA%2BTgmoYT9xODgEB6y6j93hFHqobVcdiRCRCp0dHh%2BfFzZALn%3Dw%40mail.gmail.com
and nearby messages
[8] https://www.postgresql.org/message-id/20190916173933.GE6962%40tamriel.snowman.net


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

David Steele
Hi Robert,

On 9/18/19 1:48 PM, Robert Haas wrote:
> That whole approach might still be dead on
> arrival if it's possible to add new blocks with old LSNs to existing
> files,[7] but there seems to be room to hope that there are no such
> cases.[8]

I sure hope there are no such cases, but we should be open to the idea
just in case.

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file.

These are essential.

Also consider adding the timestamp.  You have justifiable concerns about
using timestamps for deltas and I get that.  However, there are a number
of methods that can be employed to make it *much* safer.  I won't go
into that here since it is an entire thread in itself.  Suffice to say
we can detect many anomalies in the timestamps and require a checksum
backup when we see them.  I'm really interested in scanning the WAL for
changed files but that method is very complex and getting it right might
be harder than ensuring FS checksums are reliable.  Still worth trying,
though, since the benefits are enormous.  We are planning to use
timestamp + size + wal data to do incrementals if we get there.

Consider adding a reference to each file that specifies where the file
can be found in if it is not in this backup.  As I understand the
pg_basebackup proposal, it would only be implementing differential
backups, i.e. an incremental that is *only* based on the last full
backup.  So, the reference can be inferred in this case.  However, if
the user selects the wrong full backup on restore, and we have labeled
each backup, then a differential restore with references against the
wrong full backup would result in a hard error rather than corruption.

> I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default.

Based on my original calculations (which sadly I don't have anymore),
the combination of SHA1, size, and file name is *extremely* unlikely to
generate a collision.  As in, unlikely to happen before the end of the
universe kind of unlikely.  Though, I guess it depends on your
expectations for the lifetime of the universe.

These checksums don't have to be cryptographically secure, in the sense
that you could infer the plaintext from the checksum.  They just need to
have a suitably low collision rate.  These days I would choose something
with more bits because the computation time is similar, though the
larger size requires more storage.

> I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish.

Our benchmarks have indicated that checksums only account for about 1%
of total cpu time when gzip -6 compression is used.  Without compression
the percentage may be higher of course, but in that case we find network
latency is the primary bottleneck.

For S3 backups we do a SHA1 hash for our manifest, a SHA256 hash for
authv4 and a good-old-fashioned MD5 checksum for each upload part.  This
is barely noticeable when compression is enabled.

> Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing.  

Absolutely -- and yet.  There was a time when we made checksums optional
but eventually gave up on that once we profiled and realized how low the
cost was vs. the benefit.

> The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified.

This is a good idea.  Amazingly we've never seen a manifest checksum
error in the field but it's only a matter of time.

And maybe a few other bits of metadata, but I'm not sure
> exactly what.  Ideas?

A backup label for sure.  You can also use this as the directory/tar
name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
incrementals and have logic to prevent two backups from having the same
label.  This is unlikely outside of testing but still a good idea.

Knowing the start/stop time of the backup is useful in all kinds of
ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
also good.  I know this is also in backup_label but having it all in one
place is nice.

We include the version/sysid of the cluster to avoid mixups.  It's a
great extra check on top of references to be sure everything is kosher.

A manifest version is good in case we change the format later.  I'd
recommend JSON for the format since it is so ubiquitous and easily
handles escaping which can be gotchas in a home-grown format.  We
currently have a format that is a combination of Windows INI and JSON
(for human-readability in theory) and we have become painfully aware of
escaping issues.  Really, why would you drop files with '=' in their
name in PGDATA?  And yet it happens.

> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
>
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.

Manifests are cheap to builds so I wouldn't make it an option.

> (2) Given an existing backup that has not got a manifest, construct one.

Might be too late to be trusted and we'd have to write extra code for
it.  I'd leave this for a project down the road, if at all.

> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.

Verification is the best part of the manifest.  Plus, you can do
verification pretty cheaply on restore.  We also restore pg_control last
so clusters that have a restore error won't start.

> One thing I'm not quite sure about is where to store the backup
> manifest. If you take a base backup in tar format, you get base.tar,
> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> Does the backup manifest go into base.tar? Get written into a separate
> file outside of any tar archive? Something else? And what about a
> plain-format backup? I suppose then we should just write the manifest
> into the top level of the main data directory, but perhaps someone has
> another idea.

We do:

[backup_label]/
    backup.manifest
    pg_data/
    pg_tblspc/

In general, having the manifest easily accessible is ideal.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
On Wed, Sep 18, 2019 at 9:11 PM David Steele <[hidden email]> wrote:
> Also consider adding the timestamp.

Sounds reasonable, even if only for the benefit of humans who might
look at the file.  We can decide later whether to use it for anything
else (and third-party tools could make different decisions from core).
I assume we're talking about file mtime here, not file ctime or file
atime or the time the manifest was generated, but let me know if I'm
wrong.

> Consider adding a reference to each file that specifies where the file
> can be found in if it is not in this backup.  As I understand the
> pg_basebackup proposal, it would only be implementing differential
> backups, i.e. an incremental that is *only* based on the last full
> backup.  So, the reference can be inferred in this case.  However, if
> the user selects the wrong full backup on restore, and we have labeled
> each backup, then a differential restore with references against the
> wrong full backup would result in a hard error rather than corruption.

I intend that we should be able to support incremental backups based
either on a previous full backup or based on a previous incremental
backup. I am not aware of a technical reason why we need to identify
the specific backup that must be used. If incremental backup B is
taken based on a pre-existing backup A, then I think that B can be
restored using either A or *any other backup taken after A and before
B*. In the normal case, there probably wouldn't be any such backup,
but AFAICS the start-LSNs are a sufficient cross-check that the chosen
base backup is legal.

> Based on my original calculations (which sadly I don't have anymore),
> the combination of SHA1, size, and file name is *extremely* unlikely to
> generate a collision.  As in, unlikely to happen before the end of the
> universe kind of unlikely.  Though, I guess it depends on your
> expectations for the lifetime of the universe.

Somebody once said that we should be prepared for it to end at an any
time, or not, and that the time at which it actually was due to end
would not be disclosed in advance. This is probably good life advice
which I ought to take more frequently than I do, but I think we can
finesse the issue for purposes of this discussion. What I'd say is: if
the probability of getting a collision is demonstrably many orders of
magnitude less than the probability of the disk writing the block
incorrectly, then I think we're probably reasonably OK. Somebody might
differ, which is perhaps a mild point in favor of LSN-based
approaches, but as a practical matter, if a bad block is a billion
times more likely to be the result of a disk error than a checksum
mismatch, then it's a negligible risk.

> And maybe a few other bits of metadata, but I'm not sure
> > exactly what.  Ideas?
>
> A backup label for sure.  You can also use this as the directory/tar
> name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
> full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
> incrementals and have logic to prevent two backups from having the same
> label.  This is unlikely outside of testing but still a good idea.
>
> Knowing the start/stop time of the backup is useful in all kinds of
> ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
> also good.  I know this is also in backup_label but having it all in one
> place is nice.
>
> We include the version/sysid of the cluster to avoid mixups.  It's a
> great extra check on top of references to be sure everything is kosher.

I don't think it's a good idea to duplicate the information that's
already in the backup_label. Storing two copies of the same
information is just an invitation to having to worry about what
happens if they don't agree.

> A manifest version is good in case we change the format later.

Yeah.

> I'd
> recommend JSON for the format since it is so ubiquitous and easily
> handles escaping which can be gotchas in a home-grown format.  We
> currently have a format that is a combination of Windows INI and JSON
> (for human-readability in theory) and we have become painfully aware of
> escaping issues.  Really, why would you drop files with '=' in their
> name in PGDATA?  And yet it happens.

I am not crazy about JSON because it requires that I get a json parser
into src/common, which I could do, but given the possibly-imminent end
of the universe, I'm not sure it's the greatest use of time. You're
right that if we pick an ad-hoc format, we've got to worry about
escaping, which isn't lovely.

> > (1) When taking a backup, have the option (perhaps enabled by default)
> > to include a backup manifest.
>
> Manifests are cheap to builds so I wouldn't make it an option.

Huh. That's an interesting idea. Thanks.

> > (3) Cross-check a manifest against a backup and complain about extra
> > files, missing files, size differences, or checksum mismatches.
>
> Verification is the best part of the manifest.  Plus, you can do
> verification pretty cheaply on restore.  We also restore pg_control last
> so clusters that have a restore error won't start.

There's no "restore" operation here, really. A backup taken by
pg_basebackup can be "restored" by copying the whole thing, but it can
also be used just where it is. If we were going to build something
into some in-core tool to copy backups around, this would be a smart
way to implement said tool, but I'm not planning on that myself.

> > One thing I'm not quite sure about is where to store the backup
> > manifest. If you take a base backup in tar format, you get base.tar,
> > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> > Does the backup manifest go into base.tar? Get written into a separate
> > file outside of any tar archive? Something else? And what about a
> > plain-format backup? I suppose then we should just write the manifest
> > into the top level of the main data directory, but perhaps someone has
> > another idea.
>
> We do:
>
> [backup_label]/
>     backup.manifest
>     pg_data/
>     pg_tblspc/
>
> In general, having the manifest easily accessible is ideal.

That's a fine choice for a tool, but a I'm talking about something
that is part of the actual backup format supported by PostgreSQL, not
what a tool might wrap around it. The choice is whether, for a
tar-format backup, the manifest goes inside a tar file or as a
separate file. To put that another way, a patch adding backup
manifests does not get to redesign where pg_basebackup puts anything
else; it only gets to decide where to put the manifest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <[hidden email]> wrote:
> I intend that we should be able to support incremental backups based
> either on a previous full backup or based on a previous incremental
> backup. I am not aware of a technical reason why we need to identify
> the specific backup that must be used. If incremental backup B is
> taken based on a pre-existing backup A, then I think that B can be
> restored using either A or *any other backup taken after A and before
> B*. In the normal case, there probably wouldn't be any such backup,
> but AFAICS the start-LSNs are a sufficient cross-check that the chosen
> base backup is legal.

Scratch that: there can be overlapping backups, so you have to
cross-check both start and stop LSNs.

> > > (3) Cross-check a manifest against a backup and complain about extra
> > > files, missing files, size differences, or checksum mismatches.
> >
> > Verification is the best part of the manifest.  Plus, you can do
> > verification pretty cheaply on restore.  We also restore pg_control last
> > so clusters that have a restore error won't start.
>
> There's no "restore" operation here, really. A backup taken by
> pg_basebackup can be "restored" by copying the whole thing, but it can
> also be used just where it is. If we were going to build something
> into some in-core tool to copy backups around, this would be a smart
> way to implement said tool, but I'm not planning on that myself.

Scratch that: incremental backups need a restore tool, so we can use
this technique there. And it can work for full backups too, because
why not?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

David Steele
In reply to this post by Robert Haas
Hi Robert,

On 9/19/19 9:51 AM, Robert Haas wrote:
> On Wed, Sep 18, 2019 at 9:11 PM David Steele <[hidden email]> wrote:
>> Also consider adding the timestamp.
>
> Sounds reasonable, even if only for the benefit of humans who might
> look at the file.  We can decide later whether to use it for anything
> else (and third-party tools could make different decisions from core).
> I assume we're talking about file mtime here, not file ctime or file
> atime or the time the manifest was generated, but let me know if I'm
> wrong.

In my experience only mtime is useful.

>> Based on my original calculations (which sadly I don't have anymore),
>> the combination of SHA1, size, and file name is *extremely* unlikely to
>> generate a collision.  As in, unlikely to happen before the end of the
>> universe kind of unlikely.  Though, I guess it depends on your
>> expectations for the lifetime of the universe.

> What I'd say is: if
> the probability of getting a collision is demonstrably many orders of
> magnitude less than the probability of the disk writing the block
> incorrectly, then I think we're probably reasonably OK. Somebody might
> differ, which is perhaps a mild point in favor of LSN-based
> approaches, but as a practical matter, if a bad block is a billion
> times more likely to be the result of a disk error than a checksum
> mismatch, then it's a negligible risk.

Agreed.

>> We include the version/sysid of the cluster to avoid mixups.  It's a
>> great extra check on top of references to be sure everything is kosher.
>
> I don't think it's a good idea to duplicate the information that's
> already in the backup_label. Storing two copies of the same
> information is just an invitation to having to worry about what
> happens if they don't agree.

OK, but now we have backup_label, tablespace_map,
XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a
backup.manifest file.  I feel like we may be drowning in backup info files.

>> I'd
>> recommend JSON for the format since it is so ubiquitous and easily
>> handles escaping which can be gotchas in a home-grown format.  We
>> currently have a format that is a combination of Windows INI and JSON
>> (for human-readability in theory) and we have become painfully aware of
>> escaping issues.  Really, why would you drop files with '=' in their
>> name in PGDATA?  And yet it happens.
>
> I am not crazy about JSON because it requires that I get a json parser
> into src/common, which I could do, but given the possibly-imminent end
> of the universe, I'm not sure it's the greatest use of time. You're
> right that if we pick an ad-hoc format, we've got to worry about
> escaping, which isn't lovely.

My experience is that JSON is simple to implement and has already dealt
with escaping and data structure considerations.  A home-grown solution
will be at least as complex but have the disadvantage of being non-standard.

>>> One thing I'm not quite sure about is where to store the backup
>>> manifest. If you take a base backup in tar format, you get base.tar,
>>> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
>>> Does the backup manifest go into base.tar? Get written into a separate
>>> file outside of any tar archive? Something else? And what about a
>>> plain-format backup? I suppose then we should just write the manifest
>>> into the top level of the main data directory, but perhaps someone has
>>> another idea.
>>
>> We do:
>>
>> [backup_label]/
>>      backup.manifest
>>      pg_data/
>>      pg_tblspc/
>>
>> In general, having the manifest easily accessible is ideal.
>
> That's a fine choice for a tool, but a I'm talking about something
> that is part of the actual backup format supported by PostgreSQL, not
> what a tool might wrap around it. The choice is whether, for a
> tar-format backup, the manifest goes inside a tar file or as a
> separate file. To put that another way, a patch adding backup
> manifests does not get to redesign where pg_basebackup puts anything
> else; it only gets to decide where to put the manifest.

Fair enough.  The point is to make the manifest easily accessible.

I'd keep it in the data directory for file-based backups and as a
separate file for tar-based backups.  The advantage here is that we can
pick a file name that becomes reserved which a tool can't do.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

David Steele
In reply to this post by Robert Haas
On 9/19/19 11:00 AM, Robert Haas wrote:

> On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <[hidden email]> wrote:
>> I intend that we should be able to support incremental backups based
>> either on a previous full backup or based on a previous incremental
>> backup. I am not aware of a technical reason why we need to identify
>> the specific backup that must be used. If incremental backup B is
>> taken based on a pre-existing backup A, then I think that B can be
>> restored using either A or *any other backup taken after A and before
>> B*. In the normal case, there probably wouldn't be any such backup,
>> but AFAICS the start-LSNs are a sufficient cross-check that the chosen
>> base backup is legal.
>
> Scratch that: there can be overlapping backups, so you have to
> cross-check both start and stop LSNs.

Overall we have found it's much simpler to label each backup and
cross-check that against the pg version and system id.  Start LSN is
pretty unique, but backup labels work really well and are more widely
understood.

>>>> (3) Cross-check a manifest against a backup and complain about extra
>>>> files, missing files, size differences, or checksum mismatches.
>>>
>>> Verification is the best part of the manifest.  Plus, you can do
>>> verification pretty cheaply on restore.  We also restore pg_control last
>>> so clusters that have a restore error won't start.
>>
>> There's no "restore" operation here, really. A backup taken by
>> pg_basebackup can be "restored" by copying the whole thing, but it can
>> also be used just where it is. If we were going to build something
>> into some in-core tool to copy backups around, this would be a smart
>> way to implement said tool, but I'm not planning on that myself.
>
> Scratch that: incremental backups need a restore tool, so we can use
> this technique there. And it can work for full backups too, because
> why not?

Agreed, once we have a restore tool, use it for everything.

--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Michael Paquier-2
On Thu, Sep 19, 2019 at 11:10:46PM -0400, David Steele wrote:

> On 9/19/19 11:00 AM, Robert Haas wrote:
>> On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <[hidden email]> wrote:
>> > I intend that we should be able to support incremental backups based
>> > either on a previous full backup or based on a previous incremental
>> > backup. I am not aware of a technical reason why we need to identify
>> > the specific backup that must be used. If incremental backup B is
>> > taken based on a pre-existing backup A, then I think that B can be
>> > restored using either A or *any other backup taken after A and before
>> > B*. In the normal case, there probably wouldn't be any such backup,
>> > but AFAICS the start-LSNs are a sufficient cross-check that the chosen
>> > base backup is legal.
>>
>> Scratch that: there can be overlapping backups, so you have to
>> cross-check both start and stop LSNs.
>
> Overall we have found it's much simpler to label each backup and cross-check
> that against the pg version and system id.  Start LSN is pretty unique, but
> backup labels work really well and are more widely understood.
Warning.  The start LSN could be the same for multiple backups when
taken from a standby.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
In reply to this post by David Steele
On Thu, Sep 19, 2019 at 11:10 PM David Steele <[hidden email]> wrote:
> Overall we have found it's much simpler to label each backup and
> cross-check that against the pg version and system id.  Start LSN is
> pretty unique, but backup labels work really well and are more widely
> understood.

I see your point, but part of my point is that uniqueness is not a
technical requirement. However, it may be a requirement for user
comprehension.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
In reply to this post by David Steele
On Thu, Sep 19, 2019 at 11:06 PM David Steele <[hidden email]> wrote:
> > I am not crazy about JSON because it requires that I get a json parser
> > into src/common, which I could do, but given the possibly-imminent end
> > of the universe, I'm not sure it's the greatest use of time. You're
> > right that if we pick an ad-hoc format, we've got to worry about
> > escaping, which isn't lovely.
>
> My experience is that JSON is simple to implement and has already dealt
> with escaping and data structure considerations.  A home-grown solution
> will be at least as complex but have the disadvantage of being non-standard.

I think that's fair and just spent a little while investigating how
difficult it would be to disentangle the JSON parser from the backend.
It has dependencies on the following bits of backend-only
functionality:

- check_stack_depth(). No problem, I think.  Just skip it for frontend code.

- pg_mblen() / GetDatabaseEncoding(). Not sure what to do about this.
Some of our infrastructure for dealing with encoding is available in
the frontend and backend, but this part is backend-only.

- elog() / ereport(). Kind of a pain. We could just kill the program
if an error occurs, but that seems a bit ham-fisted. Refactoring the
code so that the error is returned rather than thrown might be the way
to go, but it's not simple, because you're not just passing a string.

                        ereport(ERROR,

(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                         errmsg("invalid input syntax
for type %s", "json"),
                                         errdetail("Character with
value 0x%02x must be escaped.",
                                                           (unsigned char) *s),
                                         report_json_context(lex)));

- appendStringInfo et. al. I don't think it would be that hard to move
this to src/common, but I'm also not sure it really solves the
problem, because StringInfo has a 1GB limit, and there's no rule at
all that a backup manifest has got to be less than 1GB.

https://www.pgcon.org/2013/schedule/events/595.en.html

This gets at another problem that I just started to think about. If
the file is just a series of lines, you can parse it one line and a
time and do something with that line, then move on. If it's a JSON
blob, you have to parse the whole file and get a potentially giant
data structure back, and then operate on that data structure. At
least, I think you do. There's probably some way to create a callback
structure that lets you presuppose that the toplevel data structure is
an array (or object) and get back each element of that array (or
key/value pair) as it's parsed, but that sounds pretty annoying to get
working. Or we could just decide that you have to have enough memory
to hold the parsed version of the entire manifest file in memory all
at once, and if you don't, maybe you should drop some tables or buy
more RAM. That still leaves you with bypassing the 1GB size limit on
StringInfo, maybe by having a "huge" option, or perhaps by
memory-mapping the file and then making the StringInfo point directly
into the mapped region. Perhaps I'm overthinking this and maybe you
have a simpler idea in mind about how it can be made to work, but I
find all this complexity pretty unappealing.

Here's a competing proposal: let's decide that lines consist of
tab-separated fields. If a field contains a \t, \r, or \n, put a " at
the beginning, a " at the end, and double any " that appears in the
middle. This is easy to generate and easy to parse. It lets us
completely ignore encoding considerations. Incremental parsing is
straightforward. Quoting will rarely be needed because there's very
little reason to create a file inside a PostgreSQL data directory that
contains a tab or a newline, but if you do it'll still work.  The lack
of quoting is nice for humans reading the manifest, and nice in terms
of keeping the manifest succinct; in contrast, note that using JSON
doubles every backslash.

I hear you saying that this is going to end up being just as complex
in the end, but I don't think I believe it.  It sounds to me like the
difference between spending a couple of hours figuring this out and
spending a couple of months trying to figure it out and maybe not
actually getting anywhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
In reply to this post by David Steele
On Thu, Sep 19, 2019 at 11:06 PM David Steele <[hidden email]> wrote:
> > I don't think it's a good idea to duplicate the information that's
> > already in the backup_label. Storing two copies of the same
> > information is just an invitation to having to worry about what
> > happens if they don't agree.
>
> OK, but now we have backup_label, tablespace_map,
> XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a
> backup.manifest file.  I feel like we may be drowning in backup info files.

I agree!

I'm not sure what to do about it, though.  The information that is
present in the tablespace_map file could have been stored in the
backup_label file, I think, and that would have made sense, because
both files are serving a very similar purpose: they tell the server
that it needs to do some non-standard stuff when it starts up, and
they give it instructions for what those things are. And, as a
secondary purpose, humans or third-party tools can read them and use
that information for whatever purpose they wish.

The proposed backup_manifest file is a little different. I don't think
that anyone is proposing that the server should read that file: it is
there solely for the purpose of helping our own tools or third-party
tools or human beings who are, uh, acting like tools.[1] We're also
proposing to put it in a different place: the backup_label goes into
one of the tar files, but the backup_manifest would sit outside of any
tar file.

If we were designing this from scratch, maybe we'd roll all of this
into one file that serves as backup manifest, tablespace map, backup
label, and backup history file, but then again, maybe separating the
instructions-to-the-server part from the backup-integrity-checking
part makes sense.  At any rate, even if we knew for sure that's the
direction we wanted to go, getting there from here looks a bit rough.
If we just add a backup manifest, people who don't care can mostly
ignore it and then should be mostly fine. If we start trying to create
the one backup information system to rule them all, we're going to
break people's tools. Maybe that's worth doing someday, but the paint
isn't even dry on removing recovery.conf yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] There are a surprising number of installations where, in effect,
the DBA is the backup-and-restore tool, performing all the steps by
hand and hoping not to mess any of them up. The fact that nearly every
PostgreSQL company offers tools to make this easier does not seem to
have done a whole lot to diminish the number of people using ad-hoc
solutions.


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
In reply to this post by Robert Haas
On Fri, Sep 20, 2019 at 9:46 AM Robert Haas <[hidden email]> wrote:
> - appendStringInfo et. al. I don't think it would be that hard to move
> this to src/common, but I'm also not sure it really solves the
> problem, because StringInfo has a 1GB limit, and there's no rule at
> all that a backup manifest has got to be less than 1GB.

Hmm.  That's actually going to be a problem on the server side, no
matter what we do on the client side.  We have to send the manifest
after we send everything else, so that we know what we sent. But if we
sent a lot of files, the manifest might be really huge. I had been
thinking that we would generate the manifest on the server and send it
to the client after everything else, but maybe this is an argument for
generating the manifest on the client side and writing it
incrementally. That would require the client to peek at the contents
of every tar file it receives all the time, which it currently doesn't
need to do, but it does peek inside them a little bit, so maybe it's
OK.

Another alternative would be to have the server spill the manifest in
progress to a temp file and then stream it from there to the client.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

David Steele
In reply to this post by Robert Haas
On 9/20/19 9:46 AM, Robert Haas wrote:

> On Thu, Sep 19, 2019 at 11:06 PM David Steele <[hidden email]> wrote:
>
>> My experience is that JSON is simple to implement and has already dealt
>> with escaping and data structure considerations.  A home-grown solution
>> will be at least as complex but have the disadvantage of being non-standard.
>
> I think that's fair and just spent a little while investigating how
> difficult it would be to disentangle the JSON parser from the backend.
> It has dependencies on the following bits of backend-only
> functionality:

> - elog() / ereport(). Kind of a pain. We could just kill the program
> if an error occurs, but that seems a bit ham-fisted. Refactoring the
> code so that the error is returned rather than thrown might be the way
> to go, but it's not simple, because you're not just passing a string.

Seems to me we are overdue for elog()/ereport() compatible
error-handling in the front end.  Plus mem contexts.

It sucks to make that a prereq for this project but the longer we kick
that can down the road...

> https://www.pgcon.org/2013/schedule/events/595.en.html

This talk was good fun.  The largest number of tables we've seen is a
few hundred thousand, but that still adds up to more than a million
files to backup.

> This gets at another problem that I just started to think about. If
> the file is just a series of lines, you can parse it one line and a
> time and do something with that line, then move on. If it's a JSON
> blob, you have to parse the whole file and get a potentially giant
> data structure back, and then operate on that data structure. At
> least, I think you do.

JSON can definitely be parsed incrementally, but for practical reasons
certain structures work better than others.

> There's probably some way to create a callback
> structure that lets you presuppose that the toplevel data structure is
> an array (or object) and get back each element of that array (or
> key/value pair) as it's parsed, but that sounds pretty annoying to get
> working.

And that's how we do it.  It's annoying and yeah it's complicated but it
is very fast and memory-efficient.

> Or we could just decide that you have to have enough memory
> to hold the parsed version of the entire manifest file in memory all
> at once, and if you don't, maybe you should drop some tables or buy
> more RAM.

I assume you meant "un-parsed" here?

> That still leaves you with bypassing the 1GB size limit on
> StringInfo, maybe by having a "huge" option, or perhaps by
> memory-mapping the file and then making the StringInfo point directly
> into the mapped region. Perhaps I'm overthinking this and maybe you
> have a simpler idea in mind about how it can be made to work, but I
> find all this complexity pretty unappealing.

Our String object has the same 1GB limit.  Partly because it works and
saves a bit of memory per object, but also because if we find ourselves
exceeding that limit we know we've probably made a design error.

Parsing in stream means that you only need to store the final in-memory
representation of the manifest which can be much more compact.  Yeah,
it's complicated, but the memory and time savings are worth it.

Note that our Perl implementation took the naive approach and has worked
pretty well for six years, but can choke on really large manifests with
out of memory errors.  Overall, I'd say getting the format right is more
important than having the perfect initial implementation.

> Here's a competing proposal: let's decide that lines consist of
> tab-separated fields. If a field contains a \t, \r, or \n, put a " at
> the beginning, a " at the end, and double any " that appears in the
> middle. This is easy to generate and easy to parse. It lets us
> completely ignore encoding considerations. Incremental parsing is
> straightforward. Quoting will rarely be needed because there's very
> little reason to create a file inside a PostgreSQL data directory that
> contains a tab or a newline, but if you do it'll still work.  The lack
> of quoting is nice for humans reading the manifest, and nice in terms
> of keeping the manifest succinct; in contrast, note that using JSON
> doubles every backslash.

There's other information you'll want to store that is not strictly file
info so you need a way to denote that.  It gets complicated quickly.

> I hear you saying that this is going to end up being just as complex
> in the end, but I don't think I believe it.  It sounds to me like the
> difference between spending a couple of hours figuring this out and
> spending a couple of months trying to figure it out and maybe not
> actually getting anywhere.

Maybe the initial implementation will be easier but I am confident we'll
pay for it down the road.  Also, don't we want users to be able to read
this file?  Do we really want them to need to cook up a custom parser in
Perl, Go, Python, etc.?

--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

David Steele
In reply to this post by Robert Haas
On 9/20/19 10:59 AM, Robert Haas wrote:

> On Fri, Sep 20, 2019 at 9:46 AM Robert Haas <[hidden email]> wrote:
>> - appendStringInfo et. al. I don't think it would be that hard to move
>> this to src/common, but I'm also not sure it really solves the
>> problem, because StringInfo has a 1GB limit, and there's no rule at
>> all that a backup manifest has got to be less than 1GB.
>
> Hmm.  That's actually going to be a problem on the server side, no
> matter what we do on the client side.  We have to send the manifest
> after we send everything else, so that we know what we sent. But if we
> sent a lot of files, the manifest might be really huge. I had been
> thinking that we would generate the manifest on the server and send it
> to the client after everything else, but maybe this is an argument for
> generating the manifest on the client side and writing it
> incrementally. That would require the client to peek at the contents
> of every tar file it receives all the time, which it currently doesn't
> need to do, but it does peek inside them a little bit, so maybe it's
> OK.
>
> Another alternative would be to have the server spill the manifest in
> progress to a temp file and then stream it from there to the client.

This seems reasonable to me.

We keep an in-memory representation which is just an array of structs
and is fairly compact -- 1 million files uses ~150MB of memory.  We just
format and stream this to storage when saving.  Saving is easier than
loading, of course.

--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Chapman Flack
In reply to this post by Robert Haas
On 9/20/19 9:46 AM, Robert Haas wrote:

> least, I think you do. There's probably some way to create a callback
> structure that lets you presuppose that the toplevel data structure is
> an array (or object) and get back each element of that array (or
> key/value pair) as it's parsed,

If a JSON parser does find its way into src/common, it probably wants
to have such an incremental mode available, similar to [2] offered
in the "Jackson" library for Java.

The Jackson developer has propounded a thesis[1] that such a parsing
library ought to offer "Three -- and Only Three" different styles of
API corresponding to three ways of organizing the code using the
library ([2], [3], [4], which also resemble the different APIs
supplied in Java for XML processing).

Regards,
-Chap


[1] http://www.cowtowncoder.com/blog/archives/2009/01/entry_132.html
[2] http://www.cowtowncoder.com/blog/archives/2009/01/entry_137.html
[3] http://www.cowtowncoder.com/blog/archives/2009/01/entry_153.html
[4] http://www.cowtowncoder.com/blog/archives/2009/01/entry_152.html


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
In reply to this post by David Steele
On Fri, Sep 20, 2019 at 11:09 AM David Steele <[hidden email]> wrote:
> Seems to me we are overdue for elog()/ereport() compatible
> error-handling in the front end.  Plus mem contexts.
>
> It sucks to make that a prereq for this project but the longer we kick
> that can down the road...

There are no doubt many patches that would benefit from having more
backend infrastructure exposed in frontend contexts, and I think we're
slowly moving in that direction, but I generally do not believe in
burdening feature patches with major infrastructure improvements.
Sometimes it's necessary, as in the case of parallel query, which
required upgrading a whole lot of backend infrastructure in order to
have any chance of doing something useful. In most cases, however,
there's a way of getting the patch done that dodges the problem.

For example, I think there's a pretty good argument that Heikki's
design for relation forks was a bad one. It's proven to scale poorly
and create performance problems and extra complexity in quite a few
places. It would likely have been better, from a strictly theoretical
point of view, to insist on a design where the FSM and VM pages got
stored inside the relation itself, and the heap was responsible for
figuring out how various pages were being used. When BRIN came along,
we insisted on precisely that design, because it was clear that
further straining the relation fork system was not a good plan.
However, if we'd insisted on that when Heikki did the original work,
it might have delayed the arrival of the free space map for one or
more releases, and we got big benefits out of having that done sooner.
There's nothing stopping someone from writing a patch to get rid of
relation forks and allow a heap AM to have multiple relfilenodes (with
the extra ones used for the FSM and VM) or with multiplexing all the
data inside of a single file. Nobody has, though, because it's hard,
and the problems with the status quo are not so bad as to justify the
amount of development effort that would be required to fix it. At some
point, that problem is probably going to work its way to the top of
somebody's priority list, but it's already been about 10 years since
that all happened and everyone has so far dodged dealing with the
problem, which in turn has enabled them to work on other things that
are perhaps more important.

I think the same principle applies here. It's reasonable to ask the
author of a feature patch to fix issues that are closely related to
the feature in question, or even problems that are not new but would
be greatly exacerbated by the addition of the feature. It's not
reasonable to stack up a list of infrastructure upgrades that somebody
has to do as a condition of having a feature patch accepted that does
not necessarily require those upgrades. I am not convinced that JSON
is actually a better format for a backup manifest (more on that
below), but even if I were, I believe that getting a backup manifest
functionality into PostgreSQL 13, and perhaps incremental backup on
top of that, is valuable enough to justify making some compromises to
make that happen. And I don't mean "compromises" as in "let's commit
something that doesn't work very well;" rather, I mean making design
choices that are aimed at making the project something that is
feasible and can be completed in reasonable time, rather than not.

And saying, well, the backup manifest format *has* to be JSON because
everything else suxxor is not that. We don't have a single other
example of a file that we read and write in JSON format. Extension
control files use a custom format. Backup labels and backup history
files and timeline history files and tablespace map files use custom
formats. postgresql.conf, pg_hba.conf, and pg_ident.conf use custom
formats. postmaster.opts and postmaster.pid use custom formats. If
JSON is better and easier, at least one of the various people who
coded those things up would have chosen to use it, but none of them
did, and nobody's made a serious attempt to convert them to use it.
That might be because we lack the infrastructure for dealing with JSON
and building it is more work than anybody's willing to do, or it might
be because JSON is not actually better for these kinds of use cases,
but either way, it's hard to see why this particular patch should be
burdened with a requirement that none of the previous ones had to
satisfy.

Personally, I'd be intensely unhappy if a motion to convert
postgresql.conf or pg_hba.conf to JSON format gathered enough steam to
be adopted.  It would be darn useful, because you could specify
complex values for options instead of being limited to scalars, but it
would also make the configuration files a lot harder for human beings
to read and grep and the quality of error reporting would probably
decline significantly.  Also, appending a setting to the file,
something which is currently quite simple, would get a lot harder.
Ad-hoc file formats can be problematic, but they can also have real
advantages in terms of readability, brevity, and fitness for purpose.

> This talk was good fun.  The largest number of tables we've seen is a
> few hundred thousand, but that still adds up to more than a million
> files to backup.

A quick survey of some of my colleagues turned up a few examples of
people with 2-4 million files to backup, so similar kind of ballpark.
Probably not big enough for the manifest to hit the 1GB mark, but
getting close.

> > Or we could just decide that you have to have enough memory
> > to hold the parsed version of the entire manifest file in memory all
> > at once, and if you don't, maybe you should drop some tables or buy
> > more RAM.
>
> I assume you meant "un-parsed" here?

I don't think I meant that, although it seems like you might need to
store either all the parsed data or all the unparsed data or even
both, depending on exactly what you are trying to do.

> > I hear you saying that this is going to end up being just as complex
> > in the end, but I don't think I believe it.  It sounds to me like the
> > difference between spending a couple of hours figuring this out and
> > spending a couple of months trying to figure it out and maybe not
> > actually getting anywhere.
>
> Maybe the initial implementation will be easier but I am confident we'll
> pay for it down the road.  Also, don't we want users to be able to read
> this file?  Do we really want them to need to cook up a custom parser in
> Perl, Go, Python, etc.?
Well, I haven't heard anybody complain that they can't read a
backup_label file because it's too hard to cook up a parser.  And I
think the reason is pretty clear: such files are not hard to parse.
Similarly for a pg_hba.conf file.  This case is a little more
complicated than those, but AFAICS, not enormously so. Actually, it
seems like a combination of those two cases: it has some fixed
metadata fields that can be represented with one line per field, like
a backup_label, and then a bunch of entries for files that are
somewhat like entries in a pg_hba.conf file, in that they can be
represented by a line per record with a certain number of fields on
each line.

I attach here a couple of patches.  The first one does some
refactoring of relevant code in pg_basebackup, and the second one adds
checksum manifests using a format that I pulled out of my ear. It
probably needs some adjustment but I don't think it's crazy.  Each
file gets a line that looks like this:

File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM

Right now, the file checksums are computed using SHA-256 but it could
be changed to anything else for which we've got code. On my system,
shasum -a256 $FILE produces the same answer that shows up here.  At
the bottom of the manifest there's a checksum of the manifest itself,
which looks like this:

Manifest-Checksum
385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781

That's a SHA-256 checksum of the file contents excluding the final
line. It can be verified by feeding all the file contents except the
last line to shasum -a256. I can't help but observe that if the file
were defined to be a JSONB blob, it's not very clear how you would
include a checksum of the blob contents in the blob itself, but with a
format based on a bunch of lines of data, it's super-easy to generate
and super-easy to write tools that verify it.

This is just a prototype so I haven't written a verification tool, and
there's a bunch of testing and documentation and so forth that would
need to be done aside from whatever we've got to hammer out in terms
of design issues and file formats.  But I think it's cool, and perhaps
some discussion of how it could be evolved will get us closer to a
resolution everybody can at least live with.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0002-POC-of-backup-manifest-with-file-names-sizes-timesta.patch (28K) Download Attachment
0001-Refactor-some-pg_basebackup-code.patch (47K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

David Steele
On 9/20/19 2:55 PM, Robert Haas wrote:
> On Fri, Sep 20, 2019 at 11:09 AM David Steele <[hidden email]> wrote:
>>
>> It sucks to make that a prereq for this project but the longer we kick
>> that can down the road...
>
> There are no doubt many patches that would benefit from having more
> backend infrastructure exposed in frontend contexts, and I think we're
> slowly moving in that direction, but I generally do not believe in
> burdening feature patches with major infrastructure improvements.

The hardest part about technical debt is knowing when to incur it.  It
is never a cut-and-dried choice.

>> This talk was good fun.  The largest number of tables we've seen is a
>> few hundred thousand, but that still adds up to more than a million
>> files to backup.
>
> A quick survey of some of my colleagues turned up a few examples of
> people with 2-4 million files to backup, so similar kind of ballpark.
> Probably not big enough for the manifest to hit the 1GB mark, but
> getting close.

I have so many doubts about clusters with this many tables, but we do
support it, so...

>>> I hear you saying that this is going to end up being just as complex
>>> in the end, but I don't think I believe it.  It sounds to me like the
>>> difference between spending a couple of hours figuring this out and
>>> spending a couple of months trying to figure it out and maybe not
>>> actually getting anywhere.
>>
>> Maybe the initial implementation will be easier but I am confident we'll
>> pay for it down the road.  Also, don't we want users to be able to read
>> this file?  Do we really want them to need to cook up a custom parser in
>> Perl, Go, Python, etc.?
>
> Well, I haven't heard anybody complain that they can't read a
> backup_label file because it's too hard to cook up a parser.  And I
> think the reason is pretty clear: such files are not hard to parse.
> Similarly for a pg_hba.conf file.  This case is a little more
> complicated than those, but AFAICS, not enormously so. Actually, it
> seems like a combination of those two cases: it has some fixed
> metadata fields that can be represented with one line per field, like
> a backup_label, and then a bunch of entries for files that are
> somewhat like entries in a pg_hba.conf file, in that they can be
> represented by a line per record with a certain number of fields on
> each line.

Yeah, they are not hard to parse, but *everyone* has to cook up code for
it.  A bit of a bummer, that.

> I attach here a couple of patches.  The first one does some
> refactoring of relevant code in pg_basebackup, and the second one adds
> checksum manifests using a format that I pulled out of my ear. It
> probably needs some adjustment but I don't think it's crazy.  Each
> file gets a line that looks like this:
>
> File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM

We also include page checksum validation failures in the file record.
Not critical for the first pass, perhaps, but something to keep in mind.

> Right now, the file checksums are computed using SHA-256 but it could
> be changed to anything else for which we've got code. On my system,
> shasum -a256 $FILE produces the same answer that shows up here.  At
> the bottom of the manifest there's a checksum of the manifest itself,
> which looks like this:
>
> Manifest-Checksum
> 385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781
>
> That's a SHA-256 checksum of the file contents excluding the final
> line. It can be verified by feeding all the file contents except the
> last line to shasum -a256. I can't help but observe that if the file
> were defined to be a JSONB blob, it's not very clear how you would
> include a checksum of the blob contents in the blob itself, but with a
> format based on a bunch of lines of data, it's super-easy to generate
> and super-easy to write tools that verify it.

You can do this in JSON pretty easily by handling the terminating
brace/bracket:

{
<some json contents>*,
"checksum":<sha256>
}

But of course a linefeed-delimited file is even easier.

> This is just a prototype so I haven't written a verification tool, and
> there's a bunch of testing and documentation and so forth that would
> need to be done aside from whatever we've got to hammer out in terms
> of design issues and file formats.  But I think it's cool, and perhaps
> some discussion of how it could be evolved will get us closer to a
> resolution everybody can at least live with.

I had a quick look and it seems pretty reasonable.  I'll need to
generate a manifest to see if I can spot any obvious gotchas.

--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

vignesh C
In reply to this post by Robert Haas
On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <[hidden email]> wrote:
>
Some comments:

Manifest file will be in plain text format even if compression is
specified, should we compress it?
May be this is intended, just raised the point to make sure that it is intended.
+static void
+ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
+{
+ WriteManifestState *state = callback_data;
+
+ if (fwrite(copybuf, r, 1, state->file) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", state->filename);
+ exit(1);
+ }
+}

WALfile.done file gets added but wal file information is not included
in the manifest file, should we include WAL file also?
@@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", pathbuf)));

- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
+ NULL);

  /* unconditionally mark file as archived */
  StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
+ sendFileWithContent(pathbuf, "", manifest);

Should we add an option to make manifest generation configurable to
reduce overhead during backup?

Manifest file does not include directory information, should we include it?

There is one warning:
In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                 from pg_basebackup.c:34:
pg_basebackup.c: In function ‘ReceiveTarFile’:
../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
comparison will always evaluate as ‘false’ for the address of ‘buf’
will never be NULL [-Waddress]
  ((str) == NULL || (str)->maxlen == 0)
         ^
pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
   if (PQExpBufferBroken(&buf))

pg_gmtime can fail in case of malloc failure:
+ /*
+ * Convert time to a string. Since it's not clear what time zone to use
+ * and since time zone definitions can change, possibly causing confusion,
+ * use GMT always.
+ */
+ pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_gmtime(&mtime));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Jeevan Chalke

Entry for directory is not added in manifest. So it might be difficult
at client to get to know about the directories. Will it be good to add
an entry for each directory too? May be like:
Dir    <dirname> <mtime>

Also, on latest HEAD patches does not apply.

On Wed, Sep 25, 2019 at 6:17 PM vignesh C <[hidden email]> wrote:
On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <[hidden email]> wrote:
>
Some comments:

Manifest file will be in plain text format even if compression is
specified, should we compress it?
May be this is intended, just raised the point to make sure that it is intended.
+static void
+ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
+{
+ WriteManifestState *state = callback_data;
+
+ if (fwrite(copybuf, r, 1, state->file) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", state->filename);
+ exit(1);
+ }
+}

WALfile.done file gets added but wal file information is not included
in the manifest file, should we include WAL file also?
@@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", pathbuf)));

- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
+ NULL);

  /* unconditionally mark file as archived */
  StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
+ sendFileWithContent(pathbuf, "", manifest);

Should we add an option to make manifest generation configurable to
reduce overhead during backup?

Manifest file does not include directory information, should we include it?

There is one warning:
In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                 from pg_basebackup.c:34:
pg_basebackup.c: In function ‘ReceiveTarFile’:
../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
comparison will always evaluate as ‘false’ for the address of ‘buf’
will never be NULL [-Waddress]
  ((str) == NULL || (str)->maxlen == 0)
         ^
pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
   if (PQExpBufferBroken(&buf))


Yes I too obeserved this warning.
 
pg_gmtime can fail in case of malloc failure:
+ /*
+ * Convert time to a string. Since it's not clear what time zone to use
+ * and since time zone definitions can change, possibly causing confusion,
+ * use GMT always.
+ */
+ pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_gmtime(&mtime));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Rushabh Lathia
In reply to this post by vignesh C


On Wed, Sep 25, 2019 at 6:17 PM vignesh C <[hidden email]> wrote:
On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <[hidden email]> wrote:
>
Some comments:

Manifest file will be in plain text format even if compression is
specified, should we compress it?
May be this is intended, just raised the point to make sure that it is intended.
+static void
+ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
+{
+ WriteManifestState *state = callback_data;
+
+ if (fwrite(copybuf, r, 1, state->file) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", state->filename);
+ exit(1);
+ }
+}

WALfile.done file gets added but wal file information is not included
in the manifest file, should we include WAL file also?
@@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", pathbuf)));

- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
+ NULL);

  /* unconditionally mark file as archived */
  StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
+ sendFileWithContent(pathbuf, "", manifest);

Should we add an option to make manifest generation configurable to
reduce overhead during backup?

Manifest file does not include directory information, should we include it?

There is one warning:
In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                 from pg_basebackup.c:34:
pg_basebackup.c: In function ‘ReceiveTarFile’:
../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
comparison will always evaluate as ‘false’ for the address of ‘buf’
will never be NULL [-Waddress]
  ((str) == NULL || (str)->maxlen == 0)
         ^
pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
   if (PQExpBufferBroken(&buf))


I also observed this warning.  PFA to fix the same.

pg_gmtime can fail in case of malloc failure:
+ /*
+ * Convert time to a string. Since it's not clear what time zone to use
+ * and since time zone definitions can change, possibly causing confusion,
+ * use GMT always.
+ */
+ pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_gmtime(&mtime));


Fixed that into attached patch.




Regards.
Rushabh Lathia

0003_warning_review.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: backup manifests

Robert Haas
In reply to this post by Jeevan Chalke
On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
<[hidden email]> wrote:
> Entry for directory is not added in manifest. So it might be difficult
> at client to get to know about the directories. Will it be good to add
> an entry for each directory too? May be like:
> Dir    <dirname> <mtime>

Well, what kind of corruption would this allow us to detect that we
can't detect as things stand? I think the only case is an empty
directory. If it's not empty, we'd have some entries for the files in
that directory, and those files won't be able to exist unless the
directory does. But, how would we end up backing up an empty
directory, anyway?

I don't really *mind* adding directories into the manifest, but I'm
not sure how much it helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


1234 ... 11