tar-related code in PostgreSQL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

tar-related code in PostgreSQL

Robert Haas
Hi,

It has come to my attention that PostgreSQL has a bunch of code to
read and write 'tar' archives and it's kind of a mess. Attached are
two patches. The second one was written first, and does some modest
cleanups, most notably replacing the use of the constants 512 and the
formula (x + 511) & ~511 - x in lotsa places with a new constant
TAR_BLOCK_SIZE and a new static line function
tarPaddingBytesRequired(). In developing that patch, I found a bug, so
the first patch (which was written second) fixes it. The problem is
here:

        if (state.is_recovery_guc_supported)
        {
            tarCreateHeader(header, "standby.signal", NULL,
                            0,  /* zero-length file */
                            pg_file_create_mode, 04000, 02000,
                            time(NULL));

            writeTarData(&state, header, sizeof(header));
            writeTarData(&state, zerobuf, 511);
        }

We have similar code in many places -- because evidently nobody
thought it would be a good idea to have all the logic for reading and
writing tarfiles in a centralized location rather than having many
copies of it -- and typically it's written to pad the block out to a
multiple of 512 bytes. But here, the file is 0 bytes long, and then we
add 511 zero bytes. This results in a tarfile whose length is not a
multiple of the TAR block size:

[rhaas ~]$ pg_basebackup -D pgbackup -Ft && ls -l pgbackup
total 80288
-rw-------  1 rhaas  staff    135255 Apr 24 11:04 backup_manifest
-rw-------  1 rhaas  staff  24183296 Apr 24 11:04 base.tar
-rw-------  1 rhaas  staff  16778752 Apr 24 11:04 pg_wal.tar
[rhaas ~]$ rm -rf pgbackup
[rhaas ~]$ pg_basebackup -D pgbackup -Ft -R && ls -l pgbackup
total 80288
-rw-------  1 rhaas  staff    135255 Apr 24 11:04 backup_manifest
-rw-------  1 rhaas  staff  24184319 Apr 24 11:04 base.tar
-rw-------  1 rhaas  staff  16778752 Apr 24 11:04 pg_wal.tar
[rhaas ~]$ perl -e 'print $_ % 512, "\n" for qw(24183296 24184319 16778752);'
0
511
0

That seems bad. At first, I thought maybe the problem was the fact
that we were adding 511 zero bytes here instead of 512, but then I
realized the real problem is that we shouldn't be adding any zero
bytes at all. A zero-byte file is already padded out to a multiple of
512, because 512 * 0 = 0. The problem happens not to have any adverse
consequences in this case because this is the last thing that gets put
into the tar file, and the end-of-tar-file marker is 1024 zero bytes,
so the extra 511 zero bytes we're adding here get interpreted as the
beginning of the end-of-file marker, and the first 513 bytes of what
we intended as the actual end of file marker get interpreted as the
rest of it. Then there are 511 bytes of garbage zeros at the end but
my version of tar, at least, does not care.

However, it's possible to make it blow up pretty good with a slightly
different test case, because there's one case in master where we
insert one extra file -- backup_manifest -- into the tar file AFTER we
insert standby.signal. That case is when we're writing the output to
stdout:

[rhaas ~]$ pg_basebackup -D - -Ft -Xnone -R > everything.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all
required WAL segments are copied through other means to complete the
backup
[rhaas ~]$ tar tf everything.tar | grep manifest
tar: Damaged tar archive
tar: Retrying...
tar: Damaged tar archive
tar: Retrying...
(it repeats this ~100 times and then exits)

If I change the offending writeTarData(&state, zerobuf, 511) to
writeTarData(&state, zerobuf, 512), then the complaint about a damaged
archive goes away, but the backup_manifest file doesn't appear to be
included in the archive, because apparently one spurious 512-byte
block of zeroes is enough to make my version of tar think it's hit the
end of the archive:

[rhaas ~]$ tar tf everything.tar | grep manifest
[rhaas ~]$

If I remove the offending writeTarData(&state, zerobuf, 511) line
entirely - which I believe to the correct fix - then it works as
expected:

[rhaas ~]$ tar tf everything.tar | grep manifest
backup_manifest

This problem appears to have been introduced by commit
2dedf4d9a899b36d1a8ed29be5efbd1b31a8fe85, "Integrate recovery.conf
into postgresql.conf". The code has been substantially modified twice
since then, but those modifications seem to have just preserved the
bug first introduced in that commit.

I tentatively propose to do the following:

1. Commit 0001, which removes the 511 bytes of bogus padding and thus
fixes the bug, to master in the near future.

2. Possibly back-patch 0001 to v12, where the bug first appeared. I'm
not sure this is strictly necessary, because in v12, standby.signal is
always the very last thing in the tarfile, so there shouldn't be an
issue unless somebody has a version of tar that cares about the 511
bytes of trailing garbage. I will do this if people think it's a good
idea; otherwise I'll skip it.

3. Commit 0002, the aforementioned cleanup patch, to master, either
immediately if people are OK with that, or else after we branch. I am
inclined to regard the widespread use of the arbitrary constants 512
and 511 as something of a hazard that ought to be corrected sooner
rather than later, but someone could not-unreasonably take the view
that it's unnecessary tinkering post-feature freeze.

Long term, it might be wiser to either switch to using a real
archiving library rather than a bunch of hand-rolled code, or at the
very least centralize things better so that it's not so easy to make
mistakes of this type. However, I don't see that as a reasonable
argument against either of these patches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0002-Assorted-cleanup-of-tar-related-code.patch (22K) Download Attachment
0001-Fix-bogus-tar-file-padding-logic-for-standby.signal.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tar-related code in PostgreSQL

Tom Lane-2
Robert Haas <[hidden email]> writes:
> We have similar code in many places -- because evidently nobody
> thought it would be a good idea to have all the logic for reading and
> writing tarfiles in a centralized location rather than having many
> copies of it -- and typically it's written to pad the block out to a
> multiple of 512 bytes. But here, the file is 0 bytes long, and then we
> add 511 zero bytes. This results in a tarfile whose length is not a
> multiple of the TAR block size:

Bleah.  Whether or not the nearest copy of tar happens to spit up on
that, it's a clear violation of the POSIX standard for tar files.
I'd vote for back-patching your 0001.

I'd lean mildly to holding 0002 until after we branch.  It probably
won't break anything, but it probably won't fix anything either.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: tar-related code in PostgreSQL

Robert Haas
On Fri, Apr 24, 2020 at 12:27 PM Tom Lane <[hidden email]> wrote:
> Bleah.  Whether or not the nearest copy of tar happens to spit up on
> that, it's a clear violation of the POSIX standard for tar files.
> I'd vote for back-patching your 0001.

Done.

> I'd lean mildly to holding 0002 until after we branch.  It probably
> won't break anything, but it probably won't fix anything either.

True.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tar-related code in PostgreSQL

Robert Haas
On Mon, Apr 27, 2020 at 2:07 PM Robert Haas <[hidden email]> wrote:
> > I'd lean mildly to holding 0002 until after we branch.  It probably
> > won't break anything, but it probably won't fix anything either.
>
> True.

Committed now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tar-related code in PostgreSQL

Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

The patch works perfectly.

The new status of this patch is: Ready for Committer
Reply | Threaded
Open this post in threaded view
|

Re: tar-related code in PostgreSQL

Robert Haas
On Sun, Jun 28, 2020 at 11:24 AM Hamid Akhtar <[hidden email]> wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           not tested
> Documentation:            not tested
>
> The patch works perfectly.
>
> The new status of this patch is: Ready for Committer

Thanks, but this was committed on June 15th, as per my previous email.
Perhaps I forgot to update the CommitFest application....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tar-related code in PostgreSQL

Daniel Gustafsson
> On 29 Jun 2020, at 13:52, Robert Haas <[hidden email]> wrote:
> On Sun, Jun 28, 2020 at 11:24 AM Hamid Akhtar <[hidden email]> wrote:

>> The new status of this patch is: Ready for Committer
>
> Thanks, but this was committed on June 15th, as per my previous email.
> Perhaps I forgot to update the CommitFest application....

Done now, marked as committed.

cheers ./daniel