Segfault when restoring -Fd dump on current HEAD

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Segfault when restoring -Fd dump on current HEAD

hubert depesz lubaczewski-2
Hi,
I did upgrade of my test pg. Part of this is pg_dump -Fd of each
database, then upgrade binaries, then initdb, and pg_restore.

But - I can't restore any database that has any data - I get segfaults.

For example, with gdb:

=$ gdb --args pg_restore -vvvvv -C -Fd backup-20190225074600.10361-db-depesz.dump
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.                              
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pg_restore...done.            
(gdb) run                                                                                                                                          
Starting program: /home/pgdba/work/bin/pg_restore -vvvvv -C -Fd backup-20190225074600.10361-db-depesz.dump
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".                                                                              
--                                                                                                                                                                  
-- PostgreSQL database dump                                                              
--                                                                              
     
-- Dumped from database version 12devel
-- Dumped by pg_dump version 12devel

-- Started on 2019-02-25 07:46:01 CET

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;

pg_restore: creating DATABASE "depesz"
--
-- TOC entry 2148 (class 1262 OID 16631)
-- Name: depesz; Type: DATABASE; Schema: -; Owner: depesz
--

CREATE DATABASE depesz WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';


ALTER DATABASE depesz OWNER TO depesz;

pg_restore: connecting to new database "depesz"
\connect depesz

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;

pg_restore: creating TABLE "public.test"
SET default_tablespace = '';

SET default_with_oids = false;

--
-- TOC entry 196 (class 1259 OID 16632)
-- Name: test; Type: TABLE; Schema: public; Owner: depesz
--

CREATE TABLE public.test (
    i integer
);


ALTER TABLE public.test OWNER TO depesz;

--
-- TOC entry 2142 (class 0 OID 16632)
-- Dependencies: 196
-- Data for Name: test; Type: TABLE DATA; Schema: public; Owner: depesz
-- File: 2142.dat
--


Program received signal SIGSEGV, Segmentation fault.
0x000055555555d99c in _printTocEntry (AH=AH@entry=0x55555577e350, te=te@entry=0x5555557844a0, isData=isData@entry=true) at pg_backup_archiver.c:3636
3636    pg_backup_archiver.c: No such file or directory.
(gdb) bt
#0  0x000055555555d99c in _printTocEntry (AH=AH@entry=0x55555577e350, te=te@entry=0x5555557844a0, isData=isData@entry=true) at pg_backup_archiver.c:3636
#1  0x000055555555e41d in restore_toc_entry (AH=AH@entry=0x55555577e350, te=te@entry=0x5555557844a0, is_parallel=is_parallel@entry=false) at pg_backup_archiver.c:852
#2  0x000055555555eebb in RestoreArchive (AHX=0x55555577e350) at pg_backup_archiver.c:675
#3  0x000055555555aaab in main (argc=5, argv=<optimized out>) at pg_restore.c:432
(gdb)

if you'd need the dump to investigate - it's available here:
https://www.depesz.com/wp-content/uploads/2019/02/bad.dump.tar.gz

Unfortunately I don't have previous binaries, but I refreshed previously
around a week ago.

Best regards,

depesz


Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Dmitry Dolgov
> On Mon, Feb 25, 2019 at 8:45 AM hubert depesz lubaczewski <[hidden email]> wrote:
>
> I did upgrade of my test pg. Part of this is pg_dump -Fd of each
> database, then upgrade binaries, then initdb, and pg_restore.
>
> But - I can't restore any database that has any data - I get segfaults.

Thank for reporting. Unfortunately, I can't reproduce this issue on the master
(for me it's currently bc09d5e4cc) with the dump you've provided - do I need to
do something more than just pg_restore to trigger it?

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Justin Pryzby
On Mon, Feb 25, 2019 at 11:01:05AM +0100, Dmitry Dolgov wrote:

> > On Mon, Feb 25, 2019 at 8:45 AM hubert depesz lubaczewski <[hidden email]> wrote:
> >
> > I did upgrade of my test pg. Part of this is pg_dump -Fd of each
> > database, then upgrade binaries, then initdb, and pg_restore.
> >
> > But - I can't restore any database that has any data - I get segfaults.
>
> Thank for reporting. Unfortunately, I can't reproduce this issue on the master
> (for me it's currently bc09d5e4cc) with the dump you've provided - do I need to
> do something more than just pg_restore to trigger it?

What's crashing for me is restoring the (12dev) dumpfile using v11 psql:

[pryzbyj@database tmp]$ pg_restore backup-20190225074600.10361-db-depesz.dump >/dev/null
Segmentation fault (core dumped)
[pryzbyj@database tmp]$ pg_restore -V
pg_restore (PostgreSQL) 11.2

I would restore dump into v12dev and re-dump and compare dump output, except it
seems like pg_restore -d no longer restores into a database ??

[pryzbyj@database tmp]$ PGHOST=/tmp PGPORT=5678 PATH=~/src/postgresql.bin/bin pg_restore backup-20190225074600.10361-db-depesz.dump -d postgres |wc -l
44

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> What's crashing for me is restoring the (12dev) dumpfile using v11 psql:

Yeah, I can reproduce that here, using either -Fc or -Fd format dumps.
The immediate problem in your example is that the "defn" field of a
TABLE DATA entry is now null where it used to be an empty string.
Poking at related examples suggests that other fields have suffered
the same fate.

It appears to me that f831d4acc required a good deal more adult
supervision than it actually got.  That was alleged to be a small
notational refactoring, not a redefinition of what gets put into
dump files.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

hubert depesz lubaczewski-2
In reply to this post by hubert depesz lubaczewski-2
On Mon, Feb 25, 2019 at 08:45:39AM +0100, hubert depesz lubaczewski wrote:
> Hi,
> I did upgrade of my test pg. Part of this is pg_dump -Fd of each
> database, then upgrade binaries, then initdb, and pg_restore.

Sorry, please disregard this problem.

Error was sitting on a chair.

Best regards,

depesz


Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Michael Paquier-2
In reply to this post by Tom Lane-2
On Mon, Feb 25, 2019 at 11:20:14AM -0500, Tom Lane wrote:
> It appears to me that f831d4acc required a good deal more adult
> supervision than it actually got.  That was alleged to be a small
> notational refactoring, not a redefinition of what gets put into
> dump files.

How much consistent do we need to be for custom dump archives
regarding backward and upward-compatibility?  For dumps, we give no
guarantee that a dump taken with pg_dump on version N will be
compatible with a backend at version (N+1), so there is a effort for
backward compatibility, not really upward compatibility.

It seems to me that f831d4acc should have bumped at least
MAKE_ARCHIVE_VERSION as it changes the dump contents, still it seems
like a lot of for some refactoring?  FWIW, I have gone through the
commit's thread and I actually agree that instead of a mix of empty
strings and NULL, using only NULL is cleaner.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Tom Lane-2
Michael Paquier <[hidden email]> writes:
> On Mon, Feb 25, 2019 at 11:20:14AM -0500, Tom Lane wrote:
>> It appears to me that f831d4acc required a good deal more adult
>> supervision than it actually got.  That was alleged to be a small
>> notational refactoring, not a redefinition of what gets put into
>> dump files.

> How much consistent do we need to be for custom dump archives
> regarding backward and upward-compatibility?

Well, if we didn't want to fix this, a reasonable way to go about
it would be to bump the archive version number in pg_dump output,
so that old versions would issue a useful complaint instead of crashing.
However, I repeat that this patch was sold as a notational improvement,
not something that was going to break format compatibility.  I think if
anyone had mentioned the latter, there would have been push-back against
its being committed at all.  I am providing such push-back right now,
because I don't think we should break file compatibility for this.

Also, I'm now realizing that 4dbe19690 was probably not fixing an
aboriginal bug, but something that this patch introduced, because
we'd likely have noticed it before if the owner field could have
been a null pointer all along.  How much do you want to bet on
whether there are other such bugs now, that we haven't yet tripped
over?

I think this patch needs to be worked over so that what it writes
is exactly what was written before.  If the author is unwilling
to do that PDQ, it should be reverted.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Michael Paquier-2
On Tue, Feb 26, 2019 at 12:16:35AM -0500, Tom Lane wrote:
> Well, if we didn't want to fix this, a reasonable way to go about
> it would be to bump the archive version number in pg_dump output,
> so that old versions would issue a useful complaint instead of crashing.
> However, I repeat that this patch was sold as a notational improvement,
> not something that was going to break format compatibility.  I think if
> anyone had mentioned the latter, there would have been push-back against
> its being committed at all.  I am providing such push-back right now,
> because I don't think we should break file compatibility for this.

While I agree that the patch makes handling of the different fields in
archive entries cleaner, I agree as well that this is not enough to
justify a dump version bump.

> I think this patch needs to be worked over so that what it writes
> is exactly what was written before.  If the author is unwilling
> to do that PDQ, it should be reverted.

Works for me.  With a quick read of the code, it seems to me that it
is possible to keep compatibility while keeping the simplifications
around ArchiveEntry()'s refactoring.  Alvaro?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Dmitry Dolgov
> On Tue, Feb 26, 2019 at 6:38 AM Michael Paquier <[hidden email]> wrote:
>
> Works for me.  With a quick read of the code, it seems to me that it
> is possible to keep compatibility while keeping the simplifications
> around ArchiveEntry()'s refactoring.

Yes, it should be rather simple, we can e.g. return to the old less consistent
NULL handling approach something (like in the attached patch), or replace a NULL
value with an empty string in WriteToc. Give me a moment, I'll check it out. At
the same time I would suggest to keep replace_line_endings -> sanitize_line,
since it doesn't break compatibility.

archive_entry_null_handling.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Dmitry Dolgov
> On Tue, Feb 26, 2019 at 9:13 AM Dmitry Dolgov <[hidden email]> wrote:
>
> > On Tue, Feb 26, 2019 at 6:38 AM Michael Paquier <[hidden email]> wrote:
> >
> > Works for me.  With a quick read of the code, it seems to me that it
> > is possible to keep compatibility while keeping the simplifications
> > around ArchiveEntry()'s refactoring.
>
> Yes, it should be rather simple, we can e.g. return to the old less consistent
> NULL handling approach something (like in the attached patch), or replace a NULL
> value with an empty string in WriteToc. Give me a moment, I'll check it out. At
> the same time I would suggest to keep replace_line_endings -> sanitize_line,
> since it doesn't break compatibility.

Ok, unfortunately, I don't see any other ways, so the patch from the previous
email is probably the best option (we could also check NULLs in WriteToc, but
it means the same kind of inconsistency, and in this case I guess it's better
to keep NULL handling as it was before).

But I hope it still makes sense to consider more consisten approach here, maybe
with the next dump version bump, if it's going to happen in the foreseeable
future.

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Alvaro Herrera-9
In reply to this post by Michael Paquier-2
On 2019-Feb-26, Michael Paquier wrote:

> On Tue, Feb 26, 2019 at 12:16:35AM -0500, Tom Lane wrote:
> > Well, if we didn't want to fix this, a reasonable way to go about
> > it would be to bump the archive version number in pg_dump output,
> > so that old versions would issue a useful complaint instead of crashing.
> > However, I repeat that this patch was sold as a notational improvement,
> > not something that was going to break format compatibility.  I think if
> > anyone had mentioned the latter, there would have been push-back against
> > its being committed at all.  I am providing such push-back right now,
> > because I don't think we should break file compatibility for this.
>
> While I agree that the patch makes handling of the different fields in
> archive entries cleaner, I agree as well that this is not enough to
> justify a dump version bump.

Yeah, it was a nice thing to have, but I didn't keep in mind that we
ought to be able to provide such upwards compatibility.  (Is this
upwards or downwards or backwards or forwards compatibility, now,
actually?  I can't quite figure it out which direction it goes.)


> > I think this patch needs to be worked over so that what it writes
> > is exactly what was written before.  If the author is unwilling
> > to do that PDQ, it should be reverted.
>
> Works for me.  With a quick read of the code, it seems to me that it
> is possible to keep compatibility while keeping the simplifications
> around ArchiveEntry()'s refactoring.  Alvaro?

Yeah, let me review the patch Dmitry just sent.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Alvaro Herrera-9
In reply to this post by Dmitry Dolgov
On 2019-Feb-26, Dmitry Dolgov wrote:

> Yes, it should be rather simple, we can e.g. return to the old less consistent
> NULL handling approach something (like in the attached patch), or replace a NULL
> value with an empty string in WriteToc. Give me a moment, I'll check it out. At
> the same time I would suggest to keep replace_line_endings -> sanitize_line,
> since it doesn't break compatibility.

Hmm, shouldn't we modify sanitize_line so that it returns strdup(hyphen)
when input is empty and want_hyphen, too?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> Hmm, shouldn't we modify sanitize_line so that it returns strdup(hyphen)
> when input is empty and want_hyphen, too?

If this patch is touching the behavior of functions like that, then
it's going in the wrong direction; the need for any such change suggests
strongly that you've failed to restore the old behavior as to which TOC
fields can be null or not.

There might be reason to make such cleanups/improvements separately,
but let's *not* fuzz things up by doing them in the corrective patch.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Alvaro Herrera-9
In reply to this post by Dmitry Dolgov
On 2019-Feb-26, Dmitry Dolgov wrote:

> > On Tue, Feb 26, 2019 at 6:38 AM Michael Paquier <[hidden email]> wrote:
> >
> > Works for me.  With a quick read of the code, it seems to me that it
> > is possible to keep compatibility while keeping the simplifications
> > around ArchiveEntry()'s refactoring.
>
> Yes, it should be rather simple, we can e.g. return to the old less consistent
> NULL handling approach something (like in the attached patch), or replace a NULL
> value with an empty string in WriteToc. Give me a moment, I'll check it out. At
> the same time I would suggest to keep replace_line_endings -> sanitize_line,
> since it doesn't break compatibility.

I think it would be better to just put back the .defn = "" (etc) to the
ArchiveEntry calls.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> I think it would be better to just put back the .defn = "" (etc) to the
> ArchiveEntry calls.

Yeah, that was what I was imagining --- just make the ArchiveEntry calls
act exactly like they did before in terms of what gets filled into the
TOC fields.  This episode is a fine reminder of why premature optimization
is the root of all evil...

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Dmitry Dolgov
> On Tue, Feb 26, 2019 at 11:53 PM Alvaro Herrera <[hidden email]> wrote:
>
> On 2019-Feb-26, Dmitry Dolgov wrote:
>
> > Yes, it should be rather simple, we can e.g. return to the old less consistent
> > NULL handling approach something (like in the attached patch), or replace a NULL
> > value with an empty string in WriteToc. Give me a moment, I'll check it out. At
> > the same time I would suggest to keep replace_line_endings -> sanitize_line,
> > since it doesn't break compatibility.
>
> Hmm, shouldn't we modify sanitize_line so that it returns strdup(hyphen)
> when input is empty and want_hyphen, too?

Yes, you're right.

> I think it would be better to just put back the .defn = "" (etc) to the
> ArchiveEntry calls.

Then we should do this not only for defn, but for owner and dropStmt too. I can
update the fix patch I've sent before, if it's preferrable approach in this
particular situation. But I hope there are no objections if I'll then submit
the original changes with more consistent null handling separately to make
decision about them more consciously.

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Alvaro Herrera-9
On 2019-Feb-27, Dmitry Dolgov wrote:

> > On Tue, Feb 26, 2019 at 11:53 PM Alvaro Herrera <[hidden email]> wrote:
> >
> > I think it would be better to just put back the .defn = "" (etc) to the
> > ArchiveEntry calls.
>
> Then we should do this not only for defn, but for owner and dropStmt too.

Yeah, absolutely.

> I can
> update the fix patch I've sent before, if it's preferrable approach in this
> particular situation.

I'm not sure we need those changes, since we're forced to update all
callsites anyway.

> But I hope there are no objections if I'll then submit the original
> changes with more consistent null handling separately to make decision
> about them more consciously.

I think we should save such a patch for whenever we next update the
archive version number, which could take a couple of years given past
history.  I'm inclined to add a comment near K_VERS_SELF to remind
whoever next patches it.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On 2019-Feb-27, Dmitry Dolgov wrote:
>> But I hope there are no objections if I'll then submit the original
>> changes with more consistent null handling separately to make decision
>> about them more consciously.

> I think we should save such a patch for whenever we next update the
> archive version number, which could take a couple of years given past
> history.  I'm inclined to add a comment near K_VERS_SELF to remind
> whoever next patches it.

+1.  This isn't an unreasonable cleanup idea, but being only a cleanup
idea, it doesn't seem worth creating compatibility issues for.  Let's
wait till there is some more-pressing reason to change the archive format,
and then fix this in the same release cycle.

I'd also note that given what we've seen so far, there are going to be
some slow-to-flush-out null pointer dereferencing bugs from this.
I'm not really eager to introduce that towards the tail end of a devel
cycle.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Dmitry Dolgov
> On Wed, Feb 27, 2019 at 1:32 PM Alvaro Herrera <[hidden email]> wrote:
>
> > > I think it would be better to just put back the .defn = "" (etc) to the
> > > ArchiveEntry calls.
> >
> > Then we should do this not only for defn, but for owner and dropStmt too.
>
> Yeah, absolutely.

Done, please find the attached patch.

> > I can
> > update the fix patch I've sent before, if it's preferrable approach in this
> > particular situation.
>
> I'm not sure we need those changes, since we're forced to update all
> callsites anyway.

I guess we can keep the part about removing null checks before using strlen,
since it's going to be useless.

> On Wed, Feb 27, 2019 at 10:36 AM Dmitry Dolgov <[hidden email]> wrote:
>
> > On Tue, Feb 26, 2019 at 11:53 PM Alvaro Herrera <[hidden email]> wrote:
> >
> > On 2019-Feb-26, Dmitry Dolgov wrote:
> >
> > > Yes, it should be rather simple, we can e.g. return to the old less consistent
> > > NULL handling approach something (like in the attached patch), or replace a NULL
> > > value with an empty string in WriteToc. Give me a moment, I'll check it out. At
> > > the same time I would suggest to keep replace_line_endings -> sanitize_line,
> > > since it doesn't break compatibility.
> >
> > Hmm, shouldn't we modify sanitize_line so that it returns strdup(hyphen)
> > when input is empty and want_hyphen, too?
>
> Yes, you're right.
I've looked closer, and looks like I was mistaken. In the only place where it
matters we anyway pass NULL after verifying noOwner:

    sanitized_owner = sanitize_line(ropt->noOwner ? NULL : te->owner, true);

So I haven't change sanitize_line yet, but can update it if there is a strong
opinion about this function.

0001-ArchiveEntry-null-handling.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Segfault when restoring -Fd dump on current HEAD

Michael Paquier-2
In reply to this post by Tom Lane-2
On Wed, Feb 27, 2019 at 12:02:43PM -0500, Tom Lane wrote:

> Alvaro Herrera <[hidden email]> writes:
>> I think we should save such a patch for whenever we next update the
>> archive version number, which could take a couple of years given past
>> history.  I'm inclined to add a comment near K_VERS_SELF to remind
>> whoever next patches it.
>
> +1.  This isn't an unreasonable cleanup idea, but being only a cleanup
> idea, it doesn't seem worth creating compatibility issues for.  Let's
> wait till there is some more-pressing reason to change the archive format,
> and then fix this in the same release cycle.
+1.  Having a comment as reminder would be really nice.
--
Michael

signature.asc (849 bytes) Download Attachment
12