subscriptionCheck failures on nightjar

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

subscriptionCheck failures on nightjar

Tom Lane-2
nightjar just did this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2019-02-11%2004%3A33%3A07

The critical bit seems to be that the publisher side of the
010_truncate.pl test failed like so:

2019-02-10 23:55:58.765 EST [40771] sub3 LOG:  statement: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
2019-02-10 23:55:58.765 EST [40771] sub3 LOG:  received replication command: CREATE_REPLICATION_SLOT "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory
2019-02-10 23:55:58.800 EST [40771] sub3 LOG:  logical decoding found consistent point at 0/160B578
2019-02-10 23:55:58.800 EST [40771] sub3 DETAIL:  There are no running transactions.

I'm not sure what to make of that, but I notice that nightjar has
failed subscriptionCheck seven times since mid-December, and every one
of those shows this same PANIC.  Meanwhile, no other buildfarm member
has produced such a failure.  It smells like a race condition with
a rather tight window, but that's just a guess.

So: (1) what's causing the failure?  (2) could we respond with
something less than take-down-the-whole-database when a failure
happens in this area?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Thomas Munro-3
On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <[hidden email]> wrote:
> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory

<pokes at totally unfamiliar code>

They get atomically renamed into place, which seems kosher even if
snapshots for the same LSN are created concurrently by different
backends (and tracing syscalls confirms that that does occasionally
happen).  It's hard to believe that nightjar's rename() ceased to be
atomic a couple of months ago.  It looks like the only way for files
to get unlinked after that is by CheckPointSnapBuild() deciding they
are too old.

Hmm.  Could this be relevant, and cause a well timed checkpoint to
unlink files too soon?

2019-02-12 21:52:58.304 EST [22922] WARNING:  out of logical
replication worker slots

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <[hidden email]> wrote:
>> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory

> <pokes at totally unfamiliar code>

> They get atomically renamed into place, which seems kosher even if
> snapshots for the same LSN are created concurrently by different
> backends (and tracing syscalls confirms that that does occasionally
> happen).  It's hard to believe that nightjar's rename() ceased to be
> atomic a couple of months ago.  It looks like the only way for files
> to get unlinked after that is by CheckPointSnapBuild() deciding they
> are too old.

> Hmm.  Could this be relevant, and cause a well timed checkpoint to
> unlink files too soon?
> 2019-02-12 21:52:58.304 EST [22922] WARNING:  out of logical
> replication worker slots

I've managed to reproduce this locally, and obtained this PANIC:

log/010_truncate_publisher.log:2019-02-13 11:29:04.759 EST [9973] sub3 PANIC:  could not open file "pg_logical/snapshots/0-16067B0.snap": No such file or directory

with this stack trace:

#0  0x0000000801ab610c in kill () from /lib/libc.so.7
#1  0x0000000801ab4d3b in abort () from /lib/libc.so.7
#2  0x000000000089202e in errfinish (dummy=Variable "dummy" is not available.
) at elog.c:552
#3  0x000000000075d339 in fsync_fname_ext (
    fname=0x7fffffffba20 "pg_logical/snapshots/0-16067B0.snap", isdir=Variable "isdir" is not available.
)
    at fd.c:3372
#4  0x0000000000730c75 in SnapBuildSerialize (builder=0x80243c118,
    lsn=23095216) at snapbuild.c:1669
#5  0x0000000000731297 in SnapBuildProcessRunningXacts (builder=0x80243c118,
    lsn=23095216, running=0x8024424f0) at snapbuild.c:1110
#6  0x0000000000722eac in LogicalDecodingProcessRecord (ctx=0x802414118,
    record=0x8024143d8) at decode.c:318
#7  0x0000000000736f30 in XLogSendLogical () at walsender.c:2826
#8  0x00000000007389c7 in WalSndLoop (send_data=0x736ed0 <XLogSendLogical>)
    at walsender.c:2184
#9  0x0000000000738c3d in StartLogicalReplication (cmd=Variable "cmd" is not available.
) at walsender.c:1118
#10 0x0000000000739895 in exec_replication_command (
    cmd_string=0x80240e118 "START_REPLICATION SLOT \"sub3\" LOGICAL 0/0 (proto_version '1', publication_names '\"pub3\"')") at walsender.c:1536
#11 0x000000000078b272 in PostgresMain (argc=-14464, argv=Variable "argv" is not available.
) at postgres.c:4252
#12 0x00000000007113fa in PostmasterMain (argc=-14256, argv=0x7fffffffcc60)
    at postmaster.c:4382

So the problem seems to boil down to "somebody removed the snapshot
file between the time we renamed it into place and the time we tried
to fsync it".

I do find messages like the one you mention, but they are on the
subscriber not the publisher side, so I'm not sure if this matches
the scenario you have in mind?

log/010_truncate_subscriber.log:2019-02-13 11:29:02.343 EST [9970] WARNING:  out of logical replication worker slots
log/010_truncate_subscriber.log:2019-02-13 11:29:02.344 EST [9970] WARNING:  out of logical replication worker slots
log/010_truncate_subscriber.log:2019-02-13 11:29:03.401 EST [9970] WARNING:  out of logical replication worker slots

Anyway, I think we might be able to fix this along the lines of

    CloseTransientFile(fd);

+   /* ensure snapshot file is down to stable storage */
+   fsync_fname(tmppath, false);
    fsync_fname("pg_logical/snapshots", true);

    /*
     * We may overwrite the work from some other backend, but that's ok, our
     * snapshot is valid as well, we'll just have done some superfluous work.
     */
    if (rename(tmppath, path) != 0)
    {
        ereport(ERROR,
                (errcode_for_file_access(),
                 errmsg("could not rename file \"%s\" to \"%s\": %m",
                        tmppath, path)));
    }

-   /* make sure we persist */
-   fsync_fname(path, false);
+   /* ensure we persist the file rename */
    fsync_fname("pg_logical/snapshots", true);

The existing code here seems simply wacky/unsafe to me regardless
of this race condition: couldn't it potentially result in a corrupt
snapshot file appearing with a valid name, if the system crashes
after persisting the rename but before it's pushed the data out?

I also wonder why bother with the directory sync just before the
rename.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Andres Freund
Hi,

On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:

Cool. How exactly?

Nice catch.

> Anyway, I think we might be able to fix this along the lines of
>
>     CloseTransientFile(fd);
>
> +   /* ensure snapshot file is down to stable storage */
> +   fsync_fname(tmppath, false);
>     fsync_fname("pg_logical/snapshots", true);
>
>     /*
>      * We may overwrite the work from some other backend, but that's ok, our
>      * snapshot is valid as well, we'll just have done some superfluous work.
>      */
>     if (rename(tmppath, path) != 0)
>     {
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not rename file \"%s\" to \"%s\": %m",
>                         tmppath, path)));
>     }
>
> -   /* make sure we persist */
> -   fsync_fname(path, false);
> +   /* ensure we persist the file rename */
>     fsync_fname("pg_logical/snapshots", true);

Hm, but that's not the same? On some filesystems one needs the directory
fsync, on some the file fsync, and I think both in some cases. ISTM we
should just open the file before the rename, and then use fsync() on the
filehandle rather than the filename.  Then a concurrent rename ought not
to hurt us?


> The existing code here seems simply wacky/unsafe to me regardless
> of this race condition: couldn't it potentially result in a corrupt
> snapshot file appearing with a valid name, if the system crashes
> after persisting the rename but before it's pushed the data out?

What do you mean precisely with "before it's pushed the data out"?


> I also wonder why bother with the directory sync just before the
> rename.

Because on some FS/OS combinations the size of the renamed-into-place
file isn't guaranteed to be durable unless the directory was
fsynced. Otherwise there's the possibility to have incomplete data if we
crash after renaming, but before fsyncing.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
>> I've managed to reproduce this locally, and obtained this PANIC:

> Cool. How exactly?

Andrew told me that nightjar is actually running in a qemu VM,
so I set up freebsd 9.0 in a qemu VM, and boom.  It took a bit
of fiddling with qemu parameters, but for such a timing-sensitive
problem, that's not surprising.

>> Anyway, I think we might be able to fix this along the lines of
>> [ fsync the data before renaming not after ]

> Hm, but that's not the same? On some filesystems one needs the directory
> fsync, on some the file fsync, and I think both in some cases.

Now that I look at it, there's a pg_fsync() just above this, so
I wonder why we need a second fsync on the file at all.  fsync'ing
the directory is needed to ensure the directory entry is on disk;
but the file data should be out already, or else the kernel is
simply failing to honor fsync.

>> The existing code here seems simply wacky/unsafe to me regardless
>> of this race condition: couldn't it potentially result in a corrupt
>> snapshot file appearing with a valid name, if the system crashes
>> after persisting the rename but before it's pushed the data out?

> What do you mean precisely with "before it's pushed the data out"?

Given the previous pg_fsync, this isn't an issue.

>> I also wonder why bother with the directory sync just before the
>> rename.

> Because on some FS/OS combinations the size of the renamed-into-place
> file isn't guaranteed to be durable unless the directory was
> fsynced.

Bleah.  But in any case, the rename should not create a situation
in which we need to fsync the file data again.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Andres Freund
Hi,

On 2019-02-13 12:37:35 -0500, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> >> I've managed to reproduce this locally, and obtained this PANIC:
>
> > Cool. How exactly?
>
> Andrew told me that nightjar is actually running in a qemu VM,
> so I set up freebsd 9.0 in a qemu VM, and boom.  It took a bit
> of fiddling with qemu parameters, but for such a timing-sensitive
> problem, that's not surprising.

Ah.


> >> I also wonder why bother with the directory sync just before the
> >> rename.
>
> > Because on some FS/OS combinations the size of the renamed-into-place
> > file isn't guaranteed to be durable unless the directory was
> > fsynced.
>
> Bleah.  But in any case, the rename should not create a situation
> in which we need to fsync the file data again.

Well, it's not super well defined which of either you need to make the
rename durable, and it appears to differ between OSs. Any argument
against fixing it up like I suggested, by using an fd from before the
rename?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>> Bleah.  But in any case, the rename should not create a situation
>> in which we need to fsync the file data again.

> Well, it's not super well defined which of either you need to make the
> rename durable, and it appears to differ between OSs. Any argument
> against fixing it up like I suggested, by using an fd from before the
> rename?

I'm unimpressed.  You're speculating about the filesystem having random
deviations from POSIX behavior, and using that weak argument to justify a
totally untested technique having its own obvious portability hazards.
Who's to say that an fsync on a file opened before a rename is going to do
anything good after the rename?  (On, eg, NFS there are obvious reasons
why it might not.)

Also, I wondered why this is coming out as a PANIC.  I thought originally
that somebody must be causing this code to run in a critical section,
but it looks like the real issue is just that fsync_fname() uses
data_sync_elevel, which is

int
data_sync_elevel(int elevel)
{
        return data_sync_retry ? elevel : PANIC;
}

I really really don't want us doing questionably-necessary fsyncs with a
PANIC as the result.  Perhaps more to the point, the way this was coded,
the PANIC applies to open() failures in fsync_fname_ext() not just fsync()
failures; that's painting with too broad a brush isn't it?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Andres Freund
Hi,

On 2019-02-13 12:59:19 -0500, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> >> Bleah.  But in any case, the rename should not create a situation
> >> in which we need to fsync the file data again.
>
> > Well, it's not super well defined which of either you need to make the
> > rename durable, and it appears to differ between OSs. Any argument
> > against fixing it up like I suggested, by using an fd from before the
> > rename?
>
> I'm unimpressed.  You're speculating about the filesystem having random
> deviations from POSIX behavior, and using that weak argument to justify a
> totally untested technique having its own obvious portability hazards.

Uhm, we've reproduced failures due to the lack of such fsyncs at some
point. And not some fringe OS, but ext4 (albeit with data=writeback).

I don't think POSIX has yet figured out what they actually think is
required:
http://austingroupbugs.net/view.php?id=672

I guess we could just ignore ENOENT in this case, that ought to be just
as safe as using the old fd.


> Also, I wondered why this is coming out as a PANIC.  I thought originally
> that somebody must be causing this code to run in a critical section,
> but it looks like the real issue is just that fsync_fname() uses
> data_sync_elevel, which is
>
> int
> data_sync_elevel(int elevel)
> {
> return data_sync_retry ? elevel : PANIC;
> }
>
> I really really don't want us doing questionably-necessary fsyncs with a
> PANIC as the result.

Well, given the 'failed fsync throws dirty data away' issue, I don't
quite see what we can do otherwise. But:


> Perhaps more to the point, the way this was coded, the PANIC applies
> to open() failures in fsync_fname_ext() not just fsync() failures;
> that's painting with too broad a brush isn't it?

That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
this though - I guess we could rename fsync_fname_ext's eleval parameter
to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
to bad?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?

> That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
> this though - I guess we could rename fsync_fname_ext's eleval parameter
> to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> to bad?

Perhaps fsync_fname() should pass the elevel through as-is, and
then fsync_fname_ext() apply the data_sync_elevel() promotion only
to the fsync() call not the open()?  I'm not sure.

The bigger picture here is that there are probably some call sites where
PANIC on open() failure is appropriate too.  So having fsync_fname[_ext]
deciding what to do on its own is likely to be inadequate.

If we fix this by allowing ENOENT to not be an error in this particular
call case, that's going to involve an fsync_fname_ext API change anyway...

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Andres Freund
Hi,

On 2019-02-13 13:24:03 -0500, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> >> Perhaps more to the point, the way this was coded, the PANIC applies
> >> to open() failures in fsync_fname_ext() not just fsync() failures;
> >> that's painting with too broad a brush isn't it?
>
> > That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
> > this though - I guess we could rename fsync_fname_ext's eleval parameter
> > to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> > to bad?
>
> Perhaps fsync_fname() should pass the elevel through as-is, and
> then fsync_fname_ext() apply the data_sync_elevel() promotion only
> to the fsync() call not the open()?  I'm not sure.

Yea, that's probably not a bad plan. It'd address your:

> The bigger picture here is that there are probably some call sites where
> PANIC on open() failure is appropriate too.  So having fsync_fname[_ext]
> deciding what to do on its own is likely to be inadequate.

Seems to me we ought to do this regardless of the bug discussed
here. But we'd nee dto be careful that we'd take the "more severe" error
between the passed in elevel and data_sync_elevel(). Otherwise we'd end
up downgrading errors...


> If we fix this by allowing ENOENT to not be an error in this particular
> call case, that's going to involve an fsync_fname_ext API change anyway...

I was kinda pondering just open coding it.  I am not yet convinced that
my idea of just using an open FD isn't the least bad approach for the
issue at hand.  What precisely is the NFS issue you're concerned about?

Right now fsync_fname_ext isn't exposed outside fd.c...

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
Andres Freund <[hidden email]> writes:
> I was kinda pondering just open coding it.  I am not yet convinced that
> my idea of just using an open FD isn't the least bad approach for the
> issue at hand.  What precisely is the NFS issue you're concerned about?

I'm not sure that fsync-on-FD after the rename will work, considering that
the issue here is that somebody might've unlinked the file altogether
before we get to doing the fsync.  I don't have a hard time believing that
that might result in a failure report on NFS or similar.  Yeah, it's
hypothetical, but the argument that we need a repeat fsync at all seems
equally hypothetical.

> Right now fsync_fname_ext isn't exposed outside fd.c...

Mmm.  That makes it easier to consider changing its API.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Thomas Munro-3
On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <[hidden email]> wrote:

> Andres Freund <[hidden email]> writes:
> > I was kinda pondering just open coding it.  I am not yet convinced that
> > my idea of just using an open FD isn't the least bad approach for the
> > issue at hand.  What precisely is the NFS issue you're concerned about?
>
> I'm not sure that fsync-on-FD after the rename will work, considering that
> the issue here is that somebody might've unlinked the file altogether
> before we get to doing the fsync.  I don't have a hard time believing that
> that might result in a failure report on NFS or similar.  Yeah, it's
> hypothetical, but the argument that we need a repeat fsync at all seems
> equally hypothetical.
>
> > Right now fsync_fname_ext isn't exposed outside fd.c...
>
> Mmm.  That makes it easier to consider changing its API.

Just to make sure I understand: it's OK for the file not to be there
when we try to fsync it by name, because a concurrent checkpoint can
remove it, having determined that we don't need it anymore?  In other
words, we really needed either missing_ok=true semantics, or to use
the fd we already had instead of the name?

I found 3 examples of this failing with an ERROR (though not turning
the BF red, so nobody noticed) before the PANIC patch went in:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-09-10%2020%3A54%3A21&amp;stg=subscription-check
2018-09-10 17:20:09.247 EDT [23287] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory
2018-09-10 17:20:09.247 EDT [23285] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-08-31%2023%3A25%3A41&amp;stg=subscription-check
2018-08-31 19:52:06.634 EDT [52724] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-31 19:52:06.634 EDT [52721] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-08-22%2021%3A49%3A18&amp;stg=subscription-check
2018-08-22 18:10:29.422 EDT [44208] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-22 18:10:29.422 EDT [44206] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> I found 3 examples of this failing with an ERROR (though not turning
> the BF red, so nobody noticed) before the PANIC patch went in:

Yeah, I suspected that had happened before with less-obvious consequences.
Now that we know where the problem is, you could probably make it highly
reproducible by inserting a sleep of a few msec between the rename and the
second fsync.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Andres Freund
In reply to this post by Thomas Munro-3
Hi,

On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:

> On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <[hidden email]> wrote:
> > Andres Freund <[hidden email]> writes:
> > > I was kinda pondering just open coding it.  I am not yet convinced that
> > > my idea of just using an open FD isn't the least bad approach for the
> > > issue at hand.  What precisely is the NFS issue you're concerned about?
> >
> > I'm not sure that fsync-on-FD after the rename will work, considering that
> > the issue here is that somebody might've unlinked the file altogether
> > before we get to doing the fsync.  I don't have a hard time believing that
> > that might result in a failure report on NFS or similar.  Yeah, it's
> > hypothetical, but the argument that we need a repeat fsync at all seems
> > equally hypothetical.
> >
> > > Right now fsync_fname_ext isn't exposed outside fd.c...
> >
> > Mmm.  That makes it easier to consider changing its API.
>
> Just to make sure I understand: it's OK for the file not to be there
> when we try to fsync it by name, because a concurrent checkpoint can
> remove it, having determined that we don't need it anymore?  In other
> words, we really needed either missing_ok=true semantics, or to use
> the fd we already had instead of the name?

I'm not yet sure that that's actually something that's supposed to
happen, I got to spend some time analysing how this actually
happens. Normally the contents of the slot should actually prevent it
from being removed (as they're newer than
ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
a bug in the drop logic in newer releases.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Andrew Dunstan-8
In reply to this post by Andres Freund

On 2/13/19 1:12 PM, Andres Freund wrote:

> Hi,
>
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Andres Freund <[hidden email]> writes:
>>> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>>>> Bleah.  But in any case, the rename should not create a situation
>>>> in which we need to fsync the file data again.
>>> Well, it's not super well defined which of either you need to make the
>>> rename durable, and it appears to differ between OSs. Any argument
>>> against fixing it up like I suggested, by using an fd from before the
>>> rename?
>> I'm unimpressed.  You're speculating about the filesystem having random
>> deviations from POSIX behavior, and using that weak argument to justify a
>> totally untested technique having its own obvious portability hazards.
> Uhm, we've reproduced failures due to the lack of such fsyncs at some
> point. And not some fringe OS, but ext4 (albeit with data=writeback).
>
> I don't think POSIX has yet figured out what they actually think is
> required:
> http://austingroupbugs.net/view.php?id=672
>
> I guess we could just ignore ENOENT in this case, that ought to be just
> as safe as using the old fd.
>
>
>> Also, I wondered why this is coming out as a PANIC.  I thought originally
>> that somebody must be causing this code to run in a critical section,
>> but it looks like the real issue is just that fsync_fname() uses
>> data_sync_elevel, which is
>>
>> int
>> data_sync_elevel(int elevel)
>> {
>> return data_sync_retry ? elevel : PANIC;
>> }
>>
>> I really really don't want us doing questionably-necessary fsyncs with a
>> PANIC as the result.
> Well, given the 'failed fsync throws dirty data away' issue, I don't
> quite see what we can do otherwise. But:
>
>
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?
> That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
> this though - I guess we could rename fsync_fname_ext's eleval parameter
> to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> to bad?
>

Thread seems to have gone quiet ...


cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
>> Just to make sure I understand: it's OK for the file not to be there
>> when we try to fsync it by name, because a concurrent checkpoint can
>> remove it, having determined that we don't need it anymore?  In other
>> words, we really needed either missing_ok=true semantics, or to use
>> the fd we already had instead of the name?

> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're newer than
> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
> a bug in the drop logic in newer releases.

My animal dromedary just reproduced this failure, which we've previously
only seen on nightjar.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: subscriptionCheck failures on nightjar

Tom Lane-2
I wrote:
> My animal dromedary just reproduced this failure, which we've previously
> only seen on nightjar.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

Twice:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-28%2006%3A50%3A41

Since this is a live (if old) system, not some weird qemu emulation,
we can no longer pretend that it might not be reachable in the field.
I've added an open item.

                        regards, tom lane