Use durable_unlink for .ready and .done files for WAL segment removal

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
Hi all,

While reviewing the archiving code, I have bumped into the fact that
XLogArchiveCleanup() thinks that it is safe to do only a plain unlink()
for .ready and .done files when removing a past segment.  I don't think
that it is a smart move, as on a subsequent crash we may still see
those, but the related segment would have gone away.  This is not really
a problem for .done files, but it could confuse the archiver to see some
.ready files about things that have already gone away.

Attached is a patch.  Thoughts?
--
Michael

archive-clean-durable.patch (756 bytes) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Andres Freund
Hi,

On 2018-09-28 12:28:27 +0900, Michael Paquier wrote:
> While reviewing the archiving code, I have bumped into the fact that
> XLogArchiveCleanup() thinks that it is safe to do only a plain unlink()
> for .ready and .done files when removing a past segment.  I don't think
> that it is a smart move, as on a subsequent crash we may still see
> those, but the related segment would have gone away.  This is not really
> a problem for .done files, but it could confuse the archiver to see some
> .ready files about things that have already gone away.

Isn't that window fundamentally there anyway?

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Thu, Sep 27, 2018 at 08:40:26PM -0700, Andres Freund wrote:

> On 2018-09-28 12:28:27 +0900, Michael Paquier wrote:
>> While reviewing the archiving code, I have bumped into the fact that
>> XLogArchiveCleanup() thinks that it is safe to do only a plain unlink()
>> for .ready and .done files when removing a past segment.  I don't think
>> that it is a smart move, as on a subsequent crash we may still see
>> those, but the related segment would have gone away.  This is not really
>> a problem for .done files, but it could confuse the archiver to see some
>> .ready files about things that have already gone away.
>
> Isn't that window fundamentally there anyway?
Sure.  However the point I would like to make is that if we have the
possibility to reduce this window, then we should.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Andres Freund


On September 27, 2018 10:23:31 PM PDT, Michael Paquier <[hidden email]> wrote:

>On Thu, Sep 27, 2018 at 08:40:26PM -0700, Andres Freund wrote:
>> On 2018-09-28 12:28:27 +0900, Michael Paquier wrote:
>>> While reviewing the archiving code, I have bumped into the fact that
>>> XLogArchiveCleanup() thinks that it is safe to do only a plain
>unlink()
>>> for .ready and .done files when removing a past segment.  I don't
>think
>>> that it is a smart move, as on a subsequent crash we may still see
>>> those, but the related segment would have gone away.  This is not
>really
>>> a problem for .done files, but it could confuse the archiver to see
>some
>>> .ready files about things that have already gone away.
>>
>> Isn't that window fundamentally there anyway?
>
>Sure.  However the point I would like to make is that if we have the
>possibility to reduce this window, then we should.

It's not free though. I don't think this is as clear cut as you make it sound.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Stephen Frost
In reply to this post by Michael Paquier-2
Greetings,

* Michael Paquier ([hidden email]) wrote:
> While reviewing the archiving code, I have bumped into the fact that
> XLogArchiveCleanup() thinks that it is safe to do only a plain unlink()
> for .ready and .done files when removing a past segment.  I don't think
> that it is a smart move, as on a subsequent crash we may still see
> those, but the related segment would have gone away.  This is not really
> a problem for .done files, but it could confuse the archiver to see some
> .ready files about things that have already gone away.

Is there an issue with making the archiver able to understand that
situation instead of being confused by it..?  Seems like that'd probably
be a good thing to do regardless of this, but that would then remove the
need for this kind of change..

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Fri, Sep 28, 2018 at 02:36:19PM -0400, Stephen Frost wrote:
> Is there an issue with making the archiver able to understand that
> situation instead of being confused by it..?  Seems like that'd probably
> be a good thing to do regardless of this, but that would then remove the
> need for this kind of change..

I thought about that a bit, and there is as well a lot which can be done
within the archive_command itself regarding that, so I am not sure that
there is the argument to make pgarch.c more complicated than it should.
Now it is true that for most users having a .ready file but no segment
would most likely lead in a failure.  I suspect that a large user base
is still just using plain cp in archive_command, which would cause the
archiver to be stuck.  So we could actually just tweak pgarch_readyXlog
to check if the segment fetched actually exists (see bottom of the
so-said function).  If it doesn't, then the archiver removes the .ready
file and retries fetching a new segment.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Stephen Frost
Greetings,

* Michael Paquier ([hidden email]) wrote:

> On Fri, Sep 28, 2018 at 02:36:19PM -0400, Stephen Frost wrote:
> > Is there an issue with making the archiver able to understand that
> > situation instead of being confused by it..?  Seems like that'd probably
> > be a good thing to do regardless of this, but that would then remove the
> > need for this kind of change..
>
> I thought about that a bit, and there is as well a lot which can be done
> within the archive_command itself regarding that, so I am not sure that
> there is the argument to make pgarch.c more complicated than it should.
> Now it is true that for most users having a .ready file but no segment
> would most likely lead in a failure.  I suspect that a large user base
> is still just using plain cp in archive_command, which would cause the
> archiver to be stuck.  So we could actually just tweak pgarch_readyXlog
> to check if the segment fetched actually exists (see bottom of the
> so-said function).  If it doesn't, then the archiver removes the .ready
> file and retries fetching a new segment.
Yes, checking if the WAL file exists before calling archive_command on
it is what I was thinking we'd do here, and if it doesn't, then just
remove the .ready file.

An alternative would be to go through the .ready files on crash-recovery
and remove any .ready files that don't have corresponding WAL files, or
if we felt that it was necessary, we could do that on every restart but
do we really think we'd need to do that..?

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Fri, Sep 28, 2018 at 07:16:25PM -0400, Stephen Frost wrote:
> An alternative would be to go through the .ready files on crash-recovery
> and remove any .ready files that don't have corresponding WAL files, or
> if we felt that it was necessary, we could do that on every restart but
> do we really think we'd need to do that..?

Actually, what you are proposing here sounds much better to me.  That's
in the area of what has been done recently with RemoveTempXlogFiles() in
5fc1008e.  Any objections to doing something like that?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Sat, Sep 29, 2018 at 04:58:57PM +0900, Michael Paquier wrote:
> Actually, what you are proposing here sounds much better to me.  That's
> in the area of what has been done recently with RemoveTempXlogFiles() in
> 5fc1008e.  Any objections to doing something like that?

Okay.  I have hacked a patch based on Stephen's idea as attached.  Any
opinions?
--
Michael

archive-missing-v1.patch (4K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Bossart, Nathan
One argument for instead checking WAL file existence before calling
archive_command might be to avoid the increased startup time.
Granted, any added delay from this patch is unlikely to be noticeable
unless your archiver is way behind and archive_status has a huge
number of files.  However, I have seen cases where startup is stuck on
other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
very long time, so perhaps it is worth considering.

Nathan
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Kyotaro HORIGUCHI-2
At Fri, 02 Nov 2018 14:47:08 +0000, Nathan Bossart <[hidden email]> wrote in <[hidden email]>
> One argument for instead checking WAL file existence before calling
> archive_command might be to avoid the increased startup time.
> Granted, any added delay from this patch is unlikely to be noticeable
> unless your archiver is way behind and archive_status has a huge
> number of files.  However, I have seen cases where startup is stuck on
> other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
> very long time, so perhaps it is worth considering.

While archive_mode is tuned on, .ready files are created for all
exising wal files if not exists. Thus archiver may wait for the
ealiest segment to have .ready file. As the result
pgarch_readyXLog can be modified to loops over wal files, not
status files.  This prevents the confusion comes from .ready
files for non-existent segment files.

RemoveXlogFile as is doesn't get confused by .done files for
nonexistent segments.

We may leave useless .done/.ready files. We no longer scan over
the files so no matter how many files are there in the directory.

The remaining issue is removal of the files. Even if we blew away
the directory altogether, status files would be cleanly recreated
having already-archived wal segments are archived again. However,
redundant copy won't happen with our recommending configuration:p

# Yeah, I see almost all sites uses simple 'cp' or 'scp' for that..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Thu, Nov 15, 2018 at 07:39:27PM +0900, Kyotaro HORIGUCHI wrote:
> At Fri, 02 Nov 2018 14:47:08 +0000, Nathan Bossart
> <[hidden email]> wrote in
> <[hidden email]>:
>> One argument for instead checking WAL file existence before calling
>> archive_command might be to avoid the increased startup time.

I guess that you mean the startup of the archive command itself here.
Yes that can be an issue with a high WAL output depending on the
interpreter of the archive command :(

>> Granted, any added delay from this patch is unlikely to be noticeable
>> unless your archiver is way behind and archive_status has a huge
>> number of files.  However, I have seen cases where startup is stuck on
>> other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
>> very long time, so perhaps it is worth considering.

What's the scale of the pg_wal partition and the amount of time things
were stuck?  I would imagine that the sync phase hurts the most, and a
fast startup time for crash recovery is always important.

> While archive_mode is tuned on, .ready files are created for all
> existing wal files if not exists. Thus archiver may wait for the
> earliest segment to have .ready file.

Yes, RemoveOldXlogFiles() does that via XLogArchiveCheckDone().

> As the result
> pgarch_readyXLog can be modified to loops over WAL files, not
> status files.  This prevents the confusion comes from .ready
> files for non-existent segment files.

No, pgarch_readyXLog() should still look after .ready files as those are
here for this purpose, but we could have an additional check to see if
the segment linked with it actually exists and can be archived.  This
check could happen in pgarch.c code before calling the archive command
gets called (just before pgarch_ArchiverCopyLoop and after
XLogArchiveCommandSet feels rather right, and that it should be cheap
enough to call stat()).
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Thu, Nov 22, 2018 at 01:16:09PM +0900, Michael Paquier wrote:
> No, pgarch_readyXLog() should still look after .ready files as those are
> here for this purpose, but we could have an additional check to see if
> the segment linked with it actually exists and can be archived.  This
> check could happen in pgarch.c code before calling the archive command
> gets called (just before pgarch_ArchiverCopyLoop and after
> XLogArchiveCommandSet feels rather right, and that it should be cheap
> enough to call stat()).

s/pgarch_ArchiverCopyLoop/pgarch_archiveXlog/.

Attached is a patch showing shaped based on the idea of upthread.
Thoughts?
--
Michael

archive-missing-v2.patch (1K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Bossart, Nathan
On 11/21/18, 10:16 PM, "Michael Paquier" <[hidden email]> wrote:

>> At Fri, 02 Nov 2018 14:47:08 +0000, Nathan Bossart
>> <[hidden email]> wrote in
>> <[hidden email]>:
>>> Granted, any added delay from this patch is unlikely to be noticeable
>>> unless your archiver is way behind and archive_status has a huge
>>> number of files.  However, I have seen cases where startup is stuck on
>>> other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
>>> very long time, so perhaps it is worth considering.
>
> What's the scale of the pg_wal partition and the amount of time things
> were stuck?  I would imagine that the sync phase hurts the most, and a
> fast startup time for crash recovery is always important.

I don't have exact figures to share, but yes, a huge number of calls
to sync_file_range() and fsync() can use up a lot of time.  Presumably
Postgres processes files individually instead of using sync() because
sync() may return before writing is done.  Also, sync() would affect
non-Postgres files.  However, it looks like Linux actually does wait
for writing to complete before returning from sync() [0].

For RemovePgTempFiles(), the documentation above the function
indicates that skipping temp file cleanup during startup would
actually be okay because collisions with existing temp file names
should be handled by OpenTemporaryFile().  I assume this cleanup is
done during startup because there isn't a great alternative besides
offloading the work to a new background worker or something.

On 11/27/18, 6:35 AM, "Michael Paquier" <[hidden email]> wrote:
> Attached is a patch showing shaped based on the idea of upthread.
> Thoughts?

I took a look at this patch.

+                       /*
+                        * In the event of a system crash, archive status files may be
+                        * left behind as their removals are not durable, cleaning up
+                        * orphan entries here is the cheapest method.  So check that
+                        * the segment trying to be archived still exists.
+                        */
+                       snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+                       if (stat(pathname, &stat_buf) != 0)
+                       {

Don't we also need to check that errno is ENOENT here?

+                               StatusFilePath(xlogready, xlog, ".ready");
+                               if (durable_unlink(xlogready, WARNING) == 0)
+                                       ereport(WARNING,
+                                                       (errmsg("removed orphan archive status file %s",
+                                                                       xlogready)));
+                               return;

IIUC any time that the file does not exist, we will attempt to unlink
it.  Regardless of whether unlinking fails or succeeds, we then
proceed to give up archiving for now, but it's not clear why.  Perhaps
we should retry unlinking a number of times (like we do for
pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
to move on to the next .ready file if durable_unlink() succeeds.

Nathan

[0] http://man7.org/linux/man-pages/man2/sync.2.html

Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Andres Freund
Hi,

On 2018-11-27 20:43:06 +0000, Bossart, Nathan wrote:
> I don't have exact figures to share, but yes, a huge number of calls
> to sync_file_range() and fsync() can use up a lot of time.  Presumably
> Postgres processes files individually instead of using sync() because
> sync() may return before writing is done.  Also, sync() would affect
> non-Postgres files.  However, it looks like Linux actually does wait
> for writing to complete before returning from sync() [0].

sync() has absolutely no way to report errors. So, we're never going to
be able to use it.  Besides, even postgres' temp files would be a good
reason to not use it.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Bossart, Nathan
On 11/27/18, 2:46 PM, "Andres Freund" <[hidden email]> wrote:

> On 2018-11-27 20:43:06 +0000, Bossart, Nathan wrote:
>> I don't have exact figures to share, but yes, a huge number of calls
>> to sync_file_range() and fsync() can use up a lot of time.  Presumably
>> Postgres processes files individually instead of using sync() because
>> sync() may return before writing is done.  Also, sync() would affect
>> non-Postgres files.  However, it looks like Linux actually does wait
>> for writing to complete before returning from sync() [0].
>
> sync() has absolutely no way to report errors. So, we're never going to
> be able to use it.  Besides, even postgres' temp files would be a good
> reason to not use it.

Ah, I see.  Thanks for clarifying.

Nathan

Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
In reply to this post by Bossart, Nathan
On Tue, Nov 27, 2018 at 08:43:06PM +0000, Bossart, Nathan wrote:
> Don't we also need to check that errno is ENOENT here?

Yep.

> IIUC any time that the file does not exist, we will attempt to unlink
> it.  Regardless of whether unlinking fails or succeeds, we then
> proceed to give up archiving for now, but it's not clear why.  Perhaps
> we should retry unlinking a number of times (like we do for
> pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
> to move on to the next .ready file if durable_unlink() succeeds.

Both suggestions sound reasonable to me.  (durable_unlink is not called
on HEAD in pgarch_archiveXlog).  How about 3 retries with a in-between
wait of 1s?  That's consistent with what pgarch_ArchiverCopyLoop does,
still I am not completely sure if we actually want to be consistent for
the purpose of removing orphaned ready files.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Bossart, Nathan
On 11/27/18, 3:20 PM, "Michael Paquier" <[hidden email]> wrote:

> On Tue, Nov 27, 2018 at 08:43:06PM +0000, Bossart, Nathan wrote:
>> IIUC any time that the file does not exist, we will attempt to unlink
>> it.  Regardless of whether unlinking fails or succeeds, we then
>> proceed to give up archiving for now, but it's not clear why.  Perhaps
>> we should retry unlinking a number of times (like we do for
>> pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
>> to move on to the next .ready file if durable_unlink() succeeds.
>
> Both suggestions sound reasonable to me.  (durable_unlink is not called
> on HEAD in pgarch_archiveXlog).  How about 3 retries with a in-between
> wait of 1s?  That's consistent with what pgarch_ArchiverCopyLoop does,
> still I am not completely sure if we actually want to be consistent for
> the purpose of removing orphaned ready files.

That sounds good to me.  I was actually thinking of using the same
retry counter that we use for pgarch_archiveXlog(), but on second
thought, it is probably better to have two independent retry counters
for these two unrelated operations.

Nathan

Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Michael Paquier-2
On Tue, Nov 27, 2018 at 09:49:29PM +0000, Bossart, Nathan wrote:
> That sounds good to me.  I was actually thinking of using the same
> retry counter that we use for pgarch_archiveXlog(), but on second
> thought, it is probably better to have two independent retry counters
> for these two unrelated operations.

What I had in mind was two different variables if what I wrote was
unclear, possibly with the same value, as archiving failure and failure
with orphan file removals are two different concepts.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use durable_unlink for .ready and .done files for WAL segment removal

Bossart, Nathan
On 11/27/18, 3:53 PM, "Michael Paquier" <[hidden email]> wrote:
> On Tue, Nov 27, 2018 at 09:49:29PM +0000, Bossart, Nathan wrote:
>> That sounds good to me.  I was actually thinking of using the same
>> retry counter that we use for pgarch_archiveXlog(), but on second
>> thought, it is probably better to have two independent retry counters
>> for these two unrelated operations.
>
> What I had in mind was two different variables if what I wrote was
> unclear, possibly with the same value, as archiving failure and failure
> with orphan file removals are two different concepts.

+1

Nathan

12