Wait event that should be reported while waiting for WAL archiving to finish

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Wait event that should be reported while waiting for WAL archiving to finish

Fujii Masao-4
Hi,

When I saw pg_stat_activity.wait_event while pg_basebackup -X none
is waiting for WAL archiving to finish, it was either NULL or
CheckpointDone. I think this is confusing. What about introducing
new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
(BackupWaitWalArchive) and reporting it during that period?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters


Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Michael Paquier-2
On Thu, Feb 13, 2020 at 02:29:20AM +0900, Fujii Masao wrote:
> When I saw pg_stat_activity.wait_event while pg_basebackup -X none
> is waiting for WAL archiving to finish, it was either NULL or
> CheckpointDone. I think this is confusing. What about introducing
> new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
> (BackupWaitWalArchive) and reporting it during that period?

Sounds like a good idea to me.  You need to be careful that this does
not overwrite more low-level wait event registration though, so that
could be more tricky than it looks at first sight.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Fujii Masao-4


On 2020/02/13 12:28, Michael Paquier wrote:

> On Thu, Feb 13, 2020 at 02:29:20AM +0900, Fujii Masao wrote:
>> When I saw pg_stat_activity.wait_event while pg_basebackup -X none
>> is waiting for WAL archiving to finish, it was either NULL or
>> CheckpointDone. I think this is confusing. What about introducing
>> new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
>> (BackupWaitWalArchive) and reporting it during that period?
>
> Sounds like a good idea to me.  You need to be careful that this does
> not overwrite more low-level wait event registration though, so that
> could be more tricky than it looks at first sight.
Thanks for the advise! Patch attached.

I found that the wait events "LogicalRewriteTruncate" and
"GSSOpenServer" are not documented. I'm thinking to add
them into doc separately if ok.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

wait_event_wal_archive_v1.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Michael Paquier-2
On Thu, Feb 13, 2020 at 03:35:50PM +0900, Fujii Masao wrote:
> I found that the wait events "LogicalRewriteTruncate" and
> "GSSOpenServer" are not documented. I'm thinking to add
> them into doc separately if ok.

Nice catch.  The ordering of the entries is not respected either for
GSSOpenServer in pgstat.h.  The portion for the code and the docs can
be fixed in back-branches, but not the enum list in WaitEventClient or
we would have an ABI breakage.  But this can be fixed on HEAD.  Can
you take care of it?  If you need help, please feel free to poke me. I
think that this should be fixed first, before adding the new event.

>           <entry><literal>SyncRep</literal></entry>
>           <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
>          </row>
> +        <row>
> +         <entry><literal>BackupWaitWalArchive</literal></entry>
> +         <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
> +        </row>

The category IPC is adapted.  You forgot to update the markup morerows
from "36" to "37", causing the table of the wait events to have a
weird format (the bottom should be incorrect).

> + pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
>   while (XLogArchiveIsBusy(lastxlogfilename) ||
>     XLogArchiveIsBusy(histfilename))
>   {
> @@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
>   "but the database backup will not be usable without all the WAL segments.")));
>   }
>   }
> + pgstat_report_wait_end();

Okay, that position is right.

> @@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
>   case WAIT_EVENT_SYNC_REP:
>   event_name = "SyncRep";
>   break;
> + case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
> + event_name = "BackupWaitWalArchive";
> + break;
>   /* no default case, so that compiler will warn */
> [...]
> @@ -853,7 +853,8 @@ typedef enum
>   WAIT_EVENT_REPLICATION_ORIGIN_DROP,
>   WAIT_EVENT_REPLICATION_SLOT_DROP,
>   WAIT_EVENT_SAFE_SNAPSHOT,
> - WAIT_EVENT_SYNC_REP
> + WAIT_EVENT_SYNC_REP,
> + WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
>  } WaitEventIPC;
It would be good to keep entries in alphabetical order in the header,
the code and in the docs (see the effort from 5ef037c), and your patch
is missing that concept for all three places where it matters for this
new event.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Fujii Masao-4


On 2020/02/13 16:30, Michael Paquier wrote:

> On Thu, Feb 13, 2020 at 03:35:50PM +0900, Fujii Masao wrote:
>> I found that the wait events "LogicalRewriteTruncate" and
>> "GSSOpenServer" are not documented. I'm thinking to add
>> them into doc separately if ok.
>
> Nice catch.  The ordering of the entries is not respected either for
> GSSOpenServer in pgstat.h.  The portion for the code and the docs can
> be fixed in back-branches, but not the enum list in WaitEventClient or
> we would have an ABI breakage.  But this can be fixed on HEAD.  Can
> you take care of it?
Yes. Patch attached.

logical_rewrite_truncate_v1.patch adds the description of
LogicalRewriteTruncate into the doc. This needs to be
back-patched to v10 where commit 249cf070e3 introduced
LogicalRewriteTruncate event.

gss_open_server_v1.patch adds the description of GSSOpenServer
into the doc and update the code in pgstat_get_wait_client().
This needs to be applied in v12 where commit b0b39f72b9 introduced
GSSOpenServer event.

gss_open_server_for_master_v1.patch does not only what the above
patch does but also update wait event enum into alphabetical order.
This needs to be applied in the master.

>
>>            <entry><literal>SyncRep</literal></entry>
>>            <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
>>           </row>
>> +        <row>
>> +         <entry><literal>BackupWaitWalArchive</literal></entry>
>> +         <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
>> +        </row>
>
> The category IPC is adapted.  You forgot to update the markup morerows
> from "36" to "37", causing the table of the wait events to have a
> weird format (the bottom should be incorrect).
Fixed. Thanks for the review!

>
>> + pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
>>   while (XLogArchiveIsBusy(lastxlogfilename) ||
>>     XLogArchiveIsBusy(histfilename))
>>   {
>> @@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
>>   "but the database backup will not be usable without all the WAL segments.")));
>>   }
>>   }
>> + pgstat_report_wait_end();
>
> Okay, that position is right.
>
>> @@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
>>   case WAIT_EVENT_SYNC_REP:
>>   event_name = "SyncRep";
>>   break;
>> + case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
>> + event_name = "BackupWaitWalArchive";
>> + break;
>>   /* no default case, so that compiler will warn */
>> [...]
>> @@ -853,7 +853,8 @@ typedef enum
>>   WAIT_EVENT_REPLICATION_ORIGIN_DROP,
>>   WAIT_EVENT_REPLICATION_SLOT_DROP,
>>   WAIT_EVENT_SAFE_SNAPSHOT,
>> - WAIT_EVENT_SYNC_REP
>> + WAIT_EVENT_SYNC_REP,
>> + WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
>>   } WaitEventIPC;
>
> It would be good to keep entries in alphabetical order in the header,
> the code and in the docs (see the effort from 5ef037c), and your patch
> is missing that concept for all three places where it matters for this
> new event.
Fixed. Patch attached.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

gss_open_server_for_master_v1.patch (2K) Download Attachment
gss_open_server_v1.patch (2K) Download Attachment
wait_event_wal_archive_v2.patch (2K) Download Attachment
logical_rewrite_truncate_v1.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Michael Paquier-2
On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:
> logical_rewrite_truncate_v1.patch adds the description of
> LogicalRewriteTruncate into the doc. This needs to be
> back-patched to v10 where commit 249cf070e3 introduced
> LogicalRewriteTruncate event.

Indeed.  You just be careful about the number of fields for morerows,
as that's not the same across branches.

> gss_open_server_v1.patch adds the description of GSSOpenServer
> into the doc and update the code in pgstat_get_wait_client().
> This needs to be applied in v12 where commit b0b39f72b9 introduced
> GSSOpenServer event.
>
> gss_open_server_for_master_v1.patch does not only what the above
> patch does but also update wait event enum into alphabetical order.
> This needs to be applied in the master.

Thanks for splitting things.  All that looks correct to me.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Robert Haas
In reply to this post by Fujii Masao-4
On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
<[hidden email]> wrote:
> Fixed. Thanks for the review!

I think it would be safer to just report the wait event during
pg_usleep(1000000L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Fujii Masao-4
In reply to this post by Michael Paquier-2


On 2020/02/14 15:45, Michael Paquier wrote:

> On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:
>> logical_rewrite_truncate_v1.patch adds the description of
>> LogicalRewriteTruncate into the doc. This needs to be
>> back-patched to v10 where commit 249cf070e3 introduced
>> LogicalRewriteTruncate event.
>
> Indeed.  You just be careful about the number of fields for morerows,
> as that's not the same across branches.
>
>> gss_open_server_v1.patch adds the description of GSSOpenServer
>> into the doc and update the code in pgstat_get_wait_client().
>> This needs to be applied in v12 where commit b0b39f72b9 introduced
>> GSSOpenServer event.
>>
>> gss_open_server_for_master_v1.patch does not only what the above
>> patch does but also update wait event enum into alphabetical order.
>> This needs to be applied in the master.
>
> Thanks for splitting things.  All that looks correct to me.

Thanks for the review! Pushed the patches for
LogicalRewriteTruncate and GSSOpenServer.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters


Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Fujii Masao-4
In reply to this post by Robert Haas


On 2020/02/14 23:43, Robert Haas wrote:
> On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
> <[hidden email]> wrote:
>> Fixed. Thanks for the review!
>
> I think it would be safer to just report the wait event during
> pg_usleep(1000000L) rather than putting those calls around the whole
> loop. It does not seem impossible that ereport() or
> CHECK_FOR_INTERRUPTS() could do something that reports a wait event
> internally.

OK, so I attached the updated version of the patch.
Thanks for the review!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

wait_event_wal_archive_v3.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Michael Paquier-2
On Mon, Feb 17, 2020 at 04:30:00PM +0900, Fujii Masao wrote:

> On 2020/02/14 23:43, Robert Haas wrote:
>> On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
>> <[hidden email]> wrote:
>>> Fixed. Thanks for the review!
>>
>> I think it would be safer to just report the wait event during
>> pg_usleep(1000000L) rather than putting those calls around the whole
>> loop. It does not seem impossible that ereport() or
>> CHECK_FOR_INTERRUPTS() could do something that reports a wait event
>> internally.
CHECK_FOR_INTERRUPTS() would reset the event wait state.  Hm..  You
may be right about the WARNING and it would be better to not rely on
that.  Do you remember the states which may be triggered?

> OK, so I attached the updated version of the patch.
> Thanks for the review!

Actually, I have some questions:
1) Should a new wait event be added in recoveryPausesHere()?  That
would be IMO useful.
2) Perhaps those two points should be replaced with WaitLatch(), where
we would use the new wait events introduced?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Fujii Masao-4


On 2020/02/17 18:48, Michael Paquier wrote:

> On Mon, Feb 17, 2020 at 04:30:00PM +0900, Fujii Masao wrote:
>> On 2020/02/14 23:43, Robert Haas wrote:
>>> On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
>>> <[hidden email]> wrote:
>>>> Fixed. Thanks for the review!
>>>
>>> I think it would be safer to just report the wait event during
>>> pg_usleep(1000000L) rather than putting those calls around the whole
>>> loop. It does not seem impossible that ereport() or
>>> CHECK_FOR_INTERRUPTS() could do something that reports a wait event
>>> internally.
>
> CHECK_FOR_INTERRUPTS() would reset the event wait state.  Hm..  You
> may be right about the WARNING and it would be better to not rely on
> that.  Do you remember the states which may be triggered?
>
>> OK, so I attached the updated version of the patch.
>> Thanks for the review!
>
> Actually, I have some questions:
> 1) Should a new wait event be added in recoveryPausesHere()?  That
> would be IMO useful.

Yes, it's useful, I think. But it's better to implement that
as a separate patch.

> 2) Perhaps those two points should be replaced with WaitLatch(), where
> we would use the new wait events introduced?

For what? Maybe it should, but I'm not sure it's worth the trouble.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters


Reply | Threaded
Open this post in threaded view
|

Re: Wait event that should be reported while waiting for WAL archiving to finish

Michael Paquier-2
On Mon, Feb 17, 2020 at 10:21:23PM +0900, Fujii Masao wrote:
> On 2020/02/17 18:48, Michael Paquier wrote:
>> Actually, I have some questions:
>> 1) Should a new wait event be added in recoveryPausesHere()?  That
>> would be IMO useful.
>
> Yes, it's useful, I think. But it's better to implement that
> as a separate patch.

No problem for me.

>> 2) Perhaps those two points should be replaced with WaitLatch(), where
>> we would use the new wait events introduced?
>
> For what? Maybe it should, but I'm not sure it's worth the trouble.

I don't have more to offer than signal handling consistency for both
without relying on pg_usleep()'s behavior depending on the platform,
and power consumption.  For the recovery pause, the second argument
may not be worth carrying, but we never had this argument for the
archiving wait, did we?  For both, on top of it you don't need to
worry about concurrent issues with the wait events attached around.
--
Michael

signature.asc (849 bytes) Download Attachment