About to add WAL write/fsync statistics to pg_stat_wal view

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiko Sawada
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda <[hidden email]> wrote:
>
> Hi, thanks for the reviews.
>
> I updated the attached patch.

Thank you for updating the patch!

> The summary of the changes is following.
>
> 1. fix document
>
> I followed another view's comments.
>
>
> 2. refactor issue_xlog_fsync()
>
> I removed "sync_called" variables, narrowed the "duration" scope and
> change the switch statement to if statement.

Looking at the code again, I think if we check if an fsync was really
called when calculating the I/O time, it's better to check that before
starting the measurement.

    bool issue_fsync = false;

    if (enableFsync &&
        (sync_method == SYNC_METHOD_FSYNC ||
         sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
         sync_method == SYNC_METHOD_FDATASYNC))
    {
        if (track_wal_io_timing)
            INSTR_TIME_SET_CURRENT(start);
        issue_fsync = true;
    }
    (snip)
    if (issue_fsync)
    {
        if (track_wal_io_timing)
        {
            instr_time  duration;

            INSTR_TIME_SET_CURRENT(duration);
            INSTR_TIME_SUBTRACT(duration, start);
            WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
        }
        WalStats.m_wal_sync++;
    }

So I prefer either the above which is a modified version of the
original approach or my idea that doesn’t introduce a new local
variable I proposed before. But I'm not going to insist on that.

>
>
> 3. make wal-receiver report WAL statistics
>
> I add the code to collect the statistics for a written operation
> in XLogWalRcvWrite() and to report stats in WalReceiverMain().
>
> Since WalReceiverMain() can loop fast, to avoid loading stats collector,
> I add "force" argument to the pgstat_send_wal function. If "force" is
> false, it can skip reporting until at least 500 msec since it last
> reported. WalReceiverMain() almost calls pgstat_send_wal() with "force"
> as false.

 void
-pgstat_send_wal(void)
+pgstat_send_wal(bool force)
 {
    /* We assume this initializes to zeroes */
    static const PgStat_MsgWal all_zeroes;
+   static TimestampTz last_report = 0;

+   TimestampTz now;
    WalUsage    walusage;

+   /*
+    * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+    * msec since we last sent one or specified "force".
+    */
+   now = GetCurrentTimestamp();
+   if (!force &&
+       !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
+       return;
+
+   last_report = now;

Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this
purpose since it is used as a minimum time for stats file updates. If
we want an interval, I think we should define another one Also, with
the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time
even if track_wal_io_timing is off, which is not good. On the other
hand, I agree that your concern that the wal receiver should not send
the stats for whenever receiving wal records. So an idea could be to
send the wal stats when finishing the current WAL segment file and
when timeout in the main loop. That way we can guarantee that the wal
stats on a replica is updated at least every time finishing a WAL
segment file when actively receiving WAL records and every
NAPTIME_PER_CYCLE in other cases.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
On 2021-01-26 00:03, Masahiko Sawada wrote:

> On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda
> <[hidden email]> wrote:
>>
>> Hi, thanks for the reviews.
>>
>> I updated the attached patch.
>
> Thank you for updating the patch!
>
>> The summary of the changes is following.
>>
>> 1. fix document
>>
>> I followed another view's comments.
>>
>>
>> 2. refactor issue_xlog_fsync()
>>
>> I removed "sync_called" variables, narrowed the "duration" scope and
>> change the switch statement to if statement.
>
> Looking at the code again, I think if we check if an fsync was really
> called when calculating the I/O time, it's better to check that before
> starting the measurement.
>
>     bool issue_fsync = false;
>
>     if (enableFsync &&
>         (sync_method == SYNC_METHOD_FSYNC ||
>          sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
>          sync_method == SYNC_METHOD_FDATASYNC))
>     {
>         if (track_wal_io_timing)
>             INSTR_TIME_SET_CURRENT(start);
>         issue_fsync = true;
>     }
>     (snip)
>     if (issue_fsync)
>     {
>         if (track_wal_io_timing)
>         {
>             instr_time  duration;
>
>             INSTR_TIME_SET_CURRENT(duration);
>             INSTR_TIME_SUBTRACT(duration, start);
>             WalStats.m_wal_sync_time =
> INSTR_TIME_GET_MICROSEC(duration);
>         }
>         WalStats.m_wal_sync++;
>     }
>
> So I prefer either the above which is a modified version of the
> original approach or my idea that doesn’t introduce a new local
> variable I proposed before. But I'm not going to insist on that.
Thanks for the comments.
I change the code to the above.

>>
>>
>> 3. make wal-receiver report WAL statistics
>>
>> I add the code to collect the statistics for a written operation
>> in XLogWalRcvWrite() and to report stats in WalReceiverMain().
>>
>> Since WalReceiverMain() can loop fast, to avoid loading stats
>> collector,
>> I add "force" argument to the pgstat_send_wal function. If "force" is
>> false, it can skip reporting until at least 500 msec since it last
>> reported. WalReceiverMain() almost calls pgstat_send_wal() with
>> "force"
>> as false.
>
>  void
> -pgstat_send_wal(void)
> +pgstat_send_wal(bool force)
>  {
>     /* We assume this initializes to zeroes */
>     static const PgStat_MsgWal all_zeroes;
> +   static TimestampTz last_report = 0;
>
> +   TimestampTz now;
>     WalUsage    walusage;
>
> +   /*
> +    * Don't send a message unless it's been at least
> PGSTAT_STAT_INTERVAL
> +    * msec since we last sent one or specified "force".
> +    */
> +   now = GetCurrentTimestamp();
> +   if (!force &&
> +       !TimestampDifferenceExceeds(last_report, now,
> PGSTAT_STAT_INTERVAL))
> +       return;
> +
> +   last_report = now;
>
> Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this
> purpose since it is used as a minimum time for stats file updates. If
> we want an interval, I think we should define another one Also, with
> the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time
> even if track_wal_io_timing is off, which is not good. On the other
> hand, I agree that your concern that the wal receiver should not send
> the stats for whenever receiving wal records. So an idea could be to
> send the wal stats when finishing the current WAL segment file and
> when timeout in the main loop. That way we can guarantee that the wal
> stats on a replica is updated at least every time finishing a WAL
> segment file when actively receiving WAL records and every
> NAPTIME_PER_CYCLE in other cases.
I agree with your comments. I think it should report when
reaching the end of WAL too. I add the code to report the stats
when finishing the current WAL segment file when timeout in the
main loop and when reaching the end of WAL.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

v6_v7.diff (16K) Download Attachment
v7-0001-Add-statistics-related-to-write-sync-wal-records.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

David G Johnston
In reply to this post by Masahiko Sawada
On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada <[hidden email]> wrote:
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda <[hidden email]> wrote:
>
> Hi, thanks for the reviews.
>
> I updated the attached patch.

Thank you for updating the patch!
 
Your original email with "total number of times" is more correct, removing the "of times" and just writing "total number of WAL" is not good wording.

Specifically, this change is strictly worse than the original.

-       Number of times WAL data was written to disk because WAL buffers became full
+       Total number of WAL data written to disk because WAL buffers became full

Both have the flaw that they leave implied exactly what it means to "write WAL to disk".  It is also unclear whether a counter, bytes, or both, would be more useful here. I've incorporated this into my documentation suggestions below:

(wal_buffers_full)
-- Revert - the original was better, though maybe add more detail similar to the below.  I didn't research exactly how this works.

(wal_write)
The number of times WAL buffers were written out to disk via XLogWrite

-- Seems like this should have a bytes version too

(wal_write_time)
The amount of time spent writing WAL buffers to disk, excluding sync time unless the wal_sync_method is either open_datasync or open_sync.
Units are in milliseconds with microsecond resolution.  This is zero when track_wal_io_timing is disabled.

(wal_sync)
The number of times WAL files were synced to disk while wal_sync_method was set to one of the "sync at commit" options (i.e., fdatasync, fsync, or fsync_writethrough).

-- it is not going to be zero just because those settings are presently disabled as they could have been enabled at some point since the last time these statistics were reset.

(wal_sync_time)
The amount of time spent syncing WAL files to disk, in milliseconds with microsecond resolution.  This requires setting wal_sync_method to one of the "sync at commit" options (i.e., fdatasync, fsync, or fsync_writethrough).


Also,

I would suggest extracting the changes to postmaster/pgstat.c and replication/walreceiver.c to a separate patch as you've fundamentally changed how it behaves with regards to that function and how it interacts with the WAL receiver.  That seems an entirely separate topic warranting its own patch and discussion.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

David G Johnston
In reply to this post by Masahiro Ikeda
On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda <[hidden email]> wrote:

I agree with your comments. I think it should report when
reaching the end of WAL too. I add the code to report the stats
when finishing the current WAL segment file when timeout in the
main loop and when reaching the end of WAL.


The following is not an improvement:

- /* Send WAL statistics to the stats collector. */
+ /* Send WAL statistics to stats collector */

The word "the" there makes it proper English.  Your copy-pasting should have kept the existing good wording in the other locations rather than replace the existing location with the newly added incorrect wording.

This doesn't make sense:

* current WAL segment file to avoid loading stats collector.

Maybe "overloading" or "overwhelming"?

I see you removed the pgstat_send_wal(force) change.  The rest of my comments on the v6 patch still stand I believe.

David J.


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
In reply to this post by David G Johnston
Hi, David.

Thanks for your comments.

On 2021-01-26 08:48, David G. Johnston wrote:

> On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada
> <[hidden email]> wrote:
>
>> On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda
>> <[hidden email]> wrote:
>>>
>>> Hi, thanks for the reviews.
>>>
>>> I updated the attached patch.
>>
>> Thank you for updating the patch!
>
> Your original email with "total number of times" is more correct,
> removing the "of times" and just writing "total number of WAL" is not
> good wording.
>
> Specifically, this change is strictly worse than the original.
>
> -       Number of times WAL data was written to disk because WAL
> buffers became full
> +       Total number of WAL data written to disk because WAL buffers
> became full
>
> Both have the flaw that they leave implied exactly what it means to
> "write WAL to disk".  It is also unclear whether a counter, bytes, or
> both, would be more useful here. I've incorporated this into my
> documentation suggestions below:
> (wal_buffers_full)
>
> -- Revert - the original was better, though maybe add more detail
> similar to the below.  I didn't research exactly how this works.

OK, I understood.
I reverted since this is a counter statistics.


> (wal_write)
> The number of times WAL buffers were written out to disk via XLogWrite
>

Thanks.

I thought it's better to omit "The" and "XLogWrite" because other views'
description
omits "The" and there is no description of "XlogWrite" in the documents.
What do you think?

> -- Seems like this should have a bytes version too

Do you mean that we need to separate statistics for wal write?


> (wal_write_time)
> The amount of time spent writing WAL buffers to disk, excluding sync
> time unless the wal_sync_method is either open_datasync or open_sync.
> Units are in milliseconds with microsecond resolution.  This is zero
> when track_wal_io_timing is disabled.

Thanks, I'll fix it.


> (wal_sync)
> The number of times WAL files were synced to disk while
> wal_sync_method was set to one of the "sync at commit" options (i.e.,
> fdatasync, fsync, or fsync_writethrough).

Thanks, I'll fix it.


> -- it is not going to be zero just because those settings are
> presently disabled as they could have been enabled at some point since
> the last time these statistics were reset.

Right, your description is correct.
The "track_wal_io_timing" has the same limitation, doesn't it?


> (wal_sync_time)
> The amount of time spent syncing WAL files to disk, in milliseconds
> with microsecond resolution.  This requires setting wal_sync_method to
> one of the "sync at commit" options (i.e., fdatasync, fsync, or
> fsync_writethrough).

Thanks, I'll fix it.
I will add the comments related to "track_wal_io_timing".


> Also,
>
> I would suggest extracting the changes to postmaster/pgstat.c and
> replication/walreceiver.c to a separate patch as you've fundamentally
> changed how it behaves with regards to that function and how it
> interacts with the WAL receiver.  That seems an entirely separate
> topic warranting its own patch and discussion.

OK, I will separate two patches.


On 2021-01-26 08:52, David G. Johnston wrote:

> On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda
> <[hidden email]> wrote:
>
>> I agree with your comments. I think it should report when
>> reaching the end of WAL too. I add the code to report the stats
>> when finishing the current WAL segment file when timeout in the
>> main loop and when reaching the end of WAL.
>
> The following is not an improvement:
>
> - /* Send WAL statistics to the stats collector. */+ /* Send WAL
> statistics to stats collector */
>
> The word "the" there makes it proper English.  Your copy-pasting
> should have kept the existing good wording in the other locations
> rather than replace the existing location with the newly added
> incorrect wording.

Thanks, I'll fix it.


> This doesn't make sense:
>
> * current WAL segment file to avoid loading stats collector.
>
> Maybe "overloading" or "overwhelming"?
>
> I see you removed the pgstat_send_wal(force) change.  The rest of my
> comments on the v6 patch still stand I believe.

Yes, "overloading" is right. Thanks.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

David G Johnston
On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda <[hidden email]> wrote:

> (wal_write)
> The number of times WAL buffers were written out to disk via XLogWrite
>

Thanks.

I thought it's better to omit "The" and "XLogWrite" because other views'
description
omits "The" and there is no description of "XlogWrite" in the documents.
What do you think?


The documentation for WAL does get into the public API level of detail and doing so here makes what this measures crystal clear.  The potential absence of sufficient detail elsewhere should be corrected instead of making this description more vague.  Specifically, probably XLogWrite should be added to the WAL overview as part of this update and probably even have the descriptive section of the documentation note that the number of times that said function is executed is exposed as a counter in the wal statistics table - thus closing the loop.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
On 2021-01-27 00:14, David G. Johnston wrote:

> On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda
> <[hidden email]> wrote:
>
>>> (wal_write)
>>> The number of times WAL buffers were written out to disk via
>> XLogWrite
>>>
>>
>> Thanks.
>>
>> I thought it's better to omit "The" and "XLogWrite" because other
>> views'
>> description
>> omits "The" and there is no description of "XlogWrite" in the
>> documents.
>> What do you think?
>
> The documentation for WAL does get into the public API level of detail
> and doing so here makes what this measures crystal clear.  The
> potential absence of sufficient detail elsewhere should be corrected
> instead of making this description more vague.  Specifically, probably
> XLogWrite should be added to the WAL overview as part of this update
> and probably even have the descriptive section of the documentation
> note that the number of times that said function is executed is
> exposed as a counter in the wal statistics table - thus closing the
> loop.
Thanks for your comments.

I added the descriptions in documents and separated the patch
into attached two patches. First is to add wal i/o activity statistics.
Second is to make the wal receiver report the wal statistics.

Please let me know if you have any comments.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

v8-0001-Add-statistics-related-to-write-sync-wal-records.patch (25K) Download Attachment
v8-0002-Makes-the-wal-receiver-report-WAL-statistics.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
I pgindented the patches.

Regards
--
Masahiro Ikeda
NTT DATA CORPORATION

v9-0001-Add-statistics-related-to-write-sync-wal-records.patch (25K) Download Attachment
v9-0002-Makes-the-wal-receiver-report-WAL-statistics.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Fujii Masao-4


On 2021/02/05 8:45, Masahiro Ikeda wrote:
> I pgindented the patches.

Thanks for updating the patches!

+       <function>XLogWrite</function>, which nomally called by an
+       <function>issue_xlog_fsync</function>, which nomally called by an

Typo: "nomally" should be "normally"?

+       <function>XLogFlush</function> request(see <xref linkend="wal-configuration"/>)
+       <function>XLogFlush</function> request(see <xref linkend="wal-configuration"/>),

Isn't it better to add a space character just after "request"?

+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration);

If several cycles happen in the do-while loop, m_wal_write_time should be
updated with the sum of "duration" in those cycles instead of "duration"
in the last cycle? If yes, "+=" should be used instead of "=" when updating
m_wal_write_time?

+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);

Also "=" should be "+=" in the above?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Fujii Masao-4


On 2021/02/08 13:01, Fujii Masao wrote:

>
>
> On 2021/02/05 8:45, Masahiro Ikeda wrote:
>> I pgindented the patches.
>
> Thanks for updating the patches!
>
> +       <function>XLogWrite</function>, which nomally called by an
> +       <function>issue_xlog_fsync</function>, which nomally called by an
>
> Typo: "nomally" should be "normally"?
>
> +       <function>XLogFlush</function> request(see <xref linkend="wal-configuration"/>)
> +       <function>XLogFlush</function> request(see <xref linkend="wal-configuration"/>),
>
> Isn't it better to add a space character just after "request"?
>
> +                    INSTR_TIME_SET_CURRENT(duration);
> +                    INSTR_TIME_SUBTRACT(duration, start);
> +                    WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration);
>
> If several cycles happen in the do-while loop, m_wal_write_time should be
> updated with the sum of "duration" in those cycles instead of "duration"
> in the last cycle? If yes, "+=" should be used instead of "=" when updating
> m_wal_write_time?
>
> +            INSTR_TIME_SET_CURRENT(duration);
> +            INSTR_TIME_SUBTRACT(duration, start);
> +            WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
>
> Also "=" should be "+=" in the above?

+ /* Send WAL statistics */
+ pgstat_send_wal();

This may cause overhead in WAL-writing by walwriter because it's called
every cycles even when walwriter needs to write more WAL next cycle
(don't need to sleep on WaitLatch)? If this is right, pgstat_send_wal()
should be called only when WaitLatch() returns with WL_TIMEOUT?

-       <function>XLogFlush</function> request(see <xref linkend="wal-configuration"/>)
+       <function>XLogFlush</function> request(see <xref linkend="wal-configuration"/>),
+       or WAL data written out to disk by WAL receiver.

So regarding walreceiver, only wal_write, wal_write_time, wal_sync, and
wal_sync_time are updated even while the other values are not. Isn't this
confusing to users? If so, what about reporting those walreceiver stats in
pg_stat_wal_receiver?

  if (endofwal)
+ {
+ /* Send WAL statistics to the stats collector */
+ pgstat_send_wal();
  break;

You added pgstat_send_wal() so that it's called in some cases where
walreceiver exits. But ISTM that there are other walreceiver-exit cases.
For example, in the case where SIGTERM is received. Instead,
pgstat_send_wal() should be called in WalRcvDie() for those all cases?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

David G Johnston
In reply to this post by Masahiro Ikeda
On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda <[hidden email]> wrote:
I pgindented the patches.


... <function>XLogWrite</function>, which is invoked during an <function>XLogFlush</function> request (see ...).  This is also incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original)

You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste)

is ether -> is either

"This parameter is off by default as it will repeatedly query the operating system..."
", because" -> "as"

wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication."

"The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in....) This is undesirable because ..."

I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics.  This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not.  It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL).  Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead.  In either case, this half-and-half implementation seems undesirable.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
In reply to this post by Fujii Masao-4
On 2021-02-08 13:01, Fujii Masao wrote:
> On 2021/02/05 8:45, Masahiro Ikeda wrote:
>> I pgindented the patches.
>
> Thanks for updating the patches!

Thanks for checking the patches.

> +       <function>XLogWrite</function>, which nomally called by an
> +       <function>issue_xlog_fsync</function>, which nomally called by
> an
>
> Typo: "nomally" should be "normally"?

Yes, I'll fix it.

> +       <function>XLogFlush</function> request(see <xref
> linkend="wal-configuration"/>)
> +       <function>XLogFlush</function> request(see <xref
> linkend="wal-configuration"/>),
>
> Isn't it better to add a space character just after "request"?

Thanks, I'll fix it.

> + INSTR_TIME_SET_CURRENT(duration);
> + INSTR_TIME_SUBTRACT(duration, start);
> + WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration);
>
> If several cycles happen in the do-while loop, m_wal_write_time should
> be
> updated with the sum of "duration" in those cycles instead of
> "duration"
> in the last cycle? If yes, "+=" should be used instead of "=" when
> updating
> m_wal_write_time?
> + INSTR_TIME_SET_CURRENT(duration);
> + INSTR_TIME_SUBTRACT(duration, start);
> + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
>
> Also "=" should be "+=" in the above?

Yes, they are my mistake when changing the unit from milliseconds to
microseconds.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
In reply to this post by Fujii Masao-4
On 2021-02-08 14:26, Fujii Masao wrote:

> On 2021/02/08 13:01, Fujii Masao wrote:
>>
>>
>> On 2021/02/05 8:45, Masahiro Ikeda wrote:
>>> I pgindented the patches.
>>
>> Thanks for updating the patches!
>>
>> +       <function>XLogWrite</function>, which nomally called by an
>> +       <function>issue_xlog_fsync</function>, which nomally called by
>> an
>>
>> Typo: "nomally" should be "normally"?
>>
>> +       <function>XLogFlush</function> request(see <xref
>> linkend="wal-configuration"/>)
>> +       <function>XLogFlush</function> request(see <xref
>> linkend="wal-configuration"/>),
>>
>> Isn't it better to add a space character just after "request"?
>>
>> +                    INSTR_TIME_SET_CURRENT(duration);
>> +                    INSTR_TIME_SUBTRACT(duration, start);
>> +                    WalStats.m_wal_write_time =
>> INSTR_TIME_GET_MICROSEC(duration);
>>
>> If several cycles happen in the do-while loop, m_wal_write_time should
>> be
>> updated with the sum of "duration" in those cycles instead of
>> "duration"
>> in the last cycle? If yes, "+=" should be used instead of "=" when
>> updating
>> m_wal_write_time?
>>
>> +            INSTR_TIME_SET_CURRENT(duration);
>> +            INSTR_TIME_SUBTRACT(duration, start);
>> +            WalStats.m_wal_sync_time =
>> INSTR_TIME_GET_MICROSEC(duration);
>>
>> Also "=" should be "+=" in the above?
>
> + /* Send WAL statistics */
> + pgstat_send_wal();
>
> This may cause overhead in WAL-writing by walwriter because it's called
> every cycles even when walwriter needs to write more WAL next cycle
> (don't need to sleep on WaitLatch)? If this is right, pgstat_send_wal()
> should be called only when WaitLatch() returns with WL_TIMEOUT?

Thanks, I didn't notice that.
I'll fix it.

> -       <function>XLogFlush</function> request(see <xref
> linkend="wal-configuration"/>)
> +       <function>XLogFlush</function> request(see <xref
> linkend="wal-configuration"/>),
> +       or WAL data written out to disk by WAL receiver.
>
> So regarding walreceiver, only wal_write, wal_write_time, wal_sync, and
> wal_sync_time are updated even while the other values are not. Isn't
> this
> confusing to users? If so, what about reporting those walreceiver stats
> in
> pg_stat_wal_receiver?

OK, I'll add new infrastructure code to interect with wal receiver
and stats collector and show the stats in pg_stat_wal_receiver.

>   if (endofwal)
> + {
> + /* Send WAL statistics to the stats collector */
> + pgstat_send_wal();
>   break;
>
> You added pgstat_send_wal() so that it's called in some cases where
> walreceiver exits. But ISTM that there are other walreceiver-exit
> cases.
> For example, in the case where SIGTERM is received. Instead,
> pgstat_send_wal() should be called in WalRcvDie() for those all cases?

Thanks, I forgot the case.
I'll fix it.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
In reply to this post by David G Johnston
On 2021-02-10 00:51, David G. Johnston wrote:

> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
> <[hidden email]> wrote:
>
>> I pgindented the patches.
>
> ... <function>XLogWrite</function>, which is invoked during an
> <function>XLogFlush</function> request (see ...).  This is also
> incremented by the WAL receiver during replication.
>
> ("which normally called" should be "which is normally called" or
> "which normally is called" if you want to keep true to the original)
> You missed the adding the space before an opening parenthesis here and
> elsewhere (probably copy-paste)
>
> is ether -> is either
> "This parameter is off by default as it will repeatedly query the
> operating system..."
> ", because" -> "as"
Thanks, I fixed them.

> wal_write_time and the sync items also need the note: "This is also
> incremented by the WAL receiver during replication."

I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.

> "The number of times it happened..." -> " (the tally of this event is
> reported in wal_buffers_full in....) This is undesirable because ..."

Thanks, I fixed it.

> I notice that the patch for WAL receiver doesn't require explicitly
> computing the sync statistics but does require computing the write
> statistics.  This is because of the presence of issue_xlog_fsync but
> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
> receiver path does not.  It seems technically straight-forward to
> refactor here to avoid the almost-duplicated logic in the two places,
> though I suspect there may be a trade-off for not adding another
> function call to the stack given the importance of WAL processing
> (though that seems marginalized compared to the cost of actually
> writing the WAL).  Or, as Fujii noted, go the other way and don't have
> any shared code between the two but instead implement the WAL receiver
> one to use pg_stat_wal_receiver instead.  In either case, this
> half-and-half implementation seems undesirable.
OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)

I added the infrastructure code to communicate the WAL receiver stats
messages between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

v10-0001-Add-statistics-related-to-write-sync-wal-records.patch (25K) Download Attachment
v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Fujii Masao-4


On 2021/02/15 11:59, Masahiro Ikeda wrote:

> On 2021-02-10 00:51, David G. Johnston wrote:
>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>> <[hidden email]> wrote:
>>
>>> I pgindented the patches.
>>
>> ... <function>XLogWrite</function>, which is invoked during an
>> <function>XLogFlush</function> request (see ...).  This is also
>> incremented by the WAL receiver during replication.
>>
>> ("which normally called" should be "which is normally called" or
>> "which normally is called" if you want to keep true to the original)
>> You missed the adding the space before an opening parenthesis here and
>> elsewhere (probably copy-paste)
>>
>> is ether -> is either
>> "This parameter is off by default as it will repeatedly query the
>> operating system..."
>> ", because" -> "as"
>
> Thanks, I fixed them.
>
>> wal_write_time and the sync items also need the note: "This is also
>> incremented by the WAL receiver during replication."
>
> I skipped changing it since I separated the stats for the WAL receiver
> in pg_stat_wal_receiver.
>
>> "The number of times it happened..." -> " (the tally of this event is
>> reported in wal_buffers_full in....) This is undesirable because ..."
>
> Thanks, I fixed it.
>
>> I notice that the patch for WAL receiver doesn't require explicitly
>> computing the sync statistics but does require computing the write
>> statistics.  This is because of the presence of issue_xlog_fsync but
>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>> receiver path does not.  It seems technically straight-forward to
>> refactor here to avoid the almost-duplicated logic in the two places,
>> though I suspect there may be a trade-off for not adding another
>> function call to the stack given the importance of WAL processing
>> (though that seems marginalized compared to the cost of actually
>> writing the WAL).  Or, as Fujii noted, go the other way and don't have
>> any shared code between the two but instead implement the WAL receiver
>> one to use pg_stat_wal_receiver instead.  In either case, this
>> half-and-half implementation seems undesirable.
>
> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)

Thanks for updating the patches!


> I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and
> the stats for WAL receiver is counted in pg_stat_wal_receiver.
> What do you think?

On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
On 2021-02-24 16:14, Fujii Masao wrote:

> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>> On 2021-02-10 00:51, David G. Johnston wrote:
>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>> <[hidden email]> wrote:
>>>
>>>> I pgindented the patches.
>>>
>>> ... <function>XLogWrite</function>, which is invoked during an
>>> <function>XLogFlush</function> request (see ...).  This is also
>>> incremented by the WAL receiver during replication.
>>>
>>> ("which normally called" should be "which is normally called" or
>>> "which normally is called" if you want to keep true to the original)
>>> You missed the adding the space before an opening parenthesis here
>>> and
>>> elsewhere (probably copy-paste)
>>>
>>> is ether -> is either
>>> "This parameter is off by default as it will repeatedly query the
>>> operating system..."
>>> ", because" -> "as"
>>
>> Thanks, I fixed them.
>>
>>> wal_write_time and the sync items also need the note: "This is also
>>> incremented by the WAL receiver during replication."
>>
>> I skipped changing it since I separated the stats for the WAL receiver
>> in pg_stat_wal_receiver.
>>
>>> "The number of times it happened..." -> " (the tally of this event is
>>> reported in wal_buffers_full in....) This is undesirable because ..."
>>
>> Thanks, I fixed it.
>>
>>> I notice that the patch for WAL receiver doesn't require explicitly
>>> computing the sync statistics but does require computing the write
>>> statistics.  This is because of the presence of issue_xlog_fsync but
>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe
>>> that
>>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>>> receiver path does not.  It seems technically straight-forward to
>>> refactor here to avoid the almost-duplicated logic in the two places,
>>> though I suspect there may be a trade-off for not adding another
>>> function call to the stack given the importance of WAL processing
>>> (though that seems marginalized compared to the cost of actually
>>> writing the WAL).  Or, as Fujii noted, go the other way and don't
>>> have
>>> any shared code between the two but instead implement the WAL
>>> receiver
>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>> half-and-half implementation seems undesirable.
>>
>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>
> Thanks for updating the patches!
>
>
>> I added the infrastructure code to communicate the WAL receiver stats
>> messages between the WAL receiver and the stats collector, and
>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>> What do you think?
>
> On second thought, this idea seems not good. Because those stats are
> collected between multiple walreceivers, but other values in
> pg_stat_wal_receiver is only related to the walreceiver process running
> at that moment. IOW, it seems strange that some values show dynamic
> stats and the others show collected stats, even though they are in
> the same view pg_stat_wal_receiver. Thought?
OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view
in v11 patch.

> I notice that the patch for WAL receiver doesn't require explicitly
> computing the sync statistics but does require computing the write
> statistics.  This is because of the presence of issue_xlog_fsync but
> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
> receiver path does not.  It seems technically straight-forward to
> refactor here to avoid the almost-duplicated logic in the two places,
> though I suspect there may be a trade-off for not adding another
> function call to the stack given the importance of WAL processing
> (though that seems marginalized compared to the cost of actually
> writing the WAL).  Or, as Fujii noted, go the other way and don't have
> any shared code between the two but instead implement the WAL receiver
> one to use pg_stat_wal_receiver instead.  In either case, this
> half-and-half implementation seems undesirable.
I refactored the logic to write xlog file to unify collecting the write
stats.
As David said, although pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE)
is not called in the WAL receiver's path,
I agreed that the cost to write the WAL is much bigger.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

v11-0001-Add-statistics-related-to-write-sync-wal-records.patch (26K) Download Attachment
v11-0002-Makes-the-wal-receiver-report-WAL-statistics.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Fujii Masao-4


On 2021/03/03 14:33, Masahiro Ikeda wrote:

> On 2021-02-24 16:14, Fujii Masao wrote:
>> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>>> On 2021-02-10 00:51, David G. Johnston wrote:
>>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>>> <[hidden email]> wrote:
>>>>
>>>>> I pgindented the patches.
>>>>
>>>> ... <function>XLogWrite</function>, which is invoked during an
>>>> <function>XLogFlush</function> request (see ...).  This is also
>>>> incremented by the WAL receiver during replication.
>>>>
>>>> ("which normally called" should be "which is normally called" or
>>>> "which normally is called" if you want to keep true to the original)
>>>> You missed the adding the space before an opening parenthesis here and
>>>> elsewhere (probably copy-paste)
>>>>
>>>> is ether -> is either
>>>> "This parameter is off by default as it will repeatedly query the
>>>> operating system..."
>>>> ", because" -> "as"
>>>
>>> Thanks, I fixed them.
>>>
>>>> wal_write_time and the sync items also need the note: "This is also
>>>> incremented by the WAL receiver during replication."
>>>
>>> I skipped changing it since I separated the stats for the WAL receiver
>>> in pg_stat_wal_receiver.
>>>
>>>> "The number of times it happened..." -> " (the tally of this event is
>>>> reported in wal_buffers_full in....) This is undesirable because ..."
>>>
>>> Thanks, I fixed it.
>>>
>>>> I notice that the patch for WAL receiver doesn't require explicitly
>>>> computing the sync statistics but does require computing the write
>>>> statistics.  This is because of the presence of issue_xlog_fsync but
>>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
>>>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>>>> receiver path does not.  It seems technically straight-forward to
>>>> refactor here to avoid the almost-duplicated logic in the two places,
>>>> though I suspect there may be a trade-off for not adding another
>>>> function call to the stack given the importance of WAL processing
>>>> (though that seems marginalized compared to the cost of actually
>>>> writing the WAL).  Or, as Fujii noted, go the other way and don't have
>>>> any shared code between the two but instead implement the WAL receiver
>>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>>> half-and-half implementation seems undesirable.
>>>
>>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>
>> Thanks for updating the patches!
>>
>>
>>> I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and
>>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>>> What do you think?
>>
>> On second thought, this idea seems not good. Because those stats are
>> collected between multiple walreceivers, but other values in
>> pg_stat_wal_receiver is only related to the walreceiver process running
>> at that moment. IOW, it seems strange that some values show dynamic
>> stats and the others show collected stats, even though they are in
>> the same view pg_stat_wal_receiver. Thought?
>
> OK, I fixed it.
> The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch.

Thanks for updating the patches! I'm now reading 001 patch.

+ /* Check whether the WAL file was synced to disk right now */
+ if (enableFsync &&
+ (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+ {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


+ /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+ if (rc & WL_TIMEOUT)
+ pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
On 2021-03-03 16:30, Fujii Masao wrote:

> On 2021/03/03 14:33, Masahiro Ikeda wrote:
>> On 2021-02-24 16:14, Fujii Masao wrote:
>>> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>>>> On 2021-02-10 00:51, David G. Johnston wrote:
>>>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>>>> <[hidden email]> wrote:
>>>>>
>>>>>> I pgindented the patches.
>>>>>
>>>>> ... <function>XLogWrite</function>, which is invoked during an
>>>>> <function>XLogFlush</function> request (see ...).  This is also
>>>>> incremented by the WAL receiver during replication.
>>>>>
>>>>> ("which normally called" should be "which is normally called" or
>>>>> "which normally is called" if you want to keep true to the
>>>>> original)
>>>>> You missed the adding the space before an opening parenthesis here
>>>>> and
>>>>> elsewhere (probably copy-paste)
>>>>>
>>>>> is ether -> is either
>>>>> "This parameter is off by default as it will repeatedly query the
>>>>> operating system..."
>>>>> ", because" -> "as"
>>>>
>>>> Thanks, I fixed them.
>>>>
>>>>> wal_write_time and the sync items also need the note: "This is also
>>>>> incremented by the WAL receiver during replication."
>>>>
>>>> I skipped changing it since I separated the stats for the WAL
>>>> receiver
>>>> in pg_stat_wal_receiver.
>>>>
>>>>> "The number of times it happened..." -> " (the tally of this event
>>>>> is
>>>>> reported in wal_buffers_full in....) This is undesirable because
>>>>> ..."
>>>>
>>>> Thanks, I fixed it.
>>>>
>>>>> I notice that the patch for WAL receiver doesn't require explicitly
>>>>> computing the sync statistics but does require computing the write
>>>>> statistics.  This is because of the presence of issue_xlog_fsync
>>>>> but
>>>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe
>>>>> that
>>>>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>>>>> receiver path does not.  It seems technically straight-forward to
>>>>> refactor here to avoid the almost-duplicated logic in the two
>>>>> places,
>>>>> though I suspect there may be a trade-off for not adding another
>>>>> function call to the stack given the importance of WAL processing
>>>>> (though that seems marginalized compared to the cost of actually
>>>>> writing the WAL).  Or, as Fujii noted, go the other way and don't
>>>>> have
>>>>> any shared code between the two but instead implement the WAL
>>>>> receiver
>>>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>>>> half-and-half implementation seems undesirable.
>>>>
>>>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>>>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>>
>>> Thanks for updating the patches!
>>>
>>>
>>>> I added the infrastructure code to communicate the WAL receiver
>>>> stats messages between the WAL receiver and the stats collector, and
>>>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>>>> What do you think?
>>>
>>> On second thought, this idea seems not good. Because those stats are
>>> collected between multiple walreceivers, but other values in
>>> pg_stat_wal_receiver is only related to the walreceiver process
>>> running
>>> at that moment. IOW, it seems strange that some values show dynamic
>>> stats and the others show collected stats, even though they are in
>>> the same view pg_stat_wal_receiver. Thought?
>>
>> OK, I fixed it.
>> The stats collected in the WAL receiver is exposed in pg_stat_wal view
>> in v11 patch.
>
> Thanks for updating the patches! I'm now reading 001 patch.
>
> + /* Check whether the WAL file was synced to disk right now */
> + if (enableFsync &&
> + (sync_method == SYNC_METHOD_FSYNC ||
> + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
> + sync_method == SYNC_METHOD_FDATASYNC))
> + {
>
> Isn't it better to make issue_xlog_fsync() return immediately
> if enableFsync is off, sync_method is open_sync or open_data_sync,
> to simplify the code more?
Thanks for the comments.
I added the above code in v12 patch.

>
> + /*
> + * Send WAL statistics only if WalWriterDelay has elapsed to
> minimize
> + * the overhead in WAL-writing.
> + */
> + if (rc & WL_TIMEOUT)
> + pgstat_send_wal();
>
> On second thought, this change means that it always takes
> wal_writer_delay
> before walwriter's WAL stats is sent after XLogBackgroundFlush() is
> called.
> For example, if wal_writer_delay is set to several seconds, some values
> in
> pg_stat_wal would be not up-to-date meaninglessly for those seconds.
> So I'm thinking to withdraw my previous comment and it's ok to send
> the stats every after XLogBackgroundFlush() is called. Thought?
Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.

Why don't to make another way to check the timestamp?

+               /*
+                * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+                * msec since we last sent one
+                */
+               now = GetCurrentTimestamp();
+               if (TimestampDifferenceExceeds(last_report, now,
PGSTAT_STAT_INTERVAL))
+               {
+                       pgstat_send_wal();
+                       last_report = now;
+               }
+

Although I worried that it's better to add the check code in
pgstat_send_wal(),
I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
pgstat_send_wal() is invoked pg_report_stat() and it already checks the
PGSTAT_STAT_INTERVAL.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

v12-0001-Add-statistics-related-to-write-sync-wal-records.patch (27K) Download Attachment
v11_v12_0001.diff (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda
On 2021-03-03 20:27, Masahiro Ikeda wrote:

> On 2021-03-03 16:30, Fujii Masao wrote:
>> On 2021/03/03 14:33, Masahiro Ikeda wrote:
>>> On 2021-02-24 16:14, Fujii Masao wrote:
>>>> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>>>>> On 2021-02-10 00:51, David G. Johnston wrote:
>>>>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>>> I pgindented the patches.
>>>>>>
>>>>>> ... <function>XLogWrite</function>, which is invoked during an
>>>>>> <function>XLogFlush</function> request (see ...).  This is also
>>>>>> incremented by the WAL receiver during replication.
>>>>>>
>>>>>> ("which normally called" should be "which is normally called" or
>>>>>> "which normally is called" if you want to keep true to the
>>>>>> original)
>>>>>> You missed the adding the space before an opening parenthesis here
>>>>>> and
>>>>>> elsewhere (probably copy-paste)
>>>>>>
>>>>>> is ether -> is either
>>>>>> "This parameter is off by default as it will repeatedly query the
>>>>>> operating system..."
>>>>>> ", because" -> "as"
>>>>>
>>>>> Thanks, I fixed them.
>>>>>
>>>>>> wal_write_time and the sync items also need the note: "This is
>>>>>> also
>>>>>> incremented by the WAL receiver during replication."
>>>>>
>>>>> I skipped changing it since I separated the stats for the WAL
>>>>> receiver
>>>>> in pg_stat_wal_receiver.
>>>>>
>>>>>> "The number of times it happened..." -> " (the tally of this event
>>>>>> is
>>>>>> reported in wal_buffers_full in....) This is undesirable because
>>>>>> ..."
>>>>>
>>>>> Thanks, I fixed it.
>>>>>
>>>>>> I notice that the patch for WAL receiver doesn't require
>>>>>> explicitly
>>>>>> computing the sync statistics but does require computing the write
>>>>>> statistics.  This is because of the presence of issue_xlog_fsync
>>>>>> but
>>>>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe
>>>>>> that
>>>>>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>>>>>> receiver path does not.  It seems technically straight-forward to
>>>>>> refactor here to avoid the almost-duplicated logic in the two
>>>>>> places,
>>>>>> though I suspect there may be a trade-off for not adding another
>>>>>> function call to the stack given the importance of WAL processing
>>>>>> (though that seems marginalized compared to the cost of actually
>>>>>> writing the WAL).  Or, as Fujii noted, go the other way and don't
>>>>>> have
>>>>>> any shared code between the two but instead implement the WAL
>>>>>> receiver
>>>>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>>>>> half-and-half implementation seems undesirable.
>>>>>
>>>>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>>>>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>>>
>>>> Thanks for updating the patches!
>>>>
>>>>
>>>>> I added the infrastructure code to communicate the WAL receiver
>>>>> stats messages between the WAL receiver and the stats collector,
>>>>> and
>>>>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>>>>> What do you think?
>>>>
>>>> On second thought, this idea seems not good. Because those stats are
>>>> collected between multiple walreceivers, but other values in
>>>> pg_stat_wal_receiver is only related to the walreceiver process
>>>> running
>>>> at that moment. IOW, it seems strange that some values show dynamic
>>>> stats and the others show collected stats, even though they are in
>>>> the same view pg_stat_wal_receiver. Thought?
>>>
>>> OK, I fixed it.
>>> The stats collected in the WAL receiver is exposed in pg_stat_wal
>>> view in v11 patch.
>>
>> Thanks for updating the patches! I'm now reading 001 patch.
>>
>> + /* Check whether the WAL file was synced to disk right now */
>> + if (enableFsync &&
>> + (sync_method == SYNC_METHOD_FSYNC ||
>> + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
>> + sync_method == SYNC_METHOD_FDATASYNC))
>> + {
>>
>> Isn't it better to make issue_xlog_fsync() return immediately
>> if enableFsync is off, sync_method is open_sync or open_data_sync,
>> to simplify the code more?
>
> Thanks for the comments.
> I added the above code in v12 patch.
>
>>
>> + /*
>> + * Send WAL statistics only if WalWriterDelay has elapsed to
>> minimize
>> + * the overhead in WAL-writing.
>> + */
>> + if (rc & WL_TIMEOUT)
>> + pgstat_send_wal();
>>
>> On second thought, this change means that it always takes
>> wal_writer_delay
>> before walwriter's WAL stats is sent after XLogBackgroundFlush() is
>> called.
>> For example, if wal_writer_delay is set to several seconds, some
>> values in
>> pg_stat_wal would be not up-to-date meaninglessly for those seconds.
>> So I'm thinking to withdraw my previous comment and it's ok to send
>> the stats every after XLogBackgroundFlush() is called. Thought?
>
> Thanks, I didn't notice that.
>
> Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
> default value is 200msec and it may be set shorter time.
>
> Why don't to make another way to check the timestamp?
>
> +               /*
> +                * Don't send a message unless it's been at least
> PGSTAT_STAT_INTERVAL
> +                * msec since we last sent one
> +                */
> +               now = GetCurrentTimestamp();
> +               if (TimestampDifferenceExceeds(last_report, now,
> PGSTAT_STAT_INTERVAL))
> +               {
> +                       pgstat_send_wal();
> +                       last_report = now;
> +               }
> +
>
> Although I worried that it's better to add the check code in
> pgstat_send_wal(),
> I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
> pgstat_send_wal() is invoked pg_report_stat() and it already checks the
> PGSTAT_STAT_INTERVAL.
I forgot to remove an unused variable.
The attached v13 patch is fixed.

Regards
--
Masahiro Ikeda
NTT DATA CORPORATION

v13-0001-Add-statistics-related-to-write-sync-wal-records.patch (27K) Download Attachment
v12_v13.diff (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Ibrar Ahmed-4


On Thu, Mar 4, 2021 at 12:14 PM Masahiro Ikeda <[hidden email]> wrote:
On 2021-03-03 20:27, Masahiro Ikeda wrote:
> On 2021-03-03 16:30, Fujii Masao wrote:
>> On 2021/03/03 14:33, Masahiro Ikeda wrote:
>>> On 2021-02-24 16:14, Fujii Masao wrote:
>>>> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>>>>> On 2021-02-10 00:51, David G. Johnston wrote:
>>>>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>>> I pgindented the patches.
>>>>>>
>>>>>> ... <function>XLogWrite</function>, which is invoked during an
>>>>>> <function>XLogFlush</function> request (see ...).  This is also
>>>>>> incremented by the WAL receiver during replication.
>>>>>>
>>>>>> ("which normally called" should be "which is normally called" or
>>>>>> "which normally is called" if you want to keep true to the
>>>>>> original)
>>>>>> You missed the adding the space before an opening parenthesis here
>>>>>> and
>>>>>> elsewhere (probably copy-paste)
>>>>>>
>>>>>> is ether -> is either
>>>>>> "This parameter is off by default as it will repeatedly query the
>>>>>> operating system..."
>>>>>> ", because" -> "as"
>>>>>
>>>>> Thanks, I fixed them.
>>>>>
>>>>>> wal_write_time and the sync items also need the note: "This is
>>>>>> also
>>>>>> incremented by the WAL receiver during replication."
>>>>>
>>>>> I skipped changing it since I separated the stats for the WAL
>>>>> receiver
>>>>> in pg_stat_wal_receiver.
>>>>>
>>>>>> "The number of times it happened..." -> " (the tally of this event
>>>>>> is
>>>>>> reported in wal_buffers_full in....) This is undesirable because
>>>>>> ..."
>>>>>
>>>>> Thanks, I fixed it.
>>>>>
>>>>>> I notice that the patch for WAL receiver doesn't require
>>>>>> explicitly
>>>>>> computing the sync statistics but does require computing the write
>>>>>> statistics.  This is because of the presence of issue_xlog_fsync
>>>>>> but
>>>>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe
>>>>>> that
>>>>>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>>>>>> receiver path does not.  It seems technically straight-forward to
>>>>>> refactor here to avoid the almost-duplicated logic in the two
>>>>>> places,
>>>>>> though I suspect there may be a trade-off for not adding another
>>>>>> function call to the stack given the importance of WAL processing
>>>>>> (though that seems marginalized compared to the cost of actually
>>>>>> writing the WAL).  Or, as Fujii noted, go the other way and don't
>>>>>> have
>>>>>> any shared code between the two but instead implement the WAL
>>>>>> receiver
>>>>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>>>>> half-and-half implementation seems undesirable.
>>>>>
>>>>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>>>>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>>>
>>>> Thanks for updating the patches!
>>>>
>>>>
>>>>> I added the infrastructure code to communicate the WAL receiver
>>>>> stats messages between the WAL receiver and the stats collector,
>>>>> and
>>>>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>>>>> What do you think?
>>>>
>>>> On second thought, this idea seems not good. Because those stats are
>>>> collected between multiple walreceivers, but other values in
>>>> pg_stat_wal_receiver is only related to the walreceiver process
>>>> running
>>>> at that moment. IOW, it seems strange that some values show dynamic
>>>> stats and the others show collected stats, even though they are in
>>>> the same view pg_stat_wal_receiver. Thought?
>>>
>>> OK, I fixed it.
>>> The stats collected in the WAL receiver is exposed in pg_stat_wal
>>> view in v11 patch.
>>
>> Thanks for updating the patches! I'm now reading 001 patch.
>>
>> +    /* Check whether the WAL file was synced to disk right now */
>> +    if (enableFsync &&
>> +            (sync_method == SYNC_METHOD_FSYNC ||
>> +             sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
>> +             sync_method == SYNC_METHOD_FDATASYNC))
>> +    {
>>
>> Isn't it better to make issue_xlog_fsync() return immediately
>> if enableFsync is off, sync_method is open_sync or open_data_sync,
>> to simplify the code more?
>
> Thanks for the comments.
> I added the above code in v12 patch.
>
>>
>> +            /*
>> +             * Send WAL statistics only if WalWriterDelay has elapsed to
>> minimize
>> +             * the overhead in WAL-writing.
>> +             */
>> +            if (rc & WL_TIMEOUT)
>> +                    pgstat_send_wal();
>>
>> On second thought, this change means that it always takes
>> wal_writer_delay
>> before walwriter's WAL stats is sent after XLogBackgroundFlush() is
>> called.
>> For example, if wal_writer_delay is set to several seconds, some
>> values in
>> pg_stat_wal would be not up-to-date meaninglessly for those seconds.
>> So I'm thinking to withdraw my previous comment and it's ok to send
>> the stats every after XLogBackgroundFlush() is called. Thought?
>
> Thanks, I didn't notice that.
>
> Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
> default value is 200msec and it may be set shorter time.
>
> Why don't to make another way to check the timestamp?
>
> +               /*
> +                * Don't send a message unless it's been at least
> PGSTAT_STAT_INTERVAL
> +                * msec since we last sent one
> +                */
> +               now = GetCurrentTimestamp();
> +               if (TimestampDifferenceExceeds(last_report, now,
> PGSTAT_STAT_INTERVAL))
> +               {
> +                       pgstat_send_wal();
> +                       last_report = now;
> +               }
> +
>
> Although I worried that it's better to add the check code in
> pgstat_send_wal(),
> I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
> pgstat_send_wal() is invoked pg_report_stat() and it already checks the
> PGSTAT_STAT_INTERVAL.

I forgot to remove an unused variable.
The attached v13 patch is fixed.

Regards
--
Masahiro Ikeda
NTT DATA CORPORATION

This patch set no longer applies
http://cfbot.cputube.org/patch_32_2859.log

Can we get a rebase? 

I am marking the patch "Waiting on Author"




--
Ibrar Ahmed
1234