Resetting spilled txn statistics in pg_stat_replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
Hi all,

Tracking of spilled transactions has been introduced to PG13. These
new statistics values, spill_txns, spill_count, and spill_bytes, are
cumulative total values unlike other statistics values in
pg_stat_replication. How can we reset these values? We can reset
statistics values in other statistics views using by
pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
the only option to reset spilled transactions is to restart logical
replication but it's surely high cost.

It might have been discussed during development but it's worth having
a SQL function to reset these statistics?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Kyotaro Horiguchi-4
At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <[hidden email]> wrote in

> Hi all,
>
> Tracking of spilled transactions has been introduced to PG13. These
> new statistics values, spill_txns, spill_count, and spill_bytes, are
> cumulative total values unlike other statistics values in
> pg_stat_replication. How can we reset these values? We can reset
> statistics values in other statistics views using by
> pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
> the only option to reset spilled transactions is to restart logical
> replication but it's surely high cost.
>
> It might have been discussed during development but it's worth having
> a SQL function to reset these statistics?

Actually, I don't see pg_stat_reset() useful so much except for our
regression test (or might be rather harmful for monitoring aids).  So
I doubt the usefulness of the feature, but having it makes things more
consistent.

Anyway I think the most significant point of implementing the feature
would be user interface.  Adding pg_stat_replication_reset(pid int)
doesn't seem to be a good thing to do..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Fujii Masao-4


On 2020/06/02 16:00, Kyotaro Horiguchi wrote:
> At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <[hidden email]> wrote in
>> Hi all,
>>
>> Tracking of spilled transactions has been introduced to PG13. These
>> new statistics values, spill_txns, spill_count, and spill_bytes, are
>> cumulative total values unlike other statistics values in
>> pg_stat_replication.

Basically I don't think it's good design to mix dynamic and collected
stats in one view. It might be better to separate them into different
new stats view. It's too late to add new stats view for v13, though....

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
In reply to this post by Masahiko Sawada-2
On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
<[hidden email]> wrote:

>
> Hi all,
>
> Tracking of spilled transactions has been introduced to PG13. These
> new statistics values, spill_txns, spill_count, and spill_bytes, are
> cumulative total values unlike other statistics values in
> pg_stat_replication. How can we reset these values? We can reset
> statistics values in other statistics views using by
> pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
> the only option to reset spilled transactions is to restart logical
> replication but it's surely high cost.
>

I see your point but I don't see a pressing need for such a function
for PG13.  Basically, these counters will be populated when we have
large transactions in the system so not sure how much is the use case
for such a function. Note that we need to add additional column
stats_reset in pg_stat_replication view as well similar to what we
have in pg_stat_archiver and pg_stat_bgwriter.  OTOH, I don't see any
big reason for not having such a function for PG14.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
In reply to this post by Fujii Masao-4
On Tue, 2 Jun 2020 at 17:22, Fujii Masao <[hidden email]> wrote:

>
>
>
> On 2020/06/02 16:00, Kyotaro Horiguchi wrote:
> > At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <[hidden email]> wrote in
> >> Hi all,
> >>
> >> Tracking of spilled transactions has been introduced to PG13. These
> >> new statistics values, spill_txns, spill_count, and spill_bytes, are
> >> cumulative total values unlike other statistics values in
> >> pg_stat_replication.
>
> Basically I don't think it's good design to mix dynamic and collected
> stats in one view. It might be better to separate them into different
> new stats view.  It's too late to add new stats view for v13, though....

Yeah, actually I had the same impression when studying this feature.
Another benefit of having a separate view for such statistics would be
that it can also support logical decoding invoked by SQL interface.
Currently, reorder buffer always tracks statistics of spilled
transactions but we can see it only in using logical replication.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
In reply to this post by Fujii Masao-4
On Tue, Jun 2, 2020 at 1:52 PM Fujii Masao <[hidden email]> wrote:

>
> On 2020/06/02 16:00, Kyotaro Horiguchi wrote:
> > At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <[hidden email]> wrote in
> >> Hi all,
> >>
> >> Tracking of spilled transactions has been introduced to PG13. These
> >> new statistics values, spill_txns, spill_count, and spill_bytes, are
> >> cumulative total values unlike other statistics values in
> >> pg_stat_replication.
>
> Basically I don't think it's good design to mix dynamic and collected
> stats in one view. It might be better to separate them into different
> new stats view.
>

I think this is worth considering but note that we already have a
similar mix in other views like pg_stat_archiver (archived_count and
failed_count are dynamic whereas other columns are static).  On the
one hand, it is good if we have a separate view for such dynamic
information but OTOH users need to consult more views for replication
information.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
In reply to this post by Kyotaro Horiguchi-4
On Tue, 2 Jun 2020 at 16:00, Kyotaro Horiguchi <[hidden email]> wrote:

>
> At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <[hidden email]> wrote in
> > Hi all,
> >
> > Tracking of spilled transactions has been introduced to PG13. These
> > new statistics values, spill_txns, spill_count, and spill_bytes, are
> > cumulative total values unlike other statistics values in
> > pg_stat_replication. How can we reset these values? We can reset
> > statistics values in other statistics views using by
> > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
> > the only option to reset spilled transactions is to restart logical
> > replication but it's surely high cost.
> >
> > It might have been discussed during development but it's worth having
> > a SQL function to reset these statistics?
>
> Actually, I don't see pg_stat_reset() useful so much except for our
> regression test (or might be rather harmful for monitoring aids).  So
> I doubt the usefulness of the feature, but having it makes things more
> consistent.

IMO these reset functions are useful for verifications. I often use
them before starting performance evaluations.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
In reply to this post by akapila
On Tue, 2 Jun 2020 at 18:34, Amit Kapila <[hidden email]> wrote:

>
> On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > Hi all,
> >
> > Tracking of spilled transactions has been introduced to PG13. These
> > new statistics values, spill_txns, spill_count, and spill_bytes, are
> > cumulative total values unlike other statistics values in
> > pg_stat_replication. How can we reset these values? We can reset
> > statistics values in other statistics views using by
> > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
> > the only option to reset spilled transactions is to restart logical
> > replication but it's surely high cost.
> >
>
> I see your point but I don't see a pressing need for such a function
> for PG13.  Basically, these counters will be populated when we have
> large transactions in the system so not sure how much is the use case
> for such a function. Note that we need to add additional column
> stats_reset in pg_stat_replication view as well similar to what we
> have in pg_stat_archiver and pg_stat_bgwriter.  OTOH, I don't see any
> big reason for not having such a function for PG14.

Ok. I think the reset function is mostly for evaluations or rare
cases. In either case, since it's not an urgent case we can postpone
it to PG14 if necessary.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Magnus Hagander-2


On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <[hidden email]> wrote:
On Tue, 2 Jun 2020 at 18:34, Amit Kapila <[hidden email]> wrote:
>
> On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > Hi all,
> >
> > Tracking of spilled transactions has been introduced to PG13. These
> > new statistics values, spill_txns, spill_count, and spill_bytes, are
> > cumulative total values unlike other statistics values in
> > pg_stat_replication. How can we reset these values? We can reset
> > statistics values in other statistics views using by
> > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
> > the only option to reset spilled transactions is to restart logical
> > replication but it's surely high cost.

You just have to "bounce" the worker though, right? You don't have to actually restart logical replication, just disconnect and reconnect?


> I see your point but I don't see a pressing need for such a function
> for PG13.  Basically, these counters will be populated when we have
> large transactions in the system so not sure how much is the use case
> for such a function. Note that we need to add additional column
> stats_reset in pg_stat_replication view as well similar to what we
> have in pg_stat_archiver and pg_stat_bgwriter.  OTOH, I don't see any
> big reason for not having such a function for PG14.

Ok. I think the reset function is mostly for evaluations or rare
cases. In either case, since it's not an urgent case we can postpone
it to PG14 if necessary.

Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats. (it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every other column in it)

However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as it disconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example.

Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will lose data.

ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?) That is, that the logical grouping of these statistics for long-term is the replication slot, not the walsender?

--
Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Tue, Jun 9, 2020 at 1:54 PM Magnus Hagander <[hidden email]> wrote:

>
> On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <[hidden email]> wrote:
>>
>> On Tue, 2 Jun 2020 at 18:34, Amit Kapila <[hidden email]> wrote:
>> >
>> > I see your point but I don't see a pressing need for such a function
>> > for PG13.  Basically, these counters will be populated when we have
>> > large transactions in the system so not sure how much is the use case
>> > for such a function. Note that we need to add additional column
>> > stats_reset in pg_stat_replication view as well similar to what we
>> > have in pg_stat_archiver and pg_stat_bgwriter.  OTOH, I don't see any
>> > big reason for not having such a function for PG14.
>>
>> Ok. I think the reset function is mostly for evaluations or rare
>> cases. In either case, since it's not an urgent case we can postpone
>> it to PG14 if necessary.
>
>
> Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats. (it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every other column in it)
>
> However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as it disconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example.
>
> Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will lose data.
>
> ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?) That is, that the logical grouping of these statistics for long-term is the replication slot, not the walsender?
>

I think I see one advantage of displaying these stats at slot level.
Currently, we won't be able to see these stats when we use SQL
Interface APIs (like pg_logical_get_slot_changes) to decode the WAL
but if we display at slot level, then we should be able to see it.

I would prefer to display it in pg_replication_slots just to avoid
creating more views but OTOH, if a new view like
pg_stat_replication_slots sounds better place for these stats then I
am fine with it too.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
In reply to this post by Magnus Hagander-2
On Tue, 9 Jun 2020 at 17:24, Magnus Hagander <[hidden email]> wrote:

>
>
>
> On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <[hidden email]> wrote:
>>
>> On Tue, 2 Jun 2020 at 18:34, Amit Kapila <[hidden email]> wrote:
>> >
>> > On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
>> > <[hidden email]> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > Tracking of spilled transactions has been introduced to PG13. These
>> > > new statistics values, spill_txns, spill_count, and spill_bytes, are
>> > > cumulative total values unlike other statistics values in
>> > > pg_stat_replication. How can we reset these values? We can reset
>> > > statistics values in other statistics views using by
>> > > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
>> > > the only option to reset spilled transactions is to restart logical
>> > > replication but it's surely high cost.
>
>
> You just have to "bounce" the worker though, right? You don't have to actually restart logical replication, just disconnect and reconnect?

Right.

>
>
>> > I see your point but I don't see a pressing need for such a function
>> > for PG13.  Basically, these counters will be populated when we have
>> > large transactions in the system so not sure how much is the use case
>> > for such a function. Note that we need to add additional column
>> > stats_reset in pg_stat_replication view as well similar to what we
>> > have in pg_stat_archiver and pg_stat_bgwriter.  OTOH, I don't see any
>> > big reason for not having such a function for PG14.
>>
>> Ok. I think the reset function is mostly for evaluations or rare
>> cases. In either case, since it's not an urgent case we can postpone
>> it to PG14 if necessary.
>
>
> Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats. (it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every other column in it)
>
> However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as it disconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example.
>
> Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will lose data.
>
> ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?) That is, that the logical grouping of these statistics for long-term is the replication slot, not the walsender?

I personally prefer to display these values in pg_replication_slots.
If we create a new stats view, it's only for logical replication
slots? Or displaying both types of slots as physical replication slots
might have statistics in the future?

If we move these values to replication slots, I think we can change
the code so that these statistics are managed by replication slots
(e.g. ReplicationSlot struct). Once ReplicationSlot has these values,
we can keep them beyond reconnections or multiple calls of SQL
interface functions. Of course, these values don’t need to be
persisted.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Magnus Hagander-2
On Wed, Jun 10, 2020 at 9:01 AM Masahiko Sawada <[hidden email]> wrote:
On Tue, 9 Jun 2020 at 17:24, Magnus Hagander <[hidden email]> wrote:
>
>
>
> On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <[hidden email]> wrote:
>>
>> On Tue, 2 Jun 2020 at 18:34, Amit Kapila <[hidden email]> wrote:
>> >
>> > On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
>> > <[hidden email]> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > Tracking of spilled transactions has been introduced to PG13. These
>> > > new statistics values, spill_txns, spill_count, and spill_bytes, are
>> > > cumulative total values unlike other statistics values in
>> > > pg_stat_replication. How can we reset these values? We can reset
>> > > statistics values in other statistics views using by
>> > > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
>> > > the only option to reset spilled transactions is to restart logical
>> > > replication but it's surely high cost.
>
>
> You just have to "bounce" the worker though, right? You don't have to actually restart logical replication, just disconnect and reconnect?

Right.

>
>
>> > I see your point but I don't see a pressing need for such a function
>> > for PG13.  Basically, these counters will be populated when we have
>> > large transactions in the system so not sure how much is the use case
>> > for such a function. Note that we need to add additional column
>> > stats_reset in pg_stat_replication view as well similar to what we
>> > have in pg_stat_archiver and pg_stat_bgwriter.  OTOH, I don't see any
>> > big reason for not having such a function for PG14.
>>
>> Ok. I think the reset function is mostly for evaluations or rare
>> cases. In either case, since it's not an urgent case we can postpone
>> it to PG14 if necessary.
>
>
> Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats. (it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every other column in it)
>
> However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as it disconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example.
>
> Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will lose data.
>
> ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?) That is, that the logical grouping of these statistics for long-term is the replication slot, not the walsender?

I personally prefer to display these values in pg_replication_slots.
If we create a new stats view, it's only for logical replication
slots? Or displaying both types of slots as physical replication slots
might have statistics in the future?

Yeah, I think it's kind of a weird situation. There's already some things in pg_replication_slots that should probably be in a stat_ view,  so if we were to create one we would have to move those, and probably needlessly break things for people. 

i guess we could have separate views for logical and pysical slots since there are things that only one of them will have. But there is that already -- the database for example, and xmins. Splitting that apart now should be a bigger thing, but I don't think it's worth it.


If we move these values to replication slots, I think we can change
the code so that these statistics are managed by replication slots
(e.g. ReplicationSlot struct). Once ReplicationSlot has these values,
we can keep them beyond reconnections or multiple calls of SQL
interface functions. Of course, these values don’t need to be
persisted.

Eh, why should they not be persisted? The comparison would be temp_files and temp_bytes in pg_stat_database, and those *are* persisted. 

--
Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Thu, Jun 11, 2020 at 12:35 AM Magnus Hagander <[hidden email]> wrote:

>
> On Wed, Jun 10, 2020 at 9:01 AM Masahiko Sawada <[hidden email]> wrote:
>>
>
>> If we move these values to replication slots, I think we can change
>> the code so that these statistics are managed by replication slots
>> (e.g. ReplicationSlot struct). Once ReplicationSlot has these values,
>> we can keep them beyond reconnections or multiple calls of SQL
>> interface functions. Of course, these values don’t need to be
>> persisted.
>
>
> Eh, why should they not be persisted?
>

Because these stats are corresponding to a ReoderBuffer (logical
decoding) which will be allocated fresh after restart and have no
relation with what has happened before restart.

Now, thinking about this again, I am not sure if these stats are
directly related to slots. These are stats for logical decoding which
can be performed either via WALSender or decoding plugin (via APIs).
So, why not have them displayed in a new view like pg_stat_logical (or
pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
will need to add similar stats for streaming of in-progress
transactions as well (see patch 0007-Track-statistics-for-streaming at
[1]), so having a separate view for these doesn't sound illogical.

> The comparison would be temp_files and temp_bytes in pg_stat_database, and those *are* persisted.

I am not able to see a one-on-one mapping of those stats with what is
being discussed here.

[1] - https://www.postgresql.org/message-id/CAFiTN-vXQx_161WC-a9HvNaF25nwO%3DJJRpRdTtyfGQHbM3Bd1Q%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:

>
> On Thu, Jun 11, 2020 at 12:35 AM Magnus Hagander <[hidden email]> wrote:
> >
> > On Wed, Jun 10, 2020 at 9:01 AM Masahiko Sawada <[hidden email]> wrote:
> >>
> >
> >> If we move these values to replication slots, I think we can change
> >> the code so that these statistics are managed by replication slots
> >> (e.g. ReplicationSlot struct). Once ReplicationSlot has these values,
> >> we can keep them beyond reconnections or multiple calls of SQL
> >> interface functions. Of course, these values don’t need to be
> >> persisted.
> >
> >
> > Eh, why should they not be persisted?
> >
>
> Because these stats are corresponding to a ReoderBuffer (logical
> decoding) which will be allocated fresh after restart and have no
> relation with what has happened before restart.

I thought the same. But I now think there is no difference between
reconnecting replication and server restart in terms of the logical
decoding context. Even if we persist these values, we might want to
reset these values after crash recovery like stats collector.

>
> Now, thinking about this again, I am not sure if these stats are
> directly related to slots. These are stats for logical decoding which
> can be performed either via WALSender or decoding plugin (via APIs).
> So, why not have them displayed in a new view like pg_stat_logical (or
> pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
> will need to add similar stats for streaming of in-progress
> transactions as well (see patch 0007-Track-statistics-for-streaming at
> [1]), so having a separate view for these doesn't sound illogical.
>

I think we need to decide how long we want to remain these statistics
values. That is, if we were to have such pg_stat_logical view, these
values would remain until logical decoding finished since I think the
view would display only running logical decoding. OTOH, if we were to
correspond these stats to slots, these values would remain beyond
multiple logical decoding SQL API calls.

I think one of the main use-case of these statistics is the tuning of
logical_decoding_work_mem. This seems similar to a combination of
pg_stat_database.temp_files/temp_bytes and work_mem. From this
perspective, I guess it’s useful for users if these values remain
until or the slots are removed or server crashes. Given that the kinds
of logical decoding statistics might grow, having a separate view
dedicated to replication slots makes sense to me.

For updating these statistics, if we correspond these statistics to
logical decoding or replication slot, we can change the strategy of
updating statistics so that it doesn’t depend on logical decoding
plugin implementation. If updating statistics doesn’t affect
performance much, it’s better to track the statistics regardless of
plugins.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:
> >
> >
> > Now, thinking about this again, I am not sure if these stats are
> > directly related to slots. These are stats for logical decoding which
> > can be performed either via WALSender or decoding plugin (via APIs).
> > So, why not have them displayed in a new view like pg_stat_logical (or
> > pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
> > will need to add similar stats for streaming of in-progress
> > transactions as well (see patch 0007-Track-statistics-for-streaming at
> > [1]), so having a separate view for these doesn't sound illogical.
> >
>
> I think we need to decide how long we want to remain these statistics
> values. That is, if we were to have such pg_stat_logical view, these
> values would remain until logical decoding finished since I think the
> view would display only running logical decoding. OTOH, if we were to
> correspond these stats to slots, these values would remain beyond
> multiple logical decoding SQL API calls.
>

I thought of having these till the process that performs these
operations exist.  So for WALSender, the stats will be valid till it
is not restarted due to some reason or when performed via backend, the
stats will be valid till the corresponding backend exits.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
On Thu, 11 Jun 2020 at 18:11, Amit Kapila <[hidden email]> wrote:

>
> On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:
> > >
> > >
> > > Now, thinking about this again, I am not sure if these stats are
> > > directly related to slots. These are stats for logical decoding which
> > > can be performed either via WALSender or decoding plugin (via APIs).
> > > So, why not have them displayed in a new view like pg_stat_logical (or
> > > pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
> > > will need to add similar stats for streaming of in-progress
> > > transactions as well (see patch 0007-Track-statistics-for-streaming at
> > > [1]), so having a separate view for these doesn't sound illogical.
> > >
> >
> > I think we need to decide how long we want to remain these statistics
> > values. That is, if we were to have such pg_stat_logical view, these
> > values would remain until logical decoding finished since I think the
> > view would display only running logical decoding. OTOH, if we were to
> > correspond these stats to slots, these values would remain beyond
> > multiple logical decoding SQL API calls.
> >
>
> I thought of having these till the process that performs these
> operations exist.  So for WALSender, the stats will be valid till it
> is not restarted due to some reason or when performed via backend, the
> stats will be valid till the corresponding backend exits.
>

The number of rows of that view could be up to (max_backends +
max_wal_senders). Is that right? What if different backends used the
same replication slot one after the other?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 11 Jun 2020 at 18:11, Amit Kapila <[hidden email]> wrote:
> >
> > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
> > <[hidden email]> wrote:
> > >
> > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:
> > > >
> > > >
> > > > Now, thinking about this again, I am not sure if these stats are
> > > > directly related to slots. These are stats for logical decoding which
> > > > can be performed either via WALSender or decoding plugin (via APIs).
> > > > So, why not have them displayed in a new view like pg_stat_logical (or
> > > > pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
> > > > will need to add similar stats for streaming of in-progress
> > > > transactions as well (see patch 0007-Track-statistics-for-streaming at
> > > > [1]), so having a separate view for these doesn't sound illogical.
> > > >
> > >
> > > I think we need to decide how long we want to remain these statistics
> > > values. That is, if we were to have such pg_stat_logical view, these
> > > values would remain until logical decoding finished since I think the
> > > view would display only running logical decoding. OTOH, if we were to
> > > correspond these stats to slots, these values would remain beyond
> > > multiple logical decoding SQL API calls.
> > >
> >
> > I thought of having these till the process that performs these
> > operations exist.  So for WALSender, the stats will be valid till it
> > is not restarted due to some reason or when performed via backend, the
> > stats will be valid till the corresponding backend exits.
> >
>
> The number of rows of that view could be up to (max_backends +
> max_wal_senders). Is that right? What if different backends used the
> same replication slot one after the other?
>

Yeah, it would be tricky if multiple slots are used by the same
backend.  We could probably track the number of times decoding has
happened by the session that will probably help us in averaging the
spill amount.  If we think that the aim is to help users to tune
logical_decoding_work_mem to avoid frequent spilling or streaming then
how would maintaining at slot level will help?  As you said previously
we could track it only for running logical decoding context but if we
do that then data will be temporary and the user needs to constantly
monitor the same to make sense of it but maybe that is fine.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
On Thu, 11 Jun 2020 at 20:02, Amit Kapila <[hidden email]> wrote:

>
> On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Thu, 11 Jun 2020 at 18:11, Amit Kapila <[hidden email]> wrote:
> > >
> > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
> > > <[hidden email]> wrote:
> > > >
> > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:
> > > > >
> > > > >
> > > > > Now, thinking about this again, I am not sure if these stats are
> > > > > directly related to slots. These are stats for logical decoding which
> > > > > can be performed either via WALSender or decoding plugin (via APIs).
> > > > > So, why not have them displayed in a new view like pg_stat_logical (or
> > > > > pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
> > > > > will need to add similar stats for streaming of in-progress
> > > > > transactions as well (see patch 0007-Track-statistics-for-streaming at
> > > > > [1]), so having a separate view for these doesn't sound illogical.
> > > > >
> > > >
> > > > I think we need to decide how long we want to remain these statistics
> > > > values. That is, if we were to have such pg_stat_logical view, these
> > > > values would remain until logical decoding finished since I think the
> > > > view would display only running logical decoding. OTOH, if we were to
> > > > correspond these stats to slots, these values would remain beyond
> > > > multiple logical decoding SQL API calls.
> > > >
> > >
> > > I thought of having these till the process that performs these
> > > operations exist.  So for WALSender, the stats will be valid till it
> > > is not restarted due to some reason or when performed via backend, the
> > > stats will be valid till the corresponding backend exits.
> > >
> >
> > The number of rows of that view could be up to (max_backends +
> > max_wal_senders). Is that right? What if different backends used the
> > same replication slot one after the other?
> >
>
> Yeah, it would be tricky if multiple slots are used by the same
> backend.  We could probably track the number of times decoding has
> happened by the session that will probably help us in averaging the
> spill amount.  If we think that the aim is to help users to tune
> logical_decoding_work_mem to avoid frequent spilling or streaming then
> how would maintaining at slot level will help?

Since the logical decoding intermediate files are written at per slots
directory, I thought that corresponding these statistics to
replication slots is also understandable for users. I was thinking
something like pg_stat_logical_replication_slot view which shows
slot_name and statistics of only logical replication slots. The view
always shows rows as many as existing replication slots regardless of
logical decoding being running. I think there is no big difference in
how users use these statistics values between maintaining at slot
level and at logical decoding level.

In logical replication case, since we generally don’t support setting
different logical_decoding_work_mem per wal senders, every wal sender
will decode the same WAL stream with the same setting, meaning they
will similarly spill intermediate files. Maybe the same is true
statistics of streaming. So having these statistics per logical
replication might not help as of now.

> As you said previously
> we could track it only for running logical decoding context but if we
> do that then data will be temporary and the user needs to constantly
> monitor the same to make sense of it but maybe that is fine.

Agreed, in general, it's better not to frequently reset cumulative
values. I personally would like these statistics to be valid even
after the process executing logical decoding exits (or even after
server restart), and to do reset them as needed.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 11 Jun 2020 at 20:02, Amit Kapila <[hidden email]> wrote:
> >
> > On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada
> > <[hidden email]> wrote:
> > >
> > > On Thu, 11 Jun 2020 at 18:11, Amit Kapila <[hidden email]> wrote:
> > > >
> > > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
> > > > <[hidden email]> wrote:
> > > > >
> > > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:
> > > > > >
> > > > > >
> > > > > > Now, thinking about this again, I am not sure if these stats are
> > > > > > directly related to slots. These are stats for logical decoding which
> > > > > > can be performed either via WALSender or decoding plugin (via APIs).
> > > > > > So, why not have them displayed in a new view like pg_stat_logical (or
> > > > > > pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
> > > > > > will need to add similar stats for streaming of in-progress
> > > > > > transactions as well (see patch 0007-Track-statistics-for-streaming at
> > > > > > [1]), so having a separate view for these doesn't sound illogical.
> > > > > >
> > > > >
> > > > > I think we need to decide how long we want to remain these statistics
> > > > > values. That is, if we were to have such pg_stat_logical view, these
> > > > > values would remain until logical decoding finished since I think the
> > > > > view would display only running logical decoding. OTOH, if we were to
> > > > > correspond these stats to slots, these values would remain beyond
> > > > > multiple logical decoding SQL API calls.
> > > > >
> > > >
> > > > I thought of having these till the process that performs these
> > > > operations exist.  So for WALSender, the stats will be valid till it
> > > > is not restarted due to some reason or when performed via backend, the
> > > > stats will be valid till the corresponding backend exits.
> > > >
> > >
> > > The number of rows of that view could be up to (max_backends +
> > > max_wal_senders). Is that right? What if different backends used the
> > > same replication slot one after the other?
> > >
> >
> > Yeah, it would be tricky if multiple slots are used by the same
> > backend.  We could probably track the number of times decoding has
> > happened by the session that will probably help us in averaging the
> > spill amount.  If we think that the aim is to help users to tune
> > logical_decoding_work_mem to avoid frequent spilling or streaming then
> > how would maintaining at slot level will help?
>
> Since the logical decoding intermediate files are written at per slots
> directory, I thought that corresponding these statistics to
> replication slots is also understandable for users.
>

What I wanted to know is how will it help users to tune
logical_decoding_work_mem?  Different backends can process from the
same slot, so it is not clear how user will be able to make any
meaning out of those stats.  OTOH, it is easier to see how to make
meaning of these stats if we display them w.r.t process.  Basically,
we have spill_count and spill_size which can be used to tune
logical_decoding_work_mem and also the activity of spilling happens at
process level, so it sounds like one-to-one mapping.  I am not telling
to rule out maintaining a slot level but trying to see if we can come
up with a clear definition.

> I was thinking
> something like pg_stat_logical_replication_slot view which shows
> slot_name and statistics of only logical replication slots. The view
> always shows rows as many as existing replication slots regardless of
> logical decoding being running. I think there is no big difference in
> how users use these statistics values between maintaining at slot
> level and at logical decoding level.
>
> In logical replication case, since we generally don’t support setting
> different logical_decoding_work_mem per wal senders, every wal sender
> will decode the same WAL stream with the same setting, meaning they
> will similarly spill intermediate files.
>

I am not sure this will be true in every case.  We do have a
slot_advance functionality, so some plugin might use that and that
will lead to different files getting spilled.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Fujii Masao-4


On 2020/06/12 12:21, Amit Kapila wrote:

> On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada
> <[hidden email]> wrote:
>>
>> On Thu, 11 Jun 2020 at 20:02, Amit Kapila <[hidden email]> wrote:
>>>
>>> On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada
>>> <[hidden email]> wrote:
>>>>
>>>> On Thu, 11 Jun 2020 at 18:11, Amit Kapila <[hidden email]> wrote:
>>>>>
>>>>> On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>> On Thu, 11 Jun 2020 at 12:30, Amit Kapila <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Now, thinking about this again, I am not sure if these stats are
>>>>>>> directly related to slots. These are stats for logical decoding which
>>>>>>> can be performed either via WALSender or decoding plugin (via APIs).
>>>>>>> So, why not have them displayed in a new view like pg_stat_logical (or
>>>>>>> pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
>>>>>>> will need to add similar stats for streaming of in-progress
>>>>>>> transactions as well (see patch 0007-Track-statistics-for-streaming at
>>>>>>> [1]), so having a separate view for these doesn't sound illogical.
>>>>>>>
>>>>>>
>>>>>> I think we need to decide how long we want to remain these statistics
>>>>>> values. That is, if we were to have such pg_stat_logical view, these
>>>>>> values would remain until logical decoding finished since I think the
>>>>>> view would display only running logical decoding. OTOH, if we were to
>>>>>> correspond these stats to slots, these values would remain beyond
>>>>>> multiple logical decoding SQL API calls.
>>>>>>
>>>>>
>>>>> I thought of having these till the process that performs these
>>>>> operations exist.  So for WALSender, the stats will be valid till it
>>>>> is not restarted due to some reason or when performed via backend, the
>>>>> stats will be valid till the corresponding backend exits.
>>>>>
>>>>
>>>> The number of rows of that view could be up to (max_backends +
>>>> max_wal_senders). Is that right? What if different backends used the
>>>> same replication slot one after the other?
>>>>
>>>
>>> Yeah, it would be tricky if multiple slots are used by the same
>>> backend.  We could probably track the number of times decoding has
>>> happened by the session that will probably help us in averaging the
>>> spill amount.  If we think that the aim is to help users to tune
>>> logical_decoding_work_mem to avoid frequent spilling or streaming then
>>> how would maintaining at slot level will help?
>>
>> Since the logical decoding intermediate files are written at per slots
>> directory, I thought that corresponding these statistics to
>> replication slots is also understandable for users.
>>
>
> What I wanted to know is how will it help users to tune
> logical_decoding_work_mem?  Different backends can process from the
> same slot, so it is not clear how user will be able to make any
> meaning out of those stats.  OTOH, it is easier to see how to make
> meaning of these stats if we display them w.r.t process.  Basically,
> we have spill_count and spill_size which can be used to tune
> logical_decoding_work_mem and also the activity of spilling happens at
> process level, so it sounds like one-to-one mapping.  I am not telling
> to rule out maintaining a slot level but trying to see if we can come
> up with a clear definition.
>
>> I was thinking
>> something like pg_stat_logical_replication_slot view which shows
>> slot_name and statistics of only logical replication slots. The view
>> always shows rows as many as existing replication slots regardless of
>> logical decoding being running. I think there is no big difference in
>> how users use these statistics values between maintaining at slot
>> level and at logical decoding level.
>>
>> In logical replication case, since we generally don’t support setting
>> different logical_decoding_work_mem per wal senders, every wal sender
>> will decode the same WAL stream with the same setting, meaning they
>> will similarly spill intermediate files.

I was thinking we support that. We can create multiple replication users
with different logical_decoding_work_mem settings. Also each walsender
can use logical_decoding_work_mem configured in its user. No?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


123