Resetting spilled txn statistics in pg_stat_replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
On Thu, 25 Jun 2020 at 19:35, Amit Kapila <[hidden email]> wrote:

>
> On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <[hidden email]> wrote:
> >
> > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra
> > <[hidden email]> wrote:
> > >
> > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > > ><[hidden email]> wrote:
> > > >>
> > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <[hidden email]> wrote:
> > > >> >
> > > >> > >
> > > >> > >What if the decoding has been performed by multiple backends using the
> > > >> > >same slot?  In that case, it will be difficult to make the judgment
> > > >> > >for the value of logical_decoding_work_mem based on stats.  It would
> > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a
> > > >> > >slot but not sure if that is better than what we have now.
> > > >> > >
> > > >>
> > > >> I thought that the stats are relevant to what
> > > >> logical_decoding_work_mem value was but not with who performed logical
> > > >> decoding. So even if multiple backends perform logical decoding using
> > > >> the same slot, the user can directly use stats as long as
> > > >> logical_decoding_work_mem value doesn’t change.
> > > >>
>
> Today, I thought about it again, and if we consider the point that
> logical_decoding_work_mem value doesn’t change much then having the
> stats at slot-level would also allow computing
> logical_decoding_work_mem based on stats.  Do you think it is a
> reasonable assumption that users won't change
> logical_decoding_work_mem for different processes (WALSender, etc.)?

FWIW, if we use logical_decoding_work_mem as a threshold of starting
of sending changes to a subscriber, I think there might be use cases
where the user wants to set different logical_decoding_work_mem values
to different wal senders. For example, setting a lower value to
minimize the latency of synchronous logical replication to a near-site
whereas setting a large value to minimize the amount of data sent to a
far site.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 25 Jun 2020 at 19:35, Amit Kapila <[hidden email]> wrote:
> >
> > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <[hidden email]> wrote:
> > >
> > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra
> > > <[hidden email]> wrote:
> > > >
> > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > > > ><[hidden email]> wrote:
> > > > >>
> > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <[hidden email]> wrote:
> > > > >> >
> > > > >> > >
> > > > >> > >What if the decoding has been performed by multiple backends using the
> > > > >> > >same slot?  In that case, it will be difficult to make the judgment
> > > > >> > >for the value of logical_decoding_work_mem based on stats.  It would
> > > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a
> > > > >> > >slot but not sure if that is better than what we have now.
> > > > >> > >
> > > > >>
> > > > >> I thought that the stats are relevant to what
> > > > >> logical_decoding_work_mem value was but not with who performed logical
> > > > >> decoding. So even if multiple backends perform logical decoding using
> > > > >> the same slot, the user can directly use stats as long as
> > > > >> logical_decoding_work_mem value doesn’t change.
> > > > >>
> >
> > Today, I thought about it again, and if we consider the point that
> > logical_decoding_work_mem value doesn’t change much then having the
> > stats at slot-level would also allow computing
> > logical_decoding_work_mem based on stats.  Do you think it is a
> > reasonable assumption that users won't change
> > logical_decoding_work_mem for different processes (WALSender, etc.)?
>
> FWIW, if we use logical_decoding_work_mem as a threshold of starting
> of sending changes to a subscriber, I think there might be use cases
> where the user wants to set different logical_decoding_work_mem values
> to different wal senders. For example, setting a lower value to
> minimize the latency of synchronous logical replication to a near-site
> whereas setting a large value to minimize the amount of data sent to a
> far site.
>

How does setting a large value can minimize the amount of data sent?
One possibility is if there are a lot of transaction aborts and
transactions are not large enough that they cross
logical_decoding_work_mem threshold but such cases shouldn't be many.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Magnus Hagander-2
In reply to this post by Tomas Vondra-4


On Tue, Jun 23, 2020 at 12:18 PM Tomas Vondra <[hidden email]> wrote:
On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
>On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
><[hidden email]> wrote:
>>
>> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <[hidden email]> wrote:
>> >
>> > >
>> > >What if the decoding has been performed by multiple backends using the
>> > >same slot?  In that case, it will be difficult to make the judgment
>> > >for the value of logical_decoding_work_mem based on stats.  It would
>> > >make sense if we provide a way to set logical_decoding_work_mem for a
>> > >slot but not sure if that is better than what we have now.
>> > >
>>
>> I thought that the stats are relevant to what
>> logical_decoding_work_mem value was but not with who performed logical
>> decoding. So even if multiple backends perform logical decoding using
>> the same slot, the user can directly use stats as long as
>> logical_decoding_work_mem value doesn’t change.
>>
>
>I think if you maintain these stats at the slot level, you probably
>need to use spinlock or atomic ops in order to update those as slots
>can be used from multiple backends whereas currently, we don't need
>that.

IMHO storing the stats in the slot itself is a bad idea. We have the
statistics collector for exactly this purpose, and it's receiving data
over UDP without any extra locking etc.

Yeah, that seems much more appropriate. Of course, where they are exposed is a different question.


>> > >What problems do we see in displaying these for each process?  I think
>> > >users might want to see the stats for the exited processes or after
>> > >server restart but I think both of those are not even possible today.
>> > >I think the stats are available till the corresponding WALSender
>> > >process is active.
>>
>> I might want to see the stats for the exited processes or after server
>> restart. But I'm inclined to agree with displaying the stats per
>> process if the stats are displayed on a separate view (e.g.
>> pg_stat_replication_slots).
>>
>
>Yeah, as told previously, this makes more sense to me.
>
>Do you think we should try to write a POC patch using a per-process
>entry approach and see what difficulties we are facing and does it
>give the stats in a way we are imagining but OTOH, we can wait for
>some more to see if there is clear winner approach here?
>

I may be missing something obvious, but I still see no point in tracking
per-process stats. We don't have that for other stats, and I'm not sure
how common is the scenario when a given slot is decoded by many
backends. I'd say vast majority of cases are simply running decoding
from a walsender, which may occasionally restart, but I doubt the users
are interested in per-pid data - they probably want aggregated data.

Well, technically we do -- we have the pg_stat_xact_* views. However, those are only viewable from *inside* the session itself (which can sometimes be quite annoying).

This does somewhat apply in that normal transactions send their stats batches at transaction end. If this is data we'd be interested in viewing inside of that, a more direct exposure would be needed -- such as the way we do with LSNs in pg_stat_replication or whatever.

For long-term monitoring, people definitely want aggregate data I'd say. The "realtime data" if we call it that is in my experience mostly interesting if you want to define alerts etc ("replication standby is too far behind" is alertable through that, whereas things like "total amount of replication traffic over the past hour" is something that's more trend-alertable which is typically handled in a separate system pulling the aggregate stats)


Can someone explain a plausible scenario for which tracking per-process
stats would be needed, and simply computing deltas would not work? How
will you know which old PID is which, what will you do when a PID is
reused, and so on?

I fail to see that one as well, in a real-world scenario. Maybe if you want to do a one-off point-tuning of one tiny piece of a system? But you will then also need to long term statistics to follow-up if what you did was correct anyway... 

--
Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
In reply to this post by akapila
On Fri, 26 Jun 2020 at 17:53, Amit Kapila <[hidden email]> wrote:

>
> On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <[hidden email]> wrote:
> > >
> > > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <[hidden email]> wrote:
> > > >
> > > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra
> > > > <[hidden email]> wrote:
> > > > >
> > > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> > > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > > > > ><[hidden email]> wrote:
> > > > > >>
> > > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <[hidden email]> wrote:
> > > > > >> >
> > > > > >> > >
> > > > > >> > >What if the decoding has been performed by multiple backends using the
> > > > > >> > >same slot?  In that case, it will be difficult to make the judgment
> > > > > >> > >for the value of logical_decoding_work_mem based on stats.  It would
> > > > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a
> > > > > >> > >slot but not sure if that is better than what we have now.
> > > > > >> > >
> > > > > >>
> > > > > >> I thought that the stats are relevant to what
> > > > > >> logical_decoding_work_mem value was but not with who performed logical
> > > > > >> decoding. So even if multiple backends perform logical decoding using
> > > > > >> the same slot, the user can directly use stats as long as
> > > > > >> logical_decoding_work_mem value doesn’t change.
> > > > > >>
> > >
> > > Today, I thought about it again, and if we consider the point that
> > > logical_decoding_work_mem value doesn’t change much then having the
> > > stats at slot-level would also allow computing
> > > logical_decoding_work_mem based on stats.  Do you think it is a
> > > reasonable assumption that users won't change
> > > logical_decoding_work_mem for different processes (WALSender, etc.)?
> >
> > FWIW, if we use logical_decoding_work_mem as a threshold of starting
> > of sending changes to a subscriber, I think there might be use cases
> > where the user wants to set different logical_decoding_work_mem values
> > to different wal senders. For example, setting a lower value to
> > minimize the latency of synchronous logical replication to a near-site
> > whereas setting a large value to minimize the amount of data sent to a
> > far site.
> >
>
> How does setting a large value can minimize the amount of data sent?
> One possibility is if there are a lot of transaction aborts and
> transactions are not large enough that they cross
> logical_decoding_work_mem threshold but such cases shouldn't be many.

Yeah, this is what I meant.

I agree that it would not be a common case that the user sets
different values for different processes. Based on that assumption, I
also think having the stats at slot-level is a good idea. But I might
want to have the reset function.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada
<[hidden email]> wrote:

>
> On Fri, 26 Jun 2020 at 17:53, Amit Kapila <[hidden email]> wrote:
> >
> > On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada
> > <[hidden email]> wrote:
> > >
> > > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <[hidden email]> wrote:
> > > >
> > > >
> > > > Today, I thought about it again, and if we consider the point that
> > > > logical_decoding_work_mem value doesn’t change much then having the
> > > > stats at slot-level would also allow computing
> > > > logical_decoding_work_mem based on stats.  Do you think it is a
> > > > reasonable assumption that users won't change
> > > > logical_decoding_work_mem for different processes (WALSender, etc.)?
> > >
> > > FWIW, if we use logical_decoding_work_mem as a threshold of starting
> > > of sending changes to a subscriber, I think there might be use cases
> > > where the user wants to set different logical_decoding_work_mem values
> > > to different wal senders. For example, setting a lower value to
> > > minimize the latency of synchronous logical replication to a near-site
> > > whereas setting a large value to minimize the amount of data sent to a
> > > far site.
> > >
> >
> > How does setting a large value can minimize the amount of data sent?
> > One possibility is if there are a lot of transaction aborts and
> > transactions are not large enough that they cross
> > logical_decoding_work_mem threshold but such cases shouldn't be many.
>
> Yeah, this is what I meant.
>
> I agree that it would not be a common case that the user sets
> different values for different processes. Based on that assumption, I
> also think having the stats at slot-level is a good idea.
>

Okay.

> But I might
> want to have the reset function.
>

I don't mind but lets fist see how the patch for the basic feature
looks and what is required to implement it?  Are you interested in
writing the patch for this work?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
On Mon, 29 Jun 2020 at 20:37, Amit Kapila <[hidden email]> wrote:

>
> On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Fri, 26 Jun 2020 at 17:53, Amit Kapila <[hidden email]> wrote:
> > >
> > > On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada
> > > <[hidden email]> wrote:
> > > >
> > > > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <[hidden email]> wrote:
> > > > >
> > > > >
> > > > > Today, I thought about it again, and if we consider the point that
> > > > > logical_decoding_work_mem value doesn’t change much then having the
> > > > > stats at slot-level would also allow computing
> > > > > logical_decoding_work_mem based on stats.  Do you think it is a
> > > > > reasonable assumption that users won't change
> > > > > logical_decoding_work_mem for different processes (WALSender, etc.)?
> > > >
> > > > FWIW, if we use logical_decoding_work_mem as a threshold of starting
> > > > of sending changes to a subscriber, I think there might be use cases
> > > > where the user wants to set different logical_decoding_work_mem values
> > > > to different wal senders. For example, setting a lower value to
> > > > minimize the latency of synchronous logical replication to a near-site
> > > > whereas setting a large value to minimize the amount of data sent to a
> > > > far site.
> > > >
> > >
> > > How does setting a large value can minimize the amount of data sent?
> > > One possibility is if there are a lot of transaction aborts and
> > > transactions are not large enough that they cross
> > > logical_decoding_work_mem threshold but such cases shouldn't be many.
> >
> > Yeah, this is what I meant.
> >
> > I agree that it would not be a common case that the user sets
> > different values for different processes. Based on that assumption, I
> > also think having the stats at slot-level is a good idea.
> >
>
> Okay.
>
> > But I might
> > want to have the reset function.
> >
>
> I don't mind but lets fist see how the patch for the basic feature
> looks and what is required to implement it?  Are you interested in
> writing the patch for this work?

Yes, I'll write the draft patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

akapila
On Tue, Jun 30, 2020 at 6:38 AM Masahiko Sawada
<[hidden email]> wrote:

>
> On Mon, 29 Jun 2020 at 20:37, Amit Kapila <[hidden email]> wrote:
> >
> > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada
> > <[hidden email]> wrote:
> > >
> > > I agree that it would not be a common case that the user sets
> > > different values for different processes. Based on that assumption, I
> > > also think having the stats at slot-level is a good idea.
> > >
> >
> > Okay.
> >
> > > But I might
> > > want to have the reset function.
> > >
> >
> > I don't mind but lets fist see how the patch for the basic feature
> > looks and what is required to implement it?  Are you interested in
> > writing the patch for this work?
>
> Yes, I'll write the draft patch.
>

Great, thanks.  One thing we can consider is that instead of storing
the stats directly in the slot we can consider sending it to stats
collector as suggested by Tomas.  Basically that can avoid contention
around slots (See discussion in email [1]).  I have not evaluated any
of the approaches in detail so you can let us know the advantage of
one over another.  Now, you might be already considering this but I
thought it is better to share what I have in mind rather than saying
that later once you have the draft patch ready.

[1] - https://www.postgresql.org/message-id/20200623101831.it6lzwbm37xwquco%40development

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Masahiko Sawada-2
On Tue, 30 Jun 2020 at 12:58, Amit Kapila <[hidden email]> wrote:

>
> On Tue, Jun 30, 2020 at 6:38 AM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Mon, 29 Jun 2020 at 20:37, Amit Kapila <[hidden email]> wrote:
> > >
> > > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada
> > > <[hidden email]> wrote:
> > > >
> > > > I agree that it would not be a common case that the user sets
> > > > different values for different processes. Based on that assumption, I
> > > > also think having the stats at slot-level is a good idea.
> > > >
> > >
> > > Okay.
> > >
> > > > But I might
> > > > want to have the reset function.
> > > >
> > >
> > > I don't mind but lets fist see how the patch for the basic feature
> > > looks and what is required to implement it?  Are you interested in
> > > writing the patch for this work?
> >
> > Yes, I'll write the draft patch.
> >
>
> Great, thanks.  One thing we can consider is that instead of storing
> the stats directly in the slot we can consider sending it to stats
> collector as suggested by Tomas.  Basically that can avoid contention
> around slots (See discussion in email [1]).  I have not evaluated any
> of the approaches in detail so you can let us know the advantage of
> one over another.  Now, you might be already considering this but I
> thought it is better to share what I have in mind rather than saying
> that later once you have the draft patch ready.
Thanks! Yes, I'm working on this patch while considering to send the
stats to stats collector.

I've attached PoC patch that implements a simple approach. I'd like to
discuss how we collect the replication slot statistics in the stats
collector before I bring the patch to completion.

In this PoC patch, we have the array of PgStat_ReplSlotStats struct
which has max_replication_slots entries. The backend and wal sender
send the slot statistics to the stats collector when decoding a commit
WAL record.

typedef struct PgStat_ReplSlotStats
{
    char            slotname[NAMEDATALEN];
    PgStat_Counter  spill_txns;
    PgStat_Counter  spill_count;
    PgStat_Counter  spill_bytes;
} PgStat_ReplSlotStats;

What I'd like to discuss are:

Since the unique identifier of replication slots is the name, the
process sends slot statistics along with slot name to stats collector.
I'm concerned about the amount of data sent to the stats collector and
the cost of searching the statistics within the statistics array (it’s
O(N) where N is max_replication_slots). Since the maximum length of
slot name is NAMEDATALEN (64 bytes default) and max_replication_slots
is unlikely a large number, I might be too worrying but it seems like
it’s better to avoid that if we can do that easily. An idea I came up
with is to use the index of slots (i.g., the index of
ReplicationSlotCtl->replication_slots[]) as the index of statistics of
slot in the stats collector. But since the index of slots could change
after the restart we need to synchronize the index of slots on both
array somehow. So I thought that we can determine the index of the
statistics of slots at ReplicationSlotAcquire() or
ReplicationSlotCreate(), but it will in turn need to read stats file
while holding ReplicationSlotControlLock to prevent the same index of
the statistics being used by the concurrent process who creating a
slot. I might be missing something though.

Second, as long as the unique identifier is the slot name there is no
convenient way to distinguish between the same name old and new
replication slots, so the backend process or wal sender process sends
a message to the stats collector to drop the replication slot at
ReplicationSlotDropPtr(). This strategy differs from what we do for
table, index, and function statistics. It might not be a problem but
I’m thinking a better way.

The new view name is also an open question. I prefer
pg_stat_replication_slots and to add stats of physical replication
slots to the same view in the future, rather than a separate view.

Aside from the above, this patch will change the most of the changes
introduced by commit 9290ad198b1 and introduce new code much. I’m
concerned whether such changes are acceptable at the time of beta 2.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

spill_stats_poc.patch (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Resetting spilled txn statistics in pg_stat_replication

Ajin Cherian


On Thu, Jul 2, 2020 at 1:31 PM Masahiko Sawada <[hidden email]> wrote:


Thanks! Yes, I'm working on this patch while considering to send the
stats to stats collector.

I've attached PoC patch that implements a simple approach. I'd like to
discuss how we collect the replication slot statistics in the stats
collector before I bring the patch to completion.


I understand the patch is only in the initial stage but I just tried testing it. Using the patch, I enabled logical replication and created two pub/subs (sub1,sub2) for two seperate tables (t1,t2). I inserted data into the second table (t2) such that it spills into disk.
Then when I checked the stats using the new function pg_stat_get_replication_slots() , I see that the same stats are updated for both the slots, when ideally it should have reflected in the second slot alone.

postgres=# SELECT s.name, s.spill_txns,    s.spill_count,    s.spill_bytes   FROM pg_stat_get_replication_slots() s(name, spill_txns, spill_count, spill_bytes);
 name | spill_txns | spill_count | spill_bytes
------+------------+-------------+-------------
 sub1 |          1 |          20 |  1320000000
 sub2 |          1 |          20 |  1320000000
(2 rows)

I haven't debugged the issue yet, I can if you wish but just thought I'd let you know what I found.

thanks,
Ajin Cherian
Fujitsu Australia

123