Quantcast

Interval for launching the table sync worker

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
65 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Interval for launching the table sync worker

Sawada Masahiko
Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Kyotaro HORIGUCHI-2
I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoDCnyRJDUY=[hidden email]>

> Hi all,
>
> While testing table sync worker for logical replication I noticed that
> if the table sync worker of logical replication failed to insert the
> data for whatever reason, the table sync worker process exits with
> error. And then the main apply worker launches the table sync worker
> again soon without interval. This routine is executed at very high
> frequency without interval.
>
> Should we do put a interval (wal_retrieve_interval or make a new GUC
> parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation. Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<[hidden email]> wrote:

> I was thinking the same.
>
> At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoDCnyRJDUY=[hidden email]>
>> Hi all,
>>
>> While testing table sync worker for logical replication I noticed that
>> if the table sync worker of logical replication failed to insert the
>> data for whatever reason, the table sync worker process exits with
>> error. And then the main apply worker launches the table sync worker
>> again soon without interval. This routine is executed at very high
>> frequency without interval.
>>
>> Should we do put a interval (wal_retrieve_interval or make a new GUC
>> parameter?) for launching the table sync worker?
>
> After introducing encoding conversion, untranslatable characters
> in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

> Interval can reduce
> the frequence of reconnecting, but I think that walreciever
> should refrain from reconnecting on unrecoverable(or repeating)
> error in walsender.
>

Yeah, that's also considerable issue.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Kyotaro HORIGUCHI-2
At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>

> On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
> <[hidden email]> wrote:
> > I was thinking the same.
> >
> > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoDCnyRJDUY=[hidden email]>
> >> Hi all,
> >>
> >> While testing table sync worker for logical replication I noticed that
> >> if the table sync worker of logical replication failed to insert the
> >> data for whatever reason, the table sync worker process exits with
> >> error. And then the main apply worker launches the table sync worker
> >> again soon without interval. This routine is executed at very high
> >> frequency without interval.
> >>
> >> Should we do put a interval (wal_retrieve_interval or make a new GUC
> >> parameter?) for launching the table sync worker?
> >
> > After introducing encoding conversion, untranslatable characters
> > in a published table causes this situation.
>
> I think it's better to make a new GUC parameter for the table sync
> worker. Having multiple behaviors in wal_retrieve_retry_interval is
> not good idea. Thought?

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
       PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

> > Interval can reduce
> > the frequence of reconnecting, but I think that walreciever
> > should refrain from reconnecting on unrecoverable(or repeating)
> > error in walsender.
> >
>
> Yeah, that's also considerable issue.

But not to do now.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Kyotaro HORIGUCHI-2
At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>

> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
> > <[hidden email]> wrote:
> > > I was thinking the same.
> > >
> > > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoDCnyRJDUY=[hidden email]>
> > >> Hi all,
> > >>
> > >> While testing table sync worker for logical replication I noticed that
> > >> if the table sync worker of logical replication failed to insert the
> > >> data for whatever reason, the table sync worker process exits with
> > >> error. And then the main apply worker launches the table sync worker
> > >> again soon without interval. This routine is executed at very high
> > >> frequency without interval.
> > >>
> > >> Should we do put a interval (wal_retrieve_interval or make a new GUC
> > >> parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

> > > After introducing encoding conversion, untranslatable characters
> > > in a published table causes this situation.
> >
> > I think it's better to make a new GUC parameter for the table sync
> > worker. Having multiple behaviors in wal_retrieve_retry_interval is
> > not good idea. Thought?

So, this is working, sorry.

> I prefer subscription option than GUC. Something like following.
>
> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
>        PUBLICATION p1 WITH (noreconnect = true);
>
> Stored in pg_subscription?
>
> > > Interval can reduce
> > > the frequence of reconnecting, but I think that walreciever
> > > should refrain from reconnecting on unrecoverable(or repeating)
> > > error in walsender.
> > >
> >
> > Yeah, that's also considerable issue.
>
> But not to do now.

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<[hidden email]> wrote:

> At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>
>> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
>> > On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
>> > <[hidden email]> wrote:
>> > > I was thinking the same.
>> > >
>> > > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoDCnyRJDUY=[hidden email]>
>> > >> Hi all,
>> > >>
>> > >> While testing table sync worker for logical replication I noticed that
>> > >> if the table sync worker of logical replication failed to insert the
>> > >> data for whatever reason, the table sync worker process exits with
>> > >> error. And then the main apply worker launches the table sync worker
>> > >> again soon without interval. This routine is executed at very high
>> > >> frequency without interval.
>> > >>
>> > >> Should we do put a interval (wal_retrieve_interval or make a new GUC
>> > >> parameter?) for launching the table sync worker?
>
> Hmm. I was playing with something wrong. Now I see the invervals
> 5 seconds. No problem.

Yeah, this issue happens only in the table sync worker.

>
>> > > After introducing encoding conversion, untranslatable characters
>> > > in a published table causes this situation.
>> >
>> > I think it's better to make a new GUC parameter for the table sync
>> > worker. Having multiple behaviors in wal_retrieve_retry_interval is
>> > not good idea. Thought?
>
> So, this is working, sorry.
>
>> I prefer subscription option than GUC. Something like following.
>>
>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
>>        PUBLICATION p1 WITH (noreconnect = true);
>>
>> Stored in pg_subscription?
>>
>> > > Interval can reduce
>> > > the frequence of reconnecting, but I think that walreciever
>> > > should refrain from reconnecting on unrecoverable(or repeating)
>> > > error in walsender.
>> > >
>> >
>> > Yeah, that's also considerable issue.
>>
>> But not to do now.
>

I've added this as an open item, and sent a patch for this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Petr Jelinek-4
On 06/04/17 16:44, Masahiko Sawada wrote:

> On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
> <[hidden email]> wrote:
>> At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>
>>> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
>>>> On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
>>>> <[hidden email]> wrote:
>>>>> I was thinking the same.
>>>>>
>>>>> At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoDCnyRJDUY=[hidden email]>
>>>>>> Hi all,
>>>>>>
>>>>>> While testing table sync worker for logical replication I noticed that
>>>>>> if the table sync worker of logical replication failed to insert the
>>>>>> data for whatever reason, the table sync worker process exits with
>>>>>> error. And then the main apply worker launches the table sync worker
>>>>>> again soon without interval. This routine is executed at very high
>>>>>> frequency without interval.
>>>>>>
>>>>>> Should we do put a interval (wal_retrieve_interval or make a new GUC
>>>>>> parameter?) for launching the table sync worker?
>>
>> Hmm. I was playing with something wrong. Now I see the invervals
>> 5 seconds. No problem.
>
> Yeah, this issue happens only in the table sync worker.
>
>>
>>>>> After introducing encoding conversion, untranslatable characters
>>>>> in a published table causes this situation.
>>>>
>>>> I think it's better to make a new GUC parameter for the table sync
>>>> worker. Having multiple behaviors in wal_retrieve_retry_interval is
>>>> not good idea. Thought?
>>
>> So, this is working, sorry.
>>
>>> I prefer subscription option than GUC. Something like following.
>>>
>>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
>>>        PUBLICATION p1 WITH (noreconnect = true);
>>>
>>> Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

>>>
>>>>> Interval can reduce
>>>>> the frequence of reconnecting, but I think that walreciever
>>>>> should refrain from reconnecting on unrecoverable(or repeating)
>>>>> error in walsender.
>>>>>
>>>>
>>>> Yeah, that's also considerable issue.
>>>
>>> But not to do now.
>>
>
> I've added this as an open item, and sent a patch for this.
>

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Kyotaro HORIGUCHI-2
At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <[hidden email]> wrote in <[hidden email]>

> On 06/04/17 16:44, Masahiko Sawada wrote:
> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
> > <[hidden email]> wrote:
> >>> I prefer subscription option than GUC. Something like following.
> >>>
> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
> >>>        PUBLICATION p1 WITH (noreconnect = true);
> >>>
> >>> Stored in pg_subscription?
>
> I don't think that's a very good solution, you'd lose replication on
> every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

> > I've added this as an open item, and sent a patch for this.
> >
>
> I am not exactly sure what's the open item from this thread. To use the
> wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Fri, Apr 7, 2017 at 1:23 PM, Kyotaro HORIGUCHI
<[hidden email]> wrote:

> At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <[hidden email]> wrote in <[hidden email]>
>> On 06/04/17 16:44, Masahiko Sawada wrote:
>> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
>> > <[hidden email]> wrote:
>> >>> I prefer subscription option than GUC. Something like following.
>> >>>
>> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
>> >>>        PUBLICATION p1 WITH (noreconnect = true);
>> >>>
>> >>> Stored in pg_subscription?
>>
>> I don't think that's a very good solution, you'd lose replication on
>> every network glitch, upstream server restart, etc.
>
> Yes, you're right. This would work if apply worker distinguishes
> permanent error. But it is overkill so far.
>
>> > I've added this as an open item, and sent a patch for this.
>> >
>>
>> I am not exactly sure what's the open item from this thread. To use the
>> wal_retrieve_interval to limit table sync restarts?
>
> It's not me. I also don't think this critical.
>

Thank you for the comment.

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Peter Eisentraut-6
On 4/7/17 01:10, Masahiko Sawada wrote:
> It's not critical but it could be problem. So I thought we should fix
> it before the PostgreSQL 10 release. If it's not appropriate as an
> open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<[hidden email]> wrote:
> On 4/7/17 01:10, Masahiko Sawada wrote:
>> It's not critical but it could be problem. So I thought we should fix
>> it before the PostgreSQL 10 release. If it's not appropriate as an
>> open item I'll remove it.
>
> You wrote that you "sent" a patch, but I don't see a patch anywhere.
>
> I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <[hidden email]> wrote:

> On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
> <[hidden email]> wrote:
>> On 4/7/17 01:10, Masahiko Sawada wrote:
>>> It's not critical but it could be problem. So I thought we should fix
>>> it before the PostgreSQL 10 release. If it's not appropriate as an
>>> open item I'll remove it.
>>
>> You wrote that you "sent" a patch, but I don't see a patch anywhere.
>>
>> I think a nonintrusive patch for this could be considered.
>
> Oops, I made a mistake. I'll send a patch tomorrow.
>
I've attached the patch. This patch introduces GUC parameter
table_sync_retry_interval which controls the interval of launching the
table sync worker process.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

table_sync_retry_interval.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Petr Jelinek-4
On 10/04/17 11:02, Masahiko Sawada wrote:

> On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <[hidden email]> wrote:
>> On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
>> <[hidden email]> wrote:
>>> On 4/7/17 01:10, Masahiko Sawada wrote:
>>>> It's not critical but it could be problem. So I thought we should fix
>>>> it before the PostgreSQL 10 release. If it's not appropriate as an
>>>> open item I'll remove it.
>>>
>>> You wrote that you "sent" a patch, but I don't see a patch anywhere.
>>>
>>> I think a nonintrusive patch for this could be considered.
>>
>> Oops, I made a mistake. I'll send a patch tomorrow.
>>
>
> I've attached the patch. This patch introduces GUC parameter
> table_sync_retry_interval which controls the interval of launching the
> table sync worker process.
>

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Peter Eisentraut-6
On 4/10/17 08:10, Petr Jelinek wrote:
> I don't think solution is quite this simple. This will cause all table
> sync workers to be delayed which means concurrency will suffer and the
> initial sync of all tables will take much longer especially if there is
> little data. We need a way to either detect if we are launching same
> worker that was already launched before, or alternatively if we are
> launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
<[hidden email]> wrote:

> On 4/10/17 08:10, Petr Jelinek wrote:
>> I don't think solution is quite this simple. This will cause all table
>> sync workers to be delayed which means concurrency will suffer and the
>> initial sync of all tables will take much longer especially if there is
>> little data. We need a way to either detect if we are launching same
>> worker that was already launched before, or alternatively if we are
>> launching crashed worker and only then apply the delay.
>
> Perhaps instead of a global last_start_time, we store a per relation
> last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Peter Eisentraut-6
On 4/12/17 00:48, Masahiko Sawada wrote:
> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>> Perhaps instead of a global last_start_time, we store a per relation
>> last_start_time in SubscriptionRelState?
>
> I was thinking the same. But a problem is that the list of
> SubscriptionRelState is refreshed whenever the syncing table state
> becomes invalid (table_state_valid = false). I guess we need to
> improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync.  So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window.  That
doesn't seem a terrible issue to me.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<[hidden email]> wrote:

> On 4/12/17 00:48, Masahiko Sawada wrote:
>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>> Perhaps instead of a global last_start_time, we store a per relation
>>> last_start_time in SubscriptionRelState?
>>
>> I was thinking the same. But a problem is that the list of
>> SubscriptionRelState is refreshed whenever the syncing table state
>> becomes invalid (table_state_valid = false). I guess we need to
>> improve these logic including GetSubscriptionNotReadyRelations().
>
> The table states are invalidated on a syscache callback from
> pg_subscription_rel, which happens roughly speaking when a table
> finishes the initial sync.  So if we're worried about failing tablesync
> workers relaunching to quickly, this would only be a problem if a
> tablesync of another table finishes right in that restart window.  That
> doesn't seem a terrible issue to me.
>

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Sawada Masahiko
On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <[hidden email]> wrote:

> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
> <[hidden email]> wrote:
>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>>> Perhaps instead of a global last_start_time, we store a per relation
>>>> last_start_time in SubscriptionRelState?
>>>
>>> I was thinking the same. But a problem is that the list of
>>> SubscriptionRelState is refreshed whenever the syncing table state
>>> becomes invalid (table_state_valid = false). I guess we need to
>>> improve these logic including GetSubscriptionNotReadyRelations().
>>
>> The table states are invalidated on a syscache callback from
>> pg_subscription_rel, which happens roughly speaking when a table
>> finishes the initial sync.  So if we're worried about failing tablesync
>> workers relaunching to quickly, this would only be a problem if a
>> tablesync of another table finishes right in that restart window.  That
>> doesn't seem a terrible issue to me.
>>
>
> I think the table states are invalidated whenever the table sync
> worker starts, because the table sync worker updates its status of
> pg_subscription_rel and commits it before starting actual copy. So we
> cannot rely on that. I thought we can store last_start_time into
> pg_subscription_rel but it might be overkill. I'm now thinking to
> change GetSubscriptionNotReadyRealtions so that last_start_time in
> SubscriptionRelState is taken over to new list.
>
Attached the latest patch. It didn't actually necessary to change
GetSubscriptionNotReadyRelations. I just changed the logic refreshing
the sync table state list.
Please give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

table_sync_retry_interval_v2.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Kyotaro HORIGUCHI-2
In reply to this post by Sawada Masahiko
I confused sync and apply workers.
sync worker failure at start causes immediate retries.

At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>

> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
> <[hidden email]> wrote:
> > On 4/12/17 00:48, Masahiko Sawada wrote:
> >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
> >>> Perhaps instead of a global last_start_time, we store a per relation
> >>> last_start_time in SubscriptionRelState?
> >>
> >> I was thinking the same. But a problem is that the list of
> >> SubscriptionRelState is refreshed whenever the syncing table state
> >> becomes invalid (table_state_valid = false). I guess we need to
> >> improve these logic including GetSubscriptionNotReadyRelations().
> >
> > The table states are invalidated on a syscache callback from
> > pg_subscription_rel, which happens roughly speaking when a table
> > finishes the initial sync.  So if we're worried about failing tablesync
> > workers relaunching to quickly, this would only be a problem if a
> > tablesync of another table finishes right in that restart window.  That
> > doesn't seem a terrible issue to me.
> >
>
> I think the table states are invalidated whenever the table sync
> worker starts, because the table sync worker updates its status of
> pg_subscription_rel and commits it before starting actual copy. So we
> cannot rely on that. I thought we can store last_start_time into
> pg_subscription_rel but it might be overkill. I'm now thinking to
> change GetSubscriptionNotReadyRealtions so that last_start_time in
> SubscriptionRelState is taken over to new list.

This resolves the problem but, if I understand correctly, the
many pallocs in process_syncing_tables_for_apply() is working on
ApplyContext and the context is reset before the next visit here
(in LogicalRepApplyLoop).

Although this is not a problem of this patch, this is a problem
generally.


regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interval for launching the table sync worker

Kyotaro HORIGUCHI-2
Ouch! I replied to wrong mail.

At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>

> I confused sync and apply workers.
> sync worker failure at start causes immediate retries.
>
> At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
> > <[hidden email]> wrote:
> > > On 4/12/17 00:48, Masahiko Sawada wrote:
> > >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
> > >>> Perhaps instead of a global last_start_time, we store a per relation
> > >>> last_start_time in SubscriptionRelState?
> > >>
> > >> I was thinking the same. But a problem is that the list of
> > >> SubscriptionRelState is refreshed whenever the syncing table state
> > >> becomes invalid (table_state_valid = false). I guess we need to
> > >> improve these logic including GetSubscriptionNotReadyRelations().
> > >
> > > The table states are invalidated on a syscache callback from
> > > pg_subscription_rel, which happens roughly speaking when a table
> > > finishes the initial sync.  So if we're worried about failing tablesync
> > > workers relaunching to quickly, this would only be a problem if a
> > > tablesync of another table finishes right in that restart window.  That
> > > doesn't seem a terrible issue to me.
> > >
> >
> > I think the table states are invalidated whenever the table sync
> > worker starts, because the table sync worker updates its status of
> > pg_subscription_rel and commits it before starting actual copy. So we
> > cannot rely on that. I thought we can store last_start_time into
> > pg_subscription_rel but it might be overkill. I'm now thinking to
> > change GetSubscriptionNotReadyRealtions so that last_start_time in
> > SubscriptionRelState is taken over to new list.

The right target of "This" below is found at the following URL.

https://www.postgresql.org/message-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG%2Bekd7SkzLmg%40mail.gmail.com

> This resolves the problem but, if I understand correctly, the
> many pallocs in process_syncing_tables_for_apply() is working on
> ApplyContext and the context is reset before the next visit here
> (in LogicalRepApplyLoop).
>
> Although this is not a problem of this patch, this is a problem
> generally.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234
Previous Thread Next Thread
Loading...