Quantcast

logical replication and PANIC during shutdown checkpoint in publisher

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

logical replication and PANIC during shutdown checkpoint in publisher

Fujii Masao-2
Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC:  concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Regards,

--
Fujii Masao


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Peter Eisentraut-6
On 4/12/17 09:55, Fujii Masao wrote:
> To fix this issue, we should terminate walsender for logical replication
> before shutdown checkpoint starts. Of course walsender for physical
> replication still needs to keep running until shutdown checkpoint ends,
> though.

Can we turn it into a kind of read-only or no-new-commands mode instead,
so it can keep streaming but not accept any new actions?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Fujii Masao-2
On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
<[hidden email]> wrote:
> On 4/12/17 09:55, Fujii Masao wrote:
>> To fix this issue, we should terminate walsender for logical replication
>> before shutdown checkpoint starts. Of course walsender for physical
>> replication still needs to keep running until shutdown checkpoint ends,
>> though.
>
> Can we turn it into a kind of read-only or no-new-commands mode instead,
> so it can keep streaming but not accept any new actions?

So we make walsenders switch to that mode and wait for all the already-ongoing
their "write" commands to finish, and then we start a shutdown checkpoint?
This is an idea, but seems a bit complicated. ISTM that it's simpler to
terminate only walsenders for logical rep before shutdown checkpoint.

Regards,

--
Fujii Masao


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Michael Paquier
On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <[hidden email]> wrote:

> On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
> <[hidden email]> wrote:
>> On 4/12/17 09:55, Fujii Masao wrote:
>>> To fix this issue, we should terminate walsender for logical replication
>>> before shutdown checkpoint starts. Of course walsender for physical
>>> replication still needs to keep running until shutdown checkpoint ends,
>>> though.
>>
>> Can we turn it into a kind of read-only or no-new-commands mode instead,
>> so it can keep streaming but not accept any new actions?
>
> So we make walsenders switch to that mode and wait for all the already-ongoing
> their "write" commands to finish, and then we start a shutdown checkpoint?
> This is an idea, but seems a bit complicated. ISTM that it's simpler to
> terminate only walsenders for logical rep before shutdown checkpoint.

Perhaps my memory is failing me here... But in clean shutdowns we do
shut down WAL senders after the checkpoint has completed so as we are
sure that they have flushed the LSN corresponding to the checkpoint
ending, right? Why introducing an inconsistency for logical workers?
It seems to me that logical workers should fail under the same rules.
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Fujii Masao-2
On Thu, Apr 13, 2017 at 12:36 PM, Michael Paquier
<[hidden email]> wrote:

> On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <[hidden email]> wrote:
>> On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
>> <[hidden email]> wrote:
>>> On 4/12/17 09:55, Fujii Masao wrote:
>>>> To fix this issue, we should terminate walsender for logical replication
>>>> before shutdown checkpoint starts. Of course walsender for physical
>>>> replication still needs to keep running until shutdown checkpoint ends,
>>>> though.
>>>
>>> Can we turn it into a kind of read-only or no-new-commands mode instead,
>>> so it can keep streaming but not accept any new actions?
>>
>> So we make walsenders switch to that mode and wait for all the already-ongoing
>> their "write" commands to finish, and then we start a shutdown checkpoint?
>> This is an idea, but seems a bit complicated. ISTM that it's simpler to
>> terminate only walsenders for logical rep before shutdown checkpoint.
>
> Perhaps my memory is failing me here... But in clean shutdowns we do
> shut down WAL senders after the checkpoint has completed so as we are
> sure that they have flushed the LSN corresponding to the checkpoint
> ending, right?

Yes.

> Why introducing an inconsistency for logical workers?
> It seems to me that logical workers should fail under the same rules.

Could you tell me why? You think that even walsender for logical rep
needs to stream the shutdown checkpoint WAL record to the subscriber?
I was not thinking that's true.

Regards,

--
Fujii Masao


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Michael Paquier
On Fri, Apr 14, 2017 at 3:03 AM, Fujii Masao <[hidden email]> wrote:

> On Thu, Apr 13, 2017 at 12:36 PM, Michael Paquier
> <[hidden email]> wrote:
>> On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <[hidden email]> wrote:
>>> On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
>>> <[hidden email]> wrote:
>>>> On 4/12/17 09:55, Fujii Masao wrote:
>>>>> To fix this issue, we should terminate walsender for logical replication
>>>>> before shutdown checkpoint starts. Of course walsender for physical
>>>>> replication still needs to keep running until shutdown checkpoint ends,
>>>>> though.
>>>>
>>>> Can we turn it into a kind of read-only or no-new-commands mode instead,
>>>> so it can keep streaming but not accept any new actions?
>>>
>>> So we make walsenders switch to that mode and wait for all the already-ongoing
>>> their "write" commands to finish, and then we start a shutdown checkpoint?
>>> This is an idea, but seems a bit complicated. ISTM that it's simpler to
>>> terminate only walsenders for logical rep before shutdown checkpoint.
>>
>> Perhaps my memory is failing me here... But in clean shutdowns we do
>> shut down WAL senders after the checkpoint has completed so as we are
>> sure that they have flushed the LSN corresponding to the checkpoint
>> ending, right?
>
> Yes.
>
>> Why introducing an inconsistency for logical workers?
>> It seems to me that logical workers should fail under the same rules.
>
> Could you tell me why? You think that even walsender for logical rep
> needs to stream the shutdown checkpoint WAL record to the subscriber?
> I was not thinking that's true.

For physical replication, the property to wait that standbys have
flushed the LSN of the shutdown checkpoint can be important for
switchovers. For example, with a primary and a standby, it is possible
to stop cleanly the master, promote the standby, and then connect back
to the cluster the old primary as a standby to the now-new primary
with the guarantee that both are in a consistent state. It seems to me
that having similar guarantees for logical replication is important.

Now, I have not reviewed the code of logirep in details at the level
of Peter, Petr or yourself...
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Petr Jelinek-4
In reply to this post by Fujii Masao-2
On 12/04/17 15:55, Fujii Masao wrote:

> Hi,
>
> When I shut down the publisher while I repeated creating and dropping
> the subscription in the subscriber, the publisher emitted the following
> PANIC error during shutdown checkpoint.
>
> PANIC:  concurrent transaction log activity while database system is
> shutting down
>
> The cause of this problem is that walsender for logical replication can
> generate WAL records even during shutdown checkpoint.
>
> Firstly walsender keeps running until shutdown checkpoint finishes
> so that all the WAL including shutdown checkpoint record can be
> replicated to the standby. This was safe because previously walsender
> could not generate WAL records. However this assumption became
> invalid because of logical replication. That is, currenty walsender for
> logical replication can generate WAL records, for example, by executing
> CREATE_REPLICATION_SLOT command. This is an oversight in
> logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case - if it's one of the replication commands
and not SQL then we'll need to backpatch any solution we come up with to
9.4, if it's not replication command, we only need to fix 10.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Fujii Masao-2
On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
<[hidden email]> wrote:

> On 12/04/17 15:55, Fujii Masao wrote:
>> Hi,
>>
>> When I shut down the publisher while I repeated creating and dropping
>> the subscription in the subscriber, the publisher emitted the following
>> PANIC error during shutdown checkpoint.
>>
>> PANIC:  concurrent transaction log activity while database system is
>> shutting down
>>
>> The cause of this problem is that walsender for logical replication can
>> generate WAL records even during shutdown checkpoint.
>>
>> Firstly walsender keeps running until shutdown checkpoint finishes
>> so that all the WAL including shutdown checkpoint record can be
>> replicated to the standby. This was safe because previously walsender
>> could not generate WAL records. However this assumption became
>> invalid because of logical replication. That is, currenty walsender for
>> logical replication can generate WAL records, for example, by executing
>> CREATE_REPLICATION_SLOT command. This is an oversight in
>> logical replication patch, I think.
>
> Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
> that the issue with walsender still exist (since we now allow normal SQL
> to run there) but I think it's important to identify what exactly causes
> the WAL activity in your case

At least in my case, the following CREATE_REPLICATION_SLOT command
generated WAL record.

    BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
    CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;

Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
generated.

    rmgr: Standby     len (rec/tot):     24/    50, tx:          0,
lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
latestCompletedXid 691 oldestRunningXid 692

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Regards,

--
Fujii Masao


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Petr Jelinek-4
On 14/04/17 19:33, Fujii Masao wrote:

> On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
> <[hidden email]> wrote:
>> On 12/04/17 15:55, Fujii Masao wrote:
>>> Hi,
>>>
>>> When I shut down the publisher while I repeated creating and dropping
>>> the subscription in the subscriber, the publisher emitted the following
>>> PANIC error during shutdown checkpoint.
>>>
>>> PANIC:  concurrent transaction log activity while database system is
>>> shutting down
>>>
>>> The cause of this problem is that walsender for logical replication can
>>> generate WAL records even during shutdown checkpoint.
>>>
>>> Firstly walsender keeps running until shutdown checkpoint finishes
>>> so that all the WAL including shutdown checkpoint record can be
>>> replicated to the standby. This was safe because previously walsender
>>> could not generate WAL records. However this assumption became
>>> invalid because of logical replication. That is, currenty walsender for
>>> logical replication can generate WAL records, for example, by executing
>>> CREATE_REPLICATION_SLOT command. This is an oversight in
>>> logical replication patch, I think.
>>
>> Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
>> that the issue with walsender still exist (since we now allow normal SQL
>> to run there) but I think it's important to identify what exactly causes
>> the WAL activity in your case
>
> At least in my case, the following CREATE_REPLICATION_SLOT command
> generated WAL record.
>
>     BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
>     CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;
>
> Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
> generated.
>
>     rmgr: Standby     len (rec/tot):     24/    50, tx:          0,
> lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
> latestCompletedXid 691 oldestRunningXid 692
>
> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
> and which generates WAL record about snapshot of running transactions.
>

Ah yes looking at the code, it does exactly that (on master only). Means
that backport will be necessary.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Peter Eisentraut-6
On 4/14/17 14:23, Petr Jelinek wrote:
> Ah yes looking at the code, it does exactly that (on master only). Means
> that backport will be necessary.

I think these two sentences are contradicting each other.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Petr Jelinek-4
On 14/04/17 21:05, Peter Eisentraut wrote:
> On 4/14/17 14:23, Petr Jelinek wrote:
>> Ah yes looking at the code, it does exactly that (on master only). Means
>> that backport will be necessary.
>
> I think these two sentences are contradicting each other.
>

Hehe, didn't realize master will be taken as master branch, I meant
master as in not standby :)

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Noah Misch-2
In reply to this post by Fujii Masao-2
On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:

> When I shut down the publisher while I repeated creating and dropping
> the subscription in the subscriber, the publisher emitted the following
> PANIC error during shutdown checkpoint.
>
> PANIC:  concurrent transaction log activity while database system is
> shutting down
>
> The cause of this problem is that walsender for logical replication can
> generate WAL records even during shutdown checkpoint.
>
> Firstly walsender keeps running until shutdown checkpoint finishes
> so that all the WAL including shutdown checkpoint record can be
> replicated to the standby. This was safe because previously walsender
> could not generate WAL records. However this assumption became
> invalid because of logical replication. That is, currenty walsender for
> logical replication can generate WAL records, for example, by executing
> CREATE_REPLICATION_SLOT command. This is an oversight in
> logical replication patch, I think.
>
> To fix this issue, we should terminate walsender for logical replication
> before shutdown checkpoint starts. Of course walsender for physical
> replication still needs to keep running until shutdown checkpoint ends,
> though.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Petr Jelinek-4
On 16/04/17 08:12, Noah Misch wrote:

> On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:
>> When I shut down the publisher while I repeated creating and dropping
>> the subscription in the subscriber, the publisher emitted the following
>> PANIC error during shutdown checkpoint.
>>
>> PANIC:  concurrent transaction log activity while database system is
>> shutting down
>>
>> The cause of this problem is that walsender for logical replication can
>> generate WAL records even during shutdown checkpoint.
>>
>> Firstly walsender keeps running until shutdown checkpoint finishes
>> so that all the WAL including shutdown checkpoint record can be
>> replicated to the standby. This was safe because previously walsender
>> could not generate WAL records. However this assumption became
>> invalid because of logical replication. That is, currenty walsender for
>> logical replication can generate WAL records, for example, by executing
>> CREATE_REPLICATION_SLOT command. This is an oversight in
>> logical replication patch, I think.
>>
>> To fix this issue, we should terminate walsender for logical replication
>> before shutdown checkpoint starts. Of course walsender for physical
>> replication still needs to keep running until shutdown checkpoint ends,
>> though.
>
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
>

Just FYI this is not new in 10, the issue exists since the 9.4
introduction of logical replication slots.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Andres Freund
In reply to this post by Fujii Masao-2
On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:

> On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
> <[hidden email]> wrote:
> > On 12/04/17 15:55, Fujii Masao wrote:
> >> Hi,
> >>
> >> When I shut down the publisher while I repeated creating and dropping
> >> the subscription in the subscriber, the publisher emitted the following
> >> PANIC error during shutdown checkpoint.
> >>
> >> PANIC:  concurrent transaction log activity while database system is
> >> shutting down
> >>
> >> The cause of this problem is that walsender for logical replication can
> >> generate WAL records even during shutdown checkpoint.
> >>
> >> Firstly walsender keeps running until shutdown checkpoint finishes
> >> so that all the WAL including shutdown checkpoint record can be
> >> replicated to the standby. This was safe because previously walsender
> >> could not generate WAL records. However this assumption became
> >> invalid because of logical replication. That is, currenty walsender for
> >> logical replication can generate WAL records, for example, by executing
> >> CREATE_REPLICATION_SLOT command. This is an oversight in
> >> logical replication patch, I think.
> >
> > Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
> > that the issue with walsender still exist (since we now allow normal SQL
> > to run there) but I think it's important to identify what exactly causes
> > the WAL activity in your case
>
> At least in my case, the following CREATE_REPLICATION_SLOT command
> generated WAL record.
>
>     BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
>     CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;
>
> Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
> generated.
>
>     rmgr: Standby     len (rec/tot):     24/    50, tx:          0,
> lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
> latestCompletedXid 691 oldestRunningXid 692
>
> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
> and which generates WAL record about snapshot of running transactions.

Erroring out in these cases sounds easy enough.  Wonder if there's not a
bigger problem with WAL records generated e.g. by HOT pruning or such,
during decoding.  Not super likely, but would probably hit exactly the
same, no?

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Petr Jelinek-4
On 17/04/17 18:02, Andres Freund wrote:

> On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:
>> On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
>> <[hidden email]> wrote:
>>> On 12/04/17 15:55, Fujii Masao wrote:
>>>> Hi,
>>>>
>>>> When I shut down the publisher while I repeated creating and dropping
>>>> the subscription in the subscriber, the publisher emitted the following
>>>> PANIC error during shutdown checkpoint.
>>>>
>>>> PANIC:  concurrent transaction log activity while database system is
>>>> shutting down
>>>>
>>>> The cause of this problem is that walsender for logical replication can
>>>> generate WAL records even during shutdown checkpoint.
>>>>
>>>> Firstly walsender keeps running until shutdown checkpoint finishes
>>>> so that all the WAL including shutdown checkpoint record can be
>>>> replicated to the standby. This was safe because previously walsender
>>>> could not generate WAL records. However this assumption became
>>>> invalid because of logical replication. That is, currenty walsender for
>>>> logical replication can generate WAL records, for example, by executing
>>>> CREATE_REPLICATION_SLOT command. This is an oversight in
>>>> logical replication patch, I think.
>>>
>>> Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
>>> that the issue with walsender still exist (since we now allow normal SQL
>>> to run there) but I think it's important to identify what exactly causes
>>> the WAL activity in your case
>>
>> At least in my case, the following CREATE_REPLICATION_SLOT command
>> generated WAL record.
>>
>>     BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
>>     CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;
>>
>> Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
>> generated.
>>
>>     rmgr: Standby     len (rec/tot):     24/    50, tx:          0,
>> lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
>> latestCompletedXid 691 oldestRunningXid 692
>>
>> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
>> and which generates WAL record about snapshot of running transactions.
>
> Erroring out in these cases sounds easy enough.  Wonder if there's not a
> bigger problem with WAL records generated e.g. by HOT pruning or such,
> during decoding.  Not super likely, but would probably hit exactly the
> same, no?
>

Sounds possible, yes. Sounds like that's going to be nontrivial to fix
though.

Another problem is that queries can run on walsender now. But that
should be possible to detect and shutdown just like backend.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Andres Freund
On 2017-04-17 18:28:16 +0200, Petr Jelinek wrote:

> On 17/04/17 18:02, Andres Freund wrote:
> > On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:
> >> On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
> >> <[hidden email]> wrote:
> >>> On 12/04/17 15:55, Fujii Masao wrote:
> >>>> Hi,
> >>>>
> >>>> When I shut down the publisher while I repeated creating and dropping
> >>>> the subscription in the subscriber, the publisher emitted the following
> >>>> PANIC error during shutdown checkpoint.
> >>>>
> >>>> PANIC:  concurrent transaction log activity while database system is
> >>>> shutting down
> >>>>
> >>>> The cause of this problem is that walsender for logical replication can
> >>>> generate WAL records even during shutdown checkpoint.
> >>>>
> >>>> Firstly walsender keeps running until shutdown checkpoint finishes
> >>>> so that all the WAL including shutdown checkpoint record can be
> >>>> replicated to the standby. This was safe because previously walsender
> >>>> could not generate WAL records. However this assumption became
> >>>> invalid because of logical replication. That is, currenty walsender for
> >>>> logical replication can generate WAL records, for example, by executing
> >>>> CREATE_REPLICATION_SLOT command. This is an oversight in
> >>>> logical replication patch, I think.
> >>>
> >>> Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
> >>> that the issue with walsender still exist (since we now allow normal SQL
> >>> to run there) but I think it's important to identify what exactly causes
> >>> the WAL activity in your case
> >>
> >> At least in my case, the following CREATE_REPLICATION_SLOT command
> >> generated WAL record.
> >>
> >>     BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
> >>     CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;
> >>
> >> Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
> >> generated.
> >>
> >>     rmgr: Standby     len (rec/tot):     24/    50, tx:          0,
> >> lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
> >> latestCompletedXid 691 oldestRunningXid 692
> >>
> >> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
> >> and which generates WAL record about snapshot of running transactions.
> >
> > Erroring out in these cases sounds easy enough.  Wonder if there's not a
> > bigger problem with WAL records generated e.g. by HOT pruning or such,
> > during decoding.  Not super likely, but would probably hit exactly the
> > same, no?
> >
>
> Sounds possible, yes. Sounds like that's going to be nontrivial to fix
> though.
>
> Another problem is that queries can run on walsender now. But that
> should be possible to detect and shutdown just like backend.

This sounds like a case for s/PANIC/ERROR|FATAL/ to me...


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Peter Eisentraut-6
On 4/17/17 12:30, Andres Freund wrote:

>>>> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
>>>> and which generates WAL record about snapshot of running transactions.
>>>
>>> Erroring out in these cases sounds easy enough.  Wonder if there's not a
>>> bigger problem with WAL records generated e.g. by HOT pruning or such,
>>> during decoding.  Not super likely, but would probably hit exactly the
>>> same, no?
>>
>> Sounds possible, yes. Sounds like that's going to be nontrivial to fix
>> though.
>>
>> Another problem is that queries can run on walsender now. But that
>> should be possible to detect and shutdown just like backend.
>
> This sounds like a case for s/PANIC/ERROR|FATAL/ to me...

I'd imagine the postmaster would tell the walsender that it has started
shutdown, and then the walsender would reject $certain_things.  But I
don't see an existing way for the walsender to know that shutdown has
been initiated.  SIGINT is still free ...

The alternative of shutting down logical walsenders earlier also doesn't
look straightforward, since the postmaster doesn't know directly what
kind of walsender a certain process is.  So you'd also need additional
signal types or something there.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Michael Paquier
On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
<[hidden email]> wrote:
> I'd imagine the postmaster would tell the walsender that it has started
> shutdown, and then the walsender would reject $certain_things.  But I
> don't see an existing way for the walsender to know that shutdown has
> been initiated.  SIGINT is still free ...

The WAL sender receives SIGUSR2 from the postmaster when shutdown is
initiated, so why not just rely on that and issue an ERROR when a
client attempts to create or drop a new slot, setting up
walsender_ready_to_stop unconditionally? It seems to me that the issue
here is the delay between the moment SIGTERM is acknowledged by the
WAL sender and the moment CREATE_SLOT is treater. An idea with the
attached...

> The alternative of shutting down logical walsenders earlier also doesn't
> look straightforward, since the postmaster doesn't know directly what
> kind of walsender a certain process is.  So you'd also need additional
> signal types or something there.

Yup, but is a switchover between a publisher and a subscriber
something that can happen?
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

walsender-shutdown-fix.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Peter Eisentraut-6
On 4/19/17 01:45, Michael Paquier wrote:

> On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
> <[hidden email]> wrote:
>> I'd imagine the postmaster would tell the walsender that it has started
>> shutdown, and then the walsender would reject $certain_things.  But I
>> don't see an existing way for the walsender to know that shutdown has
>> been initiated.  SIGINT is still free ...
>
> The WAL sender receives SIGUSR2 from the postmaster when shutdown is
> initiated, so why not just rely on that and issue an ERROR when a
> client attempts to create or drop a new slot, setting up
> walsender_ready_to_stop unconditionally? It seems to me that the issue
> here is the delay between the moment SIGTERM is acknowledged by the
> WAL sender and the moment CREATE_SLOT is treater. An idea with the
> attached...

I think the problem with a signal-based solution is that there is no
feedback.  Ideally, you would wait for all walsenders to acknowledge the
receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
checkpoint.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: logical replication and PANIC during shutdown checkpoint in publisher

Noah Misch-2
In reply to this post by Noah Misch-2
On Sun, Apr 16, 2017 at 06:12:58AM +0000, Noah Misch wrote:

> On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:
> > When I shut down the publisher while I repeated creating and dropping
> > the subscription in the subscriber, the publisher emitted the following
> > PANIC error during shutdown checkpoint.
> >
> > PANIC:  concurrent transaction log activity while database system is
> > shutting down
> >
> > The cause of this problem is that walsender for logical replication can
> > generate WAL records even during shutdown checkpoint.
> >
> > Firstly walsender keeps running until shutdown checkpoint finishes
> > so that all the WAL including shutdown checkpoint record can be
> > replicated to the standby. This was safe because previously walsender
> > could not generate WAL records. However this assumption became
> > invalid because of logical replication. That is, currenty walsender for
> > logical replication can generate WAL records, for example, by executing
> > CREATE_REPLICATION_SLOT command. This is an oversight in
> > logical replication patch, I think.
> >
> > To fix this issue, we should terminate walsender for logical replication
> > before shutdown checkpoint starts. Of course walsender for physical
> > replication still needs to keep running until shutdown checkpoint ends,
> > though.
>
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
>
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
12
Previous Thread Next Thread
Loading...