BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15888
Logged by:          Marko Tiikkaja
Email address:      [hidden email]
PostgreSQL version: 9.6.14
Operating system:   OS X 10.12.6
Description:        

Run:

  pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start

and pg_stat_activity will show something like this until that session
disconnects:

-[ RECORD 1
]----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
datid            | 9672230
datname          | dbname
pid              | 81996
usesysid         | 10
usename          | marko
application_name | pg_recvlogical
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 2019-07-03 10:02:00.459737+02
xact_start       |
query_start      |
state_change     | 2019-07-03 10:02:00.726645+02
wait_event_type  |
wait_event       |
state            | idle in transaction
backend_xid      |
backend_xmin     |
query            |

which is quite obviously bogus.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Michael Paquier-2
On Wed, Jul 03, 2019 at 08:03:24AM +0000, PG Bug reporting form wrote:
> Run:
>
>   pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start
>
> and pg_stat_activity will show something like this until that session
> disconnects:
>
> which is quite obviously bogus.

I think that's the transaction which is started when exporting the
snapshot at logical slot creation.  Or in short:
#2  0x000055b5f8e5720f in StartTransactionCommand () at xact.c:2699
#3  0x000055b5f90fab4e in SnapBuildExportSnapshot
(builder=0x55b5fa3427b0) at snapbuild.c:575
#4  0x000055b5f90fe5df in CreateReplicationSlot (cmd=0x55b5fa2914e8)
at walsender.c:843

SnapBuildExportSnapshot() needs to keep a transaction context opened
to keep the wanted xmin around.  In PG >= 10, this problem gets
"solved" by using NOEXPORT_SNAPSHOT in the CREATE_REPLICATION_SLOT
command.  One idea would be to enforce a commit in this case from
CreateReplicationSlot/streamutil.c?  We know that pg_recvlogical is
not going to use this snapshot anyway...
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Andres Freund
Hi,

On 2019-07-04 13:35:12 +0900, Michael Paquier wrote:

> On Wed, Jul 03, 2019 at 08:03:24AM +0000, PG Bug reporting form wrote:
> > Run:
> >
> >   pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start
> >
> > and pg_stat_activity will show something like this until that session
> > disconnects:
> >
> > which is quite obviously bogus.
>
> I think that's the transaction which is started when exporting the
> snapshot at logical slot creation.  Or in short:
> #2  0x000055b5f8e5720f in StartTransactionCommand () at xact.c:2699
> #3  0x000055b5f90fab4e in SnapBuildExportSnapshot
> (builder=0x55b5fa3427b0) at snapbuild.c:575
> #4  0x000055b5f90fe5df in CreateReplicationSlot (cmd=0x55b5fa2914e8)
> at walsender.c:843
>
> SnapBuildExportSnapshot() needs to keep a transaction context opened
> to keep the wanted xmin around.  In PG >= 10, this problem gets
> "solved" by using NOEXPORT_SNAPSHOT in the CREATE_REPLICATION_SLOT
> command.  One idea would be to enforce a commit in this case from
> CreateReplicationSlot/streamutil.c?  We know that pg_recvlogical is
> not going to use this snapshot anyway...

The transaction ought to be finished by exec_replication_command()
calling SnapBuildClearExportedSnapshot(), which then does
AbortCurrentTransaction(). So something's wrong if that's not working
anymore.

I just tested this on a slightly older v12 checkout, and the general
mechanism works:

S1(replication):
postgres[13464][1]=# CREATE_REPLICATION_SLOT foo LOGICAL test_decoding;

S2(plain):
postgres[13317][1]=# SELECT state, wait_event_type, wait_event, backend_xid, backend_xmin FROM pg_stat_activity WHERE pid = 13464;
┌─────────────────────┬─────────────────┬────────────┬─────────────┬──────────────┐
│        state        │ wait_event_type │ wait_event │ backend_xid │ backend_xmin │
├─────────────────────┼─────────────────┼────────────┼─────────────┼──────────────┤
│ idle in transaction │ Client          │ ClientRead │      (null) │   1494339463 │
└─────────────────────┴─────────────────┴────────────┴─────────────┴──────────────┘
(1 row)

S1(replication):
postgres[13464][1]=# IDENTIFY_SYSTEM;
┌─────────────────────┬──────────┬──────────────┬──────────┐
│      systemid       │ timeline │   xlogpos    │  dbname  │
├─────────────────────┼──────────┼──────────────┼──────────┤
│ 6704433050907079638 │        1 │ 3AC/1A25B700 │ postgres │
└─────────────────────┴──────────┴──────────────┴──────────┘

S2(plain):
postgres[13317][1]=# SELECT state, wait_event_type, wait_event, backend_xid, backend_xmin FROM pg_stat_activity WHERE pid = 13464;
┌───────┬─────────────────┬────────────┬─────────────┬──────────────┐
│ state │ wait_event_type │ wait_event │ backend_xid │ backend_xmin │
├───────┼─────────────────┼────────────┼─────────────┼──────────────┤
│ idle  │ Client          │ ClientRead │      (null) │       (null) │
└───────┴─────────────────┴────────────┴─────────────┴──────────────┘
(1 row)


I *can* reproduce with 9.6.  It's worthwhile to note that the
transaction *is* actually aborted, via the path above.  As far as I can
tell the problem is "solely" that we never reach a set_ps_display() that
resets the 'idle in transaction' bit, because START_REPLICATION "takes
over" the connection once started, and we don't go through
PostgresMain() again.

The problem doesn't exist in later versions, because there
exec_replication_command() does an explicit
        /* Report to pgstat that this process is now idle */
        pgstat_report_activity(STATE_IDLE, NULL);
That's due to

commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6
Author: Magnus Hagander <[hidden email]>
Date:   2017-12-29 16:19:51 +0100

    Properly set base backup backends to active in pg_stat_activity

which is post 9.6.  Not sure I really feel a huge appetite for whacking
this around in the back branches.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Alvaro Herrera-9
On 2019-Jul-09, Andres Freund wrote:

> The problem doesn't exist in later versions, because there
> exec_replication_command() does an explicit
> /* Report to pgstat that this process is now idle */
> pgstat_report_activity(STATE_IDLE, NULL);
> That's due to
>
> commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6
> Author: Magnus Hagander <[hidden email]>
> Date:   2017-12-29 16:19:51 +0100
>
>     Properly set base backup backends to active in pg_stat_activity
>
> which is post 9.6.  Not sure I really feel a huge appetite for whacking
> this around in the back branches.

Hmm ... so what that commit did is precisely to fix this bug.
Magnus thought at the time he was fixing a pg10 bug,
https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@...
but apparently now we see that the bug was older than that.  Maybe it's
okay to backpatch further?

(Booh to missing "Discussion:" tags in commit messages)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Michael Paquier-2
On Tue, Jul 09, 2019 at 05:46:30PM -0400, Alvaro Herrera wrote:
> Hmm ... so what that commit did is precisely to fix this bug.
> Magnus thought at the time he was fixing a pg10 bug,
> https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@...
> but apparently now we see that the bug was older than that.  Maybe it's
> okay to backpatch further?

Yes, I would not be against a back-patch in this case.  There is a
minor conflict because pre-9.6 WAL senders cannot run SQL queries but
that looks simple enough to solve.

> (Booh to missing "Discussion:" tags in commit messages)

Booh++
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Andres Freund
Hi,

On 2019-07-10 13:51:47 +0900, Michael Paquier wrote:

> On Tue, Jul 09, 2019 at 05:46:30PM -0400, Alvaro Herrera wrote:
> > Hmm ... so what that commit did is precisely to fix this bug.
> > Magnus thought at the time he was fixing a pg10 bug,
> > https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@...
> > but apparently now we see that the bug was older than that.  Maybe it's
> > okay to backpatch further?
>
> Yes, I would not be against a back-patch in this case.  There is a
> minor conflict because pre-9.6 WAL senders cannot run SQL queries but
> that looks simple enough to solve.

I'm not worried about backpatching that in isolation - but I'm worried
that just backpatching that indidividual commit isn't going to yield
particularly satisfactory behaviour. The code around this isn't super
robust, and pg10+ are noticably different due to the changes needed to
allow to execute normal queries over a replication connection.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Magnus Hagander-2
In reply to this post by Alvaro Herrera-9
On Tue, Jul 9, 2019 at 11:46 PM Alvaro Herrera <[hidden email]> wrote:
On 2019-Jul-09, Andres Freund wrote:

> The problem doesn't exist in later versions, because there
> exec_replication_command() does an explicit
>       /* Report to pgstat that this process is now idle */
>       pgstat_report_activity(STATE_IDLE, NULL);
> That's due to
>
> commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6
> Author: Magnus Hagander <[hidden email]>
> Date:   2017-12-29 16:19:51 +0100
>
>     Properly set base backup backends to active in pg_stat_activity
>
> which is post 9.6.  Not sure I really feel a huge appetite for whacking
> this around in the back branches.

Hmm ... so what that commit did is precisely to fix this bug.
Magnus thought at the time he was fixing a pg10 bug,
https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@...
but apparently now we see that the bug was older than that.  Maybe it's
okay to backpatch further?

I don't recall any particular reason around the backpatching other than as you say, probably not realizing it. At the time we didn't have logical replication yet (it landed later), so the actual uses of the logical decoding were nowhere near as frequent...



(Booh to missing "Discussion:" tags in commit messages)


If only we could agree on a template/standard for those :P

//Magnus

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Michael Paquier-2
On Wed, Jul 10, 2019 at 05:24:44PM +0200, Magnus Hagander wrote:
> On Tue, Jul 9, 2019 at 11:46 PM Alvaro Herrera <[hidden email]>
> wrote:
> (Booh to missing "Discussion:" tags in commit messages)
>>
>>
> If only we could agree on a template/standard for those :P

This would make an excellent topic for a developper meeting with most
of the main committers around the table.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Magnus Hagander-2


On Thu, Jul 11, 2019 at 1:58 AM Michael Paquier <[hidden email]> wrote:
On Wed, Jul 10, 2019 at 05:24:44PM +0200, Magnus Hagander wrote:
> On Tue, Jul 9, 2019 at 11:46 PM Alvaro Herrera <[hidden email]>
> wrote:
> (Booh to missing "Discussion:" tags in commit messages)
>>
>>
> If only we could agree on a template/standard for those :P

This would make an excellent topic for a developper meeting with most
of the main committers around the table.

We've tried that once, and I think even twice, and failed miserably at any form of agreement. But we could certainly try again :) 

--
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Michael Paquier-2
On Thu, Jul 11, 2019 at 01:21:39PM +0200, Magnus Hagander wrote:
> We've tried that once, and I think even twice, and failed miserably at any
> form of agreement. But we could certainly try again :)

Or we could just drop a thread about that..
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Tom Lane-2
In reply to this post by Magnus Hagander-2
Magnus Hagander <[hidden email]> writes:
> We've tried that once, and I think even twice, and failed miserably at any
> form of agreement. But we could certainly try again :)

I thought we *did* have an agreement, to wit using

Discussion: https://postgr.es/m/<message-id>

to link to relevant mail thread(s).  Some people use more tags
but that seems inessential to me.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot

Michael Paquier-2
On Thu, Jul 11, 2019 at 09:44:07AM -0400, Tom Lane wrote:
> I thought we *did* have an agreement, to wit using
>
> Discussion: https://postgr.es/m/<message-id>
>
> to link to relevant mail thread(s).  Some people use more tags
> but that seems inessential to me.

Hehe.  I actually was thinking about advocating for having more of
them in the commit logs.  I'll just start a new thread about what I
had in mind.  Perhaps that will lead us nowhere, but let's see.
--
Michael

signature.asc (849 bytes) Download Attachment