Logical replication wal sender timestamp bug

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Logical replication wal sender timestamp bug

Jeff Janes
While monitoring pg_stat_subscription, I noticed that last_msg_send_time was usually NULL, which doesn't make sense.  Why would we have a message, but not know when it was sent?

Looking in src/backend/replication/walsender.c, there is this:

    /* output previously gathered data in a CopyData packet */
    pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);

    /*
     * Fill the send timestamp last, so that it is taken as late as possible.
     * This is somewhat ugly, but the protocol is set as it's already used for
     * several releases by streaming physical replication.
     */
    resetStringInfo(&tmpbuf);
    now = GetCurrentTimestamp();
    pq_sendint64(&tmpbuf, now);
    memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
           tmpbuf.data, sizeof(int64));

Filling out the timestamp after the message has already been sent is taking "as late as possible" a little too far.  It results in every message having a zero timestamp, other than keep-alives which go through a different path.

Re-ordering the code blocks as in the attached seems to fix it.  But I have to wonder, if this has been broken from the start and no one ever noticed, is this even valuable information to relay in the first place?  We could just take the column out of the view, and not bother calling  GetCurrentTimestamp() for each message.

Cheers,

Jeff

walsender_timestamp_fix.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication wal sender timestamp bug

Michael Paquier-2
On Sat, Nov 02, 2019 at 09:54:54PM -0400, Jeff Janes wrote:
> While monitoring pg_stat_subscription, I noticed that last_msg_send_time
> was usually NULL, which doesn't make sense.  Why would we have a message,
> but not know when it was sent?

So...  The timestamp is received and stored in LogicalRepApplyLoop()
with send_time when a 'w' message is received in the subscription
cluster.  And it gets computed with a two-phase process:
- WalSndPrepareWrite() reserves the space in the message for the
timestamp.
- WalSndWriteData() computes the timestamp in the reserved space once
the write message is computed and ready to go.

> Filling out the timestamp after the message has already been sent is taking
> "as late as possible" a little too far.  It results in every message having
> a zero timestamp, other than keep-alives which go through a different path.

It seems to me that you are right: the timestamp is computed too
late.

> Re-ordering the code blocks as in the attached seems to fix it.  But I have
> to wonder, if this has been broken from the start and no one ever noticed,
> is this even valuable information to relay in the first place?  We could
> just take the column out of the view, and not bother calling
> GetCurrentTimestamp() for each message.

I think that there are use cases for such monitoring capabilities,
see for example 7fee252.  So I would rather keep it.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication wal sender timestamp bug

Michael Paquier-2
On Tue, Nov 05, 2019 at 01:19:37PM +0900, Michael Paquier wrote:
> On Sat, Nov 02, 2019 at 09:54:54PM -0400, Jeff Janes wrote:
>> Filling out the timestamp after the message has already been sent is taking
>> "as late as possible" a little too far.  It results in every message having
>> a zero timestamp, other than keep-alives which go through a different path.
>
> It seems to me that you are right: the timestamp is computed too
> late.

It is easy enough to reproduce the problem by setting for example
logical replication between two nodes and pgbench to produce some
load and then monitor pg_stat_subscription periodically.  However, it
is a problem since logical decoding has been introduced (5a991ef) so
committed your fix down to 9.4.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication wal sender timestamp bug

Jeff Janes


On Wed, Nov 6, 2019 at 2:15 AM Michael Paquier <[hidden email]> wrote:
On Tue, Nov 05, 2019 at 01:19:37PM +0900, Michael Paquier wrote:
> On Sat, Nov 02, 2019 at 09:54:54PM -0400, Jeff Janes wrote:
>> Filling out the timestamp after the message has already been sent is taking
>> "as late as possible" a little too far.  It results in every message having
>> a zero timestamp, other than keep-alives which go through a different path.
>
> It seems to me that you are right: the timestamp is computed too
> late.

It is easy enough to reproduce the problem by setting for example
logical replication between two nodes and pgbench to produce some
load and then monitor pg_stat_subscription periodically.  However, it
is a problem since logical decoding has been introduced (5a991ef) so
committed your fix down to 9.4.

Thanks.  This column looks much more reasonable now.

Cheers,

Jeff