Non-null values of recovery functions after promote or crash of primary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Non-null values of recovery functions after promote or crash of primary

Martín Marqués-2
Hi,

Yesterday we (that's me and my colleague Ricardo Gomez) were working on
an issue where a monitoring script was returning increasing lag
information on a primary instead of a NULL value.

The query used involved the following functions (the function was
amended to work-around the issue I'm reporting here):

pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
pg_last_xact_replay_timestamp()

Under normal circumstances we would expect to receive NULLs from all
three functions on a primary node, and code comments back up my thoughts.

The problem is, what if the node is a standby which was promoted without
restarting, or that had to perform crash recovery?

So during the time it's recovering the values in ` XLogCtl` are updated
with recovery information, and once the recovery finishes, due to crash
recovery reaching a consistent state, or a promotion of a standby
happening, those values are not reset to startup defaults.

That's when you start seeing non-null values returned by
`pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.

Now, I don't know if we should call this a bug, or an undocumented
anomaly. We could fix the bug by resetting the values from ` XLogCtl`
after finishing recovery, or document that we might see non-NULL values
in certain cases.

Regards,

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Non-null values of recovery functions after promote or crash of primary

Stephen Frost
Greetings,

* Martín Marqués ([hidden email]) wrote:
> pg_last_wal_receive_lsn()
> pg_last_wal_replay_lsn()
> pg_last_xact_replay_timestamp()
>
> Under normal circumstances we would expect to receive NULLs from all
> three functions on a primary node, and code comments back up my thoughts.

Agreed.

> The problem is, what if the node is a standby which was promoted without
> restarting, or that had to perform crash recovery?
>
> So during the time it's recovering the values in ` XLogCtl` are updated
> with recovery information, and once the recovery finishes, due to crash
> recovery reaching a consistent state, or a promotion of a standby
> happening, those values are not reset to startup defaults.
>
> That's when you start seeing non-null values returned by
> `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.
>
> Now, I don't know if we should call this a bug, or an undocumented
> anomaly. We could fix the bug by resetting the values from ` XLogCtl`
> after finishing recovery, or document that we might see non-NULL values
> in certain cases.
IMV, and not unlike other similar cases I've talked about on another
thread, these should be cleared when the system is promoted as they're
otherwise confusing and nonsensical.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Non-null values of recovery functions after promote or crash of primary

Martín Marqués-2
Hi,

> IMV, and not unlike other similar cases I've talked about on another
> thread, these should be cleared when the system is promoted as they're
> otherwise confusing and nonsensical.

Keep in mind that this also happens when the server crashes and has to
perform crash recovery. In that case the server was always a primary.

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services