Force update_process_title=on in crash recovery?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Force update_process_title=on in crash recovery?

Thomas Munro-5
Hi,

Based on a couple of independent reports from users with no idea how
to judge the progress of a system recovering from a crash, Christoph
and I wondered if we should override update_process_title for the
"recovering ..." message, at least until connections are allowed.  We
already do that to set the initial titles.

Crash recovery is a rare case where important information is reported
through the process title that isn't readily available anywhere else,
since you can't log in.  If you want to gauge  progress on a system
that happened to crash with update_process_title set to off, your best
hope is probably to trace the process or spy on the files it has open,
to see which WAL segment it's accessing, but that's not very nice.


Reply | Threaded
Open this post in threaded view
|

Re: Force update_process_title=on in crash recovery?

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> Based on a couple of independent reports from users with no idea how
> to judge the progress of a system recovering from a crash, Christoph
> and I wondered if we should override update_process_title for the
> "recovering ..." message, at least until connections are allowed.  We
> already do that to set the initial titles.

> Crash recovery is a rare case where important information is reported
> through the process title that isn't readily available anywhere else,
> since you can't log in.  If you want to gauge  progress on a system
> that happened to crash with update_process_title set to off, your best
> hope is probably to trace the process or spy on the files it has open,
> to see which WAL segment it's accessing, but that's not very nice.

Seems like a good argument, but you'd have to be careful about the
final state when you stop overriding update_process_title --- it can't
be left looking like it's still-in-progress on some random WAL file.
(Compare my nearby gripes about walsenders being sloppy about their
pg_stat_activity and process title presentations.)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Force update_process_title=on in crash recovery?

Justin Pryzby
On Tue, Sep 15, 2020 at 10:01:18AM -0400, Tom Lane wrote:

> Thomas Munro <[hidden email]> writes:
> > Based on a couple of independent reports from users with no idea how
> > to judge the progress of a system recovering from a crash, Christoph
> > and I wondered if we should override update_process_title for the
> > "recovering ..." message, at least until connections are allowed.  We
> > already do that to set the initial titles.
>
> > Crash recovery is a rare case where important information is reported
> > through the process title that isn't readily available anywhere else,
> > since you can't log in.  If you want to gauge  progress on a system
> > that happened to crash with update_process_title set to off, your best
> > hope is probably to trace the process or spy on the files it has open,
> > to see which WAL segment it's accessing, but that's not very nice.
>
> Seems like a good argument, but you'd have to be careful about the
> final state when you stop overriding update_process_title --- it can't
> be left looking like it's still-in-progress on some random WAL file.
> (Compare my nearby gripes about walsenders being sloppy about their
> pg_stat_activity and process title presentations.)

Related:
https://commitfest.postgresql.org/29/2688/

I'm not sure I understood Michael's recent message, but I think maybe refers to
promotion of a standby.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: Force update_process_title=on in crash recovery?

Michael Paquier-2
In reply to this post by Tom Lane-2
On Tue, Sep 15, 2020 at 10:01:18AM -0400, Tom Lane wrote:
> Seems like a good argument, but you'd have to be careful about the
> final state when you stop overriding update_process_title --- it can't
> be left looking like it's still-in-progress on some random WAL file.
> (Compare my nearby gripes about walsenders being sloppy about their
> pg_stat_activity and process title presentations.)

Another thing to be careful here is WIN32, see 0921554.  And slowing
down recovery is never a good idea.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Force update_process_title=on in crash recovery?

Thomas Munro-5
On Wed, Sep 16, 2020 at 2:30 PM Michael Paquier <[hidden email]> wrote:
> On Tue, Sep 15, 2020 at 10:01:18AM -0400, Tom Lane wrote:
> > Seems like a good argument, but you'd have to be careful about the
> > final state when you stop overriding update_process_title --- it can't
> > be left looking like it's still-in-progress on some random WAL file.
> > (Compare my nearby gripes about walsenders being sloppy about their
> > pg_stat_activity and process title presentations.)
>
> Another thing to be careful here is WIN32, see 0921554.  And slowing
> down recovery is never a good idea.

Right, that commit makes a lot of sense because it suppresses many
system calls that happen for each query.  The same problem existed on
older FreeBSD versions and I saw that costing ~10% of TPS on read-only
pgbench.  In other commits I've been removing system calls that happen
for every WAL record.  But in this thread I'm talking about an update
per 16MB WAL file, which seems like an acceptable ratio to me.


Reply | Threaded
Open this post in threaded view
|

Re: Force update_process_title=on in crash recovery?

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> On Wed, Sep 16, 2020 at 2:30 PM Michael Paquier <[hidden email]> wrote:
>> Another thing to be careful here is WIN32, see 0921554.  And slowing
>> down recovery is never a good idea.

> Right, that commit makes a lot of sense because it suppresses many
> system calls that happen for each query.  The same problem existed on
> older FreeBSD versions and I saw that costing ~10% of TPS on read-only
> pgbench.  In other commits I've been removing system calls that happen
> for every WAL record.  But in this thread I'm talking about an update
> per 16MB WAL file, which seems like an acceptable ratio to me.

Hmm ... the thread leading up to 0921554 indicates that the performance
penalty of update_process_title=on is just ridiculously large on Windows.
Maybe those numbers are not relevant to crash recovery WAL-application,
but it might be smart to actually measure that not just assume it.

In any case, I'd recommend setting up any patch you create for this
to be easily "ifndef WIN32"'d in case we change our minds on the
point later.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Force update_process_title=on in crash recovery?

David Rowley
On Wed, 16 Sep 2020 at 17:43, Tom Lane <[hidden email]> wrote:
>
> Hmm ... the thread leading up to 0921554 indicates that the performance
> penalty of update_process_title=on is just ridiculously large on Windows.
> Maybe those numbers are not relevant to crash recovery WAL-application,
> but it might be smart to actually measure that not just assume it.

I had a go at measuring this on Windows and couldn't really detect any
slowdown from running update_process_title on vs off. Average over 3
runs with update_process_title = off was 94.38 s, switched on the
average was 93.81 s. (Some noise there)

Adding a bit of logging shows that the process title was set 225
times. Once setting it to an empty string then once for each of the
224 segments replayed.

Also, from a pgbench -s test with update_process_title on and again
with off I see 9343 tps vs 11969 tps. The process title is changed
twice for each query, once to set it to the query and once to set it
to "idle". Doing a bit of maths there is seems that setting the
process title takes about 15 microseconds per call. So it would have
taken about 3.38 milliseconds to set the process title 225 times for
recovery, or if you prefer,  0.003609% additional overhead.

I don't think we'll notice.

David

details.txt (4K) Download Attachment