Function to track shmem reinit time

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Function to track shmem reinit time

Anastasia Lubennikova

Attached patch introduces a new function pg_shmem_init_time(),
which returns the time shared memory was last (re)initialized.
It is created for use by monitoring tools to track backend crashes.

Currently, if the 'restart_after_crash' option is on, postgres will just restart.
And the only way to know that it happened is to regularly parse logfile
or monitor it, catching restart messages. This approach is really inconvenient for
users, who have gigabytes of logs.

This new function can be periodiacally called by a monitoring agent, and,
if shmem_init_time doesn't match pg_postmaster_start_time,
we know that server crashed-restarted, and also know the exact time, when.

Also, working on this patch, I noticed a bit of dead code
and some discordant comments in postmaster.c.
I see no reason to leave it as is.
So there is a small remove_dead_shmem_reinit_code_v0.patch.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

remove_dead_shmem_reinit_code_v0.patch (1K) Download Attachment
shmem_init_time_v0.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Grigory Smolkin

It can be used to accurately calculate server uptime, since you can`t rely on pg_postmaster_start_time() in this.


On 02/28/2018 03:11 PM, Anastasia Lubennikova wrote:

Attached patch introduces a new function pg_shmem_init_time(),
which returns the time shared memory was last (re)initialized.
It is created for use by monitoring tools to track backend crashes.

Currently, if the 'restart_after_crash' option is on, postgres will just restart.
And the only way to know that it happened is to regularly parse logfile
or monitor it, catching restart messages. This approach is really inconvenient for
users, who have gigabytes of logs.

This new function can be periodiacally called by a monitoring agent, and,
if shmem_init_time doesn't match pg_postmaster_start_time,
we know that server crashed-restarted, and also know the exact time, when.

Also, working on this patch, I noticed a bit of dead code
and some discordant comments in postmaster.c.
I see no reason to leave it as is.
So there is a small remove_dead_shmem_reinit_code_v0.patch.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Peter Eisentraut-6
In reply to this post by Anastasia Lubennikova
On 2/28/18 07:11, Anastasia Lubennikova wrote:
> Currently, if the 'restart_after_crash' option is on, postgres will just
> restart.
> And the only way to know that it happened is to regularly parse logfile
> or monitor it, catching restart messages. This approach is really
> inconvenient for
> users, who have gigabytes of logs.

I find this premise a bit dubious.  Why have a log file if it's too big
to find anything in it?  Server crashes aren't the only thing people are
interested in.  So we'll need a function for "last $anything".

> This new function can be periodiacally called by a monitoring agent, and,
> if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
> we know that server crashed-restarted, and also know the exact time, when.

If we want to do something like this, I'd rather have a name that does
not expose implementation details.  If we restructure shared memory
management in the future, the name won't apply anymore.  Call it "last
crash" or "last reinit" or something like that.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Justin Pryzby
On Sat, Mar 03, 2018 at 01:00:52PM -0500, Peter Eisentraut wrote:

> On 2/28/18 07:11, Anastasia Lubennikova wrote:
> > Currently, if the 'restart_after_crash' option is on, postgres will just
> > restart.
> > And the only way to know that it happened is to regularly parse logfile
> > or monitor it, catching restart messages. This approach is really
> > inconvenient for
> > users, who have gigabytes of logs.
>
> I find this premise a bit dubious.  Why have a log file if it's too big
> to find anything in it?  Server crashes aren't the only thing people are
> interested in.  So we'll need a function for "last $anything".

I think one can tell if it's crashed recently by comparing start time of parent
postmaster and its main children (I'm going to go put this in place for myself
now).

ts=# SELECT backend_type, backend_start, pg_postmaster_start_time(), backend_start-pg_postmaster_start_time() FROM pg_stat_activity ORDER BY backend_start LIMIT 1;
    backend_type     |         backend_start         |   pg_postmaster_start_time    |    ?column?    
---------------------+-------------------------------+-------------------------------+-----------------
 autovacuum launcher | 2018-03-02 00:21:11.604468-03 | 2018-03-02 00:12:46.757642-03 | 00:08:24.846826

pryzbyj@pryzbyj:~$ ps -O lstart -upostgres # on a separate server with 9.4 which doen't have backend_type, add in v10
  PID                  STARTED S TTY          TIME COMMAND
12982 Mon Feb 19 06:03:13 2018 S ?        00:00:33 /usr/lib/postgresql/9.4/bin/postgres -D /var/lib/postgresql/9.4/main -c config_file=/etc/postgresql/9.4/main/postgresql.conf
12997 Mon Feb 19 06:03:14 2018 S ?        00:00:00 postgres: checkpointer process                                                                                              
12998 Mon Feb 19 06:03:14 2018 S ?        00:00:08 postgres: writer process                                                                                                    
12999 Mon Feb 19 06:03:14 2018 S ?        00:00:08 postgres: wal writer process                                                                                                
13000 Mon Feb 19 06:03:14 2018 S ?        00:00:19 postgres: autovacuum launcher process                                                                                      
13001 Mon Feb 19 06:03:14 2018 S ?        00:00:45 postgres: stats collector process                                                                                          

pryzbyj@pryzbyj:~$ sudo kill -SEGV 12997
pryzbyj@pryzbyj:~$ ps -O lstart -upostgres
  PID                  STARTED S TTY          TIME COMMAND
 8644 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: checkpointer process                                                                                              
 8645 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: writer process                                                                                                    
 8646 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: wal writer process                                                                                                
 8647 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: autovacuum launcher process                                                                                      
 8648 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: stats collector process                                                                                          
12982 Mon Feb 19 06:03:13 2018 S ?        00:00:33 /usr/lib/postgresql/9.4/bin/postgres -D /var/lib/postgresql/9.4/main -c config_file=/etc/postgresql/9.4/main/postgresql.conf

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Tomas Vondra-4
In reply to this post by Grigory Smolkin

On 02/28/2018 02:39 PM, Grigory Smolkin wrote:
> It can be used to accurately calculate server uptime, since you can`t
> rely on pg_postmaster_start_time() in this.
>

Can you please explain why pg_postmaster_start_time can't be used for
this purpose? It seems like a pretty good match, considering it's meant
to show server start time.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Tomas Vondra-4
In reply to this post by Anastasia Lubennikova
On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:

> Attached patch introduces a new function pg_shmem_init_time(),
> which returns the time shared memory was last (re)initialized.
> It is created for use by monitoring tools to track backend crashes.
>
> Currently, if the 'restart_after_crash' option is on, postgres will
> just restart. And the only way to know that it happened is to
> regularly parse logfile or monitor it, catching restart messages.
> This approach is really inconvenient for users, who have gigabytes of
> logs.
>
> This new function can be periodiacally called by a monitoring agent,
> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
> we know that server crashed-restarted, and also know the exact time,
> when.
>

I don't think it really solves the problem, though. For example if the
whole VM reboots (which can be a matter of seconds), this check will say
"shmem_init_time == pg_postmaster_start_time" and you've not detected
anything.

IMHO pg_postmaster_start_time is the right way to monitor uptime, and
the right way to detect spurious restarts is to remember the last value
you've seen and compare it to the current one.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Tom Lane-2
In reply to this post by Tomas Vondra-4
Tomas Vondra <[hidden email]> writes:
> On 02/28/2018 02:39 PM, Grigory Smolkin wrote:
>> It can be used to accurately calculate server uptime, since you can`t
>> rely on pg_postmaster_start_time() in this.

> Can you please explain why pg_postmaster_start_time can't be used for
> this purpose? It seems like a pretty good match, considering it's meant
> to show server start time.

It evidently depends on how you want to define "server uptime".  If you
get backend crashes often enough, you might feel a need to define it
as "time since last crash".  Although I would think that if that's
happening regularly in a production environment, you have a problem
you need to fix, not just measure.

My own thought about that is that if you are trying to measure
backend crashes, just knowing the time of the latest one is little help.
You want to know how often they're happening.  So this gets back to the
question of why the postmaster log isn't a useful source of that
information.  I think that if we're to do anything in this area,
improving the usefulness of the log would be more important than
providing the proposed function.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Tomas Vondra-4
In reply to this post by Anastasia Lubennikova
On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:
>
> This new function can be periodiacally called by a monitoring agent,
> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
> we know that server crashed-restarted, and also know the exact time,
> when.
>

Actually, after looking at the code a bit, I think that test would not
really work anyway, because those two timestamps come from two separate
GetCurrentTimestamp calls, so you get this:

    test=# select pg_shmem_init_time(), pg_postmaster_start_time();
          pg_shmem_init_time       |   pg_postmaster_start_time
    -------------------------------+-------------------------------
     2018-03-04 17:10:59.017058+01 | 2018-03-04 17:10:59.022071+01
    (1 row)

    test=# select pg_shmem_init_time() = pg_postmaster_start_time();
     ?column?
    ----------
     f
    (1 row)

So there would have to be some sort of "fuzz" parameter when comparing
the values, etc.

Furthermore, the patch is yet another victim of fd1a421fe - fixing the
pg_proc entries is trivial, but a new version is needed.

I'd also like to see an example/explanation how this improves this
situation compared to only having pg_postmaster_start_time.

So I'm setting this as waiting on author for now.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Grigory Smolkin

On 03/03/2018 09:00 PM, Peter Eisentraut wrote:
> I find this premise a bit dubious.  Why have a log file if it's too big
> to find anything in it?  Server crashes aren't the only thing people are
> interested in.  So we'll need a function for "last $anything".

Thank you for your interest in this topic.
Well, on heavy loaded machine log file will be big, because you usually
want to log every query for later analysis, and, because postgres is
dumping everything in a single file, searching for some specific error
will be slow. Log file is certainly needed for digging the details of a
crash, but pg_shmem_init_time is more about online monitoring.

On 03/03/2018 09:43 PM, Justin Pryzby wrote:

> I think one can tell if it's crashed recently by comparing start time of parent
> postmaster and its main children (I'm going to go put this in place for myself
> now).
>
> ts=# SELECT backend_type, backend_start, pg_postmaster_start_time(), backend_start-pg_postmaster_start_time() FROM pg_stat_activity ORDER BY backend_start LIMIT 1;
>      backend_type     |         backend_start         |   pg_postmaster_start_time    |    ?column?
> ---------------------+-------------------------------+-------------------------------+-----------------
>   autovacuum launcher | 2018-03-02 00:21:11.604468-03 | 2018-03-02 00:12:46.757642-03 | 00:08:24.846826

Thank you, though we must take into account the fact that autovacuum may
not be enabled so it`s better to use checkpointer or bgwriter. It`s
working solution, yes, but it looks more like workaround and requires
from user an understanding of postgres internals.


On 03/04/2018 06:56 PM, Tomas Vondra wrote:

> Can you please explain why pg_postmaster_start_time can't be used for
> this purpose? It seems like a pretty good match, considering it's meant
> to show server start time.

Because a backend crash do not reset pg_postmaster_start_time.

On 03/04/2018 07:02 PM, Tomas Vondra wrote:

> On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:
>> Attached patch introduces a new function pg_shmem_init_time(),
>> which returns the time shared memory was last (re)initialized.
>> It is created for use by monitoring tools to track backend crashes.
>>
>> Currently, if the 'restart_after_crash' option is on, postgres will
>> just restart. And the only way to know that it happened is to
>> regularly parse logfile or monitor it, catching restart messages.
>> This approach is really inconvenient for users, who have gigabytes of
>> logs.
>>
>> This new function can be periodiacally called by a monitoring agent,
>> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
>> we know that server crashed-restarted, and also know the exact time,
>> when.
>>
> I don't think it really solves the problem, though. For example if the
> whole VM reboots (which can be a matter of seconds), this check will say
> "shmem_init_time == pg_postmaster_start_time" and you've not detected
> anything.
>
> IMHO pg_postmaster_start_time is the right way to monitor uptime, and
> the right way to detect spurious restarts is to remember the last value
> you've seen and compare it to the current one.

Yes, for the described case with VM restart pg_postmaster_start_time
works fine.
pg_postmaster_start_time is essential for almost every case and
pg_shmem_init_time only expand his usefulness, not diminish.

On 03/04/2018 07:09 PM, Tom Lane wrote:
> It evidently depends on how you want to define "server uptime".  If you
> get backend crashes often enough, you might feel a need to define it
> as "time since last crash".  Although I would think that if that's
> happening regularly in a production environment, you have a problem
> you need to fix, not just measure.

Absolutely. And for that you need to know ASAP that backend crash
happened in the first place.

On 03/04/2018 07:09 PM, Tom Lane wrote:
> My own thought about that is that if you are trying to measure
> backend crashes, just knowing the time of the latest one is little help.
> You want to know how often they're happening.  So this gets back to the
> question of why the postmaster log isn't a useful source of that
> information.

For that purpose pg_shmem_init_time also can be used.
For example we can build a chart based on values from "select (now() -
pg_shmem_init_time());" taken, for example, every 10 seconds.
Values around 0 =< x =<10 will signal about memory reinitialization
which is usually a byproduct of a backend crash.

On 03/04/2018 07:09 PM, Tom Lane wrote:
>   I think that if we're to do anything in this area,
> improving the usefulness of the log would be more important than
> providing the proposed function.

The separation of maintenance messages and query messaged into different
log files is sorely needed.
This way server errors can be identified fast and in convenient way.
But as I mentioned earlier, pg_shmem_init_time() is about online monitoring.


On 03/04/2018 07:02 PM, Tomas Vondra wrote:

> Actually, after looking at the code a bit, I think that test would not
> really work anyway, because those two timestamps come from two separate
> GetCurrentTimestamp calls, so you get this:
>
>     test=# select pg_shmem_init_time(), pg_postmaster_start_time();
>           pg_shmem_init_time       |   pg_postmaster_start_time
> -------------------------------+-------------------------------
>      2018-03-04 17:10:59.017058+01 | 2018-03-04 17:10:59.022071+01
>     (1 row)
>
>     test=# select pg_shmem_init_time() = pg_postmaster_start_time();
>      ?column?
>     ----------
>      f
>     (1 row)
>
> So there would have to be some sort of "fuzz" parameter when comparing
> the values, etc.

The difference measured in milliseconds is expected. After all, shared
memory is initialized after postmaster start.




--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

David Steele
In reply to this post by Tomas Vondra-4
On 3/4/18 11:17 AM, Tomas Vondra wrote:
>
> Furthermore, the patch is yet another victim of fd1a421fe - fixing the
> pg_proc entries is trivial, but a new version is needed.
>
> I'd also like to see an example/explanation how this improves this
> situation compared to only having pg_postmaster_start_time.
>
> So I'm setting this as waiting on author for now.

I'm not sure why this was set back to Needs Review since it neither
applies cleanly nor builds.

I'm setting this entry to Waiting on Author, but based on the discussion
I think it should be Returned with Feedback.

Regards,
--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Tomas Vondra-4


On 03/28/2018 08:55 PM, David Steele wrote:

> On 3/4/18 11:17 AM, Tomas Vondra wrote:
>>
>> Furthermore, the patch is yet another victim of fd1a421fe - fixing the
>> pg_proc entries is trivial, but a new version is needed.
>>
>> I'd also like to see an example/explanation how this improves this
>> situation compared to only having pg_postmaster_start_time.
>>
>> So I'm setting this as waiting on author for now.
>
> I'm not sure why this was set back to Needs Review since it neither
> applies cleanly nor builds.
>

It likely applied/built fine when it was set to NR, but got broken by
some recent commit.

> I'm setting this entry to Waiting on Author, but based on the discussion
> I think it should be Returned with Feedback.
>

Fine with me.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

David Steele
On 3/29/18 9:40 AM, Tomas Vondra wrote:
> On 03/28/2018 08:55 PM, David Steele wrote:
>
>> I'm setting this entry to Waiting on Author, but based on the discussion
>> I think it should be Returned with Feedback.
>>
>
> Fine with me.

This entry has been marked Returned with Feedback.

Regards,
--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Robert Haas
On Tue, Apr 10, 2018 at 9:07 AM, David Steele <[hidden email]> wrote:
> On 3/29/18 9:40 AM, Tomas Vondra wrote:
>> On 03/28/2018 08:55 PM, David Steele wrote:
>>> I'm setting this entry to Waiting on Author, but based on the discussion
>>> I think it should be Returned with Feedback.
>>
>> Fine with me.
>
> This entry has been marked Returned with Feedback.

I guess I'm in the minority here, but I don't see why it isn't useful
to have something like this. It's a lot easier for a monitoring
process to poll for this kind of thing than it is to monitor the logs.
Not that log monitoring isn't a good idea, but this is pretty
lightweight.

OTOH, it also seems like this could live as an out-of-core extension,
so I guess if nobody else likes the idea Anastasia could always do it
that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Alexander Korotkov
On Tue, Apr 10, 2018 at 8:58 PM Robert Haas <[hidden email]> wrote:

> On Tue, Apr 10, 2018 at 9:07 AM, David Steele <[hidden email]> wrote:
> > On 3/29/18 9:40 AM, Tomas Vondra wrote:
> >> On 03/28/2018 08:55 PM, David Steele wrote:
> >>> I'm setting this entry to Waiting on Author, but based on the discussion
> >>> I think it should be Returned with Feedback.
> >>
> >> Fine with me.
> >
> > This entry has been marked Returned with Feedback.
>
> I guess I'm in the minority here, but I don't see why it isn't useful
> to have something like this. It's a lot easier for a monitoring
> process to poll for this kind of thing than it is to monitor the logs.
> Not that log monitoring isn't a good idea, but this is pretty
> lightweight.

+1,

I think the issue this patch is addressing is that PostgreSQL now
doesn't have true uptime function.  We now recommend using
pg_postmaster_start_time() for uptime calculation, but that isn't it.
When backed crashes, shmen is reinitialized and PostgreSQL is
restarted then uptime should be reset to zero, but
pg_postmaster_start_time() is not about that.

The major argument against this patch I got from the thread is that we
shouldn't introduce new functions exposing information, which could be
already fetched from logs.  However, I think following this logic we
should remove pg_postmaster_start_time() too.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Alexander Korotkov
On Sun, Dec 23, 2018 at 11:14 PM Alexander Korotkov
<[hidden email]> wrote:

> On Tue, Apr 10, 2018 at 8:58 PM Robert Haas <[hidden email]> wrote:
> > On Tue, Apr 10, 2018 at 9:07 AM, David Steele <[hidden email]> wrote:
> > > On 3/29/18 9:40 AM, Tomas Vondra wrote:
> > >> On 03/28/2018 08:55 PM, David Steele wrote:
> > >>> I'm setting this entry to Waiting on Author, but based on the discussion
> > >>> I think it should be Returned with Feedback.
> > >>
> > >> Fine with me.
> > >
> > > This entry has been marked Returned with Feedback.
> >
> > I guess I'm in the minority here, but I don't see why it isn't useful
> > to have something like this. It's a lot easier for a monitoring
> > process to poll for this kind of thing than it is to monitor the logs.
> > Not that log monitoring isn't a good idea, but this is pretty
> > lightweight.
>
> +1,
>
> I think the issue this patch is addressing is that PostgreSQL now
> doesn't have true uptime function.  We now recommend using
> pg_postmaster_start_time() for uptime calculation, but that isn't it.
> When backed crashes, shmen is reinitialized and PostgreSQL is
> restarted then uptime should be reset to zero, but
> pg_postmaster_start_time() is not about that.
>
> The major argument against this patch I got from the thread is that we
> shouldn't introduce new functions exposing information, which could be
> already fetched from logs.  However, I think following this logic we
> should remove pg_postmaster_start_time() too.
I've revised the patchset.

0001-remove_dead_shmem_reinit_code-v1.patch
 * -n option is removed from getopt() argument, initdb help and documentation
 * some comments are slightly revised

0002-shmem_init_time_v1.patch
 * detailed function description is added to the documentation

From my point of view criticism of this patch was addressed by
argument, that pg_shmem_init_time() allows to calculate the server
uptime [1].  This is very basic information, which is reasonable to
get without log files parsing.  It's more than year since [1] is
unanswered.  So, I'm going to push this if no objections.

Links
1. https://www.postgresql.org/message-id/CAPpHfds3_CgNgK4nfcOBo2BfPMFS7yrDXDCPKJSqdGrpqoLwoQ%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-remove_dead_shmem_reinit_code-v1.patch (5K) Download Attachment
0002-shmem_init_time_v1.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Tom Lane-2
Alexander Korotkov <[hidden email]> writes:
> From my point of view criticism of this patch was addressed by
> argument, that pg_shmem_init_time() allows to calculate the server
> uptime [1].  This is very basic information, which is reasonable to
> get without log files parsing.  It's more than year since [1] is
> unanswered.  So, I'm going to push this if no objections.

I'm still going to object to it, on the grounds that

(1) it's exposing an implementation detail that clients should not be
concerned with, and that we might change in future.  The name isn't
even well chosen --- if the argument is that it's useful to monitor
server uptime, why isn't the name "pg_server_uptime"?

(2) if your server is crashing often enough that postmaster start
time isn't an adequate substitute, you have way worse problems than
whether you can measure it.  I'd rather see us put effort into
fixing whatever the underlying problem is.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Alexander Korotkov
On Sat, Feb 22, 2020 at 8:01 PM Tom Lane <[hidden email]> wrote:
> Alexander Korotkov <[hidden email]> writes:
> > From my point of view criticism of this patch was addressed by
> > argument, that pg_shmem_init_time() allows to calculate the server
> > uptime [1].  This is very basic information, which is reasonable to
> > get without log files parsing.  It's more than year since [1] is
> > unanswered.  So, I'm going to push this if no objections.
>
> I'm still going to object to it, on the grounds that

OK!

> (1) it's exposing an implementation detail that clients should not be
> concerned with, and that we might change in future.  The name isn't
> even well chosen --- if the argument is that it's useful to monitor
> server uptime, why isn't the name "pg_server_uptime"?

Choosing a more user-friendly name sounds like good idea for me.
pg_server_uptime() sounds like it should return an interval.  I'd like
this function to calculate a diff between timestamps.  I would rather
delegate it to user side.  What about pg_server_up_since()?

> (2) if your server is crashing often enough that postmaster start
> time isn't an adequate substitute, you have way worse problems than
> whether you can measure it.

Well, it's enough server to crash once per postmaster run to make this
measure absolutely inadequate.  We have different reasons for server
crash, not all of them are exact bugs.  OOM killer can cause a crash,
and it doesn't seem feasible we can exclude this reason completely.

> I'd rather see us put effort into
> fixing whatever the underlying problem is.

This is monitoring problem vs fixing problem tradeoff.  We had similar
for lwlock wait monitoring.  Ideally we should make our code never
stuck on lwlock, but that's not feasible.  So, we got lwlock wait
monitoring for problem diagnosis.  I think now we're discussing
similar issue.  Ideally, postgres should never crash.  But that's too
hard to achieve.  But we can easily get better monitoring on server
crashes and that's useful.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Robert Haas
In reply to this post by Tom Lane-2
On Sat, Feb 22, 2020 at 10:31 PM Tom Lane <[hidden email]> wrote:

> I'm still going to object to it, on the grounds that
>
> (1) it's exposing an implementation detail that clients should not be
> concerned with, and that we might change in future.  The name isn't
> even well chosen --- if the argument is that it's useful to monitor
> server uptime, why isn't the name "pg_server_uptime"?
>
> (2) if your server is crashing often enough that postmaster start
> time isn't an adequate substitute, you have way worse problems than
> whether you can measure it.  I'd rather see us put effort into
> fixing whatever the underlying problem is.

I don't accept the first argument, because I think the chances that
we'll change our basic approach to this problem are indistinguishable
from zero.  A few years back, you criticized some idea of mine as
tantamount to rm -rf src/backend/optimizer, which you were not ready
to endorse. That proposal was surely vastly less ambitious than some
proposal which would remove the idea of shared memory reinitialization
after an unsafe backend exit.

I don't accept the second argument either, because it's a false
dichotomy. Adding this function won't reduce the amount of energy that
gets spent fixing crashes. There are always more crashes.

I realize that several other people voted against this proposal so I
don't think it's OK to commit a patch in this area unless some more
positive votes turn up, but I'm still +1 on the idea. Granted, we
don't want an infinite amount of clutter in the system catalogs, but I
have a hard time believing that this proposed function would get any
less use than the hyperbolic trig functions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

David Steele
On 2/24/20 10:57 PM, Robert Haas wrote:

> On Sat, Feb 22, 2020 at 10:31 PM Tom Lane <[hidden email]> wrote:
>> I'm still going to object to it, on the grounds that
>>
>> (1) it's exposing an implementation detail that clients should not be
>> concerned with, and that we might change in future.  The name isn't
>> even well chosen --- if the argument is that it's useful to monitor
>> server uptime, why isn't the name "pg_server_uptime"?
>>
>> (2) if your server is crashing often enough that postmaster start
>> time isn't an adequate substitute, you have way worse problems than
>> whether you can measure it.  I'd rather see us put effort into
>> fixing whatever the underlying problem is.
>
> I don't accept the first argument, because I think the chances that
> we'll change our basic approach to this problem are indistinguishable
> from zero.  A few years back, you criticized some idea of mine as
> tantamount to rm -rf src/backend/optimizer, which you were not ready
> to endorse. That proposal was surely vastly less ambitious than some
> proposal which would remove the idea of shared memory reinitialization
> after an unsafe backend exit.
>
> I don't accept the second argument either, because it's a false
> dichotomy. Adding this function won't reduce the amount of energy that
> gets spent fixing crashes. There are always more crashes.
>
> I realize that several other people voted against this proposal so I
> don't think it's OK to commit a patch in this area unless some more
> positive votes turn up, but I'm still +1 on the idea. Granted, we
> don't want an infinite amount of clutter in the system catalogs, but I
> have a hard time believing that this proposed function would get any
> less use than the hyperbolic trig functions.

Marking this Returned with Feedback again since we are still at an impasse.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Function to track shmem reinit time

Kyotaro Horiguchi-4
FWIW, I don't object for adding the feature like this (in other words
+1), since I see the advantages Alex mentioned is valid.  (Even though
most of our customers notices server restart by client
disconnections..)

At Wed, 8 Apr 2020 11:49:11 -0400, David Steele <[hidden email]> wrote in

> On 2/24/20 10:57 PM, Robert Haas wrote:
> > On Sat, Feb 22, 2020 at 10:31 PM Tom Lane <[hidden email]> wrote:
> >> I'm still going to object to it, on the grounds that
> >>
> >> (1) it's exposing an implementation detail that clients should not be
> >> concerned with, and that we might change in future.  The name isn't
> >> even well chosen --- if the argument is that it's useful to monitor
> >> server uptime, why isn't the name "pg_server_uptime"?
> >>
> >> (2) if your server is crashing often enough that postmaster start
> >> time isn't an adequate substitute, you have way worse problems than
> >> whether you can measure it.  I'd rather see us put effort into
> >> fixing whatever the underlying problem is.
> > I don't accept the first argument, because I think the chances that
> > we'll change our basic approach to this problem are indistinguishable
> > from zero.  A few years back, you criticized some idea of mine as
> > tantamount to rm -rf src/backend/optimizer, which you were not ready
> > to endorse. That proposal was surely vastly less ambitious than some
> > proposal which would remove the idea of shared memory reinitialization
> > after an unsafe backend exit.

I'm not sure the name is good, mainly for the reason of
user-friendliness. Alexander's proposal "pg_server_up_since()"
returning absolute time makes sense to me.

> > I don't accept the second argument either, because it's a false
> > dichotomy. Adding this function won't reduce the amount of energy that
> > gets spent fixing crashes. There are always more crashes.
> > I realize that several other people voted against this proposal so I
> > don't think it's OK to commit a patch in this area unless some more
> > positive votes turn up, but I'm still +1 on the idea. Granted, we
> > don't want an infinite amount of clutter in the system catalogs, but I
> > have a hard time believing that this proposed function would get any
> > less use than the hyperbolic trig functions.

I'm on Robert and Alexander side about this.  I don't think
introducing this necessarily means we are obliged to accept all
requests for "last $anything" for other events, like introducing the
hyperbolic functions doesn't mean to accept all new functions alike,
maybe.

> Marking this Returned with Feedback again since we are still at an
> impasse.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center