Printing backtrace of postgres processes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
58 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Robert Haas
On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <[hidden email]> wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.

I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.

> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)

You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't. EDB support today frequently asks customers to
attach gdb and take a backtrace that way, and that has risks which
this implementation does not: for example, suppose you were unlucky
enough to attach during a spinlock protected critical section, and
suppose you didn't continue the stopped process before the 60 second
timeout expired and some other process caused a PANIC. Even if this
implementation were to end up emitting a backtrace with a spinlock
held, it would remove the risk of leaving the process stopped while
holding a critical lock, and would in that sense be safer. However, as
soon as you make something like this accessible via an SQL callable
function, some people are going to start spamming it. And, as soon as
they do that, any risks inherent in the implementation are multiplied.
If it carries an 0.01% chance of crashing the system, we'll have
people taking production systems down with this all the time. At that
point I wouldn't want the feature, even if the gdb approach had the
same risk (which I don't think it does).

What do you see as the main safety risks here?

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <[hidden email]> wrote:
>> (Personally, I think this whole patch fails the safety-vs-usefulness
>> tradeoff, but I expect I'll get shouted down.)

> What do you see as the main safety risks here?

The thing that is scaring me the most is the "broadcast" aspect.
For starters, I think that that is going to be useless in the
field because of the likelihood that different backends' stack
traces will get interleaved in whatever log file the traces are
going to.  Also, having hundreds of processes spitting dozens of
lines to the same place at the same time is going to expose any
little weaknesses in your logging arrangements, such as rsyslog's
tendency to drop messages under load.

I think it's got security hazards as well.  If we restricted the
feature to cause a trace of only one process at a time, and required
that process to be logged in as the same database user as the
requestor (or at least someone with the privs of that user, such
as a superuser), that'd be less of an issue.

Beyond that, well, maybe it's all right.  In theory anyplace that
there's a CHECK_FOR_INTERRUPTS should be okay to call elog from;
but it's completely untested whether we can do that and then
continue, as opposed to aborting the transaction or whole session.
I share your estimate that there'll be small-fraction-of-a-percent
hazards that could still add up to dangerous instability if people
go wild with this.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Robert Haas
On Tue, Jan 19, 2021 at 12:50 PM Tom Lane <[hidden email]> wrote:
> The thing that is scaring me the most is the "broadcast" aspect.
> For starters, I think that that is going to be useless in the
> field because of the likelihood that different backends' stack
> traces will get interleaved in whatever log file the traces are
> going to.  Also, having hundreds of processes spitting dozens of
> lines to the same place at the same time is going to expose any
> little weaknesses in your logging arrangements, such as rsyslog's
> tendency to drop messages under load.

+1. I don't think broadcast is a good idea.

> I think it's got security hazards as well.  If we restricted the
> feature to cause a trace of only one process at a time, and required
> that process to be logged in as the same database user as the
> requestor (or at least someone with the privs of that user, such
> as a superuser), that'd be less of an issue.

I am not sure I see a security hazard here. I think the checks for
this should have the same structure as pg_terminate_backend() and
pg_cancel_backend(); whatever is required there should be required
here, too, but not more, unless we have a real clear reason for such
an inconsistency.

> Beyond that, well, maybe it's all right.  In theory anyplace that
> there's a CHECK_FOR_INTERRUPTS should be okay to call elog from;
> but it's completely untested whether we can do that and then
> continue, as opposed to aborting the transaction or whole session.

I guess that's a theoretical risk but it doesn't seem very likely.
And, if we do have such a problem, I think that'd probably be a case
of bad code that we would want to fix either way.

> I share your estimate that there'll be small-fraction-of-a-percent
> hazards that could still add up to dangerous instability if people
> go wild with this.

Right. I was more concerned about whether we could, for example, crash
while inside the function that generates the backtrace, on some
platforms or in some scenarios. That would be super-sad. I assume that
the people who wrote the code tried to make sure that didn't happen
but I don't really know what's reasonable to expect.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Tue, Jan 19, 2021 at 12:50 PM Tom Lane <[hidden email]> wrote:
>> I think it's got security hazards as well.  If we restricted the
>> feature to cause a trace of only one process at a time, and required
>> that process to be logged in as the same database user as the
>> requestor (or at least someone with the privs of that user, such
>> as a superuser), that'd be less of an issue.

> I am not sure I see a security hazard here. I think the checks for
> this should have the same structure as pg_terminate_backend() and
> pg_cancel_backend(); whatever is required there should be required
> here, too, but not more, unless we have a real clear reason for such
> an inconsistency.

Yeah, agreed.  The "broadcast" option seems inconsistent with doing
things that way, but I don't have a problem with being able to send
a trace signal to the same processes you could terminate.

>> I share your estimate that there'll be small-fraction-of-a-percent
>> hazards that could still add up to dangerous instability if people
>> go wild with this.

> Right. I was more concerned about whether we could, for example, crash
> while inside the function that generates the backtrace, on some
> platforms or in some scenarios. That would be super-sad. I assume that
> the people who wrote the code tried to make sure that didn't happen
> but I don't really know what's reasonable to expect.

Recursion is scary, but it should (I think) not be possible if this
is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
of those in libbacktrace.

One point here is that it might be a good idea to suppress elog.c's
calls to functions in the error context stack.  As we saw in another
connection recently, allowing that to happen makes for a *very*
large increase in the footprint of code that you are expecting to
work at any random CHECK_FOR_INTERRUPTS call site.

BTW, it also looks like the patch is doing nothing to prevent the
backtrace from being sent to the connected client.  I'm not sure
what I think about whether it'd be okay from a security standpoint
to do that on the connection that requested the trace, but I sure
as heck don't want it to happen on connections that didn't.  Also,
whatever you think about security concerns, it seems like there'd be
pretty substantial reentrancy hazards if the interrupt occurs
anywhere in code dealing with normal client communication.

Maybe, given all of these things, we should forget using elog at
all and just emit the trace with fprintf(stderr).  That seems like
it would decrease the odds of trouble by about an order of magnitude.
It would make it more painful to collect the trace in some logging
setups, of course.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

vignesh C
On Wed, Jan 20, 2021 at 2:52 AM Tom Lane <[hidden email]> wrote:

>
> Robert Haas <[hidden email]> writes:
> > On Tue, Jan 19, 2021 at 12:50 PM Tom Lane <[hidden email]> wrote:
> >> I think it's got security hazards as well.  If we restricted the
> >> feature to cause a trace of only one process at a time, and required
> >> that process to be logged in as the same database user as the
> >> requestor (or at least someone with the privs of that user, such
> >> as a superuser), that'd be less of an issue.
>
> > I am not sure I see a security hazard here. I think the checks for
> > this should have the same structure as pg_terminate_backend() and
> > pg_cancel_backend(); whatever is required there should be required
> > here, too, but not more, unless we have a real clear reason for such
> > an inconsistency.
>
> Yeah, agreed.  The "broadcast" option seems inconsistent with doing
> things that way, but I don't have a problem with being able to send
> a trace signal to the same processes you could terminate.
>

The current patch supports both getting the trace for all the
processes of that instance and getting the trace for a particular
process, I'm understanding the concern here with broadcasting to all
the connected backends, I will remove the functionality to get trace
for all the processes. And I will also change the trace of a single
process in such a way that the user can get the trace of only the
processes which that user could terminate.

> >> I share your estimate that there'll be small-fraction-of-a-percent
> >> hazards that could still add up to dangerous instability if people
> >> go wild with this.
>
> > Right. I was more concerned about whether we could, for example, crash
> > while inside the function that generates the backtrace, on some
> > platforms or in some scenarios. That would be super-sad. I assume that
> > the people who wrote the code tried to make sure that didn't happen
> > but I don't really know what's reasonable to expect.
>
> Recursion is scary, but it should (I think) not be possible if this
> is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
> of those in libbacktrace.
>
> One point here is that it might be a good idea to suppress elog.c's
> calls to functions in the error context stack.  As we saw in another
> connection recently, allowing that to happen makes for a *very*
> large increase in the footprint of code that you are expecting to
> work at any random CHECK_FOR_INTERRUPTS call site.
>
> BTW, it also looks like the patch is doing nothing to prevent the
> backtrace from being sent to the connected client.  I'm not sure
> what I think about whether it'd be okay from a security standpoint
> to do that on the connection that requested the trace, but I sure
> as heck don't want it to happen on connections that didn't.  Also,
> whatever you think about security concerns, it seems like there'd be
> pretty substantial reentrancy hazards if the interrupt occurs
> anywhere in code dealing with normal client communication.
>
> Maybe, given all of these things, we should forget using elog at
> all and just emit the trace with fprintf(stderr).  That seems like
> it would decrease the odds of trouble by about an order of magnitude.
> It would make it more painful to collect the trace in some logging
> setups, of course.

I would prefer if the trace appears in the log file rather on the
console, as you rightly pointed out it will be difficult to collect
the trace of fprint(stderr). Let me know if I misunderstood. I think
you are concerned about the problem where elog logs the trace to the
client also. Can we use LOG_SERVER_ONLY log level that would prevent
it from logging to the client. That way we can keep the elog
implementation itself.

Thoughts?


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Craig Ringer-5
In reply to this post by Robert Haas
On Wed, 20 Jan 2021 at 01:31, Robert Haas <[hidden email]> wrote:
On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <[hidden email]> wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.

I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.

> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)

You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't.

Agreed.

In theory we should be able to do this sort of thing using external trace and diagnostic tools like perf, systemtap, etc. In practice, these tools tend to be quite version-sensitive and hard to get right without multiple rounds of back-and-forth to deal with specifics of the site's setup, installed debuginfo or lack thereof, specific tool versions, etc.

It's quite common to have to fall back on attaching gdb with a breakpoint on a function in the export symbol table (so it works w/o debuginfo), saving a core, and then analysing the core on a separate system on which debuginfo is available for all the loaded modules. It's a major pain.

The ability to get a basic bt from within Pg is strongly desirable. IIRC gdb's basic unwinder works without external debuginfo, if not especially well. libunwind produces much better results, but that didn't pass the extra-dependency bar when backtracing support was introduced to core postgres.

On a side note, to help get better diagnostics I've also been meaning to look into building --enable-debug with -ggdb3 so we can embed macro info, and using dwz to deduplicate+compress the debuginfo so we can encourage people to install it by default on production. I also want to start exporting pointers to all the important data symbols for diagnostic use, even if we do so in a separate ELF section just for debug use.
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Craig Ringer-5
In reply to this post by Tom Lane-2
On Wed, 20 Jan 2021 at 05:23, Tom Lane <[hidden email]> wrote:

Recursion is scary, but it should (I think) not be possible if this
is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
of those in libbacktrace.

We can also hold interrupts for the call, and it might be wise to do so.

One point here is that it might be a good idea to suppress elog.c's
calls to functions in the error context stack.  As we saw in another
connection recently, allowing that to happen makes for a *very*
large increase in the footprint of code that you are expecting to
work at any random CHECK_FOR_INTERRUPTS call site.

I strongly agree. Treat it as errhidecontext().

BTW, it also looks like the patch is doing nothing to prevent the
backtrace from being sent to the connected client.  I'm not sure
what I think about whether it'd be okay from a security standpoint
to do that on the connection that requested the trace, but I sure
as heck don't want it to happen on connections that didn't.

I don't see a good reason to send a bt to a client. Even though these backtraces won't be analysing debuginfo and populating args, locals, etc, it should still just go to the server log.
 
Maybe, given all of these things, we should forget using elog at
all and just emit the trace with fprintf(stderr).

That causes quite a lot of pain with MemoryContextStats() already as it's frequently difficult to actually locate the output given the variations that exist in customer logging configurations. Sometimes stderr goes to a separate file or to journald. It's also much harder to locate the desired output since there's no log_line_prefix. I have a WIP patch floating around somewhere that tries to teach MemoryContextStats to write to the ereport channel when not called during an actual out-of-memory situation for that reason; an early version is somewhere in the archives.

This is one of those "ok in development, painful in production" situations.

So I'm not a big fan of pushing it to stderr, though I'd rather have that than not have the ability at all.
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Tom Lane-2
Craig Ringer <[hidden email]> writes:
> On Wed, 20 Jan 2021 at 05:23, Tom Lane <[hidden email]> wrote:
>> BTW, it also looks like the patch is doing nothing to prevent the
>> backtrace from being sent to the connected client.

> I don't see a good reason to send a bt to a client. Even though these
> backtraces won't be analysing debuginfo and populating args, locals, etc,
> it should still just go to the server log.

Yeah.  That's easier than I was thinking, we just need to
s/LOG/LOG_SERVER_ONLY/.

>> Maybe, given all of these things, we should forget using elog at
>> all and just emit the trace with fprintf(stderr).

> That causes quite a lot of pain with MemoryContextStats() already

True.  Given the changes discussed in the last couple messages, I don't
see any really killer reasons why we can't ship the trace through elog.
We can always try that first, and back off to fprintf if we do find
reasons why it's too unstable.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Craig Ringer-5
On Thu, 21 Jan 2021 at 09:56, Tom Lane <[hidden email]> wrote:
Craig Ringer <[hidden email]> writes:
> On Wed, 20 Jan 2021 at 05:23, Tom Lane <[hidden email]> wrote:
>> BTW, it also looks like the patch is doing nothing to prevent the
>> backtrace from being sent to the connected client.

> I don't see a good reason to send a bt to a client. Even though these
> backtraces won't be analysing debuginfo and populating args, locals, etc,
> it should still just go to the server log.

Yeah.  That's easier than I was thinking, we just need to
s/LOG/LOG_SERVER_ONLY/.

>> Maybe, given all of these things, we should forget using elog at
>> all and just emit the trace with fprintf(stderr).

> That causes quite a lot of pain with MemoryContextStats() already

True.  Given the changes discussed in the last couple messages, I don't
see any really killer reasons why we can't ship the trace through elog.
We can always try that first, and back off to fprintf if we do find
reasons why it's too unstable.

Yep, works for me.

Thanks for being open to considering this.

I know lots of this stuff can seem like a pointless sidetrack because the utility of it is not obvious on dev systems or when you're doing your own hands-on expert support on systems you own and operate yourself. These sorts of things really only start to make sense when you're touching many different postgres systems "in the wild" - such as commercial support services, helping people on -general, -bugs or stackoverflow, etc.

I really appreciate your help with it.
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Robert Haas
On Wed, Jan 20, 2021 at 9:24 PM Craig Ringer
<[hidden email]> wrote:
> I know lots of this stuff can seem like a pointless sidetrack because the utility of it is not obvious on dev systems or when you're doing your own hands-on expert support on systems you own and operate yourself. These sorts of things really only start to make sense when you're touching many different postgres systems "in the wild" - such as commercial support services, helping people on -general, -bugs or stackoverflow, etc.
>
> I really appreciate your help with it.

Big +1 for all that from me, too.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

vignesh C
In reply to this post by Tom Lane-2
On Thu, Jan 21, 2021 at 7:26 AM Tom Lane <[hidden email]> wrote:

>
> Craig Ringer <[hidden email]> writes:
> > On Wed, 20 Jan 2021 at 05:23, Tom Lane <[hidden email]> wrote:
> >> BTW, it also looks like the patch is doing nothing to prevent the
> >> backtrace from being sent to the connected client.
>
> > I don't see a good reason to send a bt to a client. Even though these
> > backtraces won't be analysing debuginfo and populating args, locals, etc,
> > it should still just go to the server log.
>
> Yeah.  That's easier than I was thinking, we just need to
> s/LOG/LOG_SERVER_ONLY/.
>
> >> Maybe, given all of these things, we should forget using elog at
> >> all and just emit the trace with fprintf(stderr).
>
> > That causes quite a lot of pain with MemoryContextStats() already
>
> True.  Given the changes discussed in the last couple messages, I don't
> see any really killer reasons why we can't ship the trace through elog.
> We can always try that first, and back off to fprintf if we do find
> reasons why it's too unstable.
>
Thanks all of them for the suggestions. Attached v3 patch which has
fixes based on the suggestions. It includes the following fixes: 1)
Removal of support to get callstack of all postgres process, user can
get only one process callstack. 2) Update the documentation. 3) Added
necessary checks for pg_print_callstack similar to
pg_terminate_backend. 4) Changed LOG to LOG_SERVER_ONLY.
Thoughts?

Regards,
Vignesh

v3-0001-Print-backtrace-of-postgres-process-that-are-part.patch (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Andres Freund
Hi,

On 2021-01-27 19:05:16 +0530, vignesh C wrote:

>  /*
> + * LogBackTrace
> + *
> + * Get the backtrace and log the backtrace to log file.
> + */
> +void
> +LogBackTrace(void)
> +{
> + int save_errno = errno;
> +
> + void   *buf[100];
> + int nframes;
> + char  **strfrms;
> + StringInfoData errtrace;
> +
> + /* OK to process messages.  Reset the flag saying there are more to do. */
> + PrintBacktracePending = false;

ISTM that it'd be better to do this in the caller, allowing this
function to be used outside of signal triggered backtraces.

>  
> +extern void LogBackTrace(void); /* Called from EmitProcSignalPrintCallStack */

I don't think this comment is correct anymore?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

vignesh C
On Wed, Jan 27, 2021 at 10:40 PM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2021-01-27 19:05:16 +0530, vignesh C wrote:
>
> >  /*
> > + * LogBackTrace
> > + *
> > + * Get the backtrace and log the backtrace to log file.
> > + */
> > +void
> > +LogBackTrace(void)
> > +{
> > +     int                     save_errno = errno;
> > +
> > +     void       *buf[100];
> > +     int                     nframes;
> > +     char      **strfrms;
> > +     StringInfoData errtrace;
> > +
> > +     /* OK to process messages.  Reset the flag saying there are more to do. */
> > +     PrintBacktracePending = false;
>
> ISTM that it'd be better to do this in the caller, allowing this
> function to be used outside of signal triggered backtraces.
>
> >
> > +extern void LogBackTrace(void); /* Called from EmitProcSignalPrintCallStack */
>
> I don't think this comment is correct anymore?
Thanks for the comments, I have fixed and attached an updated patch
with the fixes for the same.
Thoughts?

Regards,
Vignesh

v4-0001-Print-backtrace-of-postgres-process-that-are-part.patch (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Bharath Rupireddy
On Thu, Jan 28, 2021 at 5:22 PM vignesh C <[hidden email]> wrote:
> Thanks for the comments, I have fixed and attached an updated patch
> with the fixes for the same.
> Thoughts?

Thanks for the patch. Here are few comments:

1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
check_valid_pid?

2) How about following in pg_signal_backend for more readability
+    if (ret != SIGNAL_BACKEND_SUCCESS)
+        return ret;
instead of
+    if (ret)
+        return ret;

3) How about validate_backend_pid or some better name instead of
check_valid_pid?

4) How about following
+                     errmsg("must be a superuser to print backtrace
of backend process")));
instead of
+                     errmsg("must be a superuser to print backtrace
of superuser query process")));

5) How about following
                     errmsg("must be a member of the role whose backed
process's backtrace is being printed or member of
pg_signal_backend")));
instead of
+                     errmsg("must be a member of the role whose
backtrace is being logged or member of pg_signal_backend")));

6) I'm not sure whether "backtrace" or "call stack" is a generic term
from the user/developer perspective. In the patch, the function name
and documentation says callstack(I think it is "call stack" actually),
but the error/warning messages says backtrace. IMHO, having
"backtrace" everywhere in the patch, even the function name changed to
pg_print_backtrace, looks better and consistent. Thoughts?

7) How about following in pg_print_callstack?
{
    int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
    bool        result = false;

        if (r == SIGNAL_BACKEND_SUCCESS)
        {
            if (EmitProcSignalPrintCallStack(bt_pid))
                result = true;
            else
                ereport(WARNING,
                        (errmsg("failed to send signal to postmaster: %m")));
        }

        PG_RETURN_BOOL(result);
}

8) How about following
+                (errmsg("backtrace generation is not supported by
this PostgresSQL installation")));
instead of
+                (errmsg("backtrace generation is not supported by
this installation")));

9) Typo - it's "example" +2) Using "addr2line -e postgres address", For exmple:

10) How about
+ * Handle print backtrace signal
instead of
+ * Handle receipt of an print backtrace.

11) Isn't below in documentation specific to Linux platform. What
happens if GDB is not there on the platform?
+<programlisting>
+1)  "info line *address" from gdb on postgres executable. For example:
+gdb ./postgres
+GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7

12) +The callstack will be logged in the log file. What happens if the
server is started without a log file , ./pg_ctl -D data start? Where
will the backtrace go?

13) Not sure, if it's an overkill, but how about pg_print_callstack
returning a warning/notice along with true, which just says, "See
<<<full log file name along with log directory>>>". Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

vignesh C
Thanks Bharath for your review comments. Please find my comments inline below.

On Thu, Jan 28, 2021 at 7:40 PM Bharath Rupireddy
<[hidden email]> wrote:

>
> On Thu, Jan 28, 2021 at 5:22 PM vignesh C <[hidden email]> wrote:
> > Thanks for the comments, I have fixed and attached an updated patch
> > with the fixes for the same.
> > Thoughts?
>
> Thanks for the patch. Here are few comments:
>
> 1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
> check_valid_pid?
>
I did not want to use SIGNAL_BACKEND_SUCCESS as we have not yet
signalled the backend process at this time. I have added
BACKEND_VALIDATION_SUCCESS macro and used it here for better
readability.

> 2) How about following in pg_signal_backend for more readability
> +    if (ret != SIGNAL_BACKEND_SUCCESS)
> +        return ret;
> instead of
> +    if (ret)
> +        return ret;
>

Modified it to ret != BACKEND_VALIDATION_SUCCESS

> 3) How about validate_backend_pid or some better name instead of
> check_valid_pid?
>

Modified it to validate_backend_pid

> 4) How about following
> +                     errmsg("must be a superuser to print backtrace
> of backend process")));
> instead of
> +                     errmsg("must be a superuser to print backtrace
> of superuser query process")));
>

Here the message should include superuser, we cannot remove it. Non
super user can log non super user provided if user has permissions for
it.

> 5) How about following
>                      errmsg("must be a member of the role whose backed
> process's backtrace is being printed or member of
> pg_signal_backend")));
> instead of
> +                     errmsg("must be a member of the role whose
> backtrace is being logged or member of pg_signal_backend")));
>

Modified it.

> 6) I'm not sure whether "backtrace" or "call stack" is a generic term
> from the user/developer perspective. In the patch, the function name
> and documentation says callstack(I think it is "call stack" actually),
> but the error/warning messages says backtrace. IMHO, having
> "backtrace" everywhere in the patch, even the function name changed to
> pg_print_backtrace, looks better and consistent. Thoughts?
>

Modified it to pg_print_backtrace.

> 7) How about following in pg_print_callstack?
> {
>     int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
>     bool        result = false;
>
>         if (r == SIGNAL_BACKEND_SUCCESS)
>         {
>             if (EmitProcSignalPrintCallStack(bt_pid))
>                 result = true;
>             else
>                 ereport(WARNING,
>                         (errmsg("failed to send signal to postmaster: %m")));
>         }
>
>         PG_RETURN_BOOL(result);
> }
>
Modified similarly with slight change.

> 8) How about following
> +                (errmsg("backtrace generation is not supported by
> this PostgresSQL installation")));
> instead of
> +                (errmsg("backtrace generation is not supported by
> this installation")));
>

I used the existing message to maintain consistency with
set_backtrace. I feel we can keep it the same.

> 9) Typo - it's "example" +2) Using "addr2line -e postgres address", For exmple:
>

Modified it.

> 10) How about
> + * Handle print backtrace signal
> instead of
> + * Handle receipt of an print backtrace.
>

I used the existing message to maintain consistency similar to
HandleProcSignalBarrierInterrupt. I feel we can keep it the same.

> 11) Isn't below in documentation specific to Linux platform. What
> happens if GDB is not there on the platform?
> +<programlisting>
> +1)  "info line *address" from gdb on postgres executable. For example:
> +gdb ./postgres
> +GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>

I have made changes "You can get the file name and line number by
using gdb/addr2line in linux platforms, as a prerequisite users must
ensure gdb/addr2line is already installed".

User will get an error like this in windows:
select pg_print_backtrace(pg_backend_pid());
WARNING:  backtrace generation is not supported by this installation
 pg_print_callstack
--------------------
 f
(1 row)

The backtrace will not be logged in case of windows, it will throw a
warning "backtrace generation is not supported by this installation"
Thoughts?

> 12) +The callstack will be logged in the log file. What happens if the
> server is started without a log file , ./pg_ctl -D data start? Where
> will the backtrace go?
>

Updated to: The backtrace will be logged to the log file if logging is
enabled, if logging is disabled backtrace will be logged to the
console where the postmaster was started.

> 13) Not sure, if it's an overkill, but how about pg_print_callstack
> returning a warning/notice along with true, which just says, "See
> <<<full log file name along with log directory>>>". Thoughts?

As you rightly pointed out it will be an overkill, I feel the existing
is easily understandable.

Attached v5 patch has the fixes for the same.
Thoughts?

Regards,
Vignesh

v5-0001-Print-backtrace-of-postgres-process-that-are-part.patch (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Bharath Rupireddy
On Fri, Jan 29, 2021 at 7:10 PM vignesh C <[hidden email]> wrote:

> > 4) How about following
> > +                     errmsg("must be a superuser to print backtrace
> > of backend process")));
> > instead of
> > +                     errmsg("must be a superuser to print backtrace
> > of superuser query process")));
> >
>
> Here the message should include superuser, we cannot remove it. Non
> super user can log non super user provided if user has permissions for
> it.
>
> > 5) How about following
> >                      errmsg("must be a member of the role whose backed
> > process's backtrace is being printed or member of
> > pg_signal_backend")));
> > instead of
> > +                     errmsg("must be a member of the role whose
> > backtrace is being logged or member of pg_signal_backend")));
> >
>
> Modified it.

Maybe I'm confused here to understand the difference between
SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
corresponding error messages. Some clarification/use case to know in
which scenarios we hit those error messages might help me. Did we try
to add test cases that show up these error messages for
pg_print_backtrace? If not, can we add?

> Attached v5 patch has the fixes for the same.
> Thoughts?

Thanks. Here are some comments on v5 patch:

1) typo - it's "process" not "porcess" +    a backend porcess. For example:

2) select * from pg_print_backtrace(NULL);
postgres=# select proname, proisstrict from pg_proc where proname =
'pg_print_backtrace';
      proname       | proisstrict
--------------------+-------------
 pg_print_backtrace | t

 See the documentation:
 "proisstrict bool

Function returns null if any call argument is null. In that case the
function won't actually be called at all. Functions that are not
“strict” must be prepared to handle null inputs."
So below PG_ARGISNUL check is not needed, you can remove that, because
pg_print_backtrace will not get called with null input.
int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);

3) Can we just set results = true instead of PG_RETURN_BOOL(true); so
that it will be returned from PG_RETURN_BOOL(result); just for
consistency?
            if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT,
InvalidBackendId))
                PG_RETURN_BOOL(true);
            else
                ereport(WARNING,
                        (errmsg("failed to send signal to postmaster: %m")));
        }

        PG_RETURN_BOOL(result);

4) Below is what happens if I request for a backtrace of the
postmaster process? 1388210 is pid of postmaster.
postgres=# select * from pg_print_backtrace(1388210);
WARNING:  PID 1388210 is not a PostgreSQL server process
 pg_print_backtrace
--------------------
 f

Does it make sense to have a postmaster's backtrace? If yes, can we
have something like below in sigusr1_handler()?
    if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
    {
        LogBackTrace();
    }

5) Can we have PROCSIG_PRINT_BACKTRACE instead of
PROCSIG_BACKTRACE_PRINT, just for readability and consistency with
function name?

6) I think it's not the postmaster that prints backtrace of the
requested backend and we don't send SIGUSR1 with
PMSIGNAL_BACKTRACE_EMIT to postmaster, I think it's the backends, upon
receiving SIGUSR1 with PMSIGNAL_BACKTRACE_EMIT signal, log their own
backtrace. Am I missing anything here? If I'm correct, we need to
change the below description and other places wherever we refer to
this description.

The idea here is to implement & expose pg_print_backtrace function, internally
what this function does is, the connected backend will send SIGUSR1 signal by
setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Once the process
receives this signal it will log the backtrace of the process.

7) Can we get the parallel worker's backtrace? IIUC it's possible.

8) What happens if a user runs pg_print_backtrace() on Windows or
MacOS or some other platform? Do you want to say something about other
platforms where gdb/addr2line doesn't exist?
+</programlisting>
+    You can get the file name and line number by using gdb/addr2line in
+    linux platforms, as a prerequisite users must ensure gdb/addr2line is
+    already installed:
+<programlisting>

9) Can't we reuse set_backtrace with just adding a flag to
set_backtrace(ErrorData *edata, int num_skip, bool
is_print_backtrace_function), making it a non-static function and call
set_backtrace(NULL, 0, true)?

void
set_backtrace(ErrorData *edata, int num_skip, bool is_print_backtrace_function)
{
    StringInfoData errtrace;
-------
-------
    if (is_print_backtrace_function)
        elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
    else
        edata->backtrace = errtrace.data;
}

I think it will be good if we do this, because we can avoid duplicate
code in set_backtrace and LogBackTrace.

10) I think it's "pg_signal_backend" instead of "pg_print_backtrace"?
+        backtrace is being printed or the calling role has been granted
+        <literal>pg_print_backtrace</literal>, however only superusers can

11) In the documentation added, isn't it good if we talk a bit about
in which scenarios users can use this function? For instance,
something like "Use pg_print_backtrace to  know exactly where it's
currently waiting when a backend process hangs."?


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Bharath Rupireddy
On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
<[hidden email]> wrote:

> On Fri, Jan 29, 2021 at 7:10 PM vignesh C <[hidden email]> wrote:
> > > 4) How about following
> > > +                     errmsg("must be a superuser to print backtrace
> > > of backend process")));
> > > instead of
> > > +                     errmsg("must be a superuser to print backtrace
> > > of superuser query process")));
> > >
> >
> > Here the message should include superuser, we cannot remove it. Non
> > super user can log non super user provided if user has permissions for
> > it.
> >
> > > 5) How about following
> > >                      errmsg("must be a member of the role whose backed
> > > process's backtrace is being printed or member of
> > > pg_signal_backend")));
> > > instead of
> > > +                     errmsg("must be a member of the role whose
> > > backtrace is being logged or member of pg_signal_backend")));
> > >
> >
> > Modified it.
>
> Maybe I'm confused here to understand the difference between
> SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> corresponding error messages. Some clarification/use case to know in
> which scenarios we hit those error messages might help me. Did we try
> to add test cases that show up these error messages for
> pg_print_backtrace? If not, can we add?

Are these superuser and permission checks enough from a security
standpoint that we don't expose some sensitive information to the
user? Although I'm not sure, say from the backtrace printed and
attached to GDB, can users see the passwords or other sensitive
information from the system that they aren't supposed to see?

I'm sure this point would have been discussed upthread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Kyotaro Horiguchi-4
In reply to this post by vignesh C
At Fri, 29 Jan 2021 19:10:24 +0530, vignesh C <[hidden email]> wrote in
> Attached v5 patch has the fixes for the same.

PMSIGNAL_BACKTRACE_EMIT is not used anywhere?
(the commit message seems stale.)

+++ b/src/bin/pg_ctl/t/005_backtrace.pl

pg_ctl doesn't (or no longer?) seem relevant to this patch.


+ eval {
+ $current_logfiles = slurp_file($node->data_dir . '/current_logfiles');
+ };
+ last unless $@;

Is there any reason not just to do "while (! -e <filenmae)) { usleep(); }" ?


+logging_collector = on

I don't see a reason for turning on logging collector here.


+gdb ./postgres
+GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
+Copyright (C) 2013 Free Software Foundation, Inc.
+License GPLv3+: GNU GPL version 3 or later <literal>&lt;</literal>http://gnu.org/licenses/gpl.html<literal>&gt;</literal>
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
+and "show warranty" for details.
+This GDB was configured as "x86_64-redhat-linux-gnu".
+For bug reporting instructions, please see:
+<literal>&lt;</literal>http://www.gnu.org/software/gdb/bugs/<literal>&gt;</literal>...
+Reading symbols from /home/postgresdba/inst/bin/postgres...done.

Almost all of the banner lines seem to be useless here.


 #define SIGNAL_BACKEND_SUCCESS 0
+#define BACKEND_VALIDATION_SUCCESS 0
 #define SIGNAL_BACKEND_ERROR 1
 #define SIGNAL_BACKEND_NOPERMISSION 2
 #define SIGNAL_BACKEND_NOSUPERUSER 3

Even though I can share the feeling that SIGNAL_BACKEND_SUCCESS is a
bit odd to represent "sending signal is allowed", I feel that it's
better using the existing symbol than using the new symbol.


+validate_backend_pid(int pid)

The function needs a comment. The name is somewhat
confusing. check_privilege_to_send_singal()?

Maybe the return value of the function should be changed to an enum,
and its callers should use switch-case to handle the value.


+ if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT, InvalidBackendId))
+ PG_RETURN_BOOL(true);
+ else
+ ereport(WARNING,
+ (errmsg("failed to send signal to postmaster: %m")));
+ }
+
+ PG_RETURN_BOOL(result);

The variable "result" seems useless.


+ elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
+
+ errno = save_errno;
+}

You need to release the resouces held by the errtrace. And the
errtrace is a bit pointless. Why isn't it "backtrace"?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

Dilip Kumar-2
In reply to this post by vignesh C
On Fri, Jan 29, 2021 at 7:10 PM vignesh C <[hidden email]> wrote:

>
> Thanks Bharath for your review comments. Please find my comments inline below.
>
> On Thu, Jan 28, 2021 at 7:40 PM Bharath Rupireddy
> <[hidden email]> wrote:
> >
> > On Thu, Jan 28, 2021 at 5:22 PM vignesh C <[hidden email]> wrote:
> > > Thanks for the comments, I have fixed and attached an updated patch
> > > with the fixes for the same.
> > > Thoughts?
> >
> > Thanks for the patch. Here are few comments:
> >
> > 1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
> > check_valid_pid?
> >
>
> I did not want to use SIGNAL_BACKEND_SUCCESS as we have not yet
> signalled the backend process at this time. I have added
> BACKEND_VALIDATION_SUCCESS macro and used it here for better
> readability.
>
> > 2) How about following in pg_signal_backend for more readability
> > +    if (ret != SIGNAL_BACKEND_SUCCESS)
> > +        return ret;
> > instead of
> > +    if (ret)
> > +        return ret;
> >
>
> Modified it to ret != BACKEND_VALIDATION_SUCCESS
>
> > 3) How about validate_backend_pid or some better name instead of
> > check_valid_pid?
> >
>
> Modified it to validate_backend_pid
>
> > 4) How about following
> > +                     errmsg("must be a superuser to print backtrace
> > of backend process")));
> > instead of
> > +                     errmsg("must be a superuser to print backtrace
> > of superuser query process")));
> >
>
> Here the message should include superuser, we cannot remove it. Non
> super user can log non super user provided if user has permissions for
> it.
>
> > 5) How about following
> >                      errmsg("must be a member of the role whose backed
> > process's backtrace is being printed or member of
> > pg_signal_backend")));
> > instead of
> > +                     errmsg("must be a member of the role whose
> > backtrace is being logged or member of pg_signal_backend")));
> >
>
> Modified it.
>
> > 6) I'm not sure whether "backtrace" or "call stack" is a generic term
> > from the user/developer perspective. In the patch, the function name
> > and documentation says callstack(I think it is "call stack" actually),
> > but the error/warning messages says backtrace. IMHO, having
> > "backtrace" everywhere in the patch, even the function name changed to
> > pg_print_backtrace, looks better and consistent. Thoughts?
> >
>
> Modified it to pg_print_backtrace.
>
> > 7) How about following in pg_print_callstack?
> > {
> >     int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
> >     bool        result = false;
> >
> >         if (r == SIGNAL_BACKEND_SUCCESS)
> >         {
> >             if (EmitProcSignalPrintCallStack(bt_pid))
> >                 result = true;
> >             else
> >                 ereport(WARNING,
> >                         (errmsg("failed to send signal to postmaster: %m")));
> >         }
> >
> >         PG_RETURN_BOOL(result);
> > }
> >
>
> Modified similarly with slight change.
>
> > 8) How about following
> > +                (errmsg("backtrace generation is not supported by
> > this PostgresSQL installation")));
> > instead of
> > +                (errmsg("backtrace generation is not supported by
> > this installation")));
> >
>
> I used the existing message to maintain consistency with
> set_backtrace. I feel we can keep it the same.
>
> > 9) Typo - it's "example" +2) Using "addr2line -e postgres address", For exmple:
> >
>
> Modified it.
>
> > 10) How about
> > + * Handle print backtrace signal
> > instead of
> > + * Handle receipt of an print backtrace.
> >
>
> I used the existing message to maintain consistency similar to
> HandleProcSignalBarrierInterrupt. I feel we can keep it the same.
>
> > 11) Isn't below in documentation specific to Linux platform. What
> > happens if GDB is not there on the platform?
> > +<programlisting>
> > +1)  "info line *address" from gdb on postgres executable. For example:
> > +gdb ./postgres
> > +GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> >
>
> I have made changes "You can get the file name and line number by
> using gdb/addr2line in linux platforms, as a prerequisite users must
> ensure gdb/addr2line is already installed".
>
> User will get an error like this in windows:
> select pg_print_backtrace(pg_backend_pid());
> WARNING:  backtrace generation is not supported by this installation
>  pg_print_callstack
> --------------------
>  f
> (1 row)
>
> The backtrace will not be logged in case of windows, it will throw a
> warning "backtrace generation is not supported by this installation"
> Thoughts?
>
> > 12) +The callstack will be logged in the log file. What happens if the
> > server is started without a log file , ./pg_ctl -D data start? Where
> > will the backtrace go?
> >
>
> Updated to: The backtrace will be logged to the log file if logging is
> enabled, if logging is disabled backtrace will be logged to the
> console where the postmaster was started.
>
> > 13) Not sure, if it's an overkill, but how about pg_print_callstack
> > returning a warning/notice along with true, which just says, "See
> > <<<full log file name along with log directory>>>". Thoughts?
>
> As you rightly pointed out it will be an overkill, I feel the existing
> is easily understandable.
>
> Attached v5 patch has the fixes for the same.
> Thoughts?

1.
+ elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
+
+ errno = save_errno;

Can you add some comments that why we have chosen LOG_SERVER_ONLY?

2.
+pg_print_backtrace(PG_FUNCTION_ARGS)
+{
+ int bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+

The variable name bt_pid is a bit odd, can we just use pid?

3.
+Datum
+pg_print_backtrace(PG_FUNCTION_ARGS)
+{
+ int bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+
+#ifdef HAVE_BACKTRACE_SYMBOLS
+ {
+ bool result = false;
...
+
+ PG_RETURN_BOOL(result);
+ }
+#else
+ {
+ ereport(WARNING,
+ (errmsg("backtrace generation is not supported by this installation")));
+ PG_RETURN_BOOL(false);
+ }
+#endif

The result is just initialized to false and it is never updated?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Printing backtrace of postgres processes

vignesh C
In reply to this post by Bharath Rupireddy
Thanks Bharath for your comments.

On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
<[hidden email]> wrote:

>
> On Fri, Jan 29, 2021 at 7:10 PM vignesh C <[hidden email]> wrote:
> > > 4) How about following
> > > +                     errmsg("must be a superuser to print backtrace
> > > of backend process")));
> > > instead of
> > > +                     errmsg("must be a superuser to print backtrace
> > > of superuser query process")));
> > >
> >
> > Here the message should include superuser, we cannot remove it. Non
> > super user can log non super user provided if user has permissions for
> > it.
> >
> > > 5) How about following
> > >                      errmsg("must be a member of the role whose backed
> > > process's backtrace is being printed or member of
> > > pg_signal_backend")));
> > > instead of
> > > +                     errmsg("must be a member of the role whose
> > > backtrace is being logged or member of pg_signal_backend")));
> > >
> >
> > Modified it.
>
> Maybe I'm confused here to understand the difference between
> SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> corresponding error messages. Some clarification/use case to know in
> which scenarios we hit those error messages might help me. Did we try
> to add test cases that show up these error messages for
> pg_print_backtrace? If not, can we add?
I have tested this manually:

I have tested it manually, Here is the test I did:
Create 2 users:
create user test password 'test@123';
create user test1 password 'test@123';

Test1: Test print backtrace of a different user's session:
./psql -d postgres -U test
psql (14devel)
Type "help" for help.
postgres=> select pg_backend_pid();
 pg_backend_pid
----------------
          30142
(1 row)
------------------------------------------
./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30142);
ERROR:  must be a member of the role whose backtrace is being logged
or member of pg_signal_backend

The above will be successful after:
grant pg_signal_backend to test1;

Test1: Test for non super user trying to print backtrace of a super
user's session:
./psql -d postgres
psql (14devel)
Type "help" for help.
postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
          30211
(1 row)

./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30211);
ERROR:  must be a superuser to print backtrace of superuser process
I have not added any tests for this as we required 2 active sessions
and I did not see any existing framework for this.
This test should help in relating the behavior.

> > Attached v5 patch has the fixes for the same.
> > Thoughts?
>
> Thanks. Here are some comments on v5 patch:
>
> 1) typo - it's "process" not "porcess" +    a backend porcess. For example:
>

Modified.

> 2) select * from pg_print_backtrace(NULL);
> postgres=# select proname, proisstrict from pg_proc where proname =
> 'pg_print_backtrace';
>       proname       | proisstrict
> --------------------+-------------
>  pg_print_backtrace | t
>
>  See the documentation:
>  "proisstrict bool
>
> Function returns null if any call argument is null. In that case the
> function won't actually be called at all. Functions that are not
> “strict” must be prepared to handle null inputs."
> So below PG_ARGISNUL check is not needed, you can remove that, because
> pg_print_backtrace will not get called with null input.
> int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
>
Modified.

> 3) Can we just set results = true instead of PG_RETURN_BOOL(true); so
> that it will be returned from PG_RETURN_BOOL(result); just for
> consistency?
>             if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT,
> InvalidBackendId))
>                 PG_RETURN_BOOL(true);
>             else
>                 ereport(WARNING,
>                         (errmsg("failed to send signal to postmaster: %m")));
>         }
>
>         PG_RETURN_BOOL(result);
>
I felt existing is better as it will reduce one instruction of setting
first and then returning. There are only 2 places from where we
return. I felt we could directly return true or false.

> 4) Below is what happens if I request for a backtrace of the
> postmaster process? 1388210 is pid of postmaster.
> postgres=# select * from pg_print_backtrace(1388210);
> WARNING:  PID 1388210 is not a PostgreSQL server process
>  pg_print_backtrace
> --------------------
>  f
>
> Does it make sense to have a postmaster's backtrace? If yes, can we
> have something like below in sigusr1_handler()?
>     if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
>     {
>         LogBackTrace();
>     }
>
We had a discussion about this in [1]  earlier and felt including this
is not very useful.

> 5) Can we have PROCSIG_PRINT_BACKTRACE instead of
> PROCSIG_BACKTRACE_PRINT, just for readability and consistency with
> function name?
>

PROCSIG_PRINT_BACKTRACE is better, I have modified it.

> 6) I think it's not the postmaster that prints backtrace of the
> requested backend and we don't send SIGUSR1 with
> PMSIGNAL_BACKTRACE_EMIT to postmaster, I think it's the backends, upon
> receiving SIGUSR1 with PMSIGNAL_BACKTRACE_EMIT signal, log their own
> backtrace. Am I missing anything here? If I'm correct, we need to
> change the below description and other places wherever we refer to
> this description.
>
> The idea here is to implement & expose pg_print_backtrace function, internally
> what this function does is, the connected backend will send SIGUSR1 signal by
> setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Once the process
> receives this signal it will log the backtrace of the process.
>
Modified.

> 7) Can we get the parallel worker's backtrace? IIUC it's possible.
>

Yes we can get the parallel worker's backtrace. As this design is
driven based on CHECK_FOR_INTERRUPTS, it will be printed when
CHECK_FOR_INTERRUPTS is called.

> 8) What happens if a user runs pg_print_backtrace() on Windows or
> MacOS or some other platform? Do you want to say something about other
> platforms where gdb/addr2line doesn't exist?
> +</programlisting>
> +    You can get the file name and line number by using gdb/addr2line in
> +    linux platforms, as a prerequisite users must ensure gdb/addr2line is
> +    already installed:
> +<programlisting>
>

I have added the following:
This feature will be available, if the installer was generated on a
platform which had backtrace capturing capability. If not available it
will return false by throwing the following warning "WARNING:
backtrace generation is not supported by this installation"

> 9) Can't we reuse set_backtrace with just adding a flag to
> set_backtrace(ErrorData *edata, int num_skip, bool
> is_print_backtrace_function), making it a non-static function and call
> set_backtrace(NULL, 0, true)?
>
> void
> set_backtrace(ErrorData *edata, int num_skip, bool is_print_backtrace_function)
> {
>     StringInfoData errtrace;
> -------
> -------
>     if (is_print_backtrace_function)
>         elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
>     else
>         edata->backtrace = errtrace.data;
> }
>
> I think it will be good if we do this, because we can avoid duplicate
> code in set_backtrace and LogBackTrace.
>
Yes it is better, I have modified it to use the same function.

> 10) I think it's "pg_signal_backend" instead of "pg_print_backtrace"?
> +        backtrace is being printed or the calling role has been granted
> +        <literal>pg_print_backtrace</literal>, however only superusers can
>

Modified.

> 11) In the documentation added, isn't it good if we talk a bit about
> in which scenarios users can use this function? For instance,
> something like "Use pg_print_backtrace to  know exactly where it's
> currently waiting when a backend process hangs."?
>

Modified to include more details.

[1] - https:://www.postgresql.org/message-id/1778088.1606026308%40sss.pgh.pa.us
Attached v6 patch with the fixes.
Thoughts?

Regards,
Vignesh

v6-0001-Print-backtrace-of-specified-postgres-process.patch (29K) Download Attachment
123