Re: Query on pg_stat_activity table got stuck

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> Took some time to get stack trace as we didn't had root permission.
> Attaching stack trace of two process (out of many) stuck for same query
> below[1][2]

Hmm, the line numbers in your stack traces don't agree with either v10
or HEAD branches for me.  But assuming that you've correctly identified
where it's stuck:

> Seems like call is unable to come out of this loop :
> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

the only really obvious theory is that some process left its
st_changecount odd, which would more or less have to imply that
something threw an error between pgstat_increment_changecount_before
and pgstat_increment_changecount_after.  There's only one place
where that seems very plausible, namely pgstat_bestart, which is
doing a rather scary amount of stuff in between.  Are you using
either SSL or GSS?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> Yes we use SSL to connect to DB.

Hm.  I'm suspicious that one of the functions that fetch data for
an SSL connection threw an error.  In particular, it doesn't look
to be hard at all to make X509_NAME_to_cstring fall over --- an
encoding conversion failure would do it, even without any stretchy
assumptions about OOM this early in backend start.  Have you got
any SSL certificates floating around with non-ASCII subject name
or issuer name?

> From comment seems like each backend should have its own copy
> of PgBackendStatus, it means st_changecount should be different for each
> process. If st_changecount was corrupted for 1/2 process, how can it impact
> newly created process. So could you please help to understand then how come
> if we run new query via new console (means new process) that also is
> getting stuck.

No, that field is in shared memory.  The easiest way to get it reset
would be to restart the postmaster.  It's possible you could do it
without that, by starting enough sessions so that one of them takes
up the broken pg_stat_activity slot.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
Jeremy Schneider <[hidden email]> writes:
> Seems to me that at a minimum, this loop shouldn't go on forever. Even
> having an arbitrary, crazy high, hard-coded number of attempts before
> failure (like a million) would be better than spinning on the CPU
> forever - which is what we are seeing.

I don't think it's the readers' fault.  The problem is that the
writer is violating the protocol.  If we put an upper limit on
the number of spin cycles on the reader side, we'll just be creating
a new failure mode when a writer gets swapped out at the wrong moment.

IMO we need to (a) get the failure-prone code out of the critical
section, and then (b) fix the pgstat_increment_changecount macros
so that the critical sections around these shmem changes really are
critical sections (ie bump CritSectionCount).  That way, if somebody
makes the same mistake again, at least there'll be a pretty obvious
failure rather than a lot of stuck readers.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> We got more information about this issue. There is one backend process
> still present into beentry which has changecount as odd value. However this
> process is long gone/terminated. It means when this process was
> killed/terminated its entry was not cleaned from beentry list. There seems
> to be some shutdown hook which cleans beentry if process is
> killed/terminated that somehow was not kicked off ?

No, that's the point: there isn't any such cleanup code.  The design
idea here was that we'd only have short straight-line code segments
with no possible way to throw an error while changecount is odd.
Then somebody abused the heck out of that ...

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

        /*
         * Clear my status entry, following the protocol of bumping st_changecount
         * before and after.  We use a volatile pointer here to ensure the
         * compiler doesn't try to get cute.
         */
        pgstat_increment_changecount_before(beentry);

        beentry->st_procpid = 0; /* mark invalid */

        pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> Also curious why query on pg_stat_activity is considering terminated
> process ?

The short answer to that is that this bug leaves shared memory in
a corrupt state.  It's not really useful to worry about whether
readers should react differently to that --- you could spend endless
amounts of time hypothesizing different kinds of corruption and
endless amounts of code/cycles trying to work around it.  But the
right answer is to fix it on the writing side.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
neeraj kumar <[hidden email]> writes:
> --> But the right answer is to fix it on the writing side.

> Yes I agree with this part. Even though there is very low probability, a
> process can still be killed in middle when writing. So what is your
> suggestion on how to recover from this automatically?

Here's a draft patch.  This makes more cosmetic changes than are strictly
necessary to fix the bug, but I'm concerned about preventing future
mistakes of the same sort, not just dealing with the immediate bug.

Anyway the idea here is (1) reduce the size of the critical section in
pgstat_bestart where the changecount is odd, so that nothing except
data copying happens there, and (2) adjust the macros for writing
st_changecount so that those sections actually are critical sections.
That means that if something throws an error inside those sections,
it'll turn into a PANIC and database restart.  That shouldn't ever happen,
of course, but if it does then a PANIC is better than a frozen system.
I renamed the macros for writing st_changecount to make it more apparent
that those are now critical-section boundaries.

The out-of-line string fields make the pgstat_bestart code more ticklish
than one could wish, since they have to be treated differently from
in-line fields.  But I'm not sure there's much we can do to avoid that.
It doesn't seem like a good idea to change the layout of the shared-
memory structure, at least not in released branches.

I found two other embarrassing bugs while I was at it.  The stanza
initializing st_backendType was clearly inserted with the aid of a
dartboard, because it was actually before the initial st_changecount bump.
That probably has little if any real-world impact, but it's still an
indication of sloppy patching.  And pgstat_read_current_status had not
been taught to copy the st_gssstatus out-of-line structure to local
storage, so that those values might fail to hold still while a
transaction examines the pg_stat_activity data.  That *is* a live bug.

This patch is against HEAD --- I've not looked at how much adjustment
it'll need for the back branches, but I'm sure there's some.

                        regards, tom lane


diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 64986b1..25a1bd9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2849,66 +2849,79 @@ pgstat_initialize(void)
  * Apart from auxiliary processes, MyBackendId, MyDatabaseId,
  * session userid, and application_name must be set for a
  * backend (hence, this cannot be combined with pgstat_initialize).
+ * Note also that we must be inside a transaction.
  * ----------
  */
 void
 pgstat_bestart(void)
 {
- SockAddr clientaddr;
- volatile PgBackendStatus *beentry;
+ volatile PgBackendStatus *vbeentry = MyBEEntry;
+ PgBackendStatus lbeentry;
+#ifdef USE_SSL
+ PgBackendSSLStatus lsslstatus;
+#endif
+#ifdef ENABLE_GSS
+ PgBackendGSSStatus lgssstatus;
+#endif
 
- /*
- * To minimize the time spent modifying the PgBackendStatus entry, fetch
- * all the needed data first.
- */
+ /* pgstats state must be initialized from pgstat_initialize() */
+ Assert(vbeentry != NULL);
 
  /*
- * We may not have a MyProcPort (eg, if this is the autovacuum process).
- * If so, use all-zeroes client address, which is dealt with specially in
- * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+ * To minimize the time spent modifying the PgBackendStatus entry, and
+ * avoid risk of errors inside the critical section, we first copy the
+ * shared-memory struct to a local variable, then modify the data in the
+ * local variable, then copy the local variable back to shared memory.
+ * Only the last step has to be inside the critical section.
+ *
+ * Most of the data we copy here is just going to be overwritten, but the
+ * struct's not so large that it's worth the maintenance hassle to copy
+ * only the needful fields.
  */
- if (MyProcPort)
- memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
- else
- MemSet(&clientaddr, 0, sizeof(clientaddr));
+ memcpy(&lbeentry,
+   unvolatize(PgBackendStatus *, vbeentry),
+   sizeof(PgBackendStatus));
+
+ /* These structs can just start from zeroes each time, though */
+#ifdef USE_SSL
+ memset(&lsslstatus, 0, sizeof(lsslstatus));
+#endif
+#ifdef ENABLE_GSS
+ memset(&lgssstatus, 0, sizeof(lgssstatus));
+#endif
 
  /*
- * Initialize my status entry, following the protocol of bumping
- * st_changecount before and after; and make sure it's even afterwards. We
- * use a volatile pointer here to ensure the compiler doesn't try to get
- * cute.
+ * Now fill in all the fields of lbeentry, except strings that are
+ * out-of-line data.
  */
- beentry = MyBEEntry;
-
- /* pgstats state must be initialized from pgstat_initialize() */
- Assert(beentry != NULL);
+ lbeentry.st_procpid = MyProcPid;
 
  if (MyBackendId != InvalidBackendId)
  {
  if (IsAutoVacuumLauncherProcess())
  {
  /* Autovacuum Launcher */
- beentry->st_backendType = B_AUTOVAC_LAUNCHER;
+ lbeentry.st_backendType = B_AUTOVAC_LAUNCHER;
  }
  else if (IsAutoVacuumWorkerProcess())
  {
  /* Autovacuum Worker */
- beentry->st_backendType = B_AUTOVAC_WORKER;
+ lbeentry.st_backendType = B_AUTOVAC_WORKER;
  }
  else if (am_walsender)
  {
  /* Wal sender */
- beentry->st_backendType = B_WAL_SENDER;
+ lbeentry.st_backendType = B_WAL_SENDER;
  }
  else if (IsBackgroundWorker)
  {
  /* bgworker */
- beentry->st_backendType = B_BG_WORKER;
+ lbeentry.st_backendType = B_BG_WORKER;
  }
  else
  {
  /* client-backend */
- beentry->st_backendType = B_BACKEND;
+ lbeentry.st_backendType = B_BACKEND;
  }
  }
  else
@@ -2918,99 +2931,92 @@ pgstat_bestart(void)
  switch (MyAuxProcType)
  {
  case StartupProcess:
- beentry->st_backendType = B_STARTUP;
+ lbeentry.st_backendType = B_STARTUP;
  break;
  case BgWriterProcess:
- beentry->st_backendType = B_BG_WRITER;
+ lbeentry.st_backendType = B_BG_WRITER;
  break;
  case CheckpointerProcess:
- beentry->st_backendType = B_CHECKPOINTER;
+ lbeentry.st_backendType = B_CHECKPOINTER;
  break;
  case WalWriterProcess:
- beentry->st_backendType = B_WAL_WRITER;
+ lbeentry.st_backendType = B_WAL_WRITER;
  break;
  case WalReceiverProcess:
- beentry->st_backendType = B_WAL_RECEIVER;
+ lbeentry.st_backendType = B_WAL_RECEIVER;
  break;
  default:
  elog(FATAL, "unrecognized process type: %d",
  (int) MyAuxProcType);
- proc_exit(1);
  }
  }
 
- do
- {
- pgstat_increment_changecount_before(beentry);
- } while ((beentry->st_changecount & 1) == 0);
-
- beentry->st_procpid = MyProcPid;
- beentry->st_proc_start_timestamp = MyStartTimestamp;
- beentry->st_activity_start_timestamp = 0;
- beentry->st_state_start_timestamp = 0;
- beentry->st_xact_start_timestamp = 0;
- beentry->st_databaseid = MyDatabaseId;
+ lbeentry.st_proc_start_timestamp = MyStartTimestamp;
+ lbeentry.st_activity_start_timestamp = 0;
+ lbeentry.st_state_start_timestamp = 0;
+ lbeentry.st_xact_start_timestamp = 0;
+ lbeentry.st_databaseid = MyDatabaseId;
 
  /* We have userid for client-backends, wal-sender and bgworker processes */
- if (beentry->st_backendType == B_BACKEND
- || beentry->st_backendType == B_WAL_SENDER
- || beentry->st_backendType == B_BG_WORKER)
- beentry->st_userid = GetSessionUserId();
+ if (lbeentry.st_backendType == B_BACKEND
+ || lbeentry.st_backendType == B_WAL_SENDER
+ || lbeentry.st_backendType == B_BG_WORKER)
+ lbeentry.st_userid = GetSessionUserId();
  else
- beentry->st_userid = InvalidOid;
+ lbeentry.st_userid = InvalidOid;
 
- beentry->st_clientaddr = clientaddr;
- if (MyProcPort && MyProcPort->remote_hostname)
- strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
- NAMEDATALEN);
+ /*
+ * We may not have a MyProcPort (eg, if this is the autovacuum process).
+ * If so, use all-zeroes client address, which is dealt with specially in
+ * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+ */
+ if (MyProcPort)
+ memcpy(&lbeentry.st_clientaddr, &MyProcPort->raddr,
+   sizeof(lbeentry.st_clientaddr));
  else
- beentry->st_clienthostname[0] = '\0';
+ MemSet(&lbeentry.st_clientaddr, 0, sizeof(lbeentry.st_clientaddr));
+
 #ifdef USE_SSL
  if (MyProcPort && MyProcPort->ssl != NULL)
  {
- beentry->st_ssl = true;
- beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
- beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
- strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
- strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
- be_tls_get_peer_subject_name(MyProcPort, beentry->st_sslstatus->ssl_client_dn, NAMEDATALEN);
- be_tls_get_peer_serial(MyProcPort, beentry->st_sslstatus->ssl_client_serial, NAMEDATALEN);
- be_tls_get_peer_issuer_name(MyProcPort, beentry->st_sslstatus->ssl_issuer_dn, NAMEDATALEN);
+ lbeentry.st_ssl = true;
+ lsslstatus.ssl_bits = be_tls_get_cipher_bits(MyProcPort);
+ lsslstatus.ssl_compression = be_tls_get_compression(MyProcPort);
+ strlcpy(lsslstatus.ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
+ strlcpy(lsslstatus.ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
+ be_tls_get_peer_subject_name(MyProcPort, lsslstatus.ssl_client_dn, NAMEDATALEN);
+ be_tls_get_peer_serial(MyProcPort, lsslstatus.ssl_client_serial, NAMEDATALEN);
+ be_tls_get_peer_issuer_name(MyProcPort, lsslstatus.ssl_issuer_dn, NAMEDATALEN);
  }
  else
  {
- beentry->st_ssl = false;
+ lbeentry.st_ssl = false;
  }
 #else
- beentry->st_ssl = false;
+ lbeentry.st_ssl = false;
 #endif
 
 #ifdef ENABLE_GSS
  if (MyProcPort && MyProcPort->gss != NULL)
  {
- beentry->st_gss = true;
- beentry->st_gssstatus->gss_auth = be_gssapi_get_auth(MyProcPort);
- beentry->st_gssstatus->gss_enc = be_gssapi_get_enc(MyProcPort);
+ lbeentry.st_gss = true;
+ lgssstatus.gss_auth = be_gssapi_get_auth(MyProcPort);
+ lgssstatus.gss_enc = be_gssapi_get_enc(MyProcPort);
 
- if (beentry->st_gssstatus->gss_auth)
- strlcpy(beentry->st_gssstatus->gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
+ if (lgssstatus.gss_auth)
+ strlcpy(lgssstatus.gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
  }
  else
  {
- beentry->st_gss = false;
+ lbeentry.st_gss = false;
  }
 #else
- beentry->st_gss = false;
+ lbeentry.st_gss = false;
 #endif
- beentry->st_state = STATE_UNDEFINED;
- beentry->st_appname[0] = '\0';
- beentry->st_activity_raw[0] = '\0';
- /* Also make sure the last byte in each string area is always 0 */
- beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
- beentry->st_appname[NAMEDATALEN - 1] = '\0';
- beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
- beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
- beentry->st_progress_command_target = InvalidOid;
+
+ lbeentry.st_state = STATE_UNDEFINED;
+ lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
+ lbeentry.st_progress_command_target = InvalidOid;
 
  /*
  * we don't zero st_progress_param here to save cycles; nobody should
@@ -3018,7 +3024,45 @@ pgstat_bestart(void)
  * than PROGRESS_COMMAND_INVALID
  */
 
- pgstat_increment_changecount_after(beentry);
+ /*
+ * Finally, we can enter the critical section that fills the shared-memory
+ * status entry.  We follow the protocol of bumping st_changecount before
+ * and after; and make sure it's even afterwards.  We use a volatile
+ * pointer here to ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(vbeentry);
+
+ /* make sure we'll memcpy the same st_changecount back */
+ lbeentry.st_changecount = vbeentry->st_changecount;
+
+ memcpy(unvolatize(PgBackendStatus *, vbeentry),
+   &lbeentry,
+   sizeof(PgBackendStatus));
+
+ /*
+ * We can write the out-of-line strings and structs using the pointers
+ * that are in lbeentry; this saves some de-volatilizing messiness.
+ */
+ lbeentry.st_appname[0] = '\0';
+ if (MyProcPort && MyProcPort->remote_hostname)
+ strlcpy(lbeentry.st_clienthostname, MyProcPort->remote_hostname,
+ NAMEDATALEN);
+ else
+ lbeentry.st_clienthostname[0] = '\0';
+ lbeentry.st_activity_raw[0] = '\0';
+ /* Also make sure the last byte in each string area is always 0 */
+ lbeentry.st_appname[NAMEDATALEN - 1] = '\0';
+ lbeentry.st_clienthostname[NAMEDATALEN - 1] = '\0';
+ lbeentry.st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
+
+#ifdef USE_SSL
+ memcpy(lbeentry.st_sslstatus, &lsslstatus, sizeof(PgBackendSSLStatus));
+#endif
+#ifdef ENABLE_GSS
+ memcpy(lbeentry.st_gssstatus, &lgssstatus, sizeof(PgBackendGSSStatus));
+#endif
+
+ PGSTAT_END_WRITE_ACTIVITY(vbeentry);
 
  /* Update app name to current GUC setting */
  if (application_name)
@@ -3053,11 +3097,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
  * before and after.  We use a volatile pointer here to ensure the
  * compiler doesn't try to get cute.
  */
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 
  beentry->st_procpid = 0; /* mark invalid */
 
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 
@@ -3096,7 +3140,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
  * non-disabled state.  As our final update, change the state and
  * clear fields we will not be updating anymore.
  */
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
  beentry->st_state = STATE_DISABLED;
  beentry->st_state_start_timestamp = 0;
  beentry->st_activity_raw[0] = '\0';
@@ -3104,14 +3148,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
  /* st_xact_start_timestamp and wait_event_info are also disabled */
  beentry->st_xact_start_timestamp = 0;
  proc->wait_event_info = 0;
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
  }
  return;
  }
 
  /*
- * To minimize the time spent modifying the entry, fetch all the needed
- * data first.
+ * To minimize the time spent modifying the entry, and avoid risk of
+ * errors inside the critical section, fetch all the needed data first.
  */
  start_timestamp = GetCurrentStatementStartTimestamp();
  if (cmd_str != NULL)
@@ -3128,7 +3172,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
  /*
  * Now update the status entry
  */
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 
  beentry->st_state = state;
  beentry->st_state_start_timestamp = current_timestamp;
@@ -3140,7 +3184,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
  beentry->st_activity_start_timestamp = start_timestamp;
  }
 
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /*-----------
@@ -3158,11 +3202,11 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
  if (!beentry || !pgstat_track_activities)
  return;
 
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
  beentry->st_progress_command = cmdtype;
  beentry->st_progress_command_target = relid;
  MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /*-----------
@@ -3181,9 +3225,9 @@ pgstat_progress_update_param(int index, int64 val)
  if (!beentry || !pgstat_track_activities)
  return;
 
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
  beentry->st_progress_param[index] = val;
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /*-----------
@@ -3203,7 +3247,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
  if (!beentry || !pgstat_track_activities || nparam == 0)
  return;
 
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 
  for (i = 0; i < nparam; ++i)
  {
@@ -3212,7 +3256,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
  beentry->st_progress_param[index[i]] = val[i];
  }
 
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /*-----------
@@ -3233,10 +3277,10 @@ pgstat_progress_end_command(void)
  && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
  return;
 
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
  beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
  beentry->st_progress_command_target = InvalidOid;
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /* ----------
@@ -3262,12 +3306,12 @@ pgstat_report_appname(const char *appname)
  * st_changecount before and after.  We use a volatile pointer here to
  * ensure the compiler doesn't try to get cute.
  */
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 
  memcpy((char *) beentry->st_appname, appname, len);
  beentry->st_appname[len] = '\0';
 
- pgstat_increment_changecount_after(beentry);
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /*
@@ -3287,9 +3331,11 @@ pgstat_report_xact_timestamp(TimestampTz tstamp)
  * st_changecount before and after.  We use a volatile pointer here to
  * ensure the compiler doesn't try to get cute.
  */
- pgstat_increment_changecount_before(beentry);
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+
  beentry->st_xact_start_timestamp = tstamp;
- pgstat_increment_changecount_after(beentry);
+
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
 /* ----------
@@ -3311,6 +3357,9 @@ pgstat_read_current_status(void)
 #ifdef USE_SSL
  PgBackendSSLStatus *localsslstatus;
 #endif
+#ifdef ENABLE_GSS
+ PgBackendGSSStatus *localgssstatus;
+#endif
  int i;
 
  Assert(!pgStatRunningInCollector);
@@ -3344,6 +3393,11 @@ pgstat_read_current_status(void)
  MemoryContextAlloc(pgStatLocalContext,
    sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
 #endif
+#ifdef ENABLE_GSS
+ localgssstatus = (PgBackendGSSStatus *)
+ MemoryContextAlloc(pgStatLocalContext,
+   sizeof(PgBackendGSSStatus) * NumBackendStatSlots);
+#endif
 
  localNumBackends = 0;
 
@@ -3363,14 +3417,19 @@ pgstat_read_current_status(void)
  int before_changecount;
  int after_changecount;
 
- pgstat_save_changecount_before(beentry, before_changecount);
+ pgstat_begin_read_activity(beentry, before_changecount);
 
  localentry->backendStatus.st_procpid = beentry->st_procpid;
+ /* Skip all the data-copying work if entry is not in use */
  if (localentry->backendStatus.st_procpid > 0)
  {
  memcpy(&localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus));
 
  /*
+ * For each PgBackendStatus field that is a pointer, copy the
+ * pointed-to data, then adjust the local copy of the pointer
+ * field to point at the local copy of the data.
+ *
  * strcpy is safe even if the string is modified concurrently,
  * because there's always a \0 at the end of the buffer.
  */
@@ -3380,7 +3439,6 @@ pgstat_read_current_status(void)
  localentry->backendStatus.st_clienthostname = localclienthostname;
  strcpy(localactivity, (char *) beentry->st_activity_raw);
  localentry->backendStatus.st_activity_raw = localactivity;
- localentry->backendStatus.st_ssl = beentry->st_ssl;
 #ifdef USE_SSL
  if (beentry->st_ssl)
  {
@@ -3388,11 +3446,19 @@ pgstat_read_current_status(void)
  localentry->backendStatus.st_sslstatus = localsslstatus;
  }
 #endif
+#ifdef ENABLE_GSS
+ if (beentry->st_gss)
+ {
+ memcpy(localgssstatus, beentry->st_gssstatus, sizeof(PgBackendGSSStatus));
+ localentry->backendStatus.st_gssstatus = localgssstatus;
+ }
+#endif
  }
 
- pgstat_save_changecount_after(beentry, after_changecount);
- if (before_changecount == after_changecount &&
- (before_changecount & 1) == 0)
+ pgstat_end_read_activity(beentry, after_changecount);
+
+ if (pgstat_read_activity_complete(before_changecount,
+  after_changecount))
  break;
 
  /* Make sure we can break out of loop if stuck... */
@@ -3414,6 +3480,9 @@ pgstat_read_current_status(void)
 #ifdef USE_SSL
  localsslstatus++;
 #endif
+#ifdef ENABLE_GSS
+ localgssstatus++;
+#endif
  localNumBackends++;
  }
  }
@@ -4090,14 +4159,14 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
  int before_changecount;
  int after_changecount;
 
- pgstat_save_changecount_before(vbeentry, before_changecount);
+ pgstat_begin_read_activity(vbeentry, before_changecount);
 
  found = (vbeentry->st_procpid == pid);
 
- pgstat_save_changecount_after(vbeentry, after_changecount);
+ pgstat_end_read_activity(vbeentry, after_changecount);
 
- if (before_changecount == after_changecount &&
- (before_changecount & 1) == 0)
+ if (pgstat_read_activity_complete(before_changecount,
+  after_changecount))
  break;
 
  /* Make sure we can break out of loop if stuck... */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9fbc492..9299871 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1035,11 +1035,12 @@ typedef struct PgBackendStatus
  * the copy is valid; otherwise start over.  This makes updates cheap
  * while reads are potentially expensive, but that's the tradeoff we want.
  *
- * The above protocol needs the memory barriers to ensure that the
- * apparent order of execution is as it desires. Otherwise, for example,
- * the CPU might rearrange the code so that st_changecount is incremented
- * twice before the modification on a machine with weak memory ordering.
- * This surprising result can lead to bugs.
+ * The above protocol needs memory barriers to ensure that the apparent
+ * order of execution is as it desires.  Otherwise, for example, the CPU
+ * might rearrange the code so that st_changecount is incremented twice
+ * before the modification on a machine with weak memory ordering.  Hence,
+ * use the macros defined below for manipulating st_changecount, rather
+ * than touching it directly.
  */
  int st_changecount;
 
@@ -1099,42 +1100,66 @@ typedef struct PgBackendStatus
 } PgBackendStatus;
 
 /*
- * Macros to load and store st_changecount with the memory barriers.
+ * Macros to load and store st_changecount with appropriate memory barriers.
  *
- * pgstat_increment_changecount_before() and
- * pgstat_increment_changecount_after() need to be called before and after
- * PgBackendStatus entries are modified, respectively. This makes sure that
- * st_changecount is incremented around the modification.
+ * Use PGSTAT_BEGIN_WRITE_ACTIVITY() before, and PGSTAT_END_WRITE_ACTIVITY()
+ * after, modifying the current process's PgBackendStatus data.  Note that,
+ * since there is no mechanism for cleaning up st_changecount after an error,
+ * THESE MACROS FORM A CRITICAL SECTION.  Any error between them will be
+ * promoted to PANIC, causing a database restart to clean up shared memory!
+ * Hence, keep the critical section as short and straight-line as possible.
+ * Aside from being safer, that minimizes the window in which readers will
+ * have to loop.
  *
- * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
- * need to be called before and after PgBackendStatus entries are copied into
- * private memory, respectively.
+ * Reader logic should follow this sketch:
+ *
+ * for (;;)
+ * {
+ * int before_ct, after_ct;
+ *
+ * pgstat_begin_read_activity(beentry, before_ct);
+ * ... copy beentry data to local memory ...
+ * pgstat_end_read_activity(beentry, after_ct);
+ * if (pgstat_read_activity_complete(before_ct, after_ct))
+ * break;
+ * CHECK_FOR_INTERRUPTS();
+ * }
+ *
+ * For extra safety, we generally use volatile beentry pointers, although
+ * the memory barriers should theoretically be sufficient.
  */
-#define pgstat_increment_changecount_before(beentry) \
- do { \
- beentry->st_changecount++; \
+#define PGSTAT_BEGIN_WRITE_ACTIVITY(beentry) \
+ do { \
+ START_CRIT_SECTION(); \
+ (beentry)->st_changecount++; \
  pg_write_barrier(); \
  } while (0)
 
-#define pgstat_increment_changecount_after(beentry) \
- do { \
+#define PGSTAT_END_WRITE_ACTIVITY(beentry) \
+ do { \
  pg_write_barrier(); \
- beentry->st_changecount++; \
- Assert((beentry->st_changecount & 1) == 0); \
+ (beentry)->st_changecount++; \
+ Assert(((beentry)->st_changecount & 1) == 0); \
+ END_CRIT_SECTION(); \
  } while (0)
 
-#define pgstat_save_changecount_before(beentry, save_changecount) \
- do { \
- save_changecount = beentry->st_changecount; \
- pg_read_barrier(); \
+#define pgstat_begin_read_activity(beentry, before_changecount) \
+ do { \
+ (before_changecount) = (beentry)->st_changecount; \
+ pg_read_barrier(); \
  } while (0)
 
-#define pgstat_save_changecount_after(beentry, save_changecount) \
- do { \
- pg_read_barrier(); \
- save_changecount = beentry->st_changecount; \
+#define pgstat_end_read_activity(beentry, after_changecount) \
+ do { \
+ pg_read_barrier(); \
+ (after_changecount) = (beentry)->st_changecount; \
  } while (0)
 
+#define pgstat_read_activity_complete(before_changecount, after_changecount) \
+ ((before_changecount) == (after_changecount) && \
+ ((before_changecount) & 1) == 0)
+
+
 /* ----------
  * LocalPgBackendStatus
  *
Reply | Threaded
Open this post in threaded view
|

Re: Query on pg_stat_activity table got stuck

Tom Lane-2
I wrote:
> This patch is against HEAD --- I've not looked at how much adjustment
> it'll need for the back branches, but I'm sure there's some.

I've back-patched this now.  If you want to test the patch, the v10-branch
version is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3d113136bbaa86fbf1edde7aaf70ba06a6166b7

                        regards, tom lane