shared-memory based stats collector

classic Classic list List threaded Threaded
76 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Tomas Vondra-4
On 11/27/18 9:59 AM, Kyotaro HORIGUCHI wrote:

> ...
>
> v10-0001-sequential-scan-for-dshash.patch
> v10-0002-Add-conditional-lock-feature-to-dshash.patch
>   fixed.
> v10-0003-Make-archiver-process-an-auxiliary-process.patch
>   fixed.
> v10-0004-Shared-memory-based-stats-collector.patch
>   updated not to touch guc.
> v10-0005-Remove-the-GUC-stats_temp_directory.patch
>   collected all guc-related changes.
>   updated not to break other programs.
> v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch
>   basebackup.c requires both bestats.h and pgstat.h
> v10-0007-Documentation-update.patch
>   small change related to 0005.
>

I need to do a more thorough review of part 0006, but these patches
seems quite fine to me. I'd however merge 0007 into the other relevant
parts (it seems like a mix of docs changes for 0004, 0005 and 0006).

Thinking about it a bit more, I'm wondering if we need to keep 0004 and
0005 separate. My understanding is that the stats_temp_directory is used
only from the stats collector, so it probably does not make much sense
to keep it after 0004. We may also keep it separate and then commit both
0004 and 0005 together, of course. What do you think.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Alvaro Herrera-9
On 2018-Nov-28, Tomas Vondra wrote:

> > v10-0004-Shared-memory-based-stats-collector.patch
> >   updated not to touch guc.
> > v10-0005-Remove-the-GUC-stats_temp_directory.patch
> >   collected all guc-related changes.
> >   updated not to break other programs.
> > v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch
> >   basebackup.c requires both bestats.h and pgstat.h
> > v10-0007-Documentation-update.patch
> >   small change related to 0005.
>
> I need to do a more thorough review of part 0006, but these patches
> seems quite fine to me. I'd however merge 0007 into the other relevant
> parts (it seems like a mix of docs changes for 0004, 0005 and 0006).

Looking at 0001 - 0003 it seems OK to keep each as separate commits, but
I suggest to have 0004+0006 be a single commit, mostly because
introducing a bunch of "new" code in 0004 and then moving it over to
bestatus.c in 0006 makes "git blame" doubly painful.  And I think
committing 0005 and not 0007 makes the documentation temporarily buggy,
so I see no reason to think of this as two commits, one being 0004+0006
and the other 0005+0007.  And even those could conceivably be pushed
together instead of as a single patch.  (But be sure to push very early
in your work day, to have plenty of time to deal with any resulting
buildfarm problems.)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Tomas Vondra-4
On 11/29/18 1:18 PM, Alvaro Herrera wrote:

> On 2018-Nov-28, Tomas Vondra wrote:
>
>>> v10-0004-Shared-memory-based-stats-collector.patch
>>>   updated not to touch guc.
>>> v10-0005-Remove-the-GUC-stats_temp_directory.patch
>>>   collected all guc-related changes.
>>>   updated not to break other programs.
>>> v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch
>>>   basebackup.c requires both bestats.h and pgstat.h
>>> v10-0007-Documentation-update.patch
>>>   small change related to 0005.
>>
>> I need to do a more thorough review of part 0006, but these patches
>> seems quite fine to me. I'd however merge 0007 into the other relevant
>> parts (it seems like a mix of docs changes for 0004, 0005 and 0006).
>
> Looking at 0001 - 0003 it seems OK to keep each as separate commits, but
> I suggest to have 0004+0006 be a single commit, mostly because
> introducing a bunch of "new" code in 0004 and then moving it over to
> bestatus.c in 0006 makes "git blame" doubly painful.  And I think
> committing 0005 and not 0007 makes the documentation temporarily buggy,
> so I see no reason to think of this as two commits, one being 0004+0006
> and the other 0005+0007.  And even those could conceivably be pushed
> together instead of as a single patch.  (But be sure to push very early
> in your work day, to have plenty of time to deal with any resulting
> buildfarm problems.)
>
Kyotaro-san, do you agree with committing the patch the way Alvaro
proposed? That is, 0001-0003 as separate commits, and 0004+0006 and
0005+0007 together. The plan seems reasonable to me.

FWIW I see cputube reports some build failures on Windows:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.26736#L3135

If I understand it correctly, it complains about this line in postmaster.c:

extern pgsocket pgStatSock;

which seems to only affect EXEC_BACKEND (including Win32). ISTM we
should get rid of all pgStatSock references, per the attached fix.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgstatsock-fix.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Andres Freund
Hi,

On 2019-01-01 18:39:12 +0100, Tomas Vondra wrote:

> On 11/29/18 1:18 PM, Alvaro Herrera wrote:
> > On 2018-Nov-28, Tomas Vondra wrote:
> >
> >>> v10-0004-Shared-memory-based-stats-collector.patch
> >>>   updated not to touch guc.
> >>> v10-0005-Remove-the-GUC-stats_temp_directory.patch
> >>>   collected all guc-related changes.
> >>>   updated not to break other programs.
> >>> v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch
> >>>   basebackup.c requires both bestats.h and pgstat.h
> >>> v10-0007-Documentation-update.patch
> >>>   small change related to 0005.
> >>
> >> I need to do a more thorough review of part 0006, but these patches
> >> seems quite fine to me. I'd however merge 0007 into the other relevant
> >> parts (it seems like a mix of docs changes for 0004, 0005 and 0006).
> >
> > Looking at 0001 - 0003 it seems OK to keep each as separate commits, but
> > I suggest to have 0004+0006 be a single commit, mostly because
> > introducing a bunch of "new" code in 0004 and then moving it over to
> > bestatus.c in 0006 makes "git blame" doubly painful.  And I think
> > committing 0005 and not 0007 makes the documentation temporarily buggy,
> > so I see no reason to think of this as two commits, one being 0004+0006
> > and the other 0005+0007.  And even those could conceivably be pushed
> > together instead of as a single patch.  (But be sure to push very early
> > in your work day, to have plenty of time to deal with any resulting
> > buildfarm problems.)
> >
>
> Kyotaro-san, do you agree with committing the patch the way Alvaro
> proposed? That is, 0001-0003 as separate commits, and 0004+0006 and
> 0005+0007 together. The plan seems reasonable to me.

Do you guys think these patches are ready already? I'm a bit doubtful, and
failures here could have quite wide-ranging symptoms.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Tomas Vondra-4


On 1/1/19 7:03 PM, Andres Freund wrote:

> Hi,
>
> On 2019-01-01 18:39:12 +0100, Tomas Vondra wrote:
>> On 11/29/18 1:18 PM, Alvaro Herrera wrote:
>>> On 2018-Nov-28, Tomas Vondra wrote:
>>>
>>>>> v10-0004-Shared-memory-based-stats-collector.patch
>>>>>   updated not to touch guc.
>>>>> v10-0005-Remove-the-GUC-stats_temp_directory.patch
>>>>>   collected all guc-related changes.
>>>>>   updated not to break other programs.
>>>>> v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch
>>>>>   basebackup.c requires both bestats.h and pgstat.h
>>>>> v10-0007-Documentation-update.patch
>>>>>   small change related to 0005.
>>>>
>>>> I need to do a more thorough review of part 0006, but these patches
>>>> seems quite fine to me. I'd however merge 0007 into the other relevant
>>>> parts (it seems like a mix of docs changes for 0004, 0005 and 0006).
>>>
>>> Looking at 0001 - 0003 it seems OK to keep each as separate commits, but
>>> I suggest to have 0004+0006 be a single commit, mostly because
>>> introducing a bunch of "new" code in 0004 and then moving it over to
>>> bestatus.c in 0006 makes "git blame" doubly painful.  And I think
>>> committing 0005 and not 0007 makes the documentation temporarily buggy,
>>> so I see no reason to think of this as two commits, one being 0004+0006
>>> and the other 0005+0007.  And even those could conceivably be pushed
>>> together instead of as a single patch.  (But be sure to push very early
>>> in your work day, to have plenty of time to deal with any resulting
>>> buildfarm problems.)
>>>
>>
>> Kyotaro-san, do you agree with committing the patch the way Alvaro
>> proposed? That is, 0001-0003 as separate commits, and 0004+0006 and
>> 0005+0007 together. The plan seems reasonable to me.
>
> Do you guys think these patches are ready already? I'm a bit doubtful, and
> failures here could have quite wide-ranging symptoms.
>

I agree it's a sensitive part of the code, so additional reviews would
be welcome of course. I've done as much review and testing as possible,
and overall it seems in a fairly good shape. Do you have any particular
concerns / ideas what to look for?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Alvaro Herrera-9
On 2019-Jan-01, Tomas Vondra wrote:

> I agree it's a sensitive part of the code, so additional reviews would
> be welcome of course. I've done as much review and testing as possible,
> and overall it seems in a fairly good shape. Do you have any particular
> concerns / ideas what to look for?

I haven't reviewed this patch thoroughly.

Shall we do a triage run over the complete commitfest to determine the
highest priority items that we should put extra effort into reviewing?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Tomas Vondra-4
In reply to this post by Tomas Vondra-4

Hi,

The patch needs rebasing, as it got broken by 285d8e1205, and there's
some other minor bitrot.

On 11/27/18 4:40 PM, Tomas Vondra wrote:

> On 11/27/18 9:59 AM, Kyotaro HORIGUCHI wrote:
>>>
>>> ...>>
>>> For the main workload there's pretty much no difference, but for
>>> selects from the stats catalogs there's ~20% drop in throughput.
>>> In absolute numbers this means drop from ~670tps to ~550tps. I
>>> haven't investigated this, but I suppose this is due to dshash
>>> seqscan being more expensive than reading the data from file.
>>
>> Thanks for finding that. The three seqscan loops in
>> pgstat_vacuum_stat cannot take such a long time, I think. I'll
>> investigate it.
>>
>
> OK. I'm not sure this is related to pgstat_vacuum_stat - the
> slowdown happens while querying the catalogs, so why would that
> trigger vacuum of the stats? I may be missing something, of course.
>
> FWIW, the "query statistics" test simply does this:
>
>   SELECT * FROM pg_stat_all_tables;
>   SELECT * FROM pg_stat_all_indexes;
>   SELECT * FROM pg_stat_user_indexes;
>   SELECT * FROM pg_stat_user_tables;
>   SELECT * FROM pg_stat_sys_tables;
>   SELECT * FROM pg_stat_sys_indexes;
>
> and the slowdown happened even it was running on it's own (nothing
> else running on the instance). Which mostly rules out concurrency
> issues with the hash table locking etc.
>

Did you have time to investigate the slowdown?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Kyotaro HORIGUCHI-2
Thank you very much for reviewing this and sorry for the absense.

At Sun, 20 Jan 2019 18:13:04 +0100, Tomas Vondra <[hidden email]> wrote in <[hidden email]>
>
> Hi,
>
> The patch needs rebasing, as it got broken by 285d8e1205, and there's
> some other minor bitrot.


The most affected part was 0006 because of file splitting, but
actually only the follwing four (actually three) commits
affected.

42e2a58071 Fix typos in documentation and for one wait event
97c39498e5 Update copyright for 2019
578b229718 Remove WITH OIDS support, change oid catalog column visibility.
(125f551c8b Leave SIGTTIN/SIGTTOU signal handling alone in postmaster child processes.)

The last one is not relevant because stats collector is no longer
a process.

This contains the EXEC_BACKEND related bug pointed by

https://www.postgresql.org/message-id/854d6d91-f2f3-e391-f0fc-064db51b391e@...

> On 11/27/18 4:40 PM, Tomas Vondra wrote:
> > On 11/27/18 9:59 AM, Kyotaro HORIGUCHI wrote:
> >>>
> >>> ...>>
> >>> For the main workload there's pretty much no difference, but for
> >>> selects from the stats catalogs there's ~20% drop in throughput.
> >>> In absolute numbers this means drop from ~670tps to ~550tps. I
> >>> haven't investigated this, but I suppose this is due to dshash
> >>> seqscan being more expensive than reading the data from file.
> >>
> >> Thanks for finding that. The three seqscan loops in
> >> pgstat_vacuum_stat cannot take such a long time, I think. I'll
> >> investigate it.
> >>
> >
> > OK. I'm not sure this is related to pgstat_vacuum_stat - the
> > slowdown happens while querying the catalogs, so why would that
> > trigger vacuum of the stats? I may be missing something, of course.
> >
> > FWIW, the "query statistics" test simply does this:
> >
> >   SELECT * FROM pg_stat_all_tables;
> >   SELECT * FROM pg_stat_all_indexes;
> >   SELECT * FROM pg_stat_user_indexes;
> >   SELECT * FROM pg_stat_user_tables;
> >   SELECT * FROM pg_stat_sys_tables;
> >   SELECT * FROM pg_stat_sys_indexes;
> >
> > and the slowdown happened even it was running on it's own (nothing
> > else running on the instance). Which mostly rules out concurrency
> > issues with the hash table locking etc.
> >
>
> Did you have time to investigate the slowdown?
It seems to me that the slowdown comes from local caching in
snapshot_statentry in several ways.

It searches local hash (HTAB), then shared hash (dshash) if not
found and copies the found entry into local hash (action A). *If*
the second reference in a transaction comes, HTAB returns the
result (action B). But it mostly takes action A in frequent-short
transactions. It can be reduced to the update interval of shared
stats, but it would be shorter if many backends runs.

Another bottle neck found in pgstat_fetch_stat_tabentry. It calls
pgstat_fetch_stat_dbentry() too often. It can be largely reduced.

A quick (and dirty) fix of the aboves reduced the slowdown
roughly by half. (59tps(master)->48tps(current)->54tps(the fix))

I'll reconsider the referer side of the stats.

I didn't merge the suggested two pairs of commits. I'll do that
after adressing the slowdown issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From 7149e93d7b41af0c7ce1cddc847a9bb7bc31b1e7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Fri, 29 Jun 2018 16:41:04 +0900
Subject: [PATCH 1/7] sequential scan for dshash

Add sequential scan feature to dshash.
---
 src/backend/lib/dshash.c | 188 ++++++++++++++++++++++++++++++++++++++++++++++-
 src/include/lib/dshash.h |  23 +++++-
 2 files changed, 206 insertions(+), 5 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index f095196fb6..d1908a6137 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -112,6 +112,7 @@ struct dshash_table
  size_t size_log2; /* log2(number of buckets) */
  bool find_locked; /* Is any partition lock held by 'find'? */
  bool find_exclusively_locked; /* ... exclusively? */
+ bool seqscan_running;/* now under sequential scan */
 };
 
 /* Given a pointer to an item, find the entry (user data) it holds. */
@@ -127,6 +128,10 @@ struct dshash_table
 #define NUM_SPLITS(size_log2) \
  (size_log2 - DSHASH_NUM_PARTITIONS_LOG2)
 
+/* How many buckets are there in a given size? */
+#define NUM_BUCKETS(size_log2) \
+ (((size_t) 1) << (size_log2))
+
 /* How many buckets are there in each partition at a given size? */
 #define BUCKETS_PER_PARTITION(size_log2) \
  (((size_t) 1) << NUM_SPLITS(size_log2))
@@ -153,6 +158,10 @@ struct dshash_table
 #define BUCKET_INDEX_FOR_PARTITION(partition, size_log2) \
  ((partition) << NUM_SPLITS(size_log2))
 
+/* Choose partition based on bucket index. */
+#define PARTITION_FOR_BUCKET_INDEX(bucket_idx, size_log2) \
+ ((bucket_idx) >> NUM_SPLITS(size_log2))
+
 /* The head of the active bucket for a given hash value (lvalue). */
 #define BUCKET_FOR_HASH(hash_table, hash) \
  (hash_table->buckets[ \
@@ -228,6 +237,7 @@ dshash_create(dsa_area *area, const dshash_parameters *params, void *arg)
 
  hash_table->find_locked = false;
  hash_table->find_exclusively_locked = false;
+ hash_table->seqscan_running = false;
 
  /*
  * Set up the initial array of buckets.  Our initial size is the same as
@@ -279,6 +289,7 @@ dshash_attach(dsa_area *area, const dshash_parameters *params,
  hash_table->control = dsa_get_address(area, control);
  hash_table->find_locked = false;
  hash_table->find_exclusively_locked = false;
+ hash_table->seqscan_running = false;
  Assert(hash_table->control->magic == DSHASH_MAGIC);
 
  /*
@@ -324,7 +335,7 @@ dshash_destroy(dshash_table *hash_table)
  ensure_valid_bucket_pointers(hash_table);
 
  /* Free all the entries. */
- size = ((size_t) 1) << hash_table->size_log2;
+ size = NUM_BUCKETS(hash_table->size_log2);
  for (i = 0; i < size; ++i)
  {
  dsa_pointer item_pointer = hash_table->buckets[i];
@@ -549,9 +560,14 @@ dshash_delete_entry(dshash_table *hash_table, void *entry)
  LW_EXCLUSIVE));
 
  delete_item(hash_table, item);
- hash_table->find_locked = false;
- hash_table->find_exclusively_locked = false;
- LWLockRelease(PARTITION_LOCK(hash_table, partition));
+
+ /* We need to keep partition lock while sequential scan */
+ if (!hash_table->seqscan_running)
+ {
+ hash_table->find_locked = false;
+ hash_table->find_exclusively_locked = false;
+ LWLockRelease(PARTITION_LOCK(hash_table, partition));
+ }
 }
 
 /*
@@ -568,6 +584,8 @@ dshash_release_lock(dshash_table *hash_table, void *entry)
  Assert(LWLockHeldByMeInMode(PARTITION_LOCK(hash_table, partition_index),
  hash_table->find_exclusively_locked
  ? LW_EXCLUSIVE : LW_SHARED));
+ /* lock is under control of sequential scan */
+ Assert(!hash_table->seqscan_running);
 
  hash_table->find_locked = false;
  hash_table->find_exclusively_locked = false;
@@ -592,6 +610,168 @@ dshash_memhash(const void *v, size_t size, void *arg)
  return tag_hash(v, size);
 }
 
+/*
+ * dshash_seq_init/_next/_term
+ *           Sequentially scan trhough dshash table and return all the
+ *           elements one by one, return NULL when no more.
+ *
+ * dshash_seq_term should be called if and only if the scan is abandoned
+ * before completion; if dshash_seq_next returns NULL then it has already done
+ * the end-of-scan cleanup.
+ *
+ * On returning element, it is locked as is the case with dshash_find.
+ * However, the caller must not release the lock. The lock is released as
+ * necessary in continued scan.
+ *
+ * As opposed to the equivalent for dynanash, the caller is not supposed to
+ * delete the returned element before continuing the scan.
+ *
+ * If consistent is set for dshash_seq_init, the whole hash table is
+ * non-exclusively locked. Otherwise a part of the hash table is locked in the
+ * same mode (partition lock).
+ */
+void
+dshash_seq_init(dshash_seq_status *status, dshash_table *hash_table,
+ bool consistent, bool exclusive)
+{
+ /* allowed at most one scan at once */
+ Assert(!hash_table->seqscan_running);
+
+ status->hash_table = hash_table;
+ status->curbucket = 0;
+ status->nbuckets = 0;
+ status->curitem = NULL;
+ status->pnextitem = InvalidDsaPointer;
+ status->curpartition = -1;
+ status->consistent = consistent;
+ status->exclusive = exclusive;
+ hash_table->seqscan_running = true;
+
+ /*
+ * Protect all partitions from modification if the caller wants a
+ * consistent result.
+ */
+ if (consistent)
+ {
+ int i;
+
+ for (i = 0; i < DSHASH_NUM_PARTITIONS; ++i)
+ {
+ Assert(!LWLockHeldByMe(PARTITION_LOCK(hash_table, i)));
+
+ LWLockAcquire(PARTITION_LOCK(hash_table, i),
+  exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ }
+ ensure_valid_bucket_pointers(hash_table);
+ }
+}
+
+void *
+dshash_seq_next(dshash_seq_status *status)
+{
+ dsa_pointer next_item_pointer;
+
+ Assert(status->hash_table->seqscan_running);
+ if (status->curitem == NULL)
+ {
+ int partition;
+
+ Assert (status->curbucket == 0);
+ Assert(!status->hash_table->find_locked);
+
+ /* first shot. grab the first item. */
+ if (!status->consistent)
+ {
+ partition =
+ PARTITION_FOR_BUCKET_INDEX(status->curbucket,
+   status->hash_table->size_log2);
+ LWLockAcquire(PARTITION_LOCK(status->hash_table, partition),
+  status->exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ status->curpartition = partition;
+
+ /* resize doesn't happen from now until seq scan ends */
+ status->nbuckets =
+ NUM_BUCKETS(status->hash_table->control->size_log2);
+ ensure_valid_bucket_pointers(status->hash_table);
+ }
+
+ next_item_pointer = status->hash_table->buckets[status->curbucket];
+ }
+ else
+ next_item_pointer = status->pnextitem;
+
+ /* Move to the next bucket if we finished the current bucket */
+ while (!DsaPointerIsValid(next_item_pointer))
+ {
+ if (++status->curbucket >= status->nbuckets)
+ {
+ /* all buckets have been scanned. finsih. */
+ dshash_seq_term(status);
+ return NULL;
+ }
+
+ /* Also move parititon lock if needed */
+ if (!status->consistent)
+ {
+ int next_partition =
+ PARTITION_FOR_BUCKET_INDEX(status->curbucket,
+   status->hash_table->size_log2);
+
+ /* Move lock along with partition for the bucket */
+ if (status->curpartition != next_partition)
+ {
+ /*
+ * Take lock on the next partition then release the current,
+ * not in the reverse order. This is required to avoid
+ * resizing from happening during a sequential scan. Locks are
+ * taken in partition order so no dead lock happen with other
+ * seq scans or resizing.
+ */
+ LWLockAcquire(PARTITION_LOCK(status->hash_table,
+ next_partition),
+  status->exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ LWLockRelease(PARTITION_LOCK(status->hash_table,
+ status->curpartition));
+ status->curpartition = next_partition;
+ }
+ }
+
+ next_item_pointer = status->hash_table->buckets[status->curbucket];
+ }
+
+ status->curitem =
+ dsa_get_address(status->hash_table->area, next_item_pointer);
+ status->hash_table->find_locked = true;
+ status->hash_table->find_exclusively_locked = status->exclusive;
+
+ /*
+ * This item can be deleted by the caller. Store the next item for the
+ * next iteration for the occasion.
+ */
+ status->pnextitem = status->curitem->next;
+
+ return ENTRY_FROM_ITEM(status->curitem);
+}
+
+void
+dshash_seq_term(dshash_seq_status *status)
+{
+ Assert(status->hash_table->seqscan_running);
+ status->hash_table->find_locked = false;
+ status->hash_table->find_exclusively_locked = false;
+ status->hash_table->seqscan_running = false;
+
+ if (status->consistent)
+ {
+ int i;
+
+ for (i = 0; i < DSHASH_NUM_PARTITIONS; ++i)
+ LWLockRelease(PARTITION_LOCK(status->hash_table, i));
+ }
+ else if (status->curpartition >= 0)
+ LWLockRelease(PARTITION_LOCK(status->hash_table, status->curpartition));
+}
+
 /*
  * Print debugging information about the internal state of the hash table to
  * stderr.  The caller must hold no partition locks.
diff --git a/src/include/lib/dshash.h b/src/include/lib/dshash.h
index e5dfd57f0a..b80f3af995 100644
--- a/src/include/lib/dshash.h
+++ b/src/include/lib/dshash.h
@@ -59,6 +59,23 @@ typedef struct dshash_parameters
 struct dshash_table_item;
 typedef struct dshash_table_item dshash_table_item;
 
+/*
+ * Sequential scan state of dshash. The detail is exposed since the storage
+ * size should be known to users but it should be considered as an opaque
+ * type by callers.
+ */
+typedef struct dshash_seq_status
+{
+ dshash_table   *hash_table;
+ int curbucket;
+ int nbuckets;
+ dshash_table_item  *curitem;
+ dsa_pointer pnextitem;
+ int curpartition;
+ bool consistent;
+ bool exclusive;
+} dshash_seq_status;
+
 /* Creating, sharing and destroying from hash tables. */
 extern dshash_table *dshash_create(dsa_area *area,
   const dshash_parameters *params,
@@ -70,7 +87,6 @@ extern dshash_table *dshash_attach(dsa_area *area,
 extern void dshash_detach(dshash_table *hash_table);
 extern dshash_table_handle dshash_get_hash_table_handle(dshash_table *hash_table);
 extern void dshash_destroy(dshash_table *hash_table);
-
 /* Finding, creating, deleting entries. */
 extern void *dshash_find(dshash_table *hash_table,
  const void *key, bool exclusive);
@@ -80,6 +96,11 @@ extern bool dshash_delete_key(dshash_table *hash_table, const void *key);
 extern void dshash_delete_entry(dshash_table *hash_table, void *entry);
 extern void dshash_release_lock(dshash_table *hash_table, void *entry);
 
+/* seq scan support */
+extern void dshash_seq_init(dshash_seq_status *status, dshash_table *hash_table,
+ bool consistent, bool exclusive);
+extern void *dshash_seq_next(dshash_seq_status *status);
+extern void dshash_seq_term(dshash_seq_status *status);
 /* Convenience hash and compare functions wrapping memcmp and tag_hash. */
 extern int dshash_memcmp(const void *a, const void *b, size_t size, void *arg);
 extern dshash_hash dshash_memhash(const void *v, size_t size, void *arg);
--
2.16.3


From 8dafcc8293b856f42bc3a68fa792ea139fd8d0cf Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Thu, 27 Sep 2018 11:15:19 +0900
Subject: [PATCH 2/7] Add conditional lock feature to dshash

Dshash currently waits for lock unconditionally. This commit adds new
interfaces for dshash_find and dshash_find_or_insert. The new
interfaces have an extra parameter "nowait" taht commands not to wait
for lock.
---
 src/backend/lib/dshash.c | 69 +++++++++++++++++++++++++++++++++++++++++++-----
 src/include/lib/dshash.h |  6 ++++-
 2 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index d1908a6137..db8d6899af 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -394,19 +394,48 @@ dshash_get_hash_table_handle(dshash_table *hash_table)
  */
 void *
 dshash_find(dshash_table *hash_table, const void *key, bool exclusive)
+{
+ return dshash_find_extended(hash_table, key, exclusive, false, NULL);
+}
+
+/*
+ * Addition to dshash_find, returns immediately when nowait is true and lock
+ * was not acquired. Lock status is set to *lock_failed if any.
+ */
+void *
+dshash_find_extended(dshash_table *hash_table, const void *key,
+ bool exclusive, bool nowait, bool *lock_acquired)
 {
  dshash_hash hash;
  size_t partition;
  dshash_table_item *item;
 
+ /* allowing !nowait returning the result is just not sensible */
+ Assert(nowait || !lock_acquired);
+
  hash = hash_key(hash_table, key);
  partition = PARTITION_FOR_HASH(hash);
 
  Assert(hash_table->control->magic == DSHASH_MAGIC);
  Assert(!hash_table->find_locked);
 
- LWLockAcquire(PARTITION_LOCK(hash_table, partition),
-  exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ if (nowait)
+ {
+ if (!LWLockConditionalAcquire(PARTITION_LOCK(hash_table, partition),
+  exclusive ? LW_EXCLUSIVE : LW_SHARED))
+ {
+ if (lock_acquired)
+ *lock_acquired = false;
+ return NULL;
+ }
+ }
+ else
+ LWLockAcquire(PARTITION_LOCK(hash_table, partition),
+  exclusive ? LW_EXCLUSIVE : LW_SHARED);
+
+ if (lock_acquired)
+ *lock_acquired = true;
+
  ensure_valid_bucket_pointers(hash_table);
 
  /* Search the active bucket. */
@@ -441,6 +470,22 @@ void *
 dshash_find_or_insert(dshash_table *hash_table,
   const void *key,
   bool *found)
+{
+ return dshash_find_or_insert_extended(hash_table, key, found, false);
+}
+
+/*
+ * Addition to dshash_find_or_insert, returns NULL if nowait is true and lock
+ * was not acquired.
+ *
+ * Notes above dshash_find_extended() regarding locking and error handling
+ * equally apply here.
+ */
+void *
+dshash_find_or_insert_extended(dshash_table *hash_table,
+   const void *key,
+   bool *found,
+   bool nowait)
 {
  dshash_hash hash;
  size_t partition_index;
@@ -455,8 +500,16 @@ dshash_find_or_insert(dshash_table *hash_table,
  Assert(!hash_table->find_locked);
 
 restart:
- LWLockAcquire(PARTITION_LOCK(hash_table, partition_index),
-  LW_EXCLUSIVE);
+ if (nowait)
+ {
+ if (!LWLockConditionalAcquire(
+ PARTITION_LOCK(hash_table, partition_index),
+ LW_EXCLUSIVE))
+ return NULL;
+ }
+ else
+ LWLockAcquire(PARTITION_LOCK(hash_table, partition_index),
+  LW_EXCLUSIVE);
  ensure_valid_bucket_pointers(hash_table);
 
  /* Search the active bucket. */
@@ -626,9 +679,11 @@ dshash_memhash(const void *v, size_t size, void *arg)
  * As opposed to the equivalent for dynanash, the caller is not supposed to
  * delete the returned element before continuing the scan.
  *
- * If consistent is set for dshash_seq_init, the whole hash table is
- * non-exclusively locked. Otherwise a part of the hash table is locked in the
- * same mode (partition lock).
+ * If consistent is set for dshash_seq_init, the all hash table
+ * partitions are locked in the requested mode (as determined by the
+ * exclusive flag), and the locks are held until the end of the scan.
+ * Otherwise the partition locks are acquired and released as needed
+ * during the scan (up to two partitions may be locked at the same time).
  */
 void
 dshash_seq_init(dshash_seq_status *status, dshash_table *hash_table,
diff --git a/src/include/lib/dshash.h b/src/include/lib/dshash.h
index b80f3af995..fe1d4d75c5 100644
--- a/src/include/lib/dshash.h
+++ b/src/include/lib/dshash.h
@@ -90,8 +90,12 @@ extern void dshash_destroy(dshash_table *hash_table);
 /* Finding, creating, deleting entries. */
 extern void *dshash_find(dshash_table *hash_table,
  const void *key, bool exclusive);
+extern void *dshash_find_extended(dshash_table *hash_table, const void *key,
+ bool exclusive, bool nowait, bool *lock_acquired);
 extern void *dshash_find_or_insert(dshash_table *hash_table,
-  const void *key, bool *found);
+ const void *key, bool *found);
+extern void *dshash_find_or_insert_extended(dshash_table *hash_table,
+ const void *key, bool *found, bool nowait);
 extern bool dshash_delete_key(dshash_table *hash_table, const void *key);
 extern void dshash_delete_entry(dshash_table *hash_table, void *entry);
 extern void dshash_release_lock(dshash_table *hash_table, void *entry);
--
2.16.3


From 90522c1de96ac84ba2ad7cc1ada47c7bb9f95e10 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Wed, 7 Nov 2018 16:53:49 +0900
Subject: [PATCH 3/7] Make archiver process an auxiliary process

This is a preliminary patch for shared-memory based stats collector.
Archiver process must be a auxiliary process since it uses shared
memory after stats data wes moved onto shared-memory. Make the process
an auxiliary process in order to make it work.
---
 src/backend/bootstrap/bootstrap.c   |  8 +++
 src/backend/postmaster/pgarch.c     | 98 +++++++++----------------------------
 src/backend/postmaster/pgstat.c     |  6 +++
 src/backend/postmaster/postmaster.c | 35 +++++++++----
 src/include/miscadmin.h             |  2 +
 src/include/pgstat.h                |  1 +
 src/include/postmaster/pgarch.h     |  4 +-
 7 files changed, 67 insertions(+), 87 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 63bb134949..df926d8dea 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -329,6 +329,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
  case BgWriterProcess:
  statmsg = pgstat_get_backend_desc(B_BG_WRITER);
  break;
+ case ArchiverProcess:
+ statmsg = pgstat_get_backend_desc(B_ARCHIVER);
+ break;
  case CheckpointerProcess:
  statmsg = pgstat_get_backend_desc(B_CHECKPOINTER);
  break;
@@ -456,6 +459,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
  BackgroundWriterMain();
  proc_exit(1); /* should never return */
 
+ case ArchiverProcess:
+ /* don't set signals, bgwriter has its own agenda */
+ PgArchiverMain();
+ proc_exit(1); /* should never return */
+
  case CheckpointerProcess:
  /* don't set signals, checkpointer has its own agenda */
  CheckpointerMain();
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index f84f882c4c..4342ebdab4 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -77,7 +77,6 @@
  * Local data
  * ----------
  */
-static time_t last_pgarch_start_time;
 static time_t last_sigterm_time = 0;
 
 /*
@@ -96,7 +95,6 @@ static volatile sig_atomic_t ready_to_stop = false;
 static pid_t pgarch_forkexec(void);
 #endif
 
-NON_EXEC_STATIC void PgArchiverMain(int argc, char *argv[]) pg_attribute_noreturn();
 static void pgarch_exit(SIGNAL_ARGS);
 static void ArchSigHupHandler(SIGNAL_ARGS);
 static void ArchSigTermHandler(SIGNAL_ARGS);
@@ -114,75 +112,6 @@ static void pgarch_archiveDone(char *xlog);
  * ------------------------------------------------------------
  */
 
-/*
- * pgarch_start
- *
- * Called from postmaster at startup or after an existing archiver
- * died.  Attempt to fire up a fresh archiver process.
- *
- * Returns PID of child process, or 0 if fail.
- *
- * Note: if fail, we will be called again from the postmaster main loop.
- */
-int
-pgarch_start(void)
-{
- time_t curtime;
- pid_t pgArchPid;
-
- /*
- * Do nothing if no archiver needed
- */
- if (!XLogArchivingActive())
- return 0;
-
- /*
- * Do nothing if too soon since last archiver start.  This is a safety
- * valve to protect against continuous respawn attempts if the archiver is
- * dying immediately at launch. Note that since we will be re-called from
- * the postmaster main loop, we will get another chance later.
- */
- curtime = time(NULL);
- if ((unsigned int) (curtime - last_pgarch_start_time) <
- (unsigned int) PGARCH_RESTART_INTERVAL)
- return 0;
- last_pgarch_start_time = curtime;
-
-#ifdef EXEC_BACKEND
- switch ((pgArchPid = pgarch_forkexec()))
-#else
- switch ((pgArchPid = fork_process()))
-#endif
- {
- case -1:
- ereport(LOG,
- (errmsg("could not fork archiver: %m")));
- return 0;
-
-#ifndef EXEC_BACKEND
- case 0:
- /* in postmaster child ... */
- InitPostmasterChild();
-
- /* Close the postmaster's sockets */
- ClosePostmasterPorts(false);
-
- /* Drop our connection to postmaster's shared memory, as well */
- dsm_detach_all();
- PGSharedMemoryDetach();
-
- PgArchiverMain(0, NULL);
- break;
-#endif
-
- default:
- return (int) pgArchPid;
- }
-
- /* shouldn't get here */
- return 0;
-}
-
 /* ------------------------------------------------------------
  * Local functions called by archiver follow
  * ------------------------------------------------------------
@@ -222,8 +151,8 @@ pgarch_forkexec(void)
  * The argc/argv parameters are valid only in EXEC_BACKEND case.  However,
  * since we don't use 'em, it hardly matters...
  */
-NON_EXEC_STATIC void
-PgArchiverMain(int argc, char *argv[])
+void
+PgArchiverMain(void)
 {
  /*
  * Ignore all signals usually bound to some action in the postmaster,
@@ -255,8 +184,27 @@ PgArchiverMain(int argc, char *argv[])
 static void
 pgarch_exit(SIGNAL_ARGS)
 {
- /* SIGQUIT means curl up and die ... */
- exit(1);
+ PG_SETMASK(&BlockSig);
+
+ /*
+ * We DO NOT want to run proc_exit() callbacks -- we're here because
+ * shared memory may be corrupted, so we don't want to try to clean up our
+ * transaction.  Just nail the windows shut and get out of town.  Now that
+ * there's an atexit callback to prevent third-party code from breaking
+ * things by calling exit() directly, we have to reset the callbacks
+ * explicitly to make this work as intended.
+ */
+ on_exit_reset();
+
+ /*
+ * Note we do exit(2) not exit(0).  This is to force the postmaster into a
+ * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
+ * backend.  This is necessary precisely because we don't clean up our
+ * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
+ * should ensure the postmaster sees this as a crash, too, but no harm in
+ * being doubly sure.)
+ */
+ exit(2);
 }
 
 /* SIGHUP signal handler for archiver process */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 13da412c59..d1fe052abf 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2857,6 +2857,9 @@ pgstat_bestart(void)
  case BgWriterProcess:
  beentry->st_backendType = B_BG_WRITER;
  break;
+ case ArchiverProcess:
+ beentry->st_backendType = B_ARCHIVER;
+ break;
  case CheckpointerProcess:
  beentry->st_backendType = B_CHECKPOINTER;
  break;
@@ -4119,6 +4122,9 @@ pgstat_get_backend_desc(BackendType backendType)
  case B_BG_WRITER:
  backendDesc = "background writer";
  break;
+ case B_ARCHIVER:
+ backendDesc = "archiver";
+ break;
  case B_CHECKPOINTER:
  backendDesc = "checkpointer";
  break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3052bbbc21..65eab02b3e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -146,7 +146,8 @@
 #define BACKEND_TYPE_AUTOVAC 0x0002 /* autovacuum worker process */
 #define BACKEND_TYPE_WALSND 0x0004 /* walsender process */
 #define BACKEND_TYPE_BGWORKER 0x0008 /* bgworker process */
-#define BACKEND_TYPE_ALL 0x000F /* OR of all the above */
+#define BACKEND_TYPE_ARCHIVER 0x0010 /* archiver process */
+#define BACKEND_TYPE_ALL 0x001F /* OR of all the above */
 
 #define BACKEND_TYPE_WORKER (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
 
@@ -539,6 +540,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 
 #define StartupDataBase() StartChildProcess(StartupProcess)
 #define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
+#define StartArchiver() StartChildProcess(ArchiverProcess)
 #define StartCheckpointer() StartChildProcess(CheckpointerProcess)
 #define StartWalWriter() StartChildProcess(WalWriterProcess)
 #define StartWalReceiver() StartChildProcess(WalReceiverProcess)
@@ -1757,7 +1759,7 @@ ServerLoop(void)
 
  /* If we have lost the archiver, try to start a new one. */
  if (PgArchPID == 0 && PgArchStartupAllowed())
- PgArchPID = pgarch_start();
+ PgArchPID = StartArchiver();
 
  /* If we need to signal the autovacuum launcher, do so now */
  if (avlauncher_needs_signal)
@@ -2920,7 +2922,7 @@ reaper(SIGNAL_ARGS)
  if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
  AutoVacPID = StartAutoVacLauncher();
  if (PgArchStartupAllowed() && PgArchPID == 0)
- PgArchPID = pgarch_start();
+ PgArchPID = StartArchiver();
  if (PgStatPID == 0)
  PgStatPID = pgstat_start();
 
@@ -3065,10 +3067,8 @@ reaper(SIGNAL_ARGS)
  {
  PgArchPID = 0;
  if (!EXIT_STATUS_0(exitstatus))
- LogChildExit(LOG, _("archiver process"),
- pid, exitstatus);
- if (PgArchStartupAllowed())
- PgArchPID = pgarch_start();
+ HandleChildCrash(pid, exitstatus,
+ _("archiver process"));
  continue;
  }
 
@@ -3314,7 +3314,7 @@ CleanupBackend(int pid,
 
 /*
  * HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
- * walwriter, autovacuum, or background worker.
+ * walwriter, autovacuum, archiver or background worker.
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
@@ -3519,6 +3519,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
  signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT));
  }
 
+ /* Take care of the archiver too */
+ if (pid == PgArchPID)
+ PgArchPID = 0;
+ else if (PgArchPID != 0 && take_action)
+ {
+ ereport(DEBUG2,
+ (errmsg_internal("sending %s to process %d",
+ (SendStop ? "SIGSTOP" : "SIGQUIT"),
+ (int) PgArchPID)));
+ signal_child(PgArchPID, (SendStop ? SIGSTOP : SIGQUIT));
+ }
+
  /*
  * Force a power-cycle of the pgarch process too.  (This isn't absolutely
  * necessary, but it seems like a good idea for robustness, and it
@@ -3795,6 +3807,7 @@ PostmasterStateMachine(void)
  Assert(CheckpointerPID == 0);
  Assert(WalWriterPID == 0);
  Assert(AutoVacPID == 0);
+ Assert(PgArchPID == 0);
  /* syslogger is not considered here */
  pmState = PM_NO_CHILDREN;
  }
@@ -5064,7 +5077,7 @@ sigusr1_handler(SIGNAL_ARGS)
  */
  Assert(PgArchPID == 0);
  if (XLogArchivingAlways())
- PgArchPID = pgarch_start();
+ PgArchPID = StartArchiver();
 
  /*
  * If we aren't planning to enter hot standby mode later, treat
@@ -5342,6 +5355,10 @@ StartChildProcess(AuxProcType type)
  ereport(LOG,
  (errmsg("could not fork background writer process: %m")));
  break;
+ case ArchiverProcess:
+ ereport(LOG,
+ (errmsg("could not fork archiver process: %m")));
+ break;
  case CheckpointerProcess:
  ereport(LOG,
  (errmsg("could not fork checkpointer process: %m")));
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index c9e35003a5..63a7653457 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -399,6 +399,7 @@ typedef enum
  BootstrapProcess,
  StartupProcess,
  BgWriterProcess,
+ ArchiverProcess,
  CheckpointerProcess,
  WalWriterProcess,
  WalReceiverProcess,
@@ -411,6 +412,7 @@ extern AuxProcType MyAuxProcType;
 #define AmBootstrapProcess() (MyAuxProcType == BootstrapProcess)
 #define AmStartupProcess() (MyAuxProcType == StartupProcess)
 #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess)
+#define AmArchiverProcess() (MyAuxProcType == ArchiverProcess)
 #define AmCheckpointerProcess() (MyAuxProcType == CheckpointerProcess)
 #define AmWalWriterProcess() (MyAuxProcType == WalWriterProcess)
 #define AmWalReceiverProcess() (MyAuxProcType == WalReceiverProcess)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 313ca5f3c3..f299d1d601 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -706,6 +706,7 @@ typedef enum BackendType
  B_BACKEND,
  B_BG_WORKER,
  B_BG_WRITER,
+ B_ARCHIVER,
  B_CHECKPOINTER,
  B_STARTUP,
  B_WAL_RECEIVER,
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 2474eac26a..88f16863d4 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,8 +32,6 @@
  */
 extern int pgarch_start(void);
 
-#ifdef EXEC_BACKEND
-extern void PgArchiverMain(int argc, char *argv[]) pg_attribute_noreturn();
-#endif
+extern void PgArchiverMain(void) pg_attribute_noreturn();
 
 #endif /* _PGARCH_H */
--
2.16.3


From c9a252e86ef04b9b59ebdb19c7c3dbabf3422e97 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Mon, 12 Nov 2018 17:26:33 +0900
Subject: [PATCH 4/7] Shared-memory based stats collector

This replaces the means to share server statistics numbers from file
to dynamic shared memory. Every backend directly reads and writres to
the stats tables. Stats collector process is removed.  Updates of
shared stats happens with the intervals not shorter than 500ms and not
longer than 1s. If the shared memory data is busy and a backend cannot
obtain lock immediately, usually the differences are stashed into
"pending stats" on local memory and merged with the number at the next
chance.
---
 src/backend/access/transam/xlog.c            |    4 +-
 src/backend/postmaster/autovacuum.c          |   59 +-
 src/backend/postmaster/bgwriter.c            |    4 +-
 src/backend/postmaster/checkpointer.c        |   24 +-
 src/backend/postmaster/pgarch.c              |    4 +-
 src/backend/postmaster/pgstat.c              | 4201 +++++++++++---------------
 src/backend/postmaster/postmaster.c          |   85 +-
 src/backend/replication/logical/tablesync.c  |    9 +-
 src/backend/replication/logical/worker.c     |    4 +-
 src/backend/storage/buffer/bufmgr.c          |    8 +-
 src/backend/storage/ipc/dsm.c                |   24 +-
 src/backend/storage/ipc/ipci.c               |    6 +
 src/backend/storage/lmgr/lwlock.c            |    3 +
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/tcop/postgres.c                  |   27 +-
 src/backend/utils/adt/pgstatfuncs.c          |   50 +-
 src/backend/utils/init/globals.c             |    1 +
 src/backend/utils/init/postinit.c            |   11 +
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |    2 +-
 src/include/miscadmin.h                      |    2 +-
 src/include/pgstat.h                         |  437 +--
 src/include/storage/dsm.h                    |    3 +
 src/include/storage/lwlock.h                 |    3 +
 src/include/utils/timeout.h                  |    1 +
 src/test/modules/worker_spi/worker_spi.c     |    2 +-
 25 files changed, 1932 insertions(+), 3043 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ab7d804f0..9e45581d89 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8416,9 +8416,9 @@ LogCheckpointEnd(bool restartpoint)
  &sync_secs, &sync_usecs);
 
  /* Accumulate checkpoint timing summary data, in milliseconds. */
- BgWriterStats.m_checkpoint_write_time +=
+ BgWriterStats.checkpoint_write_time +=
  write_secs * 1000 + write_usecs / 1000;
- BgWriterStats.m_checkpoint_sync_time +=
+ BgWriterStats.checkpoint_sync_time +=
  sync_secs * 1000 + sync_usecs / 1000;
 
  /*
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 4cf67873b1..a69ea230fb 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -969,7 +969,7 @@ rebuild_database_list(Oid newdb)
  PgStat_StatDBEntry *entry;
 
  /* only consider this database if it has a pgstat entry */
- entry = pgstat_fetch_stat_dbentry(newdb);
+ entry = pgstat_fetch_stat_dbentry(newdb, true);
  if (entry != NULL)
  {
  /* we assume it isn't found because the hash was just created */
@@ -978,6 +978,7 @@ rebuild_database_list(Oid newdb)
  /* hash_search already filled in the key */
  db->adl_score = score++;
  /* next_worker is filled in later */
+ pfree(entry);
  }
  }
 
@@ -993,7 +994,7 @@ rebuild_database_list(Oid newdb)
  * skip databases with no stat entries -- in particular, this gets rid
  * of dropped databases
  */
- entry = pgstat_fetch_stat_dbentry(avdb->adl_datid);
+ entry = pgstat_fetch_stat_dbentry(avdb->adl_datid, true);
  if (entry == NULL)
  continue;
 
@@ -1005,6 +1006,7 @@ rebuild_database_list(Oid newdb)
  db->adl_score = score++;
  /* next_worker is filled in later */
  }
+ pfree(entry);
  }
 
  /* finally, insert all qualifying databases not previously inserted */
@@ -1017,7 +1019,7 @@ rebuild_database_list(Oid newdb)
  PgStat_StatDBEntry *entry;
 
  /* only consider databases with a pgstat entry */
- entry = pgstat_fetch_stat_dbentry(avdb->adw_datid);
+ entry = pgstat_fetch_stat_dbentry(avdb->adw_datid, true);
  if (entry == NULL)
  continue;
 
@@ -1029,6 +1031,7 @@ rebuild_database_list(Oid newdb)
  db->adl_score = score++;
  /* next_worker is filled in later */
  }
+ pfree(entry);
  }
  nelems = score;
 
@@ -1227,7 +1230,7 @@ do_start_worker(void)
  continue; /* ignore not-at-risk DBs */
 
  /* Find pgstat entry if any */
- tmp->adw_entry = pgstat_fetch_stat_dbentry(tmp->adw_datid);
+ tmp->adw_entry = pgstat_fetch_stat_dbentry(tmp->adw_datid, true);
 
  /*
  * Skip a database with no pgstat entry; it means it hasn't seen any
@@ -1265,16 +1268,22 @@ do_start_worker(void)
  break;
  }
  }
- if (skipit)
- continue;
+ if (!skipit)
+ {
+ /* Remember the db with oldest autovac time. */
+ if (avdb == NULL ||
+ tmp->adw_entry->last_autovac_time <
+ avdb->adw_entry->last_autovac_time)
+ {
+ if (avdb)
+ pfree(avdb->adw_entry);
+ avdb = tmp;
+ }
+ }
 
- /*
- * Remember the db with oldest autovac time.  (If we are here, both
- * tmp->entry and db->entry must be non-null.)
- */
- if (avdb == NULL ||
- tmp->adw_entry->last_autovac_time < avdb->adw_entry->last_autovac_time)
- avdb = tmp;
+ /* Immediately free it if not used */
+ if(avdb != tmp)
+ pfree(tmp->adw_entry);
  }
 
  /* Found a database -- process it */
@@ -1963,7 +1972,7 @@ do_autovacuum(void)
  * may be NULL if we couldn't find an entry (only happens if we are
  * forcing a vacuum for anti-wrap purposes).
  */
- dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId, true);
 
  /* Start a transaction so our commands have one to play into. */
  StartTransactionCommand();
@@ -2013,7 +2022,7 @@ do_autovacuum(void)
  MemoryContextSwitchTo(AutovacMemCxt);
 
  /* The database hash where pgstat keeps shared relations */
- shared = pgstat_fetch_stat_dbentry(InvalidOid);
+ shared = pgstat_fetch_stat_dbentry(InvalidOid, true);
 
  classRel = heap_open(RelationRelationId, AccessShareLock);
 
@@ -2099,6 +2108,8 @@ do_autovacuum(void)
  relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
+ if (tabentry)
+ pfree(tabentry);
 
  /* Relations that need work are added to table_oids */
  if (dovacuum || doanalyze)
@@ -2178,10 +2189,11 @@ do_autovacuum(void)
  /* Fetch the pgstat entry for this table */
  tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
  shared, dbentry);
-
  relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
+ if (tabentry)
+ pfree(tabentry);
 
  /* ignore analyze for toast tables */
  if (dovacuum)
@@ -2750,12 +2762,10 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
  if (isshared)
  {
  if (PointerIsValid(shared))
- tabentry = hash_search(shared->tables, &relid,
-   HASH_FIND, NULL);
+ tabentry = backend_get_tab_entry(shared, relid, true);
  }
  else if (PointerIsValid(dbentry))
- tabentry = hash_search(dbentry->tables, &relid,
-   HASH_FIND, NULL);
+ tabentry = backend_get_tab_entry(dbentry, relid, true);
 
  return tabentry;
 }
@@ -2787,8 +2797,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  /* use fresh stats */
  autovac_refresh_stats();
 
- shared = pgstat_fetch_stat_dbentry(InvalidOid);
- dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+ shared = pgstat_fetch_stat_dbentry(InvalidOid, true);
+ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId, true);
 
  /* fetch the relation's relcache entry */
  classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2819,6 +2829,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
+ if (tabentry)
+ pfree(tabentry);
 
  /* ignore ANALYZE for toast tables */
  if (classForm->relkind == RELKIND_TOASTVALUE)
@@ -2909,7 +2921,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  }
 
  heap_freetuple(classTup);
-
+ pfree(shared);
+ pfree(dbentry);
  return tab;
 }
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index e6b6c549de..3fb6badea8 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -267,9 +267,9 @@ BackgroundWriterMain(void)
  can_hibernate = BgBufferSync(&wb_context);
 
  /*
- * Send off activity statistics to the stats collector
+ * Update activity statistics.
  */
- pgstat_send_bgwriter();
+ pgstat_update_bgwriter();
 
  if (FirstCallSinceLastCheckpoint())
  {
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fe96c41359..d58193774e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -371,7 +371,7 @@ CheckpointerMain(void)
  {
  checkpoint_requested = false;
  do_checkpoint = true;
- BgWriterStats.m_requested_checkpoints++;
+ BgWriterStats.requested_checkpoints++;
  }
  if (shutdown_requested)
  {
@@ -397,7 +397,7 @@ CheckpointerMain(void)
  if (elapsed_secs >= CheckPointTimeout)
  {
  if (!do_checkpoint)
- BgWriterStats.m_timed_checkpoints++;
+ BgWriterStats.timed_checkpoints++;
  do_checkpoint = true;
  flags |= CHECKPOINT_CAUSE_TIME;
  }
@@ -515,13 +515,13 @@ CheckpointerMain(void)
  CheckArchiveTimeout();
 
  /*
- * Send off activity statistics to the stats collector.  (The reason
- * why we re-use bgwriter-related code for this is that the bgwriter
- * and checkpointer used to be just one process.  It's probably not
- * worth the trouble to split the stats support into two independent
- * stats message types.)
+ * Update activity statistics.  (The reason why we re-use
+ * bgwriter-related code for this is that the bgwriter and
+ * checkpointer used to be just one process.  It's probably not worth
+ * the trouble to split the stats support into two independent
+ * functions.)
  */
- pgstat_send_bgwriter();
+ pgstat_update_bgwriter();
 
  /*
  * Sleep until we are signaled or it's time for another checkpoint or
@@ -682,9 +682,9 @@ CheckpointWriteDelay(int flags, double progress)
  CheckArchiveTimeout();
 
  /*
- * Report interim activity statistics to the stats collector.
+ * Register interim activity statistics.
  */
- pgstat_send_bgwriter();
+ pgstat_update_bgwriter();
 
  /*
  * This sleep used to be connected to bgwriter_delay, typically 200ms.
@@ -1284,8 +1284,8 @@ AbsorbFsyncRequests(void)
  LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
  /* Transfer stats counts into pending pgstats message */
- BgWriterStats.m_buf_written_backend += CheckpointerShmem->num_backend_writes;
- BgWriterStats.m_buf_fsync_backend += CheckpointerShmem->num_backend_fsync;
+ BgWriterStats.buf_written_backend += CheckpointerShmem->num_backend_writes;
+ BgWriterStats.buf_fsync_backend += CheckpointerShmem->num_backend_fsync;
 
  CheckpointerShmem->num_backend_writes = 0;
  CheckpointerShmem->num_backend_fsync = 0;
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 4342ebdab4..18bd8296b8 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -468,7 +468,7 @@ pgarch_ArchiverCopyLoop(void)
  * Tell the collector about the WAL file that we successfully
  * archived
  */
- pgstat_send_archiver(xlog, false);
+ pgstat_update_archiver(xlog, false);
 
  break; /* out of inner retry loop */
  }
@@ -478,7 +478,7 @@ pgarch_ArchiverCopyLoop(void)
  * Tell the collector about the WAL file that we failed to
  * archive
  */
- pgstat_send_archiver(xlog, true);
+ pgstat_update_archiver(xlog, true);
 
  if (++failures >= NUM_ARCHIVE_RETRIES)
  {
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d1fe052abf..a97fbae7a8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1,15 +1,10 @@
 /* ----------
  * pgstat.c
  *
- * All the statistics collector stuff hacked up in one big, ugly file.
+ * Statistics collector facility.
  *
- * TODO: - Separate collector, postmaster and backend stuff
- *  into different files.
- *
- * - Add some automatic call for pgstat vacuuming.
- *
- * - Add a pgstat config column to pg_database, so this
- *  entire thing can be enabled/disabled on a per db basis.
+ * Statistics data is stored in dynamic shared memory. Every backends
+ * updates and read it individually.
  *
  * Copyright (c) 2001-2019, PostgreSQL Global Development Group
  *
@@ -19,92 +14,59 @@
 #include "postgres.h"
 
 #include <unistd.h>
-#include <fcntl.h>
-#include <sys/param.h>
-#include <sys/time.h>
-#include <sys/socket.h>
-#include <netdb.h>
-#include <netinet/in.h>
-#include <arpa/inet.h>
-#include <signal.h>
-#include <time.h>
-#ifdef HAVE_SYS_SELECT_H
-#include <sys/select.h>
-#endif
 
 #include "pgstat.h"
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
-#include "access/transam.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_proc.h"
-#include "common/ip.h"
 #include "libpq/libpq.h"
-#include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "postmaster/autovacuum.h"
-#include "postmaster/fork_process.h"
-#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
-#include "storage/backendid.h"
-#include "storage/dsm.h"
-#include "storage/fd.h"
 #include "storage/ipc.h"
-#include "storage/latch.h"
 #include "storage/lmgr.h"
-#include "storage/pg_shmem.h"
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
-#include "utils/ps_status.h"
-#include "utils/rel.h"
 #include "utils/snapmgr.h"
-#include "utils/timestamp.h"
-#include "utils/tqual.h"
-
 
 /* ----------
  * Timer definitions.
  * ----------
  */
-#define PGSTAT_STAT_INTERVAL 500 /* Minimum time between stats file
- * updates; in milliseconds. */
-
-#define PGSTAT_RETRY_DELAY 10 /* How long to wait between checks for a
- * new file; in milliseconds. */
-
-#define PGSTAT_MAX_WAIT_TIME 10000 /* Maximum time to wait for a stats
- * file update; in milliseconds. */
-
-#define PGSTAT_INQ_INTERVAL 640 /* How often to ping the collector for a
- * new file; in milliseconds. */
-
-#define PGSTAT_RESTART_INTERVAL 60 /* How often to attempt to restart a
- * failed statistics collector; in
- * seconds. */
-
-#define PGSTAT_POLL_LOOP_COUNT (PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY)
-#define PGSTAT_INQ_LOOP_COUNT (PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY)
-
-/* Minimum receive buffer size for the collector's socket. */
-#define PGSTAT_MIN_RCVBUF (100 * 1024)
+#define PGSTAT_STAT_MIN_INTERVAL 500 /* Minimum time between stats data
+ * updates; in milliseconds. */
 
+#define PGSTAT_STAT_MAX_INTERVAL   1000 /* Maximum time between stats data
+ * updates; in milliseconds. */
 
 /* ----------
  * The initial size hints for the hash tables used in the collector.
  * ----------
  */
-#define PGSTAT_DB_HASH_SIZE 16
 #define PGSTAT_TAB_HASH_SIZE 512
 #define PGSTAT_FUNCTION_HASH_SIZE 512
 
+/*
+ * Operation mode of pgstat_get_db_entry.
+ */
+#define PGSTAT_FETCH_SHARED 0
+#define PGSTAT_FETCH_EXCLUSIVE 1
+#define PGSTAT_FETCH_NOWAIT 2
+
+typedef enum
+{
+ PGSTAT_ENTRY_NOT_FOUND,
+ PGSTAT_ENTRY_FOUND,
+ PGSTAT_ENTRY_LOCK_FAILED
+} pg_stat_table_result_status;
 
 /* ----------
  * Total number of backends including auxiliary
@@ -132,27 +94,69 @@ int pgstat_track_activity_query_size = 1024;
  * ----------
  */
 char   *pgstat_stat_directory = NULL;
+
+/* No longer used, but will be removed with GUC */
 char   *pgstat_stat_filename = NULL;
 char   *pgstat_stat_tmpname = NULL;
 
+/* Shared stats bootstrap infomation */
+typedef struct StatsShmemStruct {
+ dsa_handle stats_dsa_handle;
+ dshash_table_handle db_stats_handle;
+ dsa_pointer global_stats;
+ dsa_pointer archiver_stats;
+} StatsShmemStruct;
+
+
 /*
  * BgWriter global statistics counters (unused in other processes).
  * Stored directly in a stats message structure so it can be sent
  * without needing to copy things around.  We assume this inits to zeroes.
  */
-PgStat_MsgBgWriter BgWriterStats;
+PgStat_BgWriter BgWriterStats;
 
 /* ----------
  * Local data
  * ----------
  */
-NON_EXEC_STATIC pgsocket pgStatSock = PGINVALID_SOCKET;
+static StatsShmemStruct * StatsShmem = NULL;
+static dsa_area *area = NULL;
+static dshash_table *db_stats;
+static HTAB *snapshot_db_stats;
+static MemoryContext stats_cxt;
 
-static struct sockaddr_storage pgStatAddr;
+/*
+ *  report withholding facility.
+ *
+ *  some report items are withholded if required lock is not acquired
+ *  immediately.
+ */
+static bool pgstat_pending_recoveryconflict = false;
+static bool pgstat_pending_deadlock = false;
+static bool pgstat_pending_tempfile = false;
 
-static time_t last_pgstat_start_time;
-
-static bool pgStatRunningInCollector = false;
+/* dshash parameter for each type of table */
+static const dshash_parameters dsh_dbparams = {
+ sizeof(Oid),
+ sizeof(PgStat_StatDBEntry),
+ dshash_memcmp,
+ dshash_memhash,
+ LWTRANCHE_STATS_DB
+};
+static const dshash_parameters dsh_tblparams = {
+ sizeof(Oid),
+ sizeof(PgStat_StatTabEntry),
+ dshash_memcmp,
+ dshash_memhash,
+ LWTRANCHE_STATS_FUNC_TABLE
+};
+static const dshash_parameters dsh_funcparams = {
+ sizeof(Oid),
+ sizeof(PgStat_StatFuncEntry),
+ dshash_memcmp,
+ dshash_memhash,
+ LWTRANCHE_STATS_FUNC_TABLE
+};
 
 /*
  * Structures in which backends store per-table info that's waiting to be
@@ -189,18 +193,14 @@ typedef struct TabStatHashEntry
  * Hash table for O(1) t_id -> tsa_entry lookup
  */
 static HTAB *pgStatTabHash = NULL;
+static HTAB *pgStatPendingTabHash = NULL;
 
 /*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
  */
 static HTAB *pgStatFunctions = NULL;
-
-/*
- * Indicates if backend has some function stats that it hasn't yet
- * sent to the collector.
- */
-static bool have_function_stats = false;
+static HTAB *pgStatPendingFunctions = NULL;
 
 /*
  * Tuple insertion/deletion counts for an open transaction can't be propagated
@@ -237,6 +237,12 @@ typedef struct TwoPhasePgStatRecord
  bool t_truncated; /* was the relation truncated? */
 } TwoPhasePgStatRecord;
 
+typedef struct
+{
+ dshash_table *tabhash;
+ PgStat_StatDBEntry *dbentry;
+} pgstat_apply_tabstat_context;
+
 /*
  * Info about current "snapshot" of stats file
  */
@@ -250,23 +256,15 @@ static LocalPgBackendStatus *localBackendStatusTable = NULL;
 static int localNumBackends = 0;
 
 /*
- * Cluster wide statistics, kept in the stats collector.
- * Contains statistics that are not collected per database
- * or per table.
+ * Cluster wide statistics.
+ * Contains statistics that are not collected per database or per table.
+ * shared_* are the statistics maintained by pgstats and snapshot_* are the
+ * snapshots taken on backends.
  */
-static PgStat_ArchiverStats archiverStats;
-static PgStat_GlobalStats globalStats;
-
-/*
- * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
- * it means to write only the shared-catalog stats ("DB 0"); otherwise, we
- * will write both that DB's data and the shared stats.
- */
-static List *pending_write_requests = NIL;
-
-/* Signal handler flags */
-static volatile bool need_exit = false;
-static volatile bool got_SIGHUP = false;
+static PgStat_ArchiverStats *shared_archiverStats;
+static PgStat_ArchiverStats *snapshot_archiverStats;
+static PgStat_GlobalStats *shared_globalStats;
+static PgStat_GlobalStats *snapshot_globalStats;
 
 /*
  * Total time charged to functions so far in the current backend.
@@ -280,32 +278,41 @@ static instr_time total_func_time;
  * Local function forward declarations
  * ----------
  */
-#ifdef EXEC_BACKEND
-static pid_t pgstat_forkexec(void);
-#endif
-
-NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
-static void pgstat_exit(SIGNAL_ARGS);
+/* functions used in backends */
 static void pgstat_beshutdown_hook(int code, Datum arg);
-static void pgstat_sighup_handler(SIGNAL_ARGS);
 
-static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
-static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
- Oid tableoid, bool create);
-static void pgstat_write_statsfiles(bool permanent, bool allDbs);
-static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
-static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
-static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
-static void backend_read_statsfile(void);
+static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, int op,
+ pg_stat_table_result_status *status);
+static PgStat_StatTabEntry *pgstat_get_tab_entry(dshash_table *table, Oid tableoid, bool create);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry);
+static void pgstat_read_db_statsfile(Oid databaseid, dshash_table *tabhash, dshash_table *funchash);
+
+/* functions used in backends */
+static bool backend_snapshot_global_stats(void);
+static PgStat_StatFuncEntry *backend_get_func_etnry(PgStat_StatDBEntry *dbent, Oid funcid, bool oneshot);
 static void pgstat_read_current_status(void);
 
-static bool pgstat_write_statsfile_needed(void);
-static bool pgstat_db_requested(Oid databaseid);
+static void pgstat_postmaster_shutdown(int code, Datum arg);
+static void pgstat_apply_pending_tabstats(bool shared, bool force,
+   pgstat_apply_tabstat_context *cxt);
+static bool pgstat_apply_tabstat(pgstat_apply_tabstat_context *cxt,
+ PgStat_TableStatus *entry, bool nowait);
+static void pgstat_merge_tabentry(PgStat_TableStatus *deststat,
+  PgStat_TableStatus *srcstat,
+  bool init);
+static void pgstat_update_funcstats(bool force, PgStat_StatDBEntry *dbentry);
+static void pgstat_reset_all_counters(void);
+static void pgstat_cleanup_recovery_conflict(PgStat_StatDBEntry *dbentry);
+static void pgstat_cleanup_deadlock(PgStat_StatDBEntry *dbentry);
+static void pgstat_cleanup_tempfile(PgStat_StatDBEntry *dbentry);
+
+static inline void pgstat_merge_backendstats_to_funcentry(
+ PgStat_StatFuncEntry *dest, PgStat_BackendFunctionEntry *src, bool init);
+static inline void pgstat_merge_funcentry(
+ PgStat_StatFuncEntry *dest, PgStat_StatFuncEntry *src, bool init);
 
-static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
-static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
-
+static void reset_dbentry_counters(PgStat_StatDBEntry *dbentry);
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
 
 static void pgstat_setup_memcxt(void);
@@ -316,320 +323,16 @@ static const char *pgstat_get_wait_ipc(WaitEventIPC w);
 static const char *pgstat_get_wait_timeout(WaitEventTimeout w);
 static const char *pgstat_get_wait_io(WaitEventIO w);
 
-static void pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype);
-static void pgstat_send(void *msg, int len);
-
-static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len);
-static void pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len);
-static void pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len);
-static void pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len);
-static void pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len);
-static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len);
-static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
-static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
-static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
-static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
-static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
-static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
-static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
-static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
-static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
-static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
-static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
+static bool pgstat_update_tabentry(dshash_table *tabhash,
+   PgStat_TableStatus *stat, bool nowait);
+static void pgstat_update_dbentry(PgStat_StatDBEntry *dbentry,
+  PgStat_TableStatus *stat);
 
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
  */
 
-/* ----------
- * pgstat_init() -
- *
- * Called from postmaster at startup. Create the resources required
- * by the statistics collector process.  If unable to do so, do not
- * fail --- better to let the postmaster start with stats collection
- * disabled.
- * ----------
- */
-void
-pgstat_init(void)
-{
- ACCEPT_TYPE_ARG3 alen;
- struct addrinfo *addrs = NULL,
-   *addr,
- hints;
- int ret;
- fd_set rset;
- struct timeval tv;
- char test_byte;
- int sel_res;
- int tries = 0;
-
-#define TESTBYTEVAL ((char) 199)
-
- /*
- * This static assertion verifies that we didn't mess up the calculations
- * involved in selecting maximum payload sizes for our UDP messages.
- * Because the only consequence of overrunning PGSTAT_MAX_MSG_SIZE would
- * be silent performance loss from fragmentation, it seems worth having a
- * compile-time cross-check that we didn't.
- */
- StaticAssertStmt(sizeof(PgStat_Msg) <= PGSTAT_MAX_MSG_SIZE,
- "maximum stats message size exceeds PGSTAT_MAX_MSG_SIZE");
-
- /*
- * Create the UDP socket for sending and receiving statistic messages
- */
- hints.ai_flags = AI_PASSIVE;
- hints.ai_family = AF_UNSPEC;
- hints.ai_socktype = SOCK_DGRAM;
- hints.ai_protocol = 0;
- hints.ai_addrlen = 0;
- hints.ai_addr = NULL;
- hints.ai_canonname = NULL;
- hints.ai_next = NULL;
- ret = pg_getaddrinfo_all("localhost", NULL, &hints, &addrs);
- if (ret || !addrs)
- {
- ereport(LOG,
- (errmsg("could not resolve \"localhost\": %s",
- gai_strerror(ret))));
- goto startup_failed;
- }
-
- /*
- * On some platforms, pg_getaddrinfo_all() may return multiple addresses
- * only one of which will actually work (eg, both IPv6 and IPv4 addresses
- * when kernel will reject IPv6).  Worse, the failure may occur at the
- * bind() or perhaps even connect() stage.  So we must loop through the
- * results till we find a working combination. We will generate LOG
- * messages, but no error, for bogus combinations.
- */
- for (addr = addrs; addr; addr = addr->ai_next)
- {
-#ifdef HAVE_UNIX_SOCKETS
- /* Ignore AF_UNIX sockets, if any are returned. */
- if (addr->ai_family == AF_UNIX)
- continue;
-#endif
-
- if (++tries > 1)
- ereport(LOG,
- (errmsg("trying another address for the statistics collector")));
-
- /*
- * Create the socket.
- */
- if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) == PGINVALID_SOCKET)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not create socket for statistics collector: %m")));
- continue;
- }
-
- /*
- * Bind it to a kernel assigned port on localhost and get the assigned
- * port via getsockname().
- */
- if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not bind socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- alen = sizeof(pgStatAddr);
- if (getsockname(pgStatSock, (struct sockaddr *) &pgStatAddr, &alen) < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not get address of socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /*
- * Connect the socket to its own address.  This saves a few cycles by
- * not having to respecify the target address on every send. This also
- * provides a kernel-level check that only packets from this same
- * address will be received.
- */
- if (connect(pgStatSock, (struct sockaddr *) &pgStatAddr, alen) < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not connect socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /*
- * Try to send and receive a one-byte test message on the socket. This
- * is to catch situations where the socket can be created but will not
- * actually pass data (for instance, because kernel packet filtering
- * rules prevent it).
- */
- test_byte = TESTBYTEVAL;
-
-retry1:
- if (send(pgStatSock, &test_byte, 1, 0) != 1)
- {
- if (errno == EINTR)
- goto retry1; /* if interrupted, just retry */
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not send test message on socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /*
- * There could possibly be a little delay before the message can be
- * received.  We arbitrarily allow up to half a second before deciding
- * it's broken.
- */
- for (;;) /* need a loop to handle EINTR */
- {
- FD_ZERO(&rset);
- FD_SET(pgStatSock, &rset);
-
- tv.tv_sec = 0;
- tv.tv_usec = 500000;
- sel_res = select(pgStatSock + 1, &rset, NULL, NULL, &tv);
- if (sel_res >= 0 || errno != EINTR)
- break;
- }
- if (sel_res < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("select() failed in statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
- if (sel_res == 0 || !FD_ISSET(pgStatSock, &rset))
- {
- /*
- * This is the case we actually think is likely, so take pains to
- * give a specific message for it.
- *
- * errno will not be set meaningfully here, so don't use it.
- */
- ereport(LOG,
- (errcode(ERRCODE_CONNECTION_FAILURE),
- errmsg("test message did not get through on socket for statistics collector")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- test_byte++; /* just make sure variable is changed */
-
-retry2:
- if (recv(pgStatSock, &test_byte, 1, 0) != 1)
- {
- if (errno == EINTR)
- goto retry2; /* if interrupted, just retry */
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not receive test message on socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- if (test_byte != TESTBYTEVAL) /* strictly paranoia ... */
- {
- ereport(LOG,
- (errcode(ERRCODE_INTERNAL_ERROR),
- errmsg("incorrect test message transmission on socket for statistics collector")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /* If we get here, we have a working socket */
- break;
- }
-
- /* Did we find a working address? */
- if (!addr || pgStatSock == PGINVALID_SOCKET)
- goto startup_failed;
-
- /*
- * Set the socket to non-blocking IO.  This ensures that if the collector
- * falls behind, statistics messages will be discarded; backends won't
- * block waiting to send messages to the collector.
- */
- if (!pg_set_noblock(pgStatSock))
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not set statistics collector socket to nonblocking mode: %m")));
- goto startup_failed;
- }
-
- /*
- * Try to ensure that the socket's receive buffer is at least
- * PGSTAT_MIN_RCVBUF bytes, so that it won't easily overflow and lose
- * data.  Use of UDP protocol means that we are willing to lose data under
- * heavy load, but we don't want it to happen just because of ridiculously
- * small default buffer sizes (such as 8KB on older Windows versions).
- */
- {
- int old_rcvbuf;
- int new_rcvbuf;
- ACCEPT_TYPE_ARG3 rcvbufsize = sizeof(old_rcvbuf);
-
- if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
-   (char *) &old_rcvbuf, &rcvbufsize) < 0)
- {
- elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
- /* if we can't get existing size, always try to set it */
- old_rcvbuf = 0;
- }
-
- new_rcvbuf = PGSTAT_MIN_RCVBUF;
- if (old_rcvbuf < new_rcvbuf)
- {
- if (setsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
-   (char *) &new_rcvbuf, sizeof(new_rcvbuf)) < 0)
- elog(LOG, "setsockopt(SO_RCVBUF) failed: %m");
- }
- }
-
- pg_freeaddrinfo_all(hints.ai_family, addrs);
-
- return;
-
-startup_failed:
- ereport(LOG,
- (errmsg("disabling statistics collector for lack of working socket")));
-
- if (addrs)
- pg_freeaddrinfo_all(hints.ai_family, addrs);
-
- if (pgStatSock != PGINVALID_SOCKET)
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
-
- /*
- * Adjust GUC variables to suppress useless activity, and for debugging
- * purposes (seeing track_counts off is a clue that we failed here). We
- * use PGC_S_OVERRIDE because there is no point in trying to turn it back
- * on from postgresql.conf without a restart.
- */
- SetConfigOption("track_counts", "off", PGC_INTERNAL, PGC_S_OVERRIDE);
-}
-
 /*
  * subroutine for pgstat_reset_all
  */
@@ -678,119 +381,54 @@ pgstat_reset_remove_files(const char *directory)
 /*
  * pgstat_reset_all() -
  *
- * Remove the stats files.  This is currently used only if WAL
- * recovery is needed after a crash.
+ * Remove the stats files and on-memory counters.  This is currently used only
+ * if WAL recovery is needed after a crash.
  */
 void
 pgstat_reset_all(void)
 {
- pgstat_reset_remove_files(pgstat_stat_directory);
  pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
+ pgstat_reset_all_counters();
 }
 
-#ifdef EXEC_BACKEND
-
-/*
- * pgstat_forkexec() -
+/* ----------
+ * pgstat_create_shared_stats() -
  *
- * Format up the arglist for, then fork and exec, statistics collector process
+ * create shared stats memory
+ * ----------
  */
-static pid_t
-pgstat_forkexec(void)
+static void
+pgstat_create_shared_stats(void)
 {
- char   *av[10];
- int ac = 0;
+ MemoryContext oldcontext;
 
- av[ac++] = "postgres";
- av[ac++] = "--forkcol";
- av[ac++] = NULL; /* filled in by postmaster_forkexec */
+ Assert(StatsShmem->stats_dsa_handle == DSM_HANDLE_INVALID);
 
- av[ac] = NULL;
- Assert(ac < lengthof(av));
+ /* lives for the lifetime of the process */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create(LWTRANCHE_STATS_DSA);
+ dsa_pin_mapping(area);
 
- return postmaster_forkexec(ac, av);
-}
-#endif /* EXEC_BACKEND */
+ db_stats = dshash_create(area, &dsh_dbparams, 0);
 
+ /* create shared area and write bootstrap information */
+ StatsShmem->stats_dsa_handle = dsa_get_handle(area);
+ StatsShmem->global_stats =
+ dsa_allocate0(area, sizeof(PgStat_GlobalStats));
+ StatsShmem->archiver_stats =
+ dsa_allocate0(area, sizeof(PgStat_ArchiverStats));
+ StatsShmem->db_stats_handle =
+ dshash_get_hash_table_handle(db_stats);
 
-/*
- * pgstat_start() -
- *
- * Called from postmaster at startup or after an existing collector
- * died.  Attempt to fire up a fresh statistics collector.
- *
- * Returns PID of child process, or 0 if fail.
- *
- * Note: if fail, we will be called again from the postmaster main loop.
- */
-int
-pgstat_start(void)
-{
- time_t curtime;
- pid_t pgStatPid;
-
- /*
- * Check that the socket is there, else pgstat_init failed and we can do
- * nothing useful.
- */
- if (pgStatSock == PGINVALID_SOCKET)
- return 0;
-
- /*
- * Do nothing if too soon since last collector start.  This is a safety
- * valve to protect against continuous respawn attempts if the collector
- * is dying immediately at launch.  Note that since we will be re-called
- * from the postmaster main loop, we will get another chance later.
- */
- curtime = time(NULL);
- if ((unsigned int) (curtime - last_pgstat_start_time) <
- (unsigned int) PGSTAT_RESTART_INTERVAL)
- return 0;
- last_pgstat_start_time = curtime;
-
- /*
- * Okay, fork off the collector.
- */
-#ifdef EXEC_BACKEND
- switch ((pgStatPid = pgstat_forkexec()))
-#else
- switch ((pgStatPid = fork_process()))
-#endif
- {
- case -1:
- ereport(LOG,
- (errmsg("could not fork statistics collector: %m")));
- return 0;
-
-#ifndef EXEC_BACKEND
- case 0:
- /* in postmaster child ... */
- InitPostmasterChild();
-
- /* Close the postmaster's sockets */
- ClosePostmasterPorts(false);
-
- /* Drop our connection to postmaster's shared memory, as well */
- dsm_detach_all();
- PGSharedMemoryDetach();
-
- PgstatCollectorMain(0, NULL);
- break;
-#endif
-
- default:
- return (int) pgStatPid;
- }
-
- /* shouldn't get here */
- return 0;
+ /* connect to the memory */
+ snapshot_db_stats = NULL;
+ shared_globalStats = (PgStat_GlobalStats *)
+ dsa_get_address(area, StatsShmem->global_stats);
+ shared_archiverStats = (PgStat_ArchiverStats *)
+ dsa_get_address(area, StatsShmem->archiver_stats);
+ MemoryContextSwitchTo(oldcontext);
 }
 
-void
-allow_immediate_pgstat_restart(void)
-{
- last_pgstat_start_time = 0;
-}
 
 /* ------------------------------------------------------------
  * Public functions used by backends follow
@@ -802,41 +440,107 @@ allow_immediate_pgstat_restart(void)
  * pgstat_report_stat() -
  *
  * Must be called by processes that performs DML: tcop/postgres.c, logical
- * receiver processes, SPI worker, etc. to send the so far collected
- * per-table and function usage statistics to the collector.  Note that this
- * is called only when not within a transaction, so it is fair to use
- * transaction stop time as an approximation of current time.
- * ----------
+ * receiver processes, SPI worker, etc. to apply the so far collected
+ * per-table and function usage statistics to the shared statistics hashes.
+ *
+ *  This requires taking some locks on the shared statistics hashes and some
+ *  of updates may be withholded on lock failure. Pending updates are
+ *  retried in later call of this function and finally cleaned up by calling
+ *  this function with force = true or PGSTAT_STAT_MAX_INTERVAL milliseconds
+ *  was elapsed since last cleanup. On the other hand updates by regular
+ *  backends happen with the interval not shorter than
+ *  PGSTAT_STAT_MIN_INTERVAL when force = false.
+ *
+ *  Returns time in milliseconds until the next update time.
+ *
+ * Note that this is called only when not within a transaction, so it is fair
+ * to use transaction stop time as an approximation of current time.
+ * ----------
  */
-void
-pgstat_report_stat(bool force)
+long
+pgstat_update_stat(bool force)
 {
  /* we assume this inits to all zeroes: */
- static const PgStat_TableCounts all_zeroes;
  static TimestampTz last_report = 0;
-
+ static TimestampTz oldest_pending = 0;
  TimestampTz now;
- PgStat_MsgTabstat regular_msg;
- PgStat_MsgTabstat shared_msg;
  TabStatusArray *tsa;
- int i;
+ pgstat_apply_tabstat_context cxt;
+ bool other_pending_stats = false;
+ long elapsed;
+ long secs;
+ int usecs;
+
+ if (pgstat_pending_recoveryconflict ||
+ pgstat_pending_deadlock ||
+ pgstat_pending_tempfile ||
+ pgStatPendingFunctions)
+ other_pending_stats = true;
 
  /* Don't expend a clock check if nothing to do */
- if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
- pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
- !have_function_stats)
- return;
+ if (!other_pending_stats && !pgStatPendingTabHash &&
+ (pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
+ pgStatXactCommit == 0 && pgStatXactRollback == 0)
+ return 0;
 
- /*
- * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
- * msec since we last sent one, or the caller wants to force stats out.
- */
  now = GetCurrentTransactionStopTimestamp();
- if (!force &&
- !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
- return;
+
+ if (!force)
+ {
+ /*
+ * Don't update shared stats unless it's been at least
+ * PGSTAT_STAT_MIN_INTERVAL msec since we last updated one.
+ * Returns time to wait in the case.
+ */
+ TimestampDifference(last_report, now, &secs, &usecs);
+ elapsed = secs * 1000 + usecs /1000;
+
+ if(elapsed < PGSTAT_STAT_MIN_INTERVAL)
+ {
+ /* we know we have some statistics */
+ if (oldest_pending == 0)
+ oldest_pending = now;
+
+ return PGSTAT_STAT_MIN_INTERVAL - elapsed;
+ }
+
+
+ /*
+ * Don't keep pending stats for longer than PGSTAT_STAT_MAX_INTERVAL.
+ */
+ if (oldest_pending > 0)
+ {
+ TimestampDifference(oldest_pending, now, &secs, &usecs);
+ elapsed = secs * 1000 + usecs /1000;
+
+ if(elapsed > PGSTAT_STAT_MAX_INTERVAL)
+ force = true;
+ }
+ }
+
  last_report = now;
 
+ /* setup stats update context*/
+ cxt.dbentry = NULL;
+ cxt.tabhash = NULL;
+
+ /* Forecibly update other stats if any. */
+ if (other_pending_stats)
+ {
+ cxt.dbentry =
+ pgstat_get_db_entry(MyDatabaseId, PGSTAT_FETCH_EXCLUSIVE, NULL);
+
+ /* clean up pending statistics if any */
+ if (pgStatPendingFunctions)
+ pgstat_update_funcstats(true, cxt.dbentry);
+ if (pgstat_pending_recoveryconflict)
+ pgstat_cleanup_recovery_conflict(cxt.dbentry);
+ if (pgstat_pending_deadlock)
+ pgstat_cleanup_deadlock(cxt.dbentry);
+ if (pgstat_pending_tempfile)
+ pgstat_cleanup_tempfile(cxt.dbentry);
+ }
+
  /*
  * Destroy pgStatTabHash before we start invalidating PgStat_TableEntry
  * entries it points to.  (Should we fail partway through the loop below,
@@ -849,23 +553,55 @@ pgstat_report_stat(bool force)
  pgStatTabHash = NULL;
 
  /*
- * Scan through the TabStatusArray struct(s) to find tables that actually
- * have counts, and build messages to send.  We have to separate shared
- * relations from regular ones because the databaseid field in the message
- * header has to depend on that.
+ * XX: We cannot lock two dshash entries at once. Since we must keep lock
+ * while tables stats are being updated we have no choice other than
+ * separating jobs for shared table stats and that of egular tables.
+ * Looping over the array twice isapparently ineffcient and more efficient
+ * way is expected.
  */
- regular_msg.m_databaseid = MyDatabaseId;
- shared_msg.m_databaseid = InvalidOid;
- regular_msg.m_nentries = 0;
- shared_msg.m_nentries = 0;
+
+ /* The first call of the followings uses dbentry obtained above if any.*/
+ pgstat_apply_pending_tabstats(false, force, &cxt);
+ pgstat_apply_pending_tabstats(true, force, &cxt);
+
+ /* zero out TableStatus structs after use */
+ for (tsa = pgStatTabList; tsa != NULL; tsa = tsa->tsa_next)
+ {
+ MemSet(tsa->tsa_entries, 0,
+   tsa->tsa_used * sizeof(PgStat_TableStatus));
+ tsa->tsa_used = 0;
+ }
+
+ /* record oldest pending update time */
+ if (pgStatPendingTabHash == NULL)
+ oldest_pending = 0;
+ else if (oldest_pending == 0)
+ oldest_pending = now;
+
+ return 0;
+}
+
+/*
+ * Subroutine for pgstat_update_stat.
+ *
+ * Appies table stats in table status array merging with pending stats if any.
+ * If force is true waits until required locks to be acquired. Elsewise stats
+ * merged stats as pending sats and it will be processed in the next chance.
+ */
+static void
+pgstat_apply_pending_tabstats(bool shared, bool force,
+  pgstat_apply_tabstat_context *cxt)
+{
+ static const PgStat_TableCounts all_zeroes;
+ TabStatusArray *tsa;
+ int i;
 
  for (tsa = pgStatTabList; tsa != NULL; tsa = tsa->tsa_next)
  {
  for (i = 0; i < tsa->tsa_used; i++)
  {
  PgStat_TableStatus *entry = &tsa->tsa_entries[i];
- PgStat_MsgTabstat *this_msg;
- PgStat_TableEntry *this_ent;
+ PgStat_TableStatus *pentry = NULL;
 
  /* Shouldn't have any pending transaction-dependent counts */
  Assert(entry->trans == NULL);
@@ -878,178 +614,440 @@ pgstat_report_stat(bool force)
    sizeof(PgStat_TableCounts)) == 0)
  continue;
 
- /*
- * OK, insert data into the appropriate message, and send if full.
- */
- this_msg = entry->t_shared ? &shared_msg : &regular_msg;
- this_ent = &this_msg->m_entry[this_msg->m_nentries];
- this_ent->t_id = entry->t_id;
- memcpy(&this_ent->t_counts, &entry->t_counts,
-   sizeof(PgStat_TableCounts));
- if (++this_msg->m_nentries >= PGSTAT_NUM_TABENTRIES)
+ /* Skip if this entry is not match the request */
+ if (entry->t_shared != shared)
+ continue;
+
+ /* if pending update exists, it should be applied along with */
+ if (pgStatPendingTabHash != NULL)
  {
- pgstat_send_tabstat(this_msg);
- this_msg->m_nentries = 0;
+ pentry = hash_search(pgStatPendingTabHash,
+ (void *) entry, HASH_FIND, NULL);
+
+ if (pentry)
+ {
+ /* merge new update into pending updates */
+ pgstat_merge_tabentry(pentry, entry, false);
+ entry = pentry;
+ }
+ }
+
+ /* try to apply the merged stats */
+ if (pgstat_apply_tabstat(cxt, entry, !force))
+ {
+ /* succeeded. remove it if it was pending stats */
+ if (pentry && entry != pentry)
+ hash_search(pgStatPendingTabHash,
+ (void *) pentry, HASH_REMOVE, NULL);
+ }
+ else if (!pentry)
+ {
+ /* failed and there was no pending entry, create new one. */
+ bool found;
+
+ if (pgStatPendingTabHash == NULL)
+ {
+ HASHCTL ctl;
+
+ memset(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(PgStat_TableStatus);
+ pgStatPendingTabHash =
+ hash_create("pgstat pending table stats hash",
+ TABSTAT_QUANTUM,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS);
+ }
+
+ pentry = hash_search(pgStatPendingTabHash,
+ (void *) entry, HASH_ENTER, &found);
+ Assert (!found);
+
+ *pentry = *entry;
  }
  }
- /* zero out TableStatus structs after use */
- MemSet(tsa->tsa_entries, 0,
-   tsa->tsa_used * sizeof(PgStat_TableStatus));
- tsa->tsa_used = 0;
+ }
+
+ /* if any pending stats exists, try to clean it up */
+ if (pgStatPendingTabHash != NULL)
+ {
+ HASH_SEQ_STATUS pstat;
+ PgStat_TableStatus *pentry;
+
+ hash_seq_init(&pstat, pgStatPendingTabHash);
+ while((pentry = (PgStat_TableStatus *) hash_seq_search(&pstat)) != NULL)
+ {
+ /* Skip if this entry is not match the request */
+ if (pentry->t_shared != shared)
+ continue;
+
+ /* apply pending entry and remove on success */
+ if (pgstat_apply_tabstat(cxt, pentry, !force))
+ hash_search(pgStatPendingTabHash,
+ (void *) pentry, HASH_REMOVE, NULL);
+ }
+
+ /* destroy the hash if no entry is left */
+ if (hash_get_num_entries(pgStatPendingTabHash) == 0)
+ {
+ hash_destroy(pgStatPendingTabHash);
+ pgStatPendingTabHash = NULL;
+ }
+ }
+
+ if (cxt->tabhash)
+ dshash_detach(cxt->tabhash);
+ if (cxt->dbentry)
+ dshash_release_lock(db_stats, cxt->dbentry);
+ cxt->tabhash = NULL;
+ cxt->dbentry = NULL;
+}
+
+
+/*
+ * pgstat_apply_tabstat: update shared stats entry using given entry
+ *
+ * If nowait is true, just returns false on lock failure.  Dshashes for table
+ * and function stats are kept attached and stored in ctx. The caller must
+ * detach them after use.
+ */
+bool
+pgstat_apply_tabstat(pgstat_apply_tabstat_context *cxt,
+ PgStat_TableStatus *entry, bool nowait)
+{
+ Oid dboid = entry->t_shared ? InvalidOid : MyDatabaseId;
+ int table_mode = PGSTAT_FETCH_EXCLUSIVE;
+ bool updated = false;
+
+ if (nowait)
+ table_mode |= PGSTAT_FETCH_NOWAIT;
+
+ /*
+ * We need to keep lock on dbentries for regular tables to avoid race
+ * condition with drop database. So we hold it in the context variable. We
+ * don't need that for shared tables.
+ */
+ if (!cxt->dbentry)
+ cxt->dbentry = pgstat_get_db_entry(dboid, table_mode, NULL);
+
+ /* we cannot acquire lock, just return */
+ if (!cxt->dbentry)
+ return false;
+
+ /* attach shared stats table if not yet */
+ if (!cxt->tabhash)
+ {
+ /* apply database stats  */
+ if (!entry->t_shared)
+ {
+ /* Update database-wide stats  */
+ cxt->dbentry->n_xact_commit += pgStatXactCommit;
+ cxt->dbentry->n_xact_rollback += pgStatXactRollback;
+ cxt->dbentry->n_block_read_time += pgStatBlockReadTime;
+ cxt->dbentry->n_block_write_time += pgStatBlockWriteTime;
+ pgStatXactCommit = 0;
+ pgStatXactRollback = 0;
+ pgStatBlockReadTime = 0;
+ pgStatBlockWriteTime = 0;
+ }
+
+ cxt->tabhash =
+ dshash_attach(area, &dsh_tblparams, cxt->dbentry->tables, 0);
  }
 
  /*
- * Send partial messages.  Make sure that any pending xact commit/abort
- * gets counted, even if there are no table stats to send.
+ * If we have access to the required data, try update table stats first.
+ * Update database stats only if the first step suceeded.
  */
- if (regular_msg.m_nentries > 0 ||
- pgStatXactCommit > 0 || pgStatXactRollback > 0)
- pgstat_send_tabstat(&regular_msg);
- if (shared_msg.m_nentries > 0)
- pgstat_send_tabstat(&shared_msg);
+ if (pgstat_update_tabentry(cxt->tabhash, entry, nowait))
+ {
+ pgstat_update_dbentry(cxt->dbentry, entry);
+ updated = true;
+ }
 
- /* Now, send function statistics */
- pgstat_send_funcstats();
+ return updated;
 }
 
 /*
- * Subroutine for pgstat_report_stat: finish and send a tabstat message
+ * pgstat_merge_tabentry: subroutine for pgstat_update_stat
+ *
+ * Merge srcstat into deststat. Existing value in deststat is cleard if
+ * init is true.
  */
 static void
-pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg)
+pgstat_merge_tabentry(PgStat_TableStatus *deststat,
+  PgStat_TableStatus *srcstat,
+  bool init)
 {
- int n;
- int len;
+ Assert (deststat != srcstat);
 
- /* It's unlikely we'd get here with no socket, but maybe not impossible */
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- /*
- * Report and reset accumulated xact commit/rollback and I/O timings
- * whenever we send a normal tabstat message
- */
- if (OidIsValid(tsmsg->m_databaseid))
- {
- tsmsg->m_xact_commit = pgStatXactCommit;
- tsmsg->m_xact_rollback = pgStatXactRollback;
- tsmsg->m_block_read_time = pgStatBlockReadTime;
- tsmsg->m_block_write_time = pgStatBlockWriteTime;
- pgStatXactCommit = 0;
- pgStatXactRollback = 0;
- pgStatBlockReadTime = 0;
- pgStatBlockWriteTime = 0;
- }
+ if (init)
+ deststat->t_counts = srcstat->t_counts;
  else
  {
- tsmsg->m_xact_commit = 0;
- tsmsg->m_xact_rollback = 0;
- tsmsg->m_block_read_time = 0;
- tsmsg->m_block_write_time = 0;
+ PgStat_TableCounts *dest = &deststat->t_counts;
+ PgStat_TableCounts *src = &srcstat->t_counts;
+
+ dest->t_numscans += src->t_numscans;
+ dest->t_tuples_returned += src->t_tuples_returned;
+ dest->t_tuples_fetched += src->t_tuples_fetched;
+ dest->t_tuples_inserted += src->t_tuples_inserted;
+ dest->t_tuples_updated += src->t_tuples_updated;
+ dest->t_tuples_deleted += src->t_tuples_deleted;
+ dest->t_tuples_hot_updated += src->t_tuples_hot_updated;
+ dest->t_truncated |= src->t_truncated;
+
+ /* If table was truncated, first reset the live/dead counters */
+ if (src->t_truncated)
+ {
+ dest->t_delta_live_tuples = 0;
+ dest->t_delta_dead_tuples = 0;
+ }
+ dest->t_delta_live_tuples += src->t_delta_live_tuples;
+ dest->t_delta_dead_tuples += src->t_delta_dead_tuples;
+ dest->t_changed_tuples += src->t_changed_tuples;
+ dest->t_blocks_fetched += src->t_blocks_fetched;
+ dest->t_blocks_hit += src->t_blocks_hit;
  }
-
- n = tsmsg->m_nentries;
- len = offsetof(PgStat_MsgTabstat, m_entry[0]) +
- n * sizeof(PgStat_TableEntry);
-
- pgstat_setheader(&tsmsg->m_hdr, PGSTAT_MTYPE_TABSTAT);
- pgstat_send(tsmsg, len);
 }
-
+
 /*
- * Subroutine for pgstat_report_stat: populate and send a function stat message
+ * pgstat_update_funcstats: subroutine for pgstat_update_stat
+ *
+ *  updates a function stat
  */
 static void
-pgstat_send_funcstats(void)
+pgstat_update_funcstats(bool force, PgStat_StatDBEntry *dbentry)
 {
  /* we assume this inits to all zeroes: */
  static const PgStat_FunctionCounts all_zeroes;
+ pg_stat_table_result_status status = 0;
+ dshash_table *funchash;
+ bool  nowait = !force;
+ bool  release_db = false;
+ int  table_op = PGSTAT_FETCH_EXCLUSIVE;
 
- PgStat_MsgFuncstat msg;
- PgStat_BackendFunctionEntry *entry;
- HASH_SEQ_STATUS fstat;
-
- if (pgStatFunctions == NULL)
+ if (pgStatFunctions == NULL && pgStatPendingFunctions == NULL)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_FUNCSTAT);
- msg.m_databaseid = MyDatabaseId;
- msg.m_nentries = 0;
+ if (nowait)
+ table_op += PGSTAT_FETCH_NOWAIT;
 
- hash_seq_init(&fstat, pgStatFunctions);
- while ((entry = (PgStat_BackendFunctionEntry *) hash_seq_search(&fstat)) != NULL)
+ /* find the shared function stats table */
+ if (!dbentry)
  {
- PgStat_FunctionEntry *m_ent;
-
- /* Skip it if no counts accumulated since last time */
- if (memcmp(&entry->f_counts, &all_zeroes,
-   sizeof(PgStat_FunctionCounts)) == 0)
- continue;
-
- /* need to convert format of time accumulators */
- m_ent = &msg.m_entry[msg.m_nentries];
- m_ent->f_id = entry->f_id;
- m_ent->f_numcalls = entry->f_counts.f_numcalls;
- m_ent->f_total_time = INSTR_TIME_GET_MICROSEC(entry->f_counts.f_total_time);
- m_ent->f_self_time = INSTR_TIME_GET_MICROSEC(entry->f_counts.f_self_time);
-
- if (++msg.m_nentries >= PGSTAT_NUM_FUNCENTRIES)
- {
- pgstat_send(&msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) +
- msg.m_nentries * sizeof(PgStat_FunctionEntry));
- msg.m_nentries = 0;
- }
-
- /* reset the entry's counts */
- MemSet(&entry->f_counts, 0, sizeof(PgStat_FunctionCounts));
+ dbentry = pgstat_get_db_entry(MyDatabaseId, table_op, &status);
+ release_db = true;
  }
 
- if (msg.m_nentries > 0)
- pgstat_send(&msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) +
- msg.m_nentries * sizeof(PgStat_FunctionEntry));
+ /* lock failure, return. */
+ if (status == PGSTAT_ENTRY_LOCK_FAILED)
+ return;
 
- have_function_stats = false;
+ /* create hash if not yet */
+ if (dbentry->functions == DSM_HANDLE_INVALID)
+ {
+ funchash = dshash_create(area, &dsh_funcparams, 0);
+ dbentry->functions = dshash_get_hash_table_handle(funchash);
+ }
+ else
+ funchash = dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+
+ /*
+ * First, we empty the transaction stats. Just move numbers to pending
+ * stats if any. Elsewise try to directly update the shared stats but
+ * create a new pending entry on lock failure.
+ */
+ if (pgStatFunctions)
+ {
+ HASH_SEQ_STATUS fstat;
+ PgStat_BackendFunctionEntry *bestat;
+
+ hash_seq_init(&fstat, pgStatFunctions);
+ while ((bestat = (PgStat_BackendFunctionEntry *) hash_seq_search(&fstat)) != NULL)
+ {
+ bool found;
+ bool init = false;
+ PgStat_StatFuncEntry *funcent = NULL;
+
+ /* Skip it if no counts accumulated since last time */
+ if (memcmp(&bestat->f_counts, &all_zeroes,
+   sizeof(PgStat_FunctionCounts)) == 0)
+ continue;
+
+ /* find pending entry */
+ if (pgStatPendingFunctions)
+ funcent = (PgStat_StatFuncEntry *)
+ hash_search(pgStatPendingFunctions,
+ (void *) &(bestat->f_id), HASH_FIND, NULL);
+
+ if (!funcent)
+ {
+ /* pending entry not found, find shared stats entry */
+ funcent = (PgStat_StatFuncEntry *)
+ dshash_find_or_insert_extended(funchash,
+   (void *) &(bestat->f_id),
+   &found, nowait);
+ if (funcent)
+ init = !found;
+ else
+ {
+ /* no shared stats entry. create a new pending one */
+ funcent = (PgStat_StatFuncEntry *)
+ hash_search(pgStatPendingFunctions,
+ (void *) &(bestat->f_id), HASH_ENTER, NULL);
+ init = true;
+ }
+ }
+ Assert (funcent != NULL);
+
+ pgstat_merge_backendstats_to_funcentry(funcent, bestat, init);
+
+ /* reset used counts */
+ MemSet(&bestat->f_counts, 0, sizeof(PgStat_FunctionCounts));
+ }
+ }
+
+ /* Second, apply pending stats numbers to shared table */
+ if (pgStatPendingFunctions)
+ {
+ HASH_SEQ_STATUS fstat;
+ PgStat_StatFuncEntry *pendent;
+
+ hash_seq_init(&fstat, pgStatPendingFunctions);
+ while ((pendent = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+ {
+ PgStat_StatFuncEntry *funcent;
+ bool found;
+
+ funcent = (PgStat_StatFuncEntry *)
+ dshash_find_or_insert_extended(funchash,
+   (void *) &(pendent->functionid),
+   &found, nowait);
+ if (funcent)
+ {
+ pgstat_merge_funcentry(pendent, funcent, !found);
+ hash_search(pgStatPendingFunctions,
+ (void *) &(pendent->functionid), HASH_REMOVE, NULL);
+ }
+ }
+
+ /* destroy the hsah if no entry remains */
+ if (hash_get_num_entries(pgStatPendingFunctions) == 0)
+ {
+ hash_destroy(pgStatPendingFunctions);
+ pgStatPendingFunctions = NULL;
+ }
+ }
+
+ if (release_db)
+ dshash_release_lock(db_stats, dbentry);
 }
 
+/*
+ * pgstat_merge_backendstats_to_funcentry: subroutine for
+ * pgstat_update_funcstats
+ *
+ * Merges BackendFunctionEntry into StatFuncEntry
+ */
+static inline void
+pgstat_merge_backendstats_to_funcentry(PgStat_StatFuncEntry *dest,
+   PgStat_BackendFunctionEntry *src,
+   bool init)
+{
+ if (init)
+ {
+ /*
+ * If it's a new function entry, initialize counters to the values
+ * we just got.
+ */
+ dest->f_numcalls = src->f_counts.f_numcalls;
+ dest->f_total_time =
+ INSTR_TIME_GET_MICROSEC(src->f_counts.f_total_time);
+ dest->f_self_time =
+ INSTR_TIME_GET_MICROSEC(src->f_counts.f_self_time);
+ }
+ else
+ {
+ /*
+ * Otherwise add the values to the existing entry.
+ */
+ dest->f_numcalls += src->f_counts.f_numcalls;
+ dest->f_total_time +=
+ INSTR_TIME_GET_MICROSEC(src->f_counts.f_total_time);
+ dest->f_self_time +=
+ INSTR_TIME_GET_MICROSEC(src->f_counts.f_self_time);
+ }
+}
+
+/*
+ * pgstat_merge_funcentry: subroutine for pgstat_update_funcstats
+ *
+ * Merges two StatFuncEntrys
+ */
+static inline void
+pgstat_merge_funcentry(PgStat_StatFuncEntry *dest, PgStat_StatFuncEntry *src,
+   bool init)
+{
+ if (init)
+ {
+ /*
+ * If it's a new function entry, initialize counters to the values
+ * we just got.
+ */
+ dest->f_numcalls = src->f_numcalls;
+ dest->f_total_time = src->f_total_time;
+ dest->f_self_time = src->f_self_time;
+ }
+ else
+ {
+ /*
+ * Otherwise add the values to the existing entry.
+ */
+ dest->f_numcalls += src->f_numcalls;
+ dest->f_total_time += src->f_total_time;
+ dest->f_self_time += src->f_self_time;
+ }
+}
+
+
 
 /* ----------
  * pgstat_vacuum_stat() -
  *
- * Will tell the collector about objects he can get rid of.
+ * Remove objects he can get rid of.
  * ----------
  */
 void
 pgstat_vacuum_stat(void)
 {
- HTAB   *htab;
- PgStat_MsgTabpurge msg;
- PgStat_MsgFuncpurge f_msg;
- HASH_SEQ_STATUS hstat;
+ HTAB   *oidtab;
+ dshash_table *dshtable;
+ dshash_seq_status dshstat;
  PgStat_StatDBEntry *dbentry;
  PgStat_StatTabEntry *tabentry;
  PgStat_StatFuncEntry *funcentry;
- int len;
 
- if (pgStatSock == PGINVALID_SOCKET)
+ /* we don't collect statistics under standalone mode */
+ if (!IsUnderPostmaster)
  return;
 
- /*
- * If not done for this transaction, read the statistics collector stats
- * file into some hash tables.
- */
- backend_read_statsfile();
+ /* If not done for this transaction, take a snapshot of stats */
+ if (!backend_snapshot_global_stats())
+ return;
 
  /*
  * Read pg_database and make a list of OIDs of all existing databases
  */
- htab = pgstat_collect_oids(DatabaseRelationId, Anum_pg_database_oid);
+ oidtab = pgstat_collect_oids(DatabaseRelationId, Anum_pg_database_oid);
 
  /*
- * Search the database hash table for dead databases and tell the
- * collector to drop them.
+ * Search the database hash table for dead databases and drop them
+ * from the hash.
  */
- hash_seq_init(&hstat, pgStatDBHash);
- while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
+
+ dshash_seq_init(&dshstat, db_stats, false, true);
+ while ((dbentry = (PgStat_StatDBEntry *) dshash_seq_next(&dshstat)) != NULL)
  {
  Oid dbid = dbentry->databaseid;
 
@@ -1057,148 +1055,86 @@ pgstat_vacuum_stat(void)
 
  /* the DB entry for shared tables (with InvalidOid) is never dropped */
  if (OidIsValid(dbid) &&
- hash_search(htab, (void *) &dbid, HASH_FIND, NULL) == NULL)
+ hash_search(oidtab, (void *) &dbid, HASH_FIND, NULL) == NULL)
  pgstat_drop_database(dbid);
  }
 
  /* Clean up */
- hash_destroy(htab);
+ hash_destroy(oidtab);
 
  /*
  * Lookup our own database entry; if not found, nothing more to do.
  */
- dbentry = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- (void *) &MyDatabaseId,
- HASH_FIND, NULL);
- if (dbentry == NULL || dbentry->tables == NULL)
+ dbentry = pgstat_get_db_entry(MyDatabaseId, PGSTAT_FETCH_EXCLUSIVE, NULL);
+ if (!dbentry)
  return;
-
+
  /*
  * Similarly to above, make a list of all known relations in this DB.
  */
- htab = pgstat_collect_oids(RelationRelationId, Anum_pg_class_oid);
-
- /*
- * Initialize our messages table counter to zero
- */
- msg.m_nentries = 0;
+ oidtab = pgstat_collect_oids(RelationRelationId, Anum_pg_class_oid);
 
  /*
  * Check for all tables listed in stats hashtable if they still exist.
+ * Stats cache is useless here so directly search the shared hash.
  */
- hash_seq_init(&hstat, dbentry->tables);
- while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&hstat)) != NULL)
+ dshtable = dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ dshash_seq_init(&dshstat, dshtable, false, true);
+ while ((tabentry = (PgStat_StatTabEntry *) dshash_seq_next(&dshstat)) != NULL)
  {
  Oid tabid = tabentry->tableid;
 
  CHECK_FOR_INTERRUPTS();
 
- if (hash_search(htab, (void *) &tabid, HASH_FIND, NULL) != NULL)
+ if (hash_search(oidtab, (void *) &tabid, HASH_FIND, NULL) != NULL)
  continue;
 
- /*
- * Not there, so add this table's Oid to the message
- */
- msg.m_tableid[msg.m_nentries++] = tabid;
-
- /*
- * If the message is full, send it out and reinitialize to empty
- */
- if (msg.m_nentries >= PGSTAT_NUM_TABPURGE)
- {
- len = offsetof(PgStat_MsgTabpurge, m_tableid[0])
- + msg.m_nentries * sizeof(Oid);
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABPURGE);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, len);
-
- msg.m_nentries = 0;
- }
- }
-
- /*
- * Send the rest
- */
- if (msg.m_nentries > 0)
- {
- len = offsetof(PgStat_MsgTabpurge, m_tableid[0])
- + msg.m_nentries * sizeof(Oid);
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABPURGE);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, len);
+ /* Not there, so purge this table */
+ dshash_delete_entry(dshtable, tabentry);
  }
+ dshash_detach(dshtable);
 
  /* Clean up */
- hash_destroy(htab);
+ hash_destroy(oidtab);
 
  /*
  * Now repeat the above steps for functions.  However, we needn't bother
  * in the common case where no function stats are being collected.
  */
- if (dbentry->functions != NULL &&
- hash_get_num_entries(dbentry->functions) > 0)
+ if (dbentry->functions != DSM_HANDLE_INVALID)
  {
- htab = pgstat_collect_oids(ProcedureRelationId, Anum_pg_proc_oid);
+ dshtable = dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+ oidtab = pgstat_collect_oids(ProcedureRelationId, Anum_pg_proc_oid);
 
- pgstat_setheader(&f_msg.m_hdr, PGSTAT_MTYPE_FUNCPURGE);
- f_msg.m_databaseid = MyDatabaseId;
- f_msg.m_nentries = 0;
-
- hash_seq_init(&hstat, dbentry->functions);
- while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&hstat)) != NULL)
+ dshash_seq_init(&dshstat, dshtable, false, true);
+ while ((funcentry = (PgStat_StatFuncEntry *) dshash_seq_next(&dshstat)) != NULL)
  {
  Oid funcid = funcentry->functionid;
 
  CHECK_FOR_INTERRUPTS();
 
- if (hash_search(htab, (void *) &funcid, HASH_FIND, NULL) != NULL)
+ if (hash_search(oidtab, (void *) &funcid, HASH_FIND, NULL) != NULL)
  continue;
 
- /*
- * Not there, so add this function's Oid to the message
- */
- f_msg.m_functionid[f_msg.m_nentries++] = funcid;
-
- /*
- * If the message is full, send it out and reinitialize to empty
- */
- if (f_msg.m_nentries >= PGSTAT_NUM_FUNCPURGE)
- {
- len = offsetof(PgStat_MsgFuncpurge, m_functionid[0])
- + f_msg.m_nentries * sizeof(Oid);
-
- pgstat_send(&f_msg, len);
-
- f_msg.m_nentries = 0;
- }
+ /* Not there, so remove this function */
+ dshash_delete_entry(dshtable, funcentry);
  }
 
- /*
- * Send the rest
- */
- if (f_msg.m_nentries > 0)
- {
- len = offsetof(PgStat_MsgFuncpurge, m_functionid[0])
- + f_msg.m_nentries * sizeof(Oid);
+ hash_destroy(oidtab);
 
- pgstat_send(&f_msg, len);
- }
-
- hash_destroy(htab);
+ dshash_detach(dshtable);
  }
+ dshash_release_lock(db_stats, dbentry);
 }
 
 
-/* ----------
+/*
  * pgstat_collect_oids() -
  *
  * Collect the OIDs of all objects listed in the specified system catalog
- * into a temporary hash table.  Caller should hash_destroy the result
- * when done with it.  (However, we make the table in CurrentMemoryContext
- * so that it will be freed properly in event of an error.)
- * ----------
+ * into a temporary hash table.  Caller should hash_destroy the result after
+ * use.  (However, we make the table in CurrentMemoryContext so that it will
+ * be freed properly in event of an error.)
  */
 static HTAB *
 pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid)
@@ -1245,62 +1181,54 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid)
 /* ----------
  * pgstat_drop_database() -
  *
- * Tell the collector that we just dropped a database.
- * (If the message gets lost, we will still clean the dead DB eventually
- * via future invocations of pgstat_vacuum_stat().)
+ * Remove entry for the database that we just dropped.
+ *
+ *  If some stats update happens after this, this entry will re-created but
+ * we will still clean the dead DB eventually via future invocations of
+ * pgstat_vacuum_stat().
  * ----------
  */
+
 void
 pgstat_drop_database(Oid databaseid)
 {
- PgStat_MsgDropdb msg;
+ PgStat_StatDBEntry *dbentry;
 
- if (pgStatSock == PGINVALID_SOCKET)
- return;
+ Assert (OidIsValid(databaseid));
+ Assert(db_stats);
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_DROPDB);
- msg.m_databaseid = databaseid;
- pgstat_send(&msg, sizeof(msg));
+ /*
+ * Lookup the database in the hashtable with exclusive lock.
+ */
+ dbentry = pgstat_get_db_entry(databaseid, PGSTAT_FETCH_EXCLUSIVE, NULL);
+
+ /*
+ * If found, remove it (along with the db statfile).
+ */
+ if (dbentry)
+ {
+ if (dbentry->tables != DSM_HANDLE_INVALID)
+ {
+ dshash_table *tbl =
+ dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ dshash_destroy(tbl);
+ }
+ if (dbentry->functions != DSM_HANDLE_INVALID)
+ {
+ dshash_table *tbl =
+ dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+ dshash_destroy(tbl);
+ }
+
+ dshash_delete_entry(db_stats, (void *)dbentry);
+ }
 }
 
 
-/* ----------
- * pgstat_drop_relation() -
- *
- * Tell the collector that we just dropped a relation.
- * (If the message gets lost, we will still clean the dead entry eventually
- * via future invocations of pgstat_vacuum_stat().)
- *
- * Currently not used for lack of any good place to call it; we rely
- * entirely on pgstat_vacuum_stat() to clean out stats for dead rels.
- * ----------
- */
-#ifdef NOT_USED
-void
-pgstat_drop_relation(Oid relid)
-{
- PgStat_MsgTabpurge msg;
- int len;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- msg.m_tableid[0] = relid;
- msg.m_nentries = 1;
-
- len = offsetof(PgStat_MsgTabpurge, m_tableid[0]) + sizeof(Oid);
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABPURGE);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, len);
-}
-#endif /* NOT_USED */
-
-
 /* ----------
  * pgstat_reset_counters() -
  *
- * Tell the statistics collector to reset counters for our database.
+ * Reset counters for our database.
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
@@ -1309,20 +1237,51 @@ pgstat_drop_relation(Oid relid)
 void
 pgstat_reset_counters(void)
 {
- PgStat_MsgResetcounter msg;
+ PgStat_StatDBEntry   *dbentry;
+ pg_stat_table_result_status status;
 
- if (pgStatSock == PGINVALID_SOCKET)
+ Assert(db_stats);
+
+ /*
+ * Lookup the database in the hashtable.  Nothing to do if not there.
+ */
+ dbentry = pgstat_get_db_entry(MyDatabaseId, PGSTAT_FETCH_EXCLUSIVE, &status);
+
+ if (!dbentry)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETCOUNTER);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, sizeof(msg));
+ /*
+ * We simply throw away all the database's table entries by recreating a
+ * new hash table for them.
+ */
+ if (dbentry->tables != DSM_HANDLE_INVALID)
+ {
+ dshash_table *t =
+ dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ dshash_destroy(t);
+ dbentry->tables = DSM_HANDLE_INVALID;
+ }
+ if (dbentry->functions != DSM_HANDLE_INVALID)
+ {
+ dshash_table *t =
+ dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+ dshash_destroy(t);
+ dbentry->functions = DSM_HANDLE_INVALID;
+ }
+
+ /*
+ * Reset database-level stats, too.  This creates empty hash tables for
+ * tables and functions.
+ */
+ reset_dbentry_counters(dbentry);
+
+ dshash_release_lock(db_stats, dbentry);
 }
 
 /* ----------
  * pgstat_reset_shared_counters() -
  *
- * Tell the statistics collector to reset cluster-wide shared counters.
+ * Reset cluster-wide shared counters.
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
@@ -1331,29 +1290,37 @@ pgstat_reset_counters(void)
 void
 pgstat_reset_shared_counters(const char *target)
 {
- PgStat_MsgResetsharedcounter msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
+ Assert(db_stats);
 
+ /* Reset the archiver statistics for the cluster. */
  if (strcmp(target, "archiver") == 0)
- msg.m_resettarget = RESET_ARCHIVER;
+ {
+ LWLockAcquire(StatsLock, LW_EXCLUSIVE);
+
+ memset(shared_archiverStats, 0, sizeof(*shared_archiverStats));
+ shared_archiverStats->stat_reset_timestamp = GetCurrentTimestamp();
+ }
  else if (strcmp(target, "bgwriter") == 0)
- msg.m_resettarget = RESET_BGWRITER;
+ {
+ LWLockAcquire(StatsLock, LW_EXCLUSIVE);
+
+ /* Reset the global background writer statistics for the cluster. */
+ memset(shared_globalStats, 0, sizeof(*shared_globalStats));
+ shared_globalStats->stat_reset_timestamp = GetCurrentTimestamp();
+ }
  else
  ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("unrecognized reset target: \"%s\"", target),
  errhint("Target must be \"archiver\" or \"bgwriter\".")));
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
- pgstat_send(&msg, sizeof(msg));
+
+ LWLockRelease(StatsLock);
 }
 
 /* ----------
  * pgstat_reset_single_counter() -
  *
- * Tell the statistics collector to reset a single counter.
+ * Reset a single counter.
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
@@ -1362,17 +1329,90 @@ pgstat_reset_shared_counters(const char *target)
 void
 pgstat_reset_single_counter(Oid objoid, PgStat_Single_Reset_Type type)
 {
- PgStat_MsgResetsinglecounter msg;
+ PgStat_StatDBEntry *dbentry;
+
 
- if (pgStatSock == PGINVALID_SOCKET)
+ Assert(db_stats);
+
+ dbentry = pgstat_get_db_entry(MyDatabaseId, PGSTAT_FETCH_EXCLUSIVE, NULL);
+
+ if (!dbentry)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSINGLECOUNTER);
- msg.m_databaseid = MyDatabaseId;
- msg.m_resettype = type;
- msg.m_objectid = objoid;
+ /* Set the reset timestamp for the whole database */
+ dbentry->stat_reset_timestamp = GetCurrentTimestamp();
 
- pgstat_send(&msg, sizeof(msg));
+ /* Remove object if it exists, ignore it if not */
+ if (type == RESET_TABLE)
+ {
+ dshash_table *t =
+ dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ dshash_delete_key(t, (void *) &objoid);
+ }
+
+ if (type == RESET_FUNCTION && dbentry->functions != DSM_HANDLE_INVALID)
+ {
+ dshash_table *t =
+ dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+ dshash_delete_key(t, (void *) &objoid);
+ }
+
+ dshash_release_lock(db_stats, dbentry);
+}
+
+/*
+ * pgstat_reset_all_counters: subroutine for pgstat_reset_all
+ *
+ * clear all counters on shared memory
+ */
+static void
+pgstat_reset_all_counters(void)
+{
+ dshash_seq_status dshstat;
+ PgStat_StatDBEntry   *dbentry;
+
+ Assert (db_stats);
+
+ LWLockAcquire(StatsLock, LW_EXCLUSIVE);
+ dshash_seq_init(&dshstat, db_stats, false, true);
+ while ((dbentry = (PgStat_StatDBEntry *) dshash_seq_next(&dshstat)) != NULL)
+ {
+ /*
+ * We simply throw away all the database's table hashes
+ */
+ if (dbentry->tables != DSM_HANDLE_INVALID)
+ {
+ dshash_table *t =
+ dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ dshash_destroy(t);
+ dbentry->tables = DSM_HANDLE_INVALID;
+ }
+ if (dbentry->functions != DSM_HANDLE_INVALID)
+ {
+ dshash_table *t =
+ dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+ dshash_destroy(t);
+ dbentry->functions = DSM_HANDLE_INVALID;
+ }
+
+ /*
+ * Reset database-level stats, too.  This creates empty hash tables
+ * for tables and functions.
+ */
+ reset_dbentry_counters(dbentry);
+ dshash_release_lock(db_stats, dbentry);
+
+ }
+
+ /*
+ * Reset global counters
+ */
+ memset(shared_globalStats, 0, sizeof(*shared_globalStats));
+ memset(shared_archiverStats, 0, sizeof(*shared_archiverStats));
+ shared_globalStats->stat_reset_timestamp =
+ shared_archiverStats->stat_reset_timestamp = GetCurrentTimestamp();
+
+ LWLockRelease(StatsLock);
 }
 
 /* ----------
@@ -1386,48 +1426,75 @@ pgstat_reset_single_counter(Oid objoid, PgStat_Single_Reset_Type type)
 void
 pgstat_report_autovac(Oid dboid)
 {
- PgStat_MsgAutovacStart msg;
+ PgStat_StatDBEntry *dbentry;
 
- if (pgStatSock == PGINVALID_SOCKET)
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_AUTOVAC_START);
- msg.m_databaseid = dboid;
- msg.m_start_time = GetCurrentTimestamp();
+ /*
+ * Store the last autovacuum time in the database's hashtable entry.
+ */
+ dbentry = pgstat_get_db_entry(dboid, PGSTAT_FETCH_EXCLUSIVE, NULL);
 
- pgstat_send(&msg, sizeof(msg));
+ dbentry->last_autovac_time = GetCurrentTimestamp();
+
+ dshash_release_lock(db_stats, dbentry);
 }
 
 
 /* ---------
  * pgstat_report_vacuum() -
  *
- * Tell the collector about the table we just vacuumed.
+ * Repot about the table we just vacuumed.
  * ---------
  */
 void
 pgstat_report_vacuum(Oid tableoid, bool shared,
  PgStat_Counter livetuples, PgStat_Counter deadtuples)
 {
- PgStat_MsgVacuum msg;
+ Oid dboid;
+ PgStat_StatDBEntry *dbentry;
+ PgStat_StatTabEntry *tabentry;
+ dshash_table *table;
 
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUM);
- msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
- msg.m_tableoid = tableoid;
- msg.m_autovacuum = IsAutoVacuumWorkerProcess();
- msg.m_vacuumtime = GetCurrentTimestamp();
- msg.m_live_tuples = livetuples;
- msg.m_dead_tuples = deadtuples;
- pgstat_send(&msg, sizeof(msg));
+ dboid = shared ? InvalidOid : MyDatabaseId;
+
+ /*
+ * Store the data in the table's hashtable entry.
+ */
+ dbentry = pgstat_get_db_entry(dboid, PGSTAT_FETCH_EXCLUSIVE, NULL);
+ table = dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ tabentry = pgstat_get_tab_entry(table, tableoid, true);
+
+ tabentry->n_live_tuples = livetuples;
+ tabentry->n_dead_tuples = deadtuples;
+
+ if (IsAutoVacuumWorkerProcess())
+ {
+ tabentry->autovac_vacuum_timestamp = GetCurrentTimestamp();
+ tabentry->autovac_vacuum_count++;
+ }
+ else
+ {
+ tabentry->vacuum_timestamp = GetCurrentTimestamp();
+ tabentry->vacuum_count++;
+ }
+ dshash_release_lock(table, tabentry);
+ dshash_detach(table);
+ dshash_release_lock(db_stats, dbentry);
 }
 
 /* --------
  * pgstat_report_analyze() -
  *
- * Tell the collector about the table we just analyzed.
+ * Report about the table we just analyzed.
  *
  * Caller must provide new live- and dead-tuples estimates, as well as a
  * flag indicating whether to reset the changes_since_analyze counter.
@@ -1438,9 +1505,14 @@ pgstat_report_analyze(Relation rel,
   PgStat_Counter livetuples, PgStat_Counter deadtuples,
   bool resetcounter)
 {
- PgStat_MsgAnalyze msg;
+ Oid dboid;
+ PgStat_StatDBEntry *dbentry;
+ PgStat_StatTabEntry *tabentry;
+ dshash_table *table;
 
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
  return;
 
  /*
@@ -1469,114 +1541,228 @@ pgstat_report_analyze(Relation rel,
  deadtuples = Max(deadtuples, 0);
  }
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ANALYZE);
- msg.m_databaseid = rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId;
- msg.m_tableoid = RelationGetRelid(rel);
- msg.m_autovacuum = IsAutoVacuumWorkerProcess();
- msg.m_resetcounter = resetcounter;
- msg.m_analyzetime = GetCurrentTimestamp();
- msg.m_live_tuples = livetuples;
- msg.m_dead_tuples = deadtuples;
- pgstat_send(&msg, sizeof(msg));
+ dboid = rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId;
+
+ /*
+ * Store the data in the table's hashtable entry.
+ */
+ dbentry = pgstat_get_db_entry(dboid, PGSTAT_FETCH_EXCLUSIVE, NULL);
+
+ table = dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ tabentry = pgstat_get_tab_entry(table, RelationGetRelid(rel), true);
+
+ tabentry->n_live_tuples = livetuples;
+ tabentry->n_dead_tuples = deadtuples;
+
+ /*
+ * If commanded, reset changes_since_analyze to zero.  This forgets any
+ * changes that were committed while the ANALYZE was in progress, but we
+ * have no good way to estimate how many of those there were.
+ */
+ if (resetcounter)
+ tabentry->changes_since_analyze = 0;
+
+ if (IsAutoVacuumWorkerProcess())
+ {
+ tabentry->autovac_analyze_timestamp = GetCurrentTimestamp();
+ tabentry->autovac_analyze_count++;
+ }
+ else
+ {
+ tabentry->analyze_timestamp = GetCurrentTimestamp();
+ tabentry->analyze_count++;
+ }
+ dshash_release_lock(table, tabentry);
+ dshash_detach(table);
+ dshash_release_lock(db_stats, dbentry);
 }
 
 /* --------
  * pgstat_report_recovery_conflict() -
  *
- * Tell the collector about a Hot Standby recovery conflict.
+ * Report a Hot Standby recovery conflict.
  * --------
  */
+static int pending_conflict_tablespace = 0;
+static int pending_conflict_lock = 0;
+static int pending_conflict_snapshot = 0;
+static int pending_conflict_bufferpin = 0;
+static int pending_conflict_startup_deadlock = 0;
+
 void
 pgstat_report_recovery_conflict(int reason)
 {
- PgStat_MsgRecoveryConflict msg;
+ PgStat_StatDBEntry *dbentry;
+ pg_stat_table_result_status status;
 
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RECOVERYCONFLICT);
- msg.m_databaseid = MyDatabaseId;
- msg.m_reason = reason;
- pgstat_send(&msg, sizeof(msg));
+ pgstat_pending_recoveryconflict = true;
+
+ switch (reason)
+ {
+ case PROCSIG_RECOVERY_CONFLICT_DATABASE:
+
+ /*
+ * Since we drop the information about the database as soon as it
+ * replicates, there is no point in counting these conflicts.
+ */
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+ pending_conflict_tablespace++;
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_LOCK:
+ pending_conflict_lock++;
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+ pending_conflict_snapshot++;
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+ pending_conflict_bufferpin++;
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+ pending_conflict_startup_deadlock++;
+ break;
+ }
+
+ dbentry = pgstat_get_db_entry(MyDatabaseId,
+  PGSTAT_FETCH_EXCLUSIVE | PGSTAT_FETCH_NOWAIT,
+  &status);
+
+ if (status == PGSTAT_ENTRY_LOCK_FAILED)
+ return;
+
+ pgstat_cleanup_recovery_conflict(dbentry);
+
+ dshash_release_lock(db_stats, dbentry);
+}
+
+/*
+ * clean up function for pending recovery conflicts
+ */
+static void
+pgstat_cleanup_recovery_conflict(PgStat_StatDBEntry *dbentry)
+{
+ dbentry->n_conflict_tablespace += pending_conflict_tablespace;
+ dbentry->n_conflict_lock += pending_conflict_lock;
+ dbentry->n_conflict_snapshot += pending_conflict_snapshot;
+ dbentry->n_conflict_bufferpin += pending_conflict_bufferpin;
+ dbentry->n_conflict_startup_deadlock += pending_conflict_startup_deadlock;
+
+ pending_conflict_tablespace = 0;
+ pending_conflict_lock = 0;
+ pending_conflict_snapshot = 0;
+ pending_conflict_bufferpin = 0;
+ pending_conflict_startup_deadlock = 0;
+
+ pgstat_pending_recoveryconflict = false;
 }
 
 /* --------
  * pgstat_report_deadlock() -
  *
- * Tell the collector about a deadlock detected.
+ * Report a deadlock detected.
  * --------
  */
+static int pending_deadlocks = 0;
+
 void
 pgstat_report_deadlock(void)
 {
- PgStat_MsgDeadlock msg;
+ PgStat_StatDBEntry *dbentry;
+ pg_stat_table_result_status status;
 
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_DEADLOCK);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, sizeof(msg));
+ pending_deadlocks++;
+ pgstat_pending_deadlock = true;
+
+ dbentry = pgstat_get_db_entry(MyDatabaseId,
+  PGSTAT_FETCH_EXCLUSIVE | PGSTAT_FETCH_NOWAIT,
+  &status);
+
+ if (status == PGSTAT_ENTRY_LOCK_FAILED)
+ return;
+
+ pgstat_cleanup_deadlock(dbentry);
+
+ dshash_release_lock(db_stats, dbentry);
+}
+
+/*
+ * clean up function for pending dead locks
+ */
+static void
+pgstat_cleanup_deadlock(PgStat_StatDBEntry *dbentry)
+{
+ dbentry->n_deadlocks += pending_deadlocks;
+ pending_deadlocks = 0;
+ pgstat_pending_deadlock = false;
 }
 
 /* --------
  * pgstat_report_tempfile() -
  *
- * Tell the collector about a temporary file.
+ * Report a temporary file.
  * --------
  */
+static size_t pending_filesize = 0;
+static size_t pending_files = 0;
+
 void
 pgstat_report_tempfile(size_t filesize)
 {
- PgStat_MsgTempFile msg;
+ PgStat_StatDBEntry *dbentry;
+ pg_stat_table_result_status status;
 
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TEMPFILE);
- msg.m_databaseid = MyDatabaseId;
- msg.m_filesize = filesize;
- pgstat_send(&msg, sizeof(msg));
-}
+ if (filesize > 0) /* Is't there a case where filesize is really 0? */
+ {
+ pgstat_pending_tempfile = true;
+ pending_filesize += filesize; /* needs check overflow */
+ pending_files++;
+ }
 
-
-/* ----------
- * pgstat_ping() -
- *
- * Send some junk data to the collector to increase traffic.
- * ----------
- */
-void
-pgstat_ping(void)
-{
- PgStat_MsgDummy msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
+ if (!pgstat_pending_tempfile)
  return;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_DUMMY);
- pgstat_send(&msg, sizeof(msg));
+ dbentry = pgstat_get_db_entry(MyDatabaseId,
+  PGSTAT_FETCH_EXCLUSIVE | PGSTAT_FETCH_NOWAIT,
+  &status);
+
+ if (status == PGSTAT_ENTRY_LOCK_FAILED)
+ return;
+
+ pgstat_cleanup_tempfile(dbentry);
+
+ dshash_release_lock(db_stats, dbentry);
 }
 
-/* ----------
- * pgstat_send_inquiry() -
- *
- * Notify collector that we need fresh data.
- * ----------
+/*
+ * clean up function for temporary files
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
+pgstat_cleanup_tempfile(PgStat_StatDBEntry *dbentry)
 {
- PgStat_MsgInquiry msg;
 
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
- msg.clock_time = clock_time;
- msg.cutoff_time = cutoff_time;
- msg.databaseid = databaseid;
- pgstat_send(&msg, sizeof(msg));
+ dbentry->n_temp_bytes += pending_filesize;
+ dbentry->n_temp_files += pending_files;
+ pending_filesize = 0;
+ pending_files = 0;
+ pgstat_pending_tempfile = false;
+
 }
 
-
 /*
  * Initialize function call usage data.
  * Called by the executor before invoking a function.
@@ -1692,9 +1878,6 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
  fs->f_numcalls++;
  fs->f_total_time = f_total;
  INSTR_TIME_ADD(fs->f_self_time, f_self);
-
- /* indicate that we have something to send */
- have_function_stats = true;
 }
 
 
@@ -1716,6 +1899,15 @@ pgstat_initstats(Relation rel)
  Oid rel_id = rel->rd_id;
  char relkind = rel->rd_rel->relkind;
 
+ Assert(db_stats);
+
+ if (!pgstat_track_counts || !IsUnderPostmaster)
+ {
+ /* We're not counting at all */
+ rel->pgstat_info = NULL;
+ return;
+ }
+
  /* We only count stats for things that have storage */
  if (!(relkind == RELKIND_RELATION ||
   relkind == RELKIND_MATVIEW ||
@@ -1727,13 +1919,6 @@ pgstat_initstats(Relation rel)
  return;
  }
 
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- {
- /* We're not counting at all */
- rel->pgstat_info = NULL;
- return;
- }
-
  /*
  * If we already set up this relation in the current transaction, nothing
  * to do.
@@ -2377,34 +2562,6 @@ pgstat_twophase_postabort(TransactionId xid, uint16 info,
  rec->tuples_inserted + rec->tuples_updated;
 }
 
-
-/* ----------
- * pgstat_fetch_stat_dbentry() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * the collected statistics for one database or NULL. NULL doesn't mean
- * that the database doesn't exist, it is just not yet known by the
- * collector, so the caller is better off to report ZERO instead.
- * ----------
- */
-PgStat_StatDBEntry *
-pgstat_fetch_stat_dbentry(Oid dbid)
-{
- /*
- * If not done for this transaction, read the statistics collector stats
- * file into some hash tables.
- */
- backend_read_statsfile();
-
- /*
- * Lookup the requested database; return NULL if not found
- */
- return (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
-  (void *) &dbid,
-  HASH_FIND, NULL);
-}
-
-
 /* ----------
  * pgstat_fetch_stat_tabentry() -
  *
@@ -2417,47 +2574,28 @@ pgstat_fetch_stat_dbentry(Oid dbid)
 PgStat_StatTabEntry *
 pgstat_fetch_stat_tabentry(Oid relid)
 {
- Oid dbid;
  PgStat_StatDBEntry *dbentry;
  PgStat_StatTabEntry *tabentry;
 
- /*
- * If not done for this transaction, read the statistics collector stats
- * file into some hash tables.
- */
- backend_read_statsfile();
+ /* Lookup our database, then look in its table hash table. */
+ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId, false);
+ if (dbentry == NULL)
+ return NULL;
 
- /*
- * Lookup our database, then look in its table hash table.
- */
- dbid = MyDatabaseId;
- dbentry = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- (void *) &dbid,
- HASH_FIND, NULL);
- if (dbentry != NULL && dbentry->tables != NULL)
- {
- tabentry = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
-   (void *) &relid,
-   HASH_FIND, NULL);
- if (tabentry)
- return tabentry;
- }
+ tabentry = backend_get_tab_entry(dbentry, relid, false);
+ if (tabentry != NULL)
+ return tabentry;
 
  /*
  * If we didn't find it, maybe it's a shared table.
  */
- dbid = InvalidOid;
- dbentry = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- (void *) &dbid,
- HASH_FIND, NULL);
- if (dbentry != NULL && dbentry->tables != NULL)
- {
- tabentry = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
-   (void *) &relid,
-   HASH_FIND, NULL);
- if (tabentry)
- return tabentry;
- }
+ dbentry = pgstat_fetch_stat_dbentry(InvalidOid, false);
+ if (dbentry == NULL)
+ return NULL;
+
+ tabentry = backend_get_tab_entry(dbentry, relid, false);
+ if (tabentry != NULL)
+ return tabentry;
 
  return NULL;
 }
@@ -2476,18 +2614,14 @@ pgstat_fetch_stat_funcentry(Oid func_id)
  PgStat_StatDBEntry *dbentry;
  PgStat_StatFuncEntry *funcentry = NULL;
 
- /* load the stats file if needed */
- backend_read_statsfile();
+ /* Lookup our database, then find the requested function */
+ dbentry = pgstat_get_db_entry(MyDatabaseId, PGSTAT_FETCH_SHARED, NULL);
+ if (dbentry == NULL)
+ return NULL;
 
- /* Lookup our database, then find the requested function.  */
- dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
- if (dbentry != NULL && dbentry->functions != NULL)
- {
- funcentry = (PgStat_StatFuncEntry *) hash_search(dbentry->functions,
- (void *) &func_id,
- HASH_FIND, NULL);
- }
+ funcentry = backend_get_func_etnry(dbentry, func_id, false);
 
+ dshash_release_lock(db_stats, dbentry);
  return funcentry;
 }
 
@@ -2562,9 +2696,11 @@ pgstat_fetch_stat_numbackends(void)
 PgStat_ArchiverStats *
 pgstat_fetch_stat_archiver(void)
 {
- backend_read_statsfile();
+ /* If not done for this transaction, take a stats snapshot */
+ if (!backend_snapshot_global_stats())
+ return NULL;
 
- return &archiverStats;
+ return snapshot_archiverStats;
 }
 
 
@@ -2579,9 +2715,11 @@ pgstat_fetch_stat_archiver(void)
 PgStat_GlobalStats *
 pgstat_fetch_global(void)
 {
- backend_read_statsfile();
+ /* If not done for this transaction, take a stats snapshot */
+ if (!backend_snapshot_global_stats())
+ return NULL;
 
- return &globalStats;
+ return snapshot_globalStats;
 }
 
 
@@ -2771,7 +2909,7 @@ pgstat_initialize(void)
  }
 
  /* Set up a process-exit hook to clean up */
- on_shmem_exit(pgstat_beshutdown_hook, 0);
+ before_shmem_exit(pgstat_beshutdown_hook, 0);
 }
 
 /* ----------
@@ -2963,7 +3101,7 @@ pgstat_beshutdown_hook(int code, Datum arg)
  * during failed backend starts might never get counted.)
  */
  if (OidIsValid(MyDatabaseId))
- pgstat_report_stat(true);
+ pgstat_update_stat(true);
 
  /*
  * Clear my status entry, following the protocol of bumping st_changecount
@@ -3230,7 +3368,8 @@ pgstat_read_current_status(void)
 #endif
  int i;
 
- Assert(!pgStatRunningInCollector);
+ Assert(IsUnderPostmaster);
+
  if (localBackendStatusTable)
  return; /* already done */
 
@@ -4150,96 +4289,68 @@ pgstat_get_backend_desc(BackendType backendType)
  * ------------------------------------------------------------
  */
 
-
 /* ----------
- * pgstat_setheader() -
+ * pgstat_update_archiver() -
  *
- * Set common header fields in a statistics message
+ * Update the stats data about the WAL file that we successfully archived or
+ * failed to archive.
  * ----------
  */
-static void
-pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype)
+void
+pgstat_update_archiver(const char *xlog, bool failed)
 {
- hdr->m_type = mtype;
-}
-
-
-/* ----------
- * pgstat_send() -
- *
- * Send out one statistics message to the collector
- * ----------
- */
-static void
-pgstat_send(void *msg, int len)
-{
- int rc;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- ((PgStat_MsgHdr *) msg)->m_size = len;
-
- /* We'll retry after EINTR, but ignore all other failures */
- do
+ if (failed)
  {
- rc = send(pgStatSock, msg, len, 0);
- } while (rc < 0 && errno == EINTR);
-
-#ifdef USE_ASSERT_CHECKING
- /* In debug builds, log send failures ... */
- if (rc < 0)
- elog(LOG, "could not send to statistics collector: %m");
-#endif
+ /* Failed archival attempt */
+ ++shared_archiverStats->failed_count;
+ memcpy(shared_archiverStats->last_failed_wal, xlog,
+   sizeof(shared_archiverStats->last_failed_wal));
+ shared_archiverStats->last_failed_timestamp = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Successful archival operation */
+ ++shared_archiverStats->archived_count;
+ memcpy(shared_archiverStats->last_archived_wal, xlog,
+   sizeof(shared_archiverStats->last_archived_wal));
+ shared_archiverStats->last_archived_timestamp = GetCurrentTimestamp();
+ }
 }
 
 /* ----------
- * pgstat_send_archiver() -
+ * pgstat_update_bgwriter() -
  *
- * Tell the collector about the WAL file that we successfully
- * archived or failed to archive.
+ * Update bgwriter statistics
  * ----------
  */
 void
-pgstat_send_archiver(const char *xlog, bool failed)
-{
- PgStat_MsgArchiver msg;
-
- /*
- * Prepare and send the message
- */
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ARCHIVER);
- msg.m_failed = failed;
- StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
- msg.m_timestamp = GetCurrentTimestamp();
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* ----------
- * pgstat_send_bgwriter() -
- *
- * Send bgwriter statistics to the collector
- * ----------
- */
-void
-pgstat_send_bgwriter(void)
+pgstat_update_bgwriter(void)
 {
  /* We assume this initializes to zeroes */
- static const PgStat_MsgBgWriter all_zeroes;
+ static const PgStat_BgWriter all_zeroes;
+
+ PgStat_BgWriter *s = &BgWriterStats;
 
  /*
  * This function can be called even if nothing at all has happened. In
  * this case, avoid sending a completely empty message to the stats
  * collector.
  */
- if (memcmp(&BgWriterStats, &all_zeroes, sizeof(PgStat_MsgBgWriter)) == 0)
+ if (memcmp(&BgWriterStats, &all_zeroes, sizeof(PgStat_BgWriter)) == 0)
  return;
 
- /*
- * Prepare and send the message
- */
- pgstat_setheader(&BgWriterStats.m_hdr, PGSTAT_MTYPE_BGWRITER);
- pgstat_send(&BgWriterStats, sizeof(BgWriterStats));
+ LWLockAcquire(StatsLock, LW_EXCLUSIVE);
+ shared_globalStats->timed_checkpoints += s->timed_checkpoints;
+ shared_globalStats->requested_checkpoints += s->requested_checkpoints;
+ shared_globalStats->checkpoint_write_time += s->checkpoint_write_time;
+ shared_globalStats->checkpoint_sync_time += s->checkpoint_sync_time;
+ shared_globalStats->buf_written_checkpoints += s->buf_written_checkpoints;
+ shared_globalStats->buf_written_clean += s->buf_written_clean;
+ shared_globalStats->maxwritten_clean += s->maxwritten_clean;
+ shared_globalStats->buf_written_backend += s->buf_written_backend;
+ shared_globalStats->buf_fsync_backend += s->buf_fsync_backend;
+ shared_globalStats->buf_alloc += s->buf_alloc;
+ LWLockRelease(StatsLock);
 
  /*
  * Clear out the statistics buffer, so it can be re-used.
@@ -4247,299 +4358,15 @@ pgstat_send_bgwriter(void)
  MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
-
-/* ----------
- * PgstatCollectorMain() -
- *
- * Start up the statistics collector process.  This is the body of the
- * postmaster child process.
- *
- * The argc/argv parameters are valid only in EXEC_BACKEND case.
- * ----------
- */
-NON_EXEC_STATIC void
-PgstatCollectorMain(int argc, char *argv[])
-{
- int len;
- PgStat_Msg msg;
- int wr;
-
- /*
- * Ignore all signals usually bound to some action in the postmaster,
- * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
- * support latch operations, because we only use a local latch.
- */
- pqsignal(SIGHUP, pgstat_sighup_handler);
- pqsignal(SIGINT, SIG_IGN);
- pqsignal(SIGTERM, SIG_IGN);
- pqsignal(SIGQUIT, pgstat_exit);
- pqsignal(SIGALRM, SIG_IGN);
- pqsignal(SIGPIPE, SIG_IGN);
- pqsignal(SIGUSR1, SIG_IGN);
- pqsignal(SIGUSR2, SIG_IGN);
- /* Reset some signals that are accepted by postmaster but not here */
- pqsignal(SIGCHLD, SIG_DFL);
- PG_SETMASK(&UnBlockSig);
-
- /*
- * Identify myself via ps
- */
- init_ps_display("stats collector", "", "", "");
-
- /*
- * Read in existing stats files or initialize the stats to zero.
- */
- pgStatRunningInCollector = true;
- pgStatDBHash = pgstat_read_statsfiles(InvalidOid, true, true);
-
- /*
- * Loop to process messages until we get SIGQUIT or detect ungraceful
- * death of our parent postmaster.
- *
- * For performance reasons, we don't want to do ResetLatch/WaitLatch after
- * every message; instead, do that only after a recv() fails to obtain a
- * message.  (This effectively means that if backends are sending us stuff
- * like mad, we won't notice postmaster death until things slack off a
- * bit; which seems fine.) To do that, we have an inner loop that
- * iterates as long as recv() succeeds.  We do recognize got_SIGHUP inside
- * the inner loop, which means that such interrupts will get serviced but
- * the latch won't get cleared until next time there is a break in the
- * action.
- */
- for (;;)
- {
- /* Clear any already-pending wakeups */
- ResetLatch(MyLatch);
-
- /*
- * Quit if we get SIGQUIT from the postmaster.
- */
- if (need_exit)
- break;
-
- /*
- * Inner loop iterates as long as we keep getting messages, or until
- * need_exit becomes set.
- */
- while (!need_exit)
- {
- /*
- * Reload configuration if we got SIGHUP from the postmaster.
- */
- if (got_SIGHUP)
- {
- got_SIGHUP = false;
- ProcessConfigFile(PGC_SIGHUP);
- }
-
- /*
- * Write the stats file(s) if a new request has arrived that is
- * not satisfied by existing file(s).
- */
- if (pgstat_write_statsfile_needed())
- pgstat_write_statsfiles(false, false);
-
- /*
- * Try to receive and process a message.  This will not block,
- * since the socket is set to non-blocking mode.
- *
- * XXX On Windows, we have to force pgwin32_recv to cooperate,
- * despite the previous use of pg_set_noblock() on the socket.
- * This is extremely broken and should be fixed someday.
- */
-#ifdef WIN32
- pgwin32_noblock = 1;
-#endif
-
- len = recv(pgStatSock, (char *) &msg,
-   sizeof(PgStat_Msg), 0);
-
-#ifdef WIN32
- pgwin32_noblock = 0;
-#endif
-
- if (len < 0)
- {
- if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
- break; /* out of inner loop */
- ereport(ERROR,
- (errcode_for_socket_access(),
- errmsg("could not read statistics message: %m")));
- }
-
- /*
- * We ignore messages that are smaller than our common header
- */
- if (len < sizeof(PgStat_MsgHdr))
- continue;
-
- /*
- * The received length must match the length in the header
- */
- if (msg.msg_hdr.m_size != len)
- continue;
-
- /*
- * O.K. - we accept this message.  Process it.
- */
- switch (msg.msg_hdr.m_type)
- {
- case PGSTAT_MTYPE_DUMMY:
- break;
-
- case PGSTAT_MTYPE_INQUIRY:
- pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_TABSTAT:
- pgstat_recv_tabstat((PgStat_MsgTabstat *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_TABPURGE:
- pgstat_recv_tabpurge((PgStat_MsgTabpurge *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_DROPDB:
- pgstat_recv_dropdb((PgStat_MsgDropdb *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_RESETCOUNTER:
- pgstat_recv_resetcounter((PgStat_MsgResetcounter *) &msg,
- len);
- break;
-
- case PGSTAT_MTYPE_RESETSHAREDCOUNTER:
- pgstat_recv_resetsharedcounter(
-   (PgStat_MsgResetsharedcounter *) &msg,
-   len);
- break;
-
- case PGSTAT_MTYPE_RESETSINGLECOUNTER:
- pgstat_recv_resetsinglecounter(
-   (PgStat_MsgResetsinglecounter *) &msg,
-   len);
- break;
-
- case PGSTAT_MTYPE_AUTOVAC_START:
- pgstat_recv_autovac((PgStat_MsgAutovacStart *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_VACUUM:
- pgstat_recv_vacuum((PgStat_MsgVacuum *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_ANALYZE:
- pgstat_recv_analyze((PgStat_MsgAnalyze *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_ARCHIVER:
- pgstat_recv_archiver((PgStat_MsgArchiver *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_BGWRITER:
- pgstat_recv_bgwriter((PgStat_MsgBgWriter *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_FUNCSTAT:
- pgstat_recv_funcstat((PgStat_MsgFuncstat *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_FUNCPURGE:
- pgstat_recv_funcpurge((PgStat_MsgFuncpurge *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_RECOVERYCONFLICT:
- pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_DEADLOCK:
- pgstat_recv_deadlock((PgStat_MsgDeadlock *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_TEMPFILE:
- pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
- break;
-
- default:
- break;
- }
- } /* end of inner message-processing loop */
-
- /* Sleep until there's something to do */
-#ifndef WIN32
- wr = WaitLatchOrSocket(MyLatch,
-   WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE,
-   pgStatSock, -1L,
-   WAIT_EVENT_PGSTAT_MAIN);
-#else
-
- /*
- * Windows, at least in its Windows Server 2003 R2 incarnation,
- * sometimes loses FD_READ events.  Waking up and retrying the recv()
- * fixes that, so don't sleep indefinitely.  This is a crock of the
- * first water, but until somebody wants to debug exactly what's
- * happening there, this is the best we can do.  The two-second
- * timeout matches our pre-9.2 behavior, and needs to be short enough
- * to not provoke "using stale statistics" complaints from
- * backend_read_statsfile.
- */
- wr = WaitLatchOrSocket(MyLatch,
-   WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | WL_TIMEOUT,
-   pgStatSock,
-   2 * 1000L /* msec */ ,
-   WAIT_EVENT_PGSTAT_MAIN);
-#endif
-
- /*
- * Emergency bailout if postmaster has died.  This is to avoid the
- * necessity for manual cleanup of all postmaster children.
- */
- if (wr & WL_POSTMASTER_DEATH)
- break;
- } /* end of outer loop */
-
- /*
- * Save the final stats to reuse at next startup.
- */
- pgstat_write_statsfiles(true, true);
-
- exit(0);
-}
-
-
-/* SIGQUIT signal handler for collector process */
-static void
-pgstat_exit(SIGNAL_ARGS)
-{
- int save_errno = errno;
-
- need_exit = true;
- SetLatch(MyLatch);
-
- errno = save_errno;
-}
-
-/* SIGHUP handler for collector process */
-static void
-pgstat_sighup_handler(SIGNAL_ARGS)
-{
- int save_errno = errno;
-
- got_SIGHUP = true;
- SetLatch(MyLatch);
-
- errno = save_errno;
-}
-
 /*
- * Subroutine to clear stats in a database entry
+ * Subroutine to reset stats in a shared database entry
  *
  * Tables and functions hashes are initialized to empty.
  */
 static void
 reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
 {
- HASHCTL hash_ctl;
+ dshash_table *tbl;
 
  dbentry->n_xact_commit = 0;
  dbentry->n_xact_rollback = 0;
@@ -4565,20 +4392,17 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
  dbentry->stat_reset_timestamp = GetCurrentTimestamp();
  dbentry->stats_timestamp = 0;
 
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatTabEntry);
- dbentry->tables = hash_create("Per-database table",
-  PGSTAT_TAB_HASH_SIZE,
-  &hash_ctl,
-  HASH_ELEM | HASH_BLOBS);
 
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry);
- dbentry->functions = hash_create("Per-database function",
- PGSTAT_FUNCTION_HASH_SIZE,
- &hash_ctl,
- HASH_ELEM | HASH_BLOBS);
+ Assert(dbentry->tables == DSM_HANDLE_INVALID);
+ tbl = dshash_create(area, &dsh_tblparams, 0);
+ dbentry->tables = dshash_get_hash_table_handle(tbl);
+ dshash_detach(tbl);
+
+ Assert(dbentry->functions == DSM_HANDLE_INVALID);
+ /* we create function hash as needed */
+
+ dbentry->snapshot_tables = NULL;
+ dbentry->snapshot_functions = NULL;
 }
 
 /*
@@ -4587,47 +4411,76 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
  * Else, return NULL.
  */
 static PgStat_StatDBEntry *
-pgstat_get_db_entry(Oid databaseid, bool create)
+pgstat_get_db_entry(Oid databaseid, int op, pg_stat_table_result_status *status)
 {
  PgStat_StatDBEntry *result;
- bool found;
- HASHACTION action = (create ? HASH_ENTER : HASH_FIND);
+ bool nowait = ((op & PGSTAT_FETCH_NOWAIT) != 0);
+ bool lock_acquired = true;
+ bool found = true;
 
- /* Lookup or create the hash table entry for this database */
- result = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- &databaseid,
- action, &found);
-
- if (!create && !found)
+ if (!IsUnderPostmaster)
  return NULL;
 
- /*
- * If not found, initialize the new one.  This creates empty hash tables
- * for tables and functions, too.
- */
- if (!found)
- reset_dbentry_counters(result);
+ /* Lookup or create the hash table entry for this database */
+ if (op & PGSTAT_FETCH_EXCLUSIVE)
+ {
+ result = (PgStat_StatDBEntry *)
+ dshash_find_or_insert_extended(db_stats, &databaseid,
+   &found, nowait);
+ if (result == NULL)
+ lock_acquired = false;
+ else if (!found)
+ {
+ /*
+ * If not found, initialize the new one.  This creates empty hash
+ * tables for tables and functions, too.
+ */
+ reset_dbentry_counters(result);
+ }
+ }
+ else
+ {
+ result = (PgStat_StatDBEntry *)
+ dshash_find_extended(db_stats, &databaseid, true, nowait,
+ &lock_acquired);
+ if (result == NULL)
+ found = false;
+ }
+
+ /* Set return status if requested */
+ if (status)
+ {
+ if (!lock_acquired)
+ {
+ Assert(nowait);
+ *status = PGSTAT_ENTRY_LOCK_FAILED;
+ }
+ else if (!found)
+ *status = PGSTAT_ENTRY_NOT_FOUND;
+ else
+ *status = PGSTAT_ENTRY_FOUND;
+ }
 
  return result;
 }
 
-
 /*
  * Lookup the hash table entry for the specified table. If no hash
  * table entry exists, initialize it, if the create parameter is true.
  * Else, return NULL.
  */
 static PgStat_StatTabEntry *
-pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
+pgstat_get_tab_entry(dshash_table *table, Oid tableoid, bool create)
 {
  PgStat_StatTabEntry *result;
  bool found;
- HASHACTION action = (create ? HASH_ENTER : HASH_FIND);
 
  /* Lookup or create the hash table entry for this table */
- result = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
- &tableoid,
- action, &found);
+ if (create)
+ result = (PgStat_StatTabEntry *)
+ dshash_find_or_insert(table, &tableoid, &found);
+ else
+ result = (PgStat_StatTabEntry *) dshash_find(table, &tableoid, false);
 
  if (!create && !found)
  return NULL;
@@ -4663,29 +4516,23 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
 
 /* ----------
  * pgstat_write_statsfiles() -
- * Write the global statistics file, as well as requested DB files.
- *
- * 'permanent' specifies writing to the permanent files not temporary ones.
- * When true (happens only when the collector is shutting down), also remove
- * the temporary files so that backends starting up under a new postmaster
- * can't read old data before the new collector is ready.
- *
- * When 'allDbs' is false, only the requested databases (listed in
- * pending_write_requests) will be written; otherwise, all databases
- * will be written.
+ * Write the global statistics file, as well as DB files.
  * ----------
  */
-static void
-pgstat_write_statsfiles(bool permanent, bool allDbs)
+void
+pgstat_write_statsfiles(void)
 {
- HASH_SEQ_STATUS hstat;
+ dshash_seq_status hstat;
  PgStat_StatDBEntry *dbentry;
  FILE   *fpout;
  int32 format_id;
- const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
- const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+ const char *tmpfile = PGSTAT_STAT_PERMANENT_TMPFILE;
+ const char *statfile = PGSTAT_STAT_PERMANENT_FILENAME;
  int rc;
 
+ /* should be called from postmaster  */
+ Assert(!IsUnderPostmaster);
+
  elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
  /*
@@ -4704,7 +4551,7 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
  /*
  * Set the timestamp of the stats file.
  */
- globalStats.stats_timestamp = GetCurrentTimestamp();
+ shared_globalStats->stats_timestamp = GetCurrentTimestamp();
 
  /*
  * Write the file header --- currently just a format ID.
@@ -4716,32 +4563,29 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
  /*
  * Write global stats struct
  */
- rc = fwrite(&globalStats, sizeof(globalStats), 1, fpout);
+ rc = fwrite(shared_globalStats, sizeof(*shared_globalStats), 1, fpout);
  (void) rc; /* we'll check for error with ferror */
 
  /*
  * Write archiver stats struct
  */
- rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
+ rc = fwrite(shared_archiverStats, sizeof(*shared_archiverStats), 1, fpout);
  (void) rc; /* we'll check for error with ferror */
 
  /*
  * Walk through the database table.
  */
- hash_seq_init(&hstat, pgStatDBHash);
- while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
+ dshash_seq_init(&hstat, db_stats, false, false);
+ while ((dbentry = (PgStat_StatDBEntry *) dshash_seq_next(&hstat)) != NULL)
  {
  /*
  * Write out the table and function stats for this DB into the
  * appropriate per-DB stat file, if required.
  */
- if (allDbs || pgstat_db_requested(dbentry->databaseid))
- {
- /* Make DB's timestamp consistent with the global stats */
- dbentry->stats_timestamp = globalStats.stats_timestamp;
+ /* Make DB's timestamp consistent with the global stats */
+ dbentry->stats_timestamp = shared_globalStats->stats_timestamp;
 
- pgstat_write_db_statsfile(dbentry, permanent);
- }
+ pgstat_write_db_statsfile(dbentry);
 
  /*
  * Write out the DB entry. We don't write the tables or functions
@@ -4784,16 +4628,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
  tmpfile, statfile)));
  unlink(tmpfile);
  }
-
- if (permanent)
- unlink(pgstat_stat_filename);
-
- /*
- * Now throw away the list of requests.  Note that requests sent after we
- * started the write are still waiting on the network socket.
- */
- list_free(pending_write_requests);
- pending_write_requests = NIL;
 }
 
 /*
@@ -4801,15 +4635,14 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
  * of length len.
  */
 static void
-get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
+get_dbstat_filename(bool tempname, Oid databaseid,
  char *filename, int len)
 {
  int printed;
 
  /* NB -- pgstat_reset_remove_files knows about the pattern this uses */
  printed = snprintf(filename, len, "%s/db_%u.%s",
-   permanent ? PGSTAT_STAT_PERMANENT_DIRECTORY :
-   pgstat_stat_directory,
+   PGSTAT_STAT_PERMANENT_DIRECTORY,
    databaseid,
    tempname ? "tmp" : "stat");
  if (printed >= len)
@@ -4827,10 +4660,10 @@ get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
  * ----------
  */
 static void
-pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent)
+pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry)
 {
- HASH_SEQ_STATUS tstat;
- HASH_SEQ_STATUS fstat;
+ dshash_seq_status tstat;
+ dshash_seq_status fstat;
  PgStat_StatTabEntry *tabentry;
  PgStat_StatFuncEntry *funcentry;
  FILE   *fpout;
@@ -4839,9 +4672,10 @@ pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent)
  int rc;
  char tmpfile[MAXPGPATH];
  char statfile[MAXPGPATH];
+ dshash_table *tbl;
 
- get_dbstat_filename(permanent, true, dbid, tmpfile, MAXPGPATH);
- get_dbstat_filename(permanent, false, dbid, statfile, MAXPGPATH);
+ get_dbstat_filename(true, dbid, tmpfile, MAXPGPATH);
+ get_dbstat_filename(false, dbid, statfile, MAXPGPATH);
 
  elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -4868,23 +4702,30 @@ pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent)
  /*
  * Walk through the database's access stats per table.
  */
- hash_seq_init(&tstat, dbentry->tables);
- while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+ tbl = dshash_attach(area, &dsh_tblparams, dbentry->tables, 0);
+ dshash_seq_init(&tstat, tbl, false, false);
+ while ((tabentry = (PgStat_StatTabEntry *) dshash_seq_next(&tstat)) != NULL)
  {
  fputc('T', fpout);
  rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
  (void) rc; /* we'll check for error with ferror */
  }
+ dshash_detach(tbl);
 
  /*
  * Walk through the database's function stats table.
  */
- hash_seq_init(&fstat, dbentry->functions);
- while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+ if (dbentry->functions != DSM_HANDLE_INVALID)
  {
- fputc('F', fpout);
- rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
+ tbl = dshash_attach(area, &dsh_funcparams, dbentry->functions, 0);
+ dshash_seq_init(&fstat, tbl, false, false);
+ while ((funcentry = (PgStat_StatFuncEntry *) dshash_seq_next(&fstat)) != NULL)
+ {
+ fputc('F', fpout);
+ rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+ (void) rc; /* we'll check for error with ferror */
+ }
+ dshash_detach(tbl);
  }
 
  /*
@@ -4919,47 +4760,30 @@ pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent)
  tmpfile, statfile)));
  unlink(tmpfile);
  }
-
- if (permanent)
- {
- get_dbstat_filename(false, false, dbid, statfile, MAXPGPATH);
-
- elog(DEBUG2, "removing temporary stats file \"%s\"", statfile);
- unlink(statfile);
- }
 }
 
 /* ----------
  * pgstat_read_statsfiles() -
  *
- * Reads in some existing statistics collector files and returns the
- * databases hash table that is the top level of the data.
+ * Reads in some existing statistics collector files into the shared stats
+ * hash.
  *
- * If 'onlydb' is not InvalidOid, it means we only want data for that DB
- * plus the shared catalogs ("DB 0").  We'll still populate the DB hash
- * table for all databases, but we don't bother even creating table/function
- * hash tables for other databases.
- *
- * 'permanent' specifies reading from the permanent files not temporary ones.
- * When true (happens only when the collector is starting up), remove the
- * files after reading; the in-memory status is now authoritative, and the
- * files would be out of date in case somebody else reads them.
- *
- * If a 'deep' read is requested, table/function stats are read, otherwise
- * the table/function hash tables remain empty.
  * ----------
  */
-static HTAB *
-pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
+void
+pgstat_read_statsfiles(void)
 {
  PgStat_StatDBEntry *dbentry;
  PgStat_StatDBEntry dbbuf;
- HASHCTL hash_ctl;
- HTAB   *dbhash;
  FILE   *fpin;
  int32 format_id;
  bool found;
- const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+ const char *statfile = PGSTAT_STAT_PERMANENT_FILENAME;
+ dshash_table *tblstats = NULL;
+ dshash_table *funcstats = NULL;
+
+ /* should be called from postmaster  */
+ Assert(!IsUnderPostmaster);
 
  /*
  * The tables will live in pgStatLocalContext.
@@ -4967,28 +4791,18 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  pgstat_setup_memcxt();
 
  /*
- * Create the DB hashtable
+ * Create the DB hashtable and global stas area
  */
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatDBEntry);
- hash_ctl.hcxt = pgStatLocalContext;
- dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
- /*
- * Clear out global and archiver statistics so they start from zero in
- * case we can't load an existing statsfile.
- */
- memset(&globalStats, 0, sizeof(globalStats));
- memset(&archiverStats, 0, sizeof(archiverStats));
+ /* Hold lock so that no other process looks empty stats */
+ LWLockAcquire(StatsLock, LW_EXCLUSIVE);
+ pgstat_create_shared_stats();
 
  /*
  * Set the current timestamp (will be kept only in case we can't load an
  * existing statsfile).
  */
- globalStats.stat_reset_timestamp = GetCurrentTimestamp();
- archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+ shared_globalStats->stat_reset_timestamp = GetCurrentTimestamp();
+ shared_archiverStats->stat_reset_timestamp = shared_globalStats->stat_reset_timestamp;
 
  /*
  * Try to open the stats file. If it doesn't exist, the backends simply
@@ -5002,11 +4816,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
  {
  if (errno != ENOENT)
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errcode_for_file_access(),
  errmsg("could not open statistics file \"%s\": %m",
  statfile)));
- return dbhash;
+ LWLockRelease(StatsLock);
+ return;
  }
 
  /*
@@ -5015,7 +4830,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
  format_id != PGSTAT_FILE_FORMAT_ID)
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"", statfile)));
  goto done;
  }
@@ -5023,11 +4838,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  /*
  * Read global stats struct
  */
- if (fread(&globalStats, 1, sizeof(globalStats), fpin) != sizeof(globalStats))
+ if (fread(shared_globalStats, 1, sizeof(*shared_globalStats), fpin) !=
+ sizeof(*shared_globalStats))
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"", statfile)));
- memset(&globalStats, 0, sizeof(globalStats));
+ memset(shared_globalStats, 0, sizeof(*shared_globalStats));
  goto done;
  }
 
@@ -5038,17 +4854,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  * file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not
  * an unusual scenario.
  */
- if (pgStatRunningInCollector)
- globalStats.stats_timestamp = 0;
+ shared_globalStats->stats_timestamp = 0;
 
  /*
  * Read archiver stats struct
  */
- if (fread(&archiverStats, 1, sizeof(archiverStats), fpin) != sizeof(archiverStats))
+ if (fread(shared_archiverStats, 1, sizeof(*shared_archiverStats), fpin) !=
+ sizeof(*shared_archiverStats))
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"", statfile)));
- memset(&archiverStats, 0, sizeof(archiverStats));
+ memset(shared_archiverStats, 0, sizeof(*shared_archiverStats));
  goto done;
  }
 
@@ -5068,7 +4884,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
   fpin) != offsetof(PgStat_StatDBEntry, tables))
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
@@ -5077,21 +4893,23 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  /*
  * Add to the DB hash
  */
- dbentry = (PgStat_StatDBEntry *) hash_search(dbhash,
- (void *) &dbbuf.databaseid,
- HASH_ENTER,
- &found);
+ dbentry = (PgStat_StatDBEntry *)
+ dshash_find_or_insert(db_stats, (void *) &dbbuf.databaseid,
+  &found);
  if (found)
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ dshash_release_lock(db_stats, dbentry);
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
  }
 
  memcpy(dbentry, &dbbuf, sizeof(PgStat_StatDBEntry));
- dbentry->tables = NULL;
- dbentry->functions = NULL;
+ dbentry->tables = DSM_HANDLE_INVALID;
+ dbentry->functions = DSM_HANDLE_INVALID;
+ dbentry->snapshot_tables = NULL;
+ dbentry->snapshot_functions = NULL;
 
  /*
  * In the collector, disregard the timestamp we read from the
@@ -5099,54 +4917,26 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  * stats file immediately upon the first request from any
  * backend.
  */
- if (pgStatRunningInCollector)
- dbentry->stats_timestamp = 0;
-
- /*
- * Don't create tables/functions hashtables for uninteresting
- * databases.
- */
- if (onlydb != InvalidOid)
- {
- if (dbbuf.databaseid != onlydb &&
- dbbuf.databaseid != InvalidOid)
- break;
- }
-
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatTabEntry);
- hash_ctl.hcxt = pgStatLocalContext;
- dbentry->tables = hash_create("Per-database table",
-  PGSTAT_TAB_HASH_SIZE,
-  &hash_ctl,
-  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry);
- hash_ctl.hcxt = pgStatLocalContext;
- dbentry->functions = hash_create("Per-database function",
- PGSTAT_FUNCTION_HASH_SIZE,
- &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ dbentry->stats_timestamp = 0;
 
  /*
  * If requested, read the data from the database-specific
  * file.  Otherwise we just leave the hashtables empty.
  */
- if (deep)
- pgstat_read_db_statsfile(dbentry->databaseid,
- dbentry->tables,
- dbentry->functions,
- permanent);
-
+ tblstats = dshash_create(area, &dsh_tblparams, 0);
+ dbentry->tables = dshash_get_hash_table_handle(tblstats);
+ /* we don't create function hash at the present */
+ dshash_release_lock(db_stats, dbentry);
+ pgstat_read_db_statsfile(dbentry->databaseid,
+ tblstats, funcstats);
+ dshash_detach(tblstats);
  break;
 
  case 'E':
  goto done;
 
  default:
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
@@ -5154,36 +4944,62 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  }
 
 done:
+ LWLockRelease(StatsLock);
  FreeFile(fpin);
 
- /* If requested to read the permanent file, also get rid of it. */
- if (permanent)
- {
- elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
- unlink(statfile);
- }
+ elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
+ unlink(statfile);
 
- return dbhash;
+ return;
 }
 
 
+Size
+StatsShmemSize(void)
+{
+ return sizeof(StatsShmemStruct);
+}
+
+void
+StatsShmemInit(void)
+{
+ bool found;
+
+ StatsShmem = (StatsShmemStruct *)
+ ShmemInitStruct("Stats area", StatsShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ StatsShmem->stats_dsa_handle = DSM_HANDLE_INVALID;
+
+ /* Load saved data if any */
+ pgstat_read_statsfiles();
+
+ /* need to be called before dsm shutodwn */
+ before_shmem_exit(pgstat_postmaster_shutdown, (Datum) 0);
+ }
+}
+
+static void
+pgstat_postmaster_shutdown(int code, Datum arg)
+{
+ /* we trash the stats on crash */
+ if (code == 0)
+ pgstat_write_statsfiles();
+}
+
 /* ----------
  * pgstat_read_db_statsfile() -
  *
- * Reads in the existing statistics collector file for the given database,
- * filling the passed-in tables and functions hash tables.
- *
- * As in pgstat_read_statsfiles, if the permanent file is requested, it is
- * removed after reading.
- *
- * Note: this code has the ability to skip storing per-table or per-function
- * data, if NULL is passed for the corresponding hashtable.  That's not used
- * at the moment though.
+ * Reads in the permanent statistics collector file and create shared
+ * statistics tables. The file is removed afer reading.
  * ----------
  */
 static void
-pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
- bool permanent)
+pgstat_read_db_statsfile(Oid databaseid,
+ dshash_table *tabhash, dshash_table *funchash)
 {
  PgStat_StatTabEntry *tabentry;
  PgStat_StatTabEntry tabbuf;
@@ -5194,7 +5010,10 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  bool found;
  char statfile[MAXPGPATH];
 
- get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
+ /* should be called from postmaster  */
+ Assert(!IsUnderPostmaster);
+
+ get_dbstat_filename(false, databaseid, statfile, MAXPGPATH);
 
  /*
  * Try to open the stats file. If it doesn't exist, the backends simply
@@ -5208,7 +5027,7 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
  {
  if (errno != ENOENT)
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errcode_for_file_access(),
  errmsg("could not open statistics file \"%s\": %m",
  statfile)));
@@ -5221,7 +5040,7 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
  format_id != PGSTAT_FILE_FORMAT_ID)
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"", statfile)));
  goto done;
  }
@@ -5241,7 +5060,7 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  if (fread(&tabbuf, 1, sizeof(PgStat_StatTabEntry),
   fpin) != sizeof(PgStat_StatTabEntry))
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
@@ -5253,19 +5072,21 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  if (tabhash == NULL)
  break;
 
- tabentry = (PgStat_StatTabEntry *) hash_search(tabhash,
-   (void *) &tabbuf.tableid,
-   HASH_ENTER, &found);
+ tabentry = (PgStat_StatTabEntry *)
+ dshash_find_or_insert(tabhash,
+  (void *) &tabbuf.tableid, &found);
 
  if (found)
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ dshash_release_lock(tabhash, tabentry);
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
  }
 
  memcpy(tabentry, &tabbuf, sizeof(tabbuf));
+ dshash_release_lock(tabhash, tabentry);
  break;
 
  /*
@@ -5275,7 +5096,7 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  if (fread(&funcbuf, 1, sizeof(PgStat_StatFuncEntry),
   fpin) != sizeof(PgStat_StatFuncEntry))
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
@@ -5287,19 +5108,20 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  if (funchash == NULL)
  break;
 
- funcentry = (PgStat_StatFuncEntry *) hash_search(funchash,
- (void *) &funcbuf.functionid,
- HASH_ENTER, &found);
+ funcentry = (PgStat_StatFuncEntry *)
+ dshash_find_or_insert(funchash,
+  (void *) &funcbuf.functionid, &found);
 
  if (found)
  {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
  }
 
  memcpy(funcentry, &funcbuf, sizeof(funcbuf));
+ dshash_release_lock(funchash, funcentry);
  break;
 
  /*
@@ -5309,7 +5131,7 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
  goto done;
 
  default:
- ereport(pgStatRunningInCollector ? LOG : WARNING,
+ ereport(LOG,
  (errmsg("corrupted statistics file \"%s\"",
  statfile)));
  goto done;
@@ -5319,276 +5141,290 @@ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
 done:
  FreeFile(fpin);
 
- if (permanent)
- {
- elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
- unlink(statfile);
- }
+ elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
+ unlink(statfile);
 }
 
 /* ----------
- * pgstat_read_db_statsfile_timestamp() -
+ * backend_clean_snapshot_callback() -
  *
- * Attempt to determine the timestamp of the last db statfile write.
- * Returns true if successful; the timestamp is stored in *ts.
- *
- * This needs to be careful about handling databases for which no stats file
- * exists, such as databases without a stat entry or those not yet written:
- *
- * - if there's a database entry in the global file, return the corresponding
- * stats_timestamp value.
- *
- * - if there's no db stat entry (e.g. for a new or inactive database),
- * there's no stats_timestamp value, but also nothing to write so we return
- * the timestamp of the global statfile.
+ * This is usually called with arg = NULL when the memory context where the
+ *  current snapshot has been taken. Don't bother releasing memory in the
+ *  case.
  * ----------
  */
-static bool
-pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
-   TimestampTz *ts)
+static void
+backend_clean_snapshot_callback(void *arg)
 {
- PgStat_StatDBEntry dbentry;
- PgStat_GlobalStats myGlobalStats;
- PgStat_ArchiverStats myArchiverStats;
- FILE   *fpin;
- int32 format_id;
- const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
-
- /*
- * Try to open the stats file.  As above, anything but ENOENT is worthy of
- * complaining about.
- */
- if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+ if (arg != NULL)
  {
- if (errno != ENOENT)
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errcode_for_file_access(),
- errmsg("could not open statistics file \"%s\": %m",
- statfile)));
- return false;
- }
+ /* explicitly called, so explicitly free resources */
+ if (snapshot_globalStats)
+ pfree(snapshot_globalStats);
 
- /*
- * Verify it's of the expected format.
- */
- if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
- format_id != PGSTAT_FILE_FORMAT_ID)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- FreeFile(fpin);
- return false;
- }
+ if (snapshot_archiverStats)
+ pfree(snapshot_archiverStats);
 
- /*
- * Read global stats struct
- */
- if (fread(&myGlobalStats, 1, sizeof(myGlobalStats),
-  fpin) != sizeof(myGlobalStats))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- FreeFile(fpin);
- return false;
- }
-
- /*
- * Read archiver stats struct
- */
- if (fread(&myArchiverStats, 1, sizeof(myArchiverStats),
-  fpin) != sizeof(myArchiverStats))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- FreeFile(fpin);
- return false;
- }
-
- /* By default, we're going to return the timestamp of the global file. */
- *ts = myGlobalStats.stats_timestamp;
-
- /*
- * We found an existing collector stats file.  Read it and look for a
- * record for the requested database.  If found, use its timestamp.
- */
- for (;;)
- {
- switch (fgetc(fpin))
+ if (snapshot_db_stats)
  {
- /*
- * 'D' A PgStat_StatDBEntry struct describing a database
- * follows.
- */
- case 'D':
- if (fread(&dbentry, 1, offsetof(PgStat_StatDBEntry, tables),
-  fpin) != offsetof(PgStat_StatDBEntry, tables))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
+ HASH_SEQ_STATUS seq;
+ PgStat_StatDBEntry *dbent;
 
- /*
- * If this is the DB we're looking for, save its timestamp and
- * we're done.
- */
- if (dbentry.databaseid == databaseid)
- {
- *ts = dbentry.stats_timestamp;
- goto done;
- }
-
- break;
-
- case 'E':
- goto done;
-
- default:
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
+ hash_seq_init(&seq, snapshot_db_stats);
+ while ((dbent = hash_seq_search(&seq)) != NULL)
+ {
+ if (dbent->snapshot_tables)
+ hash_destroy(dbent->snapshot_tables);
+ if (dbent->snapshot_functions)
+ hash_destroy(dbent->snapshot_functions);
+ }
+ hash_destroy(snapshot_db_stats);
  }
  }
 
-done:
- FreeFile(fpin);
- return true;
+ /* mark as the resource are not allocated */
+ snapshot_globalStats = NULL;
+ snapshot_archiverStats = NULL;
+ snapshot_db_stats = NULL;
 }
 
 /*
- * If not already done, read the statistics collector stats file into
- * some hash tables.  The results will be kept until pgstat_clear_snapshot()
- * is called (typically, at end of transaction).
+ * create_local_stats_hash() -
+ *
+ * Creates a dynahash used for table/function stats cache.
  */
-static void
-backend_read_statsfile(void)
+static HTAB *
+create_local_stats_hash(const char *name, size_t keysize, size_t entrysize,
+ int nentries)
 {
- TimestampTz min_ts = 0;
- TimestampTz ref_ts = 0;
- Oid inquiry_db;
- int count;
+ HTAB *result;
+ HASHCTL ctl;
 
- /* already read it? */
- if (pgStatDBHash)
- return;
- Assert(!pgStatRunningInCollector);
-
- /*
- * In a normal backend, we check staleness of the data for our own DB, and
- * so we send MyDatabaseId in inquiry messages.  In the autovac launcher,
- * check staleness of the shared-catalog data, and send InvalidOid in
- * inquiry messages so as not to force writing unnecessary data.
- */
- if (IsAutoVacuumLauncherProcess())
- inquiry_db = InvalidOid;
- else
- inquiry_db = MyDatabaseId;
-
- /*
- * Loop until fresh enough stats file is available or we ran out of time.
- * The stats inquiry message is sent repeatedly in case collector drops
- * it; but not every single time, as that just swamps the collector.
- */
- for (count = 0; count < PGSTAT_POLL_LOOP_COUNT; count++)
- {
- bool ok;
- TimestampTz file_ts = 0;
- TimestampTz cur_ts;
-
- CHECK_FOR_INTERRUPTS();
-
- ok = pgstat_read_db_statsfile_timestamp(inquiry_db, false, &file_ts);
-
- cur_ts = GetCurrentTimestamp();
- /* Calculate min acceptable timestamp, if we didn't already */
- if (count == 0 || cur_ts < ref_ts)
- {
- /*
- * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL
- * msec before now.  This indirectly ensures that the collector
- * needn't write the file more often than PGSTAT_STAT_INTERVAL. In
- * an autovacuum worker, however, we want a lower delay to avoid
- * using stale data, so we use PGSTAT_RETRY_DELAY (since the
- * number of workers is low, this shouldn't be a problem).
- *
- * We don't recompute min_ts after sleeping, except in the
- * unlikely case that cur_ts went backwards.  So we might end up
- * accepting a file a bit older than PGSTAT_STAT_INTERVAL.  In
- * practice that shouldn't happen, though, as long as the sleep
- * time is less than PGSTAT_STAT_INTERVAL; and we don't want to
- * tell the collector that our cutoff time is less than what we'd
- * actually accept.
- */
- ref_ts = cur_ts;
- if (IsAutoVacuumWorkerProcess())
- min_ts = TimestampTzPlusMilliseconds(ref_ts,
- -PGSTAT_RETRY_DELAY);
- else
- min_ts = TimestampTzPlusMilliseconds(ref_ts,
- -PGSTAT_STAT_INTERVAL);
- }
-
- /*
- * If the file timestamp is actually newer than cur_ts, we must have
- * had a clock glitch (system time went backwards) or there is clock
- * skew between our processor and the stats collector's processor.
- * Accept the file, but send an inquiry message anyway to make
- * pgstat_recv_inquiry do a sanity check on the collector's time.
- */
- if (ok && file_ts > cur_ts)
- {
- /*
- * A small amount of clock skew between processors isn't terribly
- * surprising, but a large difference is worth logging.  We
- * arbitrarily define "large" as 1000 msec.
- */
- if (file_ts >= TimestampTzPlusMilliseconds(cur_ts, 1000))
- {
- char   *filetime;
- char   *mytime;
-
- /* Copy because timestamptz_to_str returns a static buffer */
- filetime = pstrdup(timestamptz_to_str(file_ts));
- mytime = pstrdup(timestamptz_to_str(cur_ts));
- elog(LOG, "stats collector's time %s is later than backend local time %s",
- filetime, mytime);
- pfree(filetime);
- pfree(mytime);
- }
-
- pgstat_send_inquiry(cur_ts, min_ts, inquiry_db);
- break;
- }
-
- /* Normal acceptance case: file is not older than cutoff time */
- if (ok && file_ts >= min_ts)
- break;
-
- /* Not there or too old, so kick the collector and wait a bit */
- if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
- pgstat_send_inquiry(cur_ts, min_ts, inquiry_db);
-
- pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
- }
-
- if (count >= PGSTAT_POLL_LOOP_COUNT)
- ereport(LOG,
- (errmsg("using stale statistics instead of current ones "
- "because stats collector is not responding")));
-
- /*
- * Autovacuum launcher wants stats about all databases, but a shallow read
- * is sufficient.  Regular backends want a deep read for just the tables
- * they can see (MyDatabaseId + shared catalogs).
- */
- if (IsAutoVacuumLauncherProcess())
- pgStatDBHash = pgstat_read_statsfiles(InvalidOid, false, false);
- else
- pgStatDBHash = pgstat_read_statsfiles(MyDatabaseId, false, true);
+ /* Create the hash in the stats context */
+ ctl.keysize = keysize;
+ ctl.entrysize = entrysize;
+ ctl.hcxt = stats_cxt;
+ result = hash_create(name, nentries, &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ return result;
 }
 
+/*
+ * snapshot_statentry() - Find an entriy from source dshash.
+ *
+ * Returns the entry for key or NULL if not found. If dest is not null, uses
+ * *dest as local cache, which is created in the same shape with the given
+ * dshash when *dest is NULL. In both cases the result is cached in the hash
+ * and the same entry is returned to subsequent calls for the same key.
+ *
+ * Otherwise returned entry is a copy that is palloc'ed in the current memory
+ * context. Its content may differ for every request.
+ *
+ * If dshash is NULL, temporaralily attaches dsh_handle instead.
+ */
+static void *
+snapshot_statentry(HTAB **dest, const char *hashname,
+   dshash_table *dshash, dshash_table_handle dsh_handle,
+   const dshash_parameters *dsh_params, Oid key)
+{
+ void *lentry = NULL;
+ size_t keysize = dsh_params->key_size;
+ size_t entrysize = dsh_params->entry_size;
+
+ if (dest)
+ {
+ /* caches the result entry */
+ bool found;
+
+ /*
+ * Create new hash with arbitrary initial entries since we don't know
+ * how this hash will grow.
+ */
+ if (!*dest)
+ {
+ Assert(hashname);
+ *dest = create_local_stats_hash(hashname, keysize, entrysize, 32);
+ }
+
+ lentry = hash_search(*dest, &key, HASH_ENTER, &found);
+ if (!found)
+ {
+ dshash_table *t = dshash;
+ void *sentry;
+
+ if (!t)
+ t = dshash_attach(area, dsh_params, dsh_handle, NULL);
+
+ sentry = dshash_find(t, &key, false);
+
+ /*
+ * We expect that the stats for specified database exists in most
+ * cases.
+ */
+ if (!sentry)
+ {
+ hash_search(*dest, &key, HASH_REMOVE, NULL);
+ if (!dshash)
+ dshash_detach(t);
+ return NULL;
+ }
+ memcpy(lentry, sentry, entrysize);
+ dshash_release_lock(t, sentry);
+
+ if (!dshash)
+ dshash_detach(t);
+ }
+ }
+ else
+ {
+ /*
+ * The caller don't want caching. Just make a copy of the entry then
+ * return.
+ */
+ dshash_table *t = dshash;
+ void *sentry;
+
+ if (!t)
+ t = dshash_attach(area, dsh_params, dsh_handle, NULL);
+
+ sentry = dshash_find(t, &key, false);
+ if (sentry)
+ {
+ lentry = palloc(entrysize);
+ memcpy(lentry, sentry, entrysize);
+ dshash_release_lock(t, sentry);
+ }
+
+ if (!dshash)
+ dshash_detach(t);
+ }
+
+ return lentry;
+}
+
+/*
+ * backend_snapshot_global_stats() -
+ *
+ * Makes a local copy of global stats if not already done.  They will be kept
+ * until pgstat_clear_snapshot() is called or the end of the current memory
+ * context (typically TopTransactionContext).  Returns false if the shared
+ * stats is not created yet.
+ */
+static bool
+backend_snapshot_global_stats(void)
+{
+ MemoryContext oldcontext = CurrentMemoryContext;
+ MemoryContextCallback *mcxt_cb;
+
+ /* Nothing to do if already done */
+ if (snapshot_globalStats)
+ return true;
+
+ Assert(snapshot_archiverStats == NULL);
+
+ /*
+ * The snapshot lives within the current top transaction if any, or the
+ * current memory context liftime otherwise.
+ */
+ if (IsTransactionState())
+ oldcontext = MemoryContextSwitchTo(TopTransactionContext);
+
+ /* Remember for stats memory allocation later */
+ stats_cxt = CurrentMemoryContext;
+
+ /* global stats can be just copied  */
+ snapshot_globalStats = palloc(sizeof(PgStat_GlobalStats));
+ memcpy(snapshot_globalStats, shared_globalStats,
+   sizeof(PgStat_GlobalStats));
+
+ snapshot_archiverStats = palloc(sizeof(PgStat_ArchiverStats));
+ memcpy(snapshot_archiverStats, shared_archiverStats,
+   sizeof(PgStat_ArchiverStats));
+
+ /* set the timestamp of this snapshot */
+ snapshot_globalStats->stats_timestamp = GetCurrentTimestamp();
+
+ /* register callback to clear snapshot */
+ mcxt_cb = (MemoryContextCallback *)palloc(sizeof(MemoryContextCallback));
+ mcxt_cb->func = backend_clean_snapshot_callback;
+ mcxt_cb->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext, mcxt_cb);
+ MemoryContextSwitchTo(oldcontext);
+
+ return true;
+}
+
+/* ----------
+ * pgstat_fetch_stat_dbentry() -
+ *
+ * Find database stats entry on backends. The returned entries are cached
+ * until transaction end. If onshot is true, they are not cached and returned
+ * in a palloc'ed memory.
+ */
+PgStat_StatDBEntry *
+pgstat_fetch_stat_dbentry(Oid dbid, bool oneshot)
+{
+ /* take a local snapshot if we don't have one */
+ char *hashname = "local database stats hash";
+ PgStat_StatDBEntry *dbentry;
+
+ /* should be called from backends  */
+ Assert(IsUnderPostmaster);
+
+ /* If not done for this transaction, take a snapshot of global stats */
+ if (!backend_snapshot_global_stats())
+ return NULL;
+
+ dbentry = snapshot_statentry(oneshot ? NULL : &snapshot_db_stats,
+ hashname, db_stats, 0, &dsh_dbparams,
+ dbid);
+
+ return dbentry;
+}
+
+/* ----------
+ * backend_get_tab_entry() -
+ *
+ * Find table stats entry on backends. The returned entries are cached until
+ * transaction end. If onshot is true, they are not cached and returned in a
+ * palloc'ed memory.
+ */
+PgStat_StatTabEntry *
+backend_get_tab_entry(PgStat_StatDBEntry *dbent, Oid reloid, bool oneshot)
+{
+ /* take a local snapshot if we don't have one */
+ char *hashname = "local table stats hash";
+
+ /* should be called from backends  */
+ Assert(IsUnderPostmaster);
+
+ return snapshot_statentry(oneshot ? NULL : &dbent->snapshot_tables,
+  hashname, NULL, dbent->tables, &dsh_tblparams,
+  reloid);
+}
+
+/* ----------
+ * backend_get_func_entry() -
+ *
+ * Find function stats entry on backends. The returned entries are cached
+ * until transaction end. If onshot is true, they are not cached and returned
+ * in a palloc'ed memory.
+ */
+static PgStat_StatFuncEntry *
+backend_get_func_etnry(PgStat_StatDBEntry *dbent, Oid funcid, bool oneshot)
+{
+ char *hashname = "local table stats hash";
+
+ /* should be called from backends  */
+ Assert(IsUnderPostmaster);
+
+ if (dbent->functions == DSM_HANDLE_INVALID)
+ return NULL;
+
+ return snapshot_statentry(oneshot ? NULL : &dbent->snapshot_tables,
+  hashname, NULL, dbent->functions, &dsh_funcparams,
+  funcid);
+}
 
 /* ----------
  * pgstat_setup_memcxt() -
@@ -5619,6 +5455,8 @@ pgstat_setup_memcxt(void)
 void
 pgstat_clear_snapshot(void)
 {
+ int param = 0; /* only the address is significant */
+
  /* Release memory, if any was allocated */
  if (pgStatLocalContext)
  MemoryContextDelete(pgStatLocalContext);
@@ -5628,717 +5466,112 @@ pgstat_clear_snapshot(void)
  pgStatDBHash = NULL;
  localBackendStatusTable = NULL;
  localNumBackends = 0;
+
+ /*
+ * the parameter inform the function that it is not called from
+ * MemoryContextCallback
+ */
+ backend_clean_snapshot_callback(&param);
 }
 
 
-/* ----------
- * pgstat_recv_inquiry() -
- *
- * Process stat inquiry requests.
- * ----------
- */
-static void
-pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
+static bool
+pgstat_update_tabentry(dshash_table *tabhash, PgStat_TableStatus *stat,
+   bool nowait)
 {
- PgStat_StatDBEntry *dbentry;
+ PgStat_StatTabEntry *tabentry;
+ bool found;
 
- elog(DEBUG2, "received inquiry for database %u", msg->databaseid);
+ if (tabhash == NULL)
+ return false;
 
- /*
- * If there's already a write request for this DB, there's nothing to do.
- *
- * Note that if a request is found, we return early and skip the below
- * check for clock skew.  This is okay, since the only way for a DB
- * request to be present in the list is that we have been here since the
- * last write round.  It seems sufficient to check for clock skew once per
- * write round.
- */
- if (list_member_oid(pending_write_requests, msg->databaseid))
- return;
+ tabentry = (PgStat_StatTabEntry *)
+ dshash_find_or_insert_extended(tabhash, (void *) &(stat->t_id),
+   &found, nowait);
 
- /*
- * Check to see if we last wrote this database at a time >= the requested
- * cutoff time.  If so, this is a stale request that was generated before
- * we updated the DB file, and we don't need to do so again.
- *
- * If the requestor's local clock time is older than stats_timestamp, we
- * should suspect a clock glitch, ie system time going backwards; though
- * the more likely explanation is just delayed message receipt.  It is
- * worth expending a GetCurrentTimestamp call to be sure, since a large
- * retreat in the system clock reading could otherwise cause us to neglect
- * to update the stats file for a long time.
- */
- dbentry = pgstat_get_db_entry(msg->databaseid, false);
- if (dbentry == NULL)
+ /* failed to acquire lock */
+ if (tabentry == NULL)
+ return false;
+
+ if (!found)
  {
  /*
- * We have no data for this DB.  Enter a write request anyway so that
- * the global stats will get updated.  This is needed to prevent
- * backend_read_statsfile from waiting for data that we cannot supply,
- * in the case of a new DB that nobody has yet reported any stats for.
- * See the behavior of pgstat_read_db_statsfile_timestamp.
+ * If it's a new table entry, initialize counters to the values we
+ * just got.
  */
+ tabentry->numscans = stat->t_counts.t_numscans;
+ tabentry->tuples_returned = stat->t_counts.t_tuples_returned;
+ tabentry->tuples_fetched = stat->t_counts.t_tuples_fetched;
+ tabentry->tuples_inserted = stat->t_counts.t_tuples_inserted;
+ tabentry->tuples_updated = stat->t_counts.t_tuples_updated;
+ tabentry->tuples_deleted = stat->t_counts.t_tuples_deleted;
+ tabentry->tuples_hot_updated = stat->t_counts.t_tuples_hot_updated;
+ tabentry->n_live_tuples = stat->t_counts.t_delta_live_tuples;
+ tabentry->n_dead_tuples = stat->t_counts.t_delta_dead_tuples;
+ tabentry->changes_since_analyze = stat->t_counts.t_changed_tuples;
+ tabentry->blocks_fetched = stat->t_counts.t_blocks_fetched;
+ tabentry->blocks_hit = stat->t_counts.t_blocks_hit;
+
+ tabentry->vacuum_timestamp = 0;
+ tabentry->vacuum_count = 0;
+ tabentry->autovac_vacuum_timestamp = 0;
+ tabentry->autovac_vacuum_count = 0;
+ tabentry->analyze_timestamp = 0;
+ tabentry->analyze_count = 0;
+ tabentry->autovac_analyze_timestamp = 0;
+ tabentry->autovac_analyze_count = 0;
  }
- else if (msg->clock_time < dbentry->stats_timestamp)
+ else
  {
- TimestampTz cur_ts = GetCurrentTimestamp();
-
- if (cur_ts < dbentry->stats_timestamp)
- {
- /*
- * Sure enough, time went backwards.  Force a new stats file write
- * to get back in sync; but first, log a complaint.
- */
- char   *writetime;
- char   *mytime;
-
- /* Copy because timestamptz_to_str returns a static buffer */
- writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
- mytime = pstrdup(timestamptz_to_str(cur_ts));
- elog(LOG,
- "stats_timestamp %s is later than collector's time %s for database %u",
- writetime, mytime, dbentry->databaseid);
- pfree(writetime);
- pfree(mytime);
- }
- else
- {
- /*
- * Nope, it's just an old request.  Assuming msg's clock_time is
- * >= its cutoff_time, it must be stale, so we can ignore it.
- */
- return;
- }
- }
- else if (msg->cutoff_time <= dbentry->stats_timestamp)
- {
- /* Stale request, ignore it */
- return;
- }
-
- /*
- * We need to write this DB, so create a request.
- */
- pending_write_requests = lappend_oid(pending_write_requests,
- msg->databaseid);
-}
-
-
-/* ----------
- * pgstat_recv_tabstat() -
- *
- * Count what the backend has done.
- * ----------
- */
-static void
-pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
- int i;
- bool found;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- /*
- * Update database-wide stats.
- */
- dbentry->n_xact_commit += (PgStat_Counter) (msg->m_xact_commit);
- dbentry->n_xact_rollback += (PgStat_Counter) (msg->m_xact_rollback);
- dbentry->n_block_read_time += msg->m_block_read_time;
- dbentry->n_block_write_time += msg->m_block_write_time;
-
- /*
- * Process all table entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++)
- {
- PgStat_TableEntry *tabmsg = &(msg->m_entry[i]);
-
- tabentry = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
-   (void *) &(tabmsg->t_id),
-   HASH_ENTER, &found);
-
- if (!found)
- {
- /*
- * If it's a new table entry, initialize counters to the values we
- * just got.
- */
- tabentry->numscans = tabmsg->t_counts.t_numscans;
- tabentry->tuples_returned = tabmsg->t_counts.t_tuples_returned;
- tabentry->tuples_fetched = tabmsg->t_counts.t_tuples_fetched;
- tabentry->tuples_inserted = tabmsg->t_counts.t_tuples_inserted;
- tabentry->tuples_updated = tabmsg->t_counts.t_tuples_updated;
- tabentry->tuples_deleted = tabmsg->t_counts.t_tuples_deleted;
- tabentry->tuples_hot_updated = tabmsg->t_counts.t_tuples_hot_updated;
- tabentry->n_live_tuples = tabmsg->t_counts.t_delta_live_tuples;
- tabentry->n_dead_tuples = tabmsg->t_counts.t_delta_dead_tuples;
- tabentry->changes_since_analyze = tabmsg->t_counts.t_changed_tuples;
- tabentry->blocks_fetched = tabmsg->t_counts.t_blocks_fetched;
- tabentry->blocks_hit = tabmsg->t_counts.t_blocks_hit;
-
- tabentry->vacuum_timestamp = 0;
- tabentry->vacuum_count = 0;
- tabentry->autovac_vacuum_timestamp = 0;
- tabentry->autovac_vacuum_count = 0;
- tabentry->analyze_timestamp = 0;
- tabentry->analyze_count = 0;
- tabentry->autovac_analyze_timestamp = 0;
- tabentry->autovac_analyze_count = 0;
- }
- else
- {
- /*
- * Otherwise add the values to the existing entry.
- */
- tabentry->numscans += tabmsg->t_counts.t_numscans;
- tabentry->tuples_returned += tabmsg->t_counts.t_tuples_returned;
- tabentry->tuples_fetched += tabmsg->t_counts.t_tuples_fetched;
- tabentry->tuples_inserted += tabmsg->t_counts.t_tuples_inserted;
- tabentry->tuples_updated += tabmsg->t_counts.t_tuples_updated;
- tabentry->tuples_deleted += tabmsg->t_counts.t_tuples_deleted;
- tabentry->tuples_hot_updated += tabmsg->t_counts.t_tuples_hot_updated;
- /* If table was truncated, first reset the live/dead counters */
- if (tabmsg->t_counts.t_truncated)
- {
- tabentry->n_live_tuples = 0;
- tabentry->n_dead_tuples = 0;
- }
- tabentry->n_live_tuples += tabmsg->t_counts.t_delta_live_tuples;
- tabentry->n_dead_tuples += tabmsg->t_counts.t_delta_dead_tuples;
- tabentry->changes_since_analyze += tabmsg->t_counts.t_changed_tuples;
- tabentry->blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
- tabentry->blocks_hit += tabmsg->t_counts.t_blocks_hit;
- }
-
- /* Clamp n_live_tuples in case of negative delta_live_tuples */
- tabentry->n_live_tuples = Max(tabentry->n_live_tuples, 0);
- /* Likewise for n_dead_tuples */
- tabentry->n_dead_tuples = Max(tabentry->n_dead_tuples, 0);
-
  /*
- * Add per-table stats to the per-database entry, too.
+ * Otherwise add the values to the existing entry.
  */
- dbentry->n_tuples_returned += tabmsg->t_counts.t_tuples_returned;
- dbentry->n_tuples_fetched += tabmsg->t_counts.t_tuples_fetched;
- dbentry->n_tuples_inserted += tabmsg->t_counts.t_tuples_inserted;
- dbentry->n_tuples_updated += tabmsg->t_counts.t_tuples_updated;
- dbentry->n_tuples_deleted += tabmsg->t_counts.t_tuples_deleted;
- dbentry->n_blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
- dbentry->n_blocks_hit += tabmsg->t_counts.t_blocks_hit;
- }
-}
-
-
-/* ----------
- * pgstat_recv_tabpurge() -
- *
- * Arrange for dead table removal.
- * ----------
- */
-static void
-pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- int i;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- /*
- * No need to purge if we don't even know the database.
- */
- if (!dbentry || !dbentry->tables)
- return;
-
- /*
- * Process all table entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++)
- {
- /* Remove from hashtable if present; we don't care if it's not. */
- (void) hash_search(dbentry->tables,
-   (void *) &(msg->m_tableid[i]),
-   HASH_REMOVE, NULL);
- }
-}
-
-
-/* ----------
- * pgstat_recv_dropdb() -
- *
- * Arrange for dead database removal
- * ----------
- */
-static void
-pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
-{
- Oid dbid = msg->m_databaseid;
- PgStat_StatDBEntry *dbentry;
-
- /*
- * Lookup the database in the hashtable.
- */
- dbentry = pgstat_get_db_entry(dbid, false);
-
- /*
- * If found, remove it (along with the db statfile).
- */
- if (dbentry)
- {
- char statfile[MAXPGPATH];
-
- get_dbstat_filename(false, false, dbid, statfile, MAXPGPATH);
-
- elog(DEBUG2, "removing stats file \"%s\"", statfile);
- unlink(statfile);
-
- if (dbentry->tables != NULL)
- hash_destroy(dbentry->tables);
- if (dbentry->functions != NULL)
- hash_destroy(dbentry->functions);
-
- if (hash_search(pgStatDBHash,
- (void *) &dbid,
- HASH_REMOVE, NULL) == NULL)
- ereport(ERROR,
- (errmsg("database hash table corrupted during cleanup --- abort")));
- }
-}
-
-
-/* ----------
- * pgstat_recv_resetcounter() -
- *
- * Reset the statistics for the specified database.
- * ----------
- */
-static void
-pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- /*
- * Lookup the database in the hashtable.  Nothing to do if not there.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- if (!dbentry)
- return;
-
- /*
- * We simply throw away all the database's table entries by recreating a
- * new hash table for them.
- */
- if (dbentry->tables != NULL)
- hash_destroy(dbentry->tables);
- if (dbentry->functions != NULL)
- hash_destroy(dbentry->functions);
-
- dbentry->tables = NULL;
- dbentry->functions = NULL;
-
- /*
- * Reset database-level stats, too.  This creates empty hash tables for
- * tables and functions.
- */
- reset_dbentry_counters(dbentry);
-}
-
-/* ----------
- * pgstat_recv_resetshared() -
- *
- * Reset some shared statistics of the cluster.
- * ----------
- */
-static void
-pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
-{
- if (msg->m_resettarget == RESET_BGWRITER)
- {
- /* Reset the global background writer statistics for the cluster. */
- memset(&globalStats, 0, sizeof(globalStats));
- globalStats.stat_reset_timestamp = GetCurrentTimestamp();
- }
- else if (msg->m_resettarget == RESET_ARCHIVER)
- {
- /* Reset the archiver statistics for the cluster. */
- memset(&archiverStats, 0, sizeof(archiverStats));
- archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
- }
-
- /*
- * Presumably the sender of this message validated the target, don't
- * complain here if it's not valid
- */
-}
-
-/* ----------
- * pgstat_recv_resetsinglecounter() -
- *
- * Reset a statistics for a single object
- * ----------
- */
-static void
-pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- if (!dbentry)
- return;
-
- /* Set the reset timestamp for the whole database */
- dbentry->stat_reset_timestamp = GetCurrentTimestamp();
-
- /* Remove object if it exists, ignore it if not */
- if (msg->m_resettype == RESET_TABLE)
- (void) hash_search(dbentry->tables, (void *) &(msg->m_objectid),
-   HASH_REMOVE, NULL);
- else if (msg->m_resettype == RESET_FUNCTION)
- (void) hash_search(dbentry->functions, (void *) &(msg->m_objectid),
-   HASH_REMOVE, NULL);
-}
-
-/* ----------
- * pgstat_recv_autovac() -
- *
- * Process an autovacuum signalling message.
- * ----------
- */
-static void
-pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- /*
- * Store the last autovacuum time in the database's hashtable entry.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- dbentry->last_autovac_time = msg->m_start_time;
-}
-
-/* ----------
- * pgstat_recv_vacuum() -
- *
- * Process a VACUUM message.
- * ----------
- */
-static void
-pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
-
- /*
- * Store the data in the table's hashtable entry.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
-
- tabentry->n_live_tuples = msg->m_live_tuples;
- tabentry->n_dead_tuples = msg->m_dead_tuples;
-
- if (msg->m_autovacuum)
- {
- tabentry->autovac_vacuum_timestamp = msg->m_vacuumtime;
- tabentry->autovac_vacuum_count++;
- }
- else
- {
- tabentry->vacuum_timestamp = msg->m_vacuumtime;
- tabentry->vacuum_count++;
- }
-}
-
-/* ----------
- * pgstat_recv_analyze() -
- *
- * Process an ANALYZE message.
- * ----------
- */
-static void
-pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
-
- /*
- * Store the data in the table's hashtable entry.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
-
- tabentry->n_live_tuples = msg->m_live_tuples;
- tabentry->n_dead_tuples = msg->m_dead_tuples;
-
- /*
- * If commanded, reset changes_since_analyze to zero.  This forgets any
- * changes that were committed while the ANALYZE was in progress, but we
- * have no good way to estimate how many of those there were.
- */
- if (msg->m_resetcounter)
- tabentry->changes_since_analyze = 0;
-
- if (msg->m_autovacuum)
- {
- tabentry->autovac_analyze_timestamp = msg->m_analyzetime;
- tabentry->autovac_analyze_count++;
- }
- else
- {
- tabentry->analyze_timestamp = msg->m_analyzetime;
- tabentry->analyze_count++;
- }
-}
-
-
-/* ----------
- * pgstat_recv_archiver() -
- *
- * Process a ARCHIVER message.
- * ----------
- */
-static void
-pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len)
-{
- if (msg->m_failed)
- {
- /* Failed archival attempt */
- ++archiverStats.failed_count;
- memcpy(archiverStats.last_failed_wal, msg->m_xlog,
-   sizeof(archiverStats.last_failed_wal));
- archiverStats.last_failed_timestamp = msg->m_timestamp;
- }
- else
- {
- /* Successful archival operation */
- ++archiverStats.archived_count;
- memcpy(archiverStats.last_archived_wal, msg->m_xlog,
-   sizeof(archiverStats.last_archived_wal));
- archiverStats.last_archived_timestamp = msg->m_timestamp;
- }
-}
-
-/* ----------
- * pgstat_recv_bgwriter() -
- *
- * Process a BGWRITER message.
- * ----------
- */
-static void
-pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
-{
- globalStats.timed_checkpoints += msg->m_timed_checkpoints;
- globalStats.requested_checkpoints += msg->m_requested_checkpoints;
- globalStats.checkpoint_write_time += msg->m_checkpoint_write_time;
- globalStats.checkpoint_sync_time += msg->m_checkpoint_sync_time;
- globalStats.buf_written_checkpoints += msg->m_buf_written_checkpoints;
- globalStats.buf_written_clean += msg->m_buf_written_clean;
- globalStats.maxwritten_clean += msg->m_maxwritten_clean;
- globalStats.buf_written_backend += msg->m_buf_written_backend;
- globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
- globalStats.buf_alloc += msg->m_buf_alloc;
-}
-
-/* ----------
- * pgstat_recv_recoveryconflict() -
- *
- * Process a RECOVERYCONFLICT message.
- * ----------
- */
-static void
-pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- switch (msg->m_reason)
- {
- case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-
- /*
- * Since we drop the information about the database as soon as it
- * replicates, there is no point in counting these conflicts.
- */
- break;
- case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
- dbentry->n_conflict_tablespace++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_LOCK:
- dbentry->n_conflict_lock++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
- dbentry->n_conflict_snapshot++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
- dbentry->n_conflict_bufferpin++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
- dbentry->n_conflict_startup_deadlock++;
- break;
- }
-}
-
-/* ----------
- * pgstat_recv_deadlock() -
- *
- * Process a DEADLOCK message.
- * ----------
- */
-static void
-pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- dbentry->n_deadlocks++;
-}
-
-/* ----------
- * pgstat_recv_tempfile() -
- *
- * Process a TEMPFILE message.
- * ----------
- */
-static void
-pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- dbentry->n_temp_bytes += msg->m_filesize;
- dbentry->n_temp_files += 1;
-}
-
-/* ----------
- * pgstat_recv_funcstat() -
- *
- * Count what the backend has done.
- * ----------
- */
-static void
-pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len)
-{
- PgStat_FunctionEntry *funcmsg = &(msg->m_entry[0]);
- PgStat_StatDBEntry *dbentry;
- PgStat_StatFuncEntry *funcentry;
- int i;
- bool found;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- /*
- * Process all function entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++, funcmsg++)
- {
- funcentry = (PgStat_StatFuncEntry *) hash_search(dbentry->functions,
- (void *) &(funcmsg->f_id),
- HASH_ENTER, &found);
-
- if (!found)
+ tabentry->numscans += stat->t_counts.t_numscans;
+ tabentry->tuples_returned += stat->t_counts.t_tuples_returned;
+ tabentry->tuples_fetched += stat->t_counts.t_tuples_fetched;
+ tabentry->tuples_inserted += stat->t_counts.t_tuples_inserted;
+ tabentry->tuples_updated += stat->t_counts.t_tuples_updated;
+ tabentry->tuples_deleted += stat->t_counts.t_tuples_deleted;
+ tabentry->tuples_hot_updated += stat->t_counts.t_tuples_hot_updated;
+ /* If table was truncated, first reset the live/dead counters */
+ if (stat->t_counts.t_truncated)
  {
- /*
- * If it's a new function entry, initialize counters to the values
- * we just got.
- */
- funcentry->f_numcalls = funcmsg->f_numcalls;
- funcentry->f_total_time = funcmsg->f_total_time;
- funcentry->f_self_time = funcmsg->f_self_time;
- }
- else
- {
- /*
- * Otherwise add the values to the existing entry.
- */
- funcentry->f_numcalls += funcmsg->f_numcalls;
- funcentry->f_total_time += funcmsg->f_total_time;
- funcentry->f_self_time += funcmsg->f_self_time;
+ tabentry->n_live_tuples = 0;
+ tabentry->n_dead_tuples = 0;
  }
+ tabentry->n_live_tuples += stat->t_counts.t_delta_live_tuples;
+ tabentry->n_dead_tuples += stat->t_counts.t_delta_dead_tuples;
+ tabentry->changes_since_analyze += stat->t_counts.t_changed_tuples;
+ tabentry->blocks_fetched += stat->t_counts.t_blocks_fetched;
+ tabentry->blocks_hit += stat->t_counts.t_blocks_hit;
  }
+
+ /* Clamp n_live_tuples in case of negative delta_live_tuples */
+ tabentry->n_live_tuples = Max(tabentry->n_live_tuples, 0);
+ /* Likewise for n_dead_tuples */
+ tabentry->n_dead_tuples = Max(tabentry->n_dead_tuples, 0);
+
+ dshash_release_lock(tabhash, tabentry);
+
+ return true;
 }
 
-/* ----------
- * pgstat_recv_funcpurge() -
- *
- * Arrange for dead function removal.
- * ----------
- */
 static void
-pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- int i;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- /*
- * No need to purge if we don't even know the database.
- */
- if (!dbentry || !dbentry->functions)
- return;
-
- /*
- * Process all function entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++)
- {
- /* Remove from hashtable if present; we don't care if it's not. */
- (void) hash_search(dbentry->functions,
-   (void *) &(msg->m_functionid[i]),
-   HASH_REMOVE, NULL);
- }
-}
-
-/* ----------
- * pgstat_write_statsfile_needed() -
- *
- * Do we need to write out any stats files?
- * ----------
- */
-static bool
-pgstat_write_statsfile_needed(void)
-{
- if (pending_write_requests != NIL)
- return true;
-
- /* Everything was written recently */
- return false;
-}
-
-/* ----------
- * pgstat_db_requested() -
- *
- * Checks whether stats for a particular DB need to be written to a file.
- * ----------
- */
-static bool
-pgstat_db_requested(Oid databaseid)
+pgstat_update_dbentry(PgStat_StatDBEntry *dbentry, PgStat_TableStatus *stat)
 {
  /*
- * If any requests are outstanding at all, we should write the stats for
- * shared catalogs (the "database" with OID 0).  This ensures that
- * backends will see up-to-date stats for shared catalogs, even though
- * they send inquiry messages mentioning only their own DB.
+ * Add per-table stats to the per-database entry, too.
  */
- if (databaseid == InvalidOid && pending_write_requests != NIL)
- return true;
-
- /* Search to see if there's an open request to write this database. */
- if (list_member_oid(pending_write_requests, databaseid))
- return true;
-
- return false;
+ dbentry->n_tuples_returned += stat->t_counts.t_tuples_returned;
+ dbentry->n_tuples_fetched += stat->t_counts.t_tuples_fetched;
+ dbentry->n_tuples_inserted += stat->t_counts.t_tuples_inserted;
+ dbentry->n_tuples_updated += stat->t_counts.t_tuples_updated;
+ dbentry->n_tuples_deleted += stat->t_counts.t_tuples_deleted;
+ dbentry->n_blocks_fetched += stat->t_counts.t_blocks_fetched;
+ dbentry->n_blocks_hit += stat->t_counts.t_blocks_hit;
 }
 
+
 /*
  * Convert a potentially unsafely truncated activity string (see
  * PgBackendStatus.st_activity_raw's documentation) into a correctly truncated
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 65eab02b3e..dd293a79f0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -255,7 +255,6 @@ static pid_t StartupPID = 0,
  WalReceiverPID = 0,
  AutoVacPID = 0,
  PgArchPID = 0,
- PgStatPID = 0,
  SysLoggerPID = 0;
 
 /* Startup process's status */
@@ -502,7 +501,6 @@ typedef struct
  PGPROC   *AuxiliaryProcs;
  PGPROC   *PreparedXactProcs;
  PMSignalData *PMSignalState;
- InheritableSocket pgStatSock;
  pid_t PostmasterPid;
  TimestampTz PgStartTime;
  TimestampTz PgReloadTime;
@@ -1298,12 +1296,6 @@ PostmasterMain(int argc, char *argv[])
 
  whereToSendOutput = DestNone;
 
- /*
- * Initialize stats collection subsystem (this does NOT start the
- * collector process!)
- */
- pgstat_init();
-
  /*
  * Initialize the autovacuum subsystem (again, no process start yet)
  */
@@ -1752,11 +1744,6 @@ ServerLoop(void)
  start_autovac_launcher = false; /* signal processed */
  }
 
- /* If we have lost the stats collector, try to start a new one */
- if (PgStatPID == 0 &&
- (pmState == PM_RUN || pmState == PM_HOT_STANDBY))
- PgStatPID = pgstat_start();
-
  /* If we have lost the archiver, try to start a new one. */
  if (PgArchPID == 0 && PgArchStartupAllowed())
  PgArchPID = StartArchiver();
@@ -2591,8 +2578,6 @@ SIGHUP_handler(SIGNAL_ARGS)
  signal_child(PgArchPID, SIGHUP);
  if (SysLoggerPID != 0)
  signal_child(SysLoggerPID, SIGHUP);
- if (PgStatPID != 0)
- signal_child(PgStatPID, SIGHUP);
 
  /* Reload authentication config files too */
  if (!load_hba())
@@ -2923,8 +2908,6 @@ reaper(SIGNAL_ARGS)
  AutoVacPID = StartAutoVacLauncher();
  if (PgArchStartupAllowed() && PgArchPID == 0)
  PgArchPID = StartArchiver();
- if (PgStatPID == 0)
- PgStatPID = pgstat_start();
 
  /* workers may be scheduled to start now */
  maybe_start_bgworkers();
@@ -2991,13 +2974,6 @@ reaper(SIGNAL_ARGS)
  SignalChildren(SIGUSR2);
 
  pmState = PM_SHUTDOWN_2;
-
- /*
- * We can also shut down the stats collector now; there's
- * nothing left for it to do.
- */
- if (PgStatPID != 0)
- signal_child(PgStatPID, SIGQUIT);
  }
  else
  {
@@ -3072,22 +3048,6 @@ reaper(SIGNAL_ARGS)
  continue;
  }
 
- /*
- * Was it the statistics collector?  If so, just try to start a new
- * one; no need to force reset of the rest of the system.  (If fail,
- * we'll try again in future cycles of the main loop.)
- */
- if (pid == PgStatPID)
- {
- PgStatPID = 0;
- if (!EXIT_STATUS_0(exitstatus))
- LogChildExit(LOG, _("statistics collector process"),
- pid, exitstatus);
- if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
- PgStatPID = pgstat_start();
- continue;
- }
-
  /* Was it the system logger?  If so, try to start a new one */
  if (pid == SysLoggerPID)
  {
@@ -3546,22 +3506,6 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
  signal_child(PgArchPID, SIGQUIT);
  }
 
- /*
- * Force a power-cycle of the pgstat process too.  (This isn't absolutely
- * necessary, but it seems like a good idea for robustness, and it
- * simplifies the state-machine logic in the case where a shutdown request
- * arrives during crash processing.)
- */
- if (PgStatPID != 0 && take_action)
- {
- ereport(DEBUG2,
- (errmsg_internal("sending %s to process %d",
- "SIGQUIT",
- (int) PgStatPID)));
- signal_child(PgStatPID, SIGQUIT);
- allow_immediate_pgstat_restart();
- }
-
  /* We do NOT restart the syslogger */
 
  if (Shutdown != ImmediateShutdown)
@@ -3757,8 +3701,6 @@ PostmasterStateMachine(void)
  SignalChildren(SIGQUIT);
  if (PgArchPID != 0)
  signal_child(PgArchPID, SIGQUIT);
- if (PgStatPID != 0)
- signal_child(PgStatPID, SIGQUIT);
  }
  }
  }
@@ -3797,8 +3739,7 @@ PostmasterStateMachine(void)
  * normal state transition leading up to PM_WAIT_DEAD_END, or during
  * FatalError processing.
  */
- if (dlist_is_empty(&BackendList) &&
- PgArchPID == 0 && PgStatPID == 0)
+ if (dlist_is_empty(&BackendList) && PgArchPID == 0)
  {
  /* These other guys should be dead already */
  Assert(StartupPID == 0);
@@ -3999,8 +3940,6 @@ TerminateChildren(int signal)
  signal_child(AutoVacPID, signal);
  if (PgArchPID != 0)
  signal_child(PgArchPID, signal);
- if (PgStatPID != 0)
- signal_child(PgStatPID, signal);
 }
 
 /*
@@ -4973,18 +4912,6 @@ SubPostmasterMain(int argc, char *argv[])
 
  StartBackgroundWorker();
  }
- if (strcmp(argv[1], "--forkarch") == 0)
- {
- /* Do not want to attach to shared memory */
-
- PgArchiverMain(argc, argv); /* does not return */
- }
- if (strcmp(argv[1], "--forkcol") == 0)
- {
- /* Do not want to attach to shared memory */
-
- PgstatCollectorMain(argc, argv); /* does not return */
- }
  if (strcmp(argv[1], "--forklog") == 0)
  {
  /* Do not want to attach to shared memory */
@@ -5097,12 +5024,6 @@ sigusr1_handler(SIGNAL_ARGS)
  if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
  pmState == PM_RECOVERY && Shutdown == NoShutdown)
  {
- /*
- * Likewise, start other special children as needed.
- */
- Assert(PgStatPID == 0);
- PgStatPID = pgstat_start();
-
  ereport(LOG,
  (errmsg("database system is ready to accept read only connections")));
 
@@ -5972,7 +5893,6 @@ extern slock_t *ShmemLock;
 extern slock_t *ProcStructLock;
 extern PGPROC *AuxiliaryProcs;
 extern PMSignalData *PMSignalState;
-extern pgsocket pgStatSock;
 extern pg_time_t first_syslogger_file_time;
 
 #ifndef WIN32
@@ -6025,8 +5945,6 @@ save_backend_variables(BackendParameters *param, Port *port,
  param->AuxiliaryProcs = AuxiliaryProcs;
  param->PreparedXactProcs = PreparedXactProcs;
  param->PMSignalState = PMSignalState;
- if (!write_inheritable_socket(&param->pgStatSock, pgStatSock, childPid))
- return false;
 
  param->PostmasterPid = PostmasterPid;
  param->PgStartTime = PgStartTime;
@@ -6258,7 +6176,6 @@ restore_backend_variables(BackendParameters *param, Port *port)
  AuxiliaryProcs = param->AuxiliaryProcs;
  PreparedXactProcs = param->PreparedXactProcs;
  PMSignalState = param->PMSignalState;
- read_inheritable_socket(&pgStatSock, &param->pgStatSock);
 
  PostmasterPid = param->PostmasterPid;
  PgStartTime = param->PgStartTime;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index d87cf8afe5..9408f87614 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -128,7 +128,7 @@ finish_sync_worker(void)
  if (IsTransactionState())
  {
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(true);
  }
 
  /* And flush all writes. */
@@ -144,6 +144,9 @@ finish_sync_worker(void)
  /* Find the main apply worker and signal it. */
  logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
 
+ /* clean up retained statistics */
+ pgstat_update_stat(true);
+
  /* Stop gracefully */
  proc_exit(0);
 }
@@ -525,7 +528,7 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
  if (started_tx)
  {
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
  }
 }
 
@@ -863,7 +866,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
    MyLogicalRepWorker->relstate,
    MyLogicalRepWorker->relstate_lsn);
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
 
  /*
  * We want to do the table data sync in a single transaction.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index de23ced9af..e4e2ad7b39 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -493,7 +493,7 @@ apply_handle_commit(StringInfo s)
  replorigin_session_origin_timestamp = commit_data.committime;
 
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
 
  store_flush_position(commit_data.end_lsn);
  }
@@ -1327,6 +1327,8 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
  }
 
  send_feedback(last_received, requestReply, requestReply);
+
+ pgstat_update_stat(false);
  }
  }
 }
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 273e2f385f..d8d0ad2487 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1990,7 +1990,7 @@ BufferSync(int flags)
  if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
  {
  TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
- BgWriterStats.m_buf_written_checkpoints++;
+ BgWriterStats.buf_written_checkpoints++;
  num_written++;
  }
  }
@@ -2098,7 +2098,7 @@ BgBufferSync(WritebackContext *wb_context)
  strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
 
  /* Report buffer alloc counts to pgstat */
- BgWriterStats.m_buf_alloc += recent_alloc;
+ BgWriterStats.buf_alloc += recent_alloc;
 
  /*
  * If we're not running the LRU scan, just stop after doing the stats
@@ -2288,7 +2288,7 @@ BgBufferSync(WritebackContext *wb_context)
  reusable_buffers++;
  if (++num_written >= bgwriter_lru_maxpages)
  {
- BgWriterStats.m_maxwritten_clean++;
+ BgWriterStats.maxwritten_clean++;
  break;
  }
  }
@@ -2296,7 +2296,7 @@ BgBufferSync(WritebackContext *wb_context)
  reusable_buffers++;
  }
 
- BgWriterStats.m_buf_written_clean += num_written;
+ BgWriterStats.buf_written_clean += num_written;
 
 #ifdef BGW_DEBUG
  elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d",
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index cab7ae74ca..c7c248878a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -197,6 +197,15 @@ dsm_postmaster_startup(PGShmemHeader *shim)
  dsm_control->maxitems = maxitems;
 }
 
+/*
+ * clear dsm_init state on child start.
+ */
+void
+dsm_child_init(void)
+{
+ dsm_init_done = false;
+}
+
 /*
  * Determine whether the control segment from the previous postmaster
  * invocation still exists.  If so, remove the dynamic shared memory
@@ -423,6 +432,15 @@ dsm_set_control_handle(dsm_handle h)
 }
 #endif
 
+/*
+ * Return if dsm feature is available on this process.
+ */
+bool
+dsm_is_available(void)
+{
+ return dsm_control != NULL;
+}
+
 /*
  * Create a new dynamic shared memory segment.
  *
@@ -440,8 +458,7 @@ dsm_create(Size size, int flags)
  uint32 i;
  uint32 nitems;
 
- /* Unsafe in postmaster (and pointless in a stand-alone backend). */
- Assert(IsUnderPostmaster);
+ Assert(dsm_is_available());
 
  if (!dsm_init_done)
  dsm_backend_startup();
@@ -537,8 +554,7 @@ dsm_attach(dsm_handle h)
  uint32 i;
  uint32 nitems;
 
- /* Unsafe in postmaster (and pointless in a stand-alone backend). */
- Assert(IsUnderPostmaster);
+ Assert(dsm_is_available());
 
  if (!dsm_init_done)
  dsm_backend_startup();
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2849e47d99..6417559cb0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -148,6 +148,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
  size = add_size(size, BTreeShmemSize());
  size = add_size(size, SyncScanShmemSize());
  size = add_size(size, AsyncShmemSize());
+ size = add_size(size, StatsShmemSize());
 #ifdef EXEC_BACKEND
  size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -279,8 +280,13 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 
  /* Initialize dynamic shared memory facilities. */
  if (!IsUnderPostmaster)
+ {
  dsm_postmaster_startup(shim);
 
+ /* Stats collector uses dynamic shared memory  */
+ StatsShmemInit();
+ }
+
  /*
  * Now give loadable modules a chance to set up their shmem allocations
  */
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 81dac45ae5..979478e2e5 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -521,6 +521,9 @@ RegisterLWLockTranches(void)
  LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
  LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
  LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
+ LWLockRegisterTranche(LWTRANCHE_STATS_DSA, "stats table dsa");
+ LWLockRegisterTranche(LWTRANCHE_STATS_DB, "db stats");
+ LWLockRegisterTranche(LWTRANCHE_STATS_FUNC_TABLE, "table/func stats");
 
  /* Register named tranches. */
  for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..97eccb35d3 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,4 @@ MultiXactTruncationLock 41
 OldSnapshotTimeMapLock 42
 LogicalRepWorkerLock 43
 CLogTruncationLock 44
+StatsLock 45
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0c0891b33e..8b9142461a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3159,6 +3159,12 @@ ProcessInterrupts(void)
 
  if (ParallelMessagePending)
  HandleParallelMessages();
+
+ if (IdleStatsUpdateTimeoutPending)
+ {
+ IdleStatsUpdateTimeoutPending = false;
+ pgstat_update_stat(true);
+ }
 }
 
 
@@ -3733,6 +3739,7 @@ PostgresMain(int argc, char *argv[],
  sigjmp_buf local_sigjmp_buf;
  volatile bool send_ready_for_query = true;
  bool disable_idle_in_transaction_timeout = false;
+ bool disable_idle_stats_update_timeout = false;
 
  /* Initialize startup process environment if necessary. */
  if (!IsUnderPostmaster)
@@ -4173,9 +4180,17 @@ PostgresMain(int argc, char *argv[],
  }
  else
  {
- ProcessCompletedNotifies();
- pgstat_report_stat(false);
+ long stats_timeout;
 
+ ProcessCompletedNotifies();
+
+ stats_timeout = pgstat_update_stat(false);
+ if (stats_timeout > 0)
+ {
+ disable_idle_stats_update_timeout = true;
+ enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT,
+ stats_timeout);
+ }
  set_ps_display("idle", false);
  pgstat_report_activity(STATE_IDLE, NULL);
  }
@@ -4210,7 +4225,7 @@ PostgresMain(int argc, char *argv[],
  DoingCommandRead = false;
 
  /*
- * (5) turn off the idle-in-transaction timeout
+ * (5) turn off the idle-in-transaction timeout and stats update timeout
  */
  if (disable_idle_in_transaction_timeout)
  {
@@ -4218,6 +4233,12 @@ PostgresMain(int argc, char *argv[],
  disable_idle_in_transaction_timeout = false;
  }
 
+ if (disable_idle_stats_update_timeout)
+ {
+ disable_timeout(IDLE_STATS_UPDATE_TIMEOUT, false);
+ disable_idle_stats_update_timeout = false;
+ }
+
  /*
  * (6) check for any other interesting events that happened while we
  * slept.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 053bb73863..6eac39fb57 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -33,7 +33,7 @@
 #define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var))))
 
 /* Global bgwriter statistics, from bgwriter.c */
-extern PgStat_MsgBgWriter bgwriterStats;
+extern PgStat_BgWriter bgwriterStats;
 
 Datum
 pg_stat_get_numscans(PG_FUNCTION_ARGS)
@@ -1176,7 +1176,7 @@ pg_stat_get_db_xact_commit(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_xact_commit);
@@ -1192,7 +1192,7 @@ pg_stat_get_db_xact_rollback(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_xact_rollback);
@@ -1208,7 +1208,7 @@ pg_stat_get_db_blocks_fetched(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_blocks_fetched);
@@ -1224,7 +1224,7 @@ pg_stat_get_db_blocks_hit(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_blocks_hit);
@@ -1240,7 +1240,7 @@ pg_stat_get_db_tuples_returned(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_tuples_returned);
@@ -1256,7 +1256,7 @@ pg_stat_get_db_tuples_fetched(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_tuples_fetched);
@@ -1272,7 +1272,7 @@ pg_stat_get_db_tuples_inserted(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_tuples_inserted);
@@ -1288,7 +1288,7 @@ pg_stat_get_db_tuples_updated(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_tuples_updated);
@@ -1304,7 +1304,7 @@ pg_stat_get_db_tuples_deleted(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_tuples_deleted);
@@ -1319,7 +1319,7 @@ pg_stat_get_db_stat_reset_time(PG_FUNCTION_ARGS)
  TimestampTz result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = dbentry->stat_reset_timestamp;
@@ -1337,7 +1337,7 @@ pg_stat_get_db_temp_files(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = dbentry->n_temp_files;
@@ -1353,7 +1353,7 @@ pg_stat_get_db_temp_bytes(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = dbentry->n_temp_bytes;
@@ -1368,7 +1368,7 @@ pg_stat_get_db_conflict_tablespace(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_conflict_tablespace);
@@ -1383,7 +1383,7 @@ pg_stat_get_db_conflict_lock(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_conflict_lock);
@@ -1398,7 +1398,7 @@ pg_stat_get_db_conflict_snapshot(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_conflict_snapshot);
@@ -1413,7 +1413,7 @@ pg_stat_get_db_conflict_bufferpin(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_conflict_bufferpin);
@@ -1428,7 +1428,7 @@ pg_stat_get_db_conflict_startup_deadlock(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_conflict_startup_deadlock);
@@ -1443,7 +1443,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (
@@ -1463,7 +1463,7 @@ pg_stat_get_db_deadlocks(PG_FUNCTION_ARGS)
  int64 result;
  PgStat_StatDBEntry *dbentry;
 
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = (int64) (dbentry->n_deadlocks);
@@ -1479,7 +1479,7 @@ pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
  PgStat_StatDBEntry *dbentry;
 
  /* convert counter from microsec to millisec for display */
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = ((double) dbentry->n_block_read_time) / 1000.0;
@@ -1495,7 +1495,7 @@ pg_stat_get_db_blk_write_time(PG_FUNCTION_ARGS)
  PgStat_StatDBEntry *dbentry;
 
  /* convert counter from microsec to millisec for display */
- if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid, false)) == NULL)
  result = 0;
  else
  result = ((double) dbentry->n_block_write_time) / 1000.0;
@@ -1850,6 +1850,9 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
  /* Get statistics about the archiver process */
  archiver_stats = pgstat_fetch_stat_archiver();
 
+ if (archiver_stats == NULL)
+ PG_RETURN_NULL();
+
  /* Fill values and NULLs */
  values[0] = Int64GetDatum(archiver_stats->archived_count);
  if (*(archiver_stats->last_archived_wal) == '\0')
@@ -1879,6 +1882,5 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
  values[6] = TimestampTzGetDatum(archiver_stats->stat_reset_timestamp);
 
  /* Returns the record as Datum */
- PG_RETURN_DATUM(HeapTupleGetDatum(
-  heap_form_tuple(tupdesc, values, nulls)));
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index fd51934aaf..994351ac2d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -32,6 +32,7 @@ volatile sig_atomic_t QueryCancelPending = false;
 volatile sig_atomic_t ProcDiePending = false;
 volatile sig_atomic_t ClientConnectionLost = false;
 volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
+volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
 volatile uint32 InterruptHoldoffCount = 0;
 volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7415c4faab..626a4326a4 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -73,6 +73,7 @@ static void ShutdownPostgres(int code, Datum arg);
 static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
 static void IdleInTransactionSessionTimeoutHandler(void);
+static void IdleStatsUpdateTimeoutHandler(void);
 static bool ThereIsAtLeastOneRole(void);
 static void process_startup_options(Port *port, bool am_superuser);
 static void process_settings(Oid databaseid, Oid roleid);
@@ -629,6 +630,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
  RegisterTimeout(LOCK_TIMEOUT, LockTimeoutHandler);
  RegisterTimeout(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
  IdleInTransactionSessionTimeoutHandler);
+ RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
+ IdleStatsUpdateTimeoutHandler);
  }
 
  /*
@@ -1240,6 +1243,14 @@ IdleInTransactionSessionTimeoutHandler(void)
  SetLatch(MyLatch);
 }
 
+static void
+IdleStatsUpdateTimeoutHandler(void)
+{
+ IdleStatsUpdateTimeoutPending = true;
+ InterruptPending = true;
+ SetLatch(MyLatch);
+}
+
 /*
  * Returns true if at least one role is defined in this database cluster.
  */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 3e1c3863c4..25b3b2a079 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -123,7 +123,7 @@ is_deeply(
 
 # Contents of these directories should not be copied.
 foreach my $dirname (
- qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_subtrans)
   )
 {
  is_deeply(
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 63a7653457..49131a6d5b 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -82,6 +82,7 @@ extern PGDLLIMPORT volatile sig_atomic_t InterruptPending;
 extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending;
 extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending;
 extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending;
 
 extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
@@ -403,7 +404,6 @@ typedef enum
  CheckpointerProcess,
  WalWriterProcess,
  WalReceiverProcess,
-
  NUM_AUXPROCTYPES /* Must be last! */
 } AuxProcType;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f299d1d601..1ad77fb20f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -13,6 +13,7 @@
 
 #include "datatype/timestamp.h"
 #include "fmgr.h"
+#include "lib/dshash.h"
 #include "libpq/pqcomm.h"
 #include "port/atomics.h"
 #include "portability/instr_time.h"
@@ -41,32 +42,6 @@ typedef enum TrackFunctionsLevel
  TRACK_FUNC_ALL
 } TrackFunctionsLevel;
 
-/* ----------
- * The types of backend -> collector messages
- * ----------
- */
-typedef enum StatMsgType
-{
- PGSTAT_MTYPE_DUMMY,
- PGSTAT_MTYPE_INQUIRY,
- PGSTAT_MTYPE_TABSTAT,
- PGSTAT_MTYPE_TABPURGE,
- PGSTAT_MTYPE_DROPDB,
- PGSTAT_MTYPE_RESETCOUNTER,
- PGSTAT_MTYPE_RESETSHAREDCOUNTER,
- PGSTAT_MTYPE_RESETSINGLECOUNTER,
- PGSTAT_MTYPE_AUTOVAC_START,
- PGSTAT_MTYPE_VACUUM,
- PGSTAT_MTYPE_ANALYZE,
- PGSTAT_MTYPE_ARCHIVER,
- PGSTAT_MTYPE_BGWRITER,
- PGSTAT_MTYPE_FUNCSTAT,
- PGSTAT_MTYPE_FUNCPURGE,
- PGSTAT_MTYPE_RECOVERYCONFLICT,
- PGSTAT_MTYPE_TEMPFILE,
- PGSTAT_MTYPE_DEADLOCK
-} StatMsgType;
-
 /* ----------
  * The data type used for counters.
  * ----------
@@ -115,13 +90,6 @@ typedef struct PgStat_TableCounts
  PgStat_Counter t_blocks_hit;
 } PgStat_TableCounts;
 
-/* Possible targets for resetting cluster-wide shared values */
-typedef enum PgStat_Shared_Reset_Target
-{
- RESET_ARCHIVER,
- RESET_BGWRITER
-} PgStat_Shared_Reset_Target;
-
 /* Possible object types for resetting single counters */
 typedef enum PgStat_Single_Reset_Type
 {
@@ -180,271 +148,23 @@ typedef struct PgStat_TableXactStatus
 } PgStat_TableXactStatus;
 
 
-/* ------------------------------------------------------------
- * Message formats follow
- * ------------------------------------------------------------
- */
-
-
 /* ----------
- * PgStat_MsgHdr The common message header
+ * PgStat_BgWriter bgwriter statistics
  * ----------
  */
-typedef struct PgStat_MsgHdr
+typedef struct PgStat_BgWriter
 {
- StatMsgType m_type;
- int m_size;
-} PgStat_MsgHdr;
-
-/* ----------
- * Space available in a message.  This will keep the UDP packets below 1K,
- * which should fit unfragmented into the MTU of the loopback interface.
- * (Larger values of PGSTAT_MAX_MSG_SIZE would work for that on most
- * platforms, but we're being conservative here.)
- * ----------
- */
-#define PGSTAT_MAX_MSG_SIZE 1000
-#define PGSTAT_MSG_PAYLOAD (PGSTAT_MAX_MSG_SIZE - sizeof(PgStat_MsgHdr))
-
-
-/* ----------
- * PgStat_MsgDummy A dummy message, ignored by the collector
- * ----------
- */
-typedef struct PgStat_MsgDummy
-{
- PgStat_MsgHdr m_hdr;
-} PgStat_MsgDummy;
-
-
-/* ----------
- * PgStat_MsgInquiry Sent by a backend to ask the collector
- * to write the stats file(s).
- *
- * Ordinarily, an inquiry message prompts writing of the global stats file,
- * the stats file for shared catalogs, and the stats file for the specified
- * database.  If databaseid is InvalidOid, only the first two are written.
- *
- * New file(s) will be written only if the existing file has a timestamp
- * older than the specified cutoff_time; this prevents duplicated effort
- * when multiple requests arrive at nearly the same time, assuming that
- * backends send requests with cutoff_times a little bit in the past.
- *
- * clock_time should be the requestor's current local time; the collector
- * uses this to check for the system clock going backward, but it has no
- * effect unless that occurs.  We assume clock_time >= cutoff_time, though.
- * ----------
- */
-
-typedef struct PgStat_MsgInquiry
-{
- PgStat_MsgHdr m_hdr;
- TimestampTz clock_time; /* observed local clock time */
- TimestampTz cutoff_time; /* minimum acceptable file timestamp */
- Oid databaseid; /* requested DB (InvalidOid => shared only) */
-} PgStat_MsgInquiry;
-
-
-/* ----------
- * PgStat_TableEntry Per-table info in a MsgTabstat
- * ----------
- */
-typedef struct PgStat_TableEntry
-{
- Oid t_id;
- PgStat_TableCounts t_counts;
-} PgStat_TableEntry;
-
-/* ----------
- * PgStat_MsgTabstat Sent by the backend to report table
- * and buffer access statistics.
- * ----------
- */
-#define PGSTAT_NUM_TABENTRIES  \
- ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - 3 * sizeof(int) - 2 * sizeof(PgStat_Counter)) \
- / sizeof(PgStat_TableEntry))
-
-typedef struct PgStat_MsgTabstat
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- int m_nentries;
- int m_xact_commit;
- int m_xact_rollback;
- PgStat_Counter m_block_read_time; /* times in microseconds */
- PgStat_Counter m_block_write_time;
- PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
-} PgStat_MsgTabstat;
-
-
-/* ----------
- * PgStat_MsgTabpurge Sent by the backend to tell the collector
- * about dead tables.
- * ----------
- */
-#define PGSTAT_NUM_TABPURGE  \
- ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - sizeof(int))  \
- / sizeof(Oid))
-
-typedef struct PgStat_MsgTabpurge
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- int m_nentries;
- Oid m_tableid[PGSTAT_NUM_TABPURGE];
-} PgStat_MsgTabpurge;
-
-
-/* ----------
- * PgStat_MsgDropdb Sent by the backend to tell the collector
- * about a dropped database
- * ----------
- */
-typedef struct PgStat_MsgDropdb
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
-} PgStat_MsgDropdb;
-
-
-/* ----------
- * PgStat_MsgResetcounter Sent by the backend to tell the collector
- * to reset counters
- * ----------
- */
-typedef struct PgStat_MsgResetcounter
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
-} PgStat_MsgResetcounter;
-
-/* ----------
- * PgStat_MsgResetsharedcounter Sent by the backend to tell the collector
- * to reset a shared counter
- * ----------
- */
-typedef struct PgStat_MsgResetsharedcounter
-{
- PgStat_MsgHdr m_hdr;
- PgStat_Shared_Reset_Target m_resettarget;
-} PgStat_MsgResetsharedcounter;
-
-/* ----------
- * PgStat_MsgResetsinglecounter Sent by the backend to tell the collector
- * to reset a single counter
- * ----------
- */
-typedef struct PgStat_MsgResetsinglecounter
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- PgStat_Single_Reset_Type m_resettype;
- Oid m_objectid;
-} PgStat_MsgResetsinglecounter;
-
-/* ----------
- * PgStat_MsgAutovacStart Sent by the autovacuum daemon to signal
- * that a database is going to be processed
- * ----------
- */
-typedef struct PgStat_MsgAutovacStart
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- TimestampTz m_start_time;
-} PgStat_MsgAutovacStart;
-
-
-/* ----------
- * PgStat_MsgVacuum Sent by the backend or autovacuum daemon
- * after VACUUM
- * ----------
- */
-typedef struct PgStat_MsgVacuum
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- Oid m_tableoid;
- bool m_autovacuum;
- TimestampTz m_vacuumtime;
- PgStat_Counter m_live_tuples;
- PgStat_Counter m_dead_tuples;
-} PgStat_MsgVacuum;
-
-
-/* ----------
- * PgStat_MsgAnalyze Sent by the backend or autovacuum daemon
- * after ANALYZE
- * ----------
- */
-typedef struct PgStat_MsgAnalyze
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- Oid m_tableoid;
- bool m_autovacuum;
- bool m_resetcounter;
- TimestampTz m_analyzetime;
- PgStat_Counter m_live_tuples;
- PgStat_Counter m_dead_tuples;
-} PgStat_MsgAnalyze;
-
-
-/* ----------
- * PgStat_MsgArchiver Sent by the archiver to update statistics.
- * ----------
- */
-typedef struct PgStat_MsgArchiver
-{
- PgStat_MsgHdr m_hdr;
- bool m_failed; /* Failed attempt */
- char m_xlog[MAX_XFN_CHARS + 1];
- TimestampTz m_timestamp;
-} PgStat_MsgArchiver;
-
-/* ----------
- * PgStat_MsgBgWriter Sent by the bgwriter to update statistics.
- * ----------
- */
-typedef struct PgStat_MsgBgWriter
-{
- PgStat_MsgHdr m_hdr;
-
- PgStat_Counter m_timed_checkpoints;
- PgStat_Counter m_requested_checkpoints;
- PgStat_Counter m_buf_written_checkpoints;
- PgStat_Counter m_buf_written_clean;
- PgStat_Counter m_maxwritten_clean;
- PgStat_Counter m_buf_written_backend;
- PgStat_Counter m_buf_fsync_backend;
- PgStat_Counter m_buf_alloc;
- PgStat_Counter m_checkpoint_write_time; /* times in milliseconds */
- PgStat_Counter m_checkpoint_sync_time;
-} PgStat_MsgBgWriter;
-
-/* ----------
- * PgStat_MsgRecoveryConflict Sent by the backend upon recovery conflict
- * ----------
- */
-typedef struct PgStat_MsgRecoveryConflict
-{
- PgStat_MsgHdr m_hdr;
-
- Oid m_databaseid;
- int m_reason;
-} PgStat_MsgRecoveryConflict;
-
-/* ----------
- * PgStat_MsgTempFile Sent by the backend upon creating a temp file
- * ----------
- */
-typedef struct PgStat_MsgTempFile
-{
- PgStat_MsgHdr m_hdr;
-
- Oid m_databaseid;
- size_t m_filesize;
-} PgStat_MsgTempFile;
+ PgStat_Counter timed_checkpoints;
+ PgStat_Counter requested_checkpoints;
+ PgStat_Counter buf_written_checkpoints;
+ PgStat_Counter buf_written_clean;
+ PgStat_Counter maxwritten_clean;
+ PgStat_Counter buf_written_backend;
+ PgStat_Counter buf_fsync_backend;
+ PgStat_Counter buf_alloc;
+ PgStat_Counter checkpoint_write_time; /* times in milliseconds */
+ PgStat_Counter checkpoint_sync_time;
+} PgStat_BgWriter;
 
 /* ----------
  * PgStat_FunctionCounts The actual per-function counts kept by a backend
@@ -485,79 +205,6 @@ typedef struct PgStat_FunctionEntry
  PgStat_Counter f_self_time;
 } PgStat_FunctionEntry;
 
-/* ----------
- * PgStat_MsgFuncstat Sent by the backend to report function
- * usage statistics.
- * ----------
- */
-#define PGSTAT_NUM_FUNCENTRIES \
- ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - sizeof(int))  \
- / sizeof(PgStat_FunctionEntry))
-
-typedef struct PgStat_MsgFuncstat
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- int m_nentries;
- PgStat_FunctionEntry m_entry[PGSTAT_NUM_FUNCENTRIES];
-} PgStat_MsgFuncstat;
-
-/* ----------
- * PgStat_MsgFuncpurge Sent by the backend to tell the collector
- * about dead functions.
- * ----------
- */
-#define PGSTAT_NUM_FUNCPURGE  \
- ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - sizeof(int))  \
- / sizeof(Oid))
-
-typedef struct PgStat_MsgFuncpurge
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
- int m_nentries;
- Oid m_functionid[PGSTAT_NUM_FUNCPURGE];
-} PgStat_MsgFuncpurge;
-
-/* ----------
- * PgStat_MsgDeadlock Sent by the backend to tell the collector
- * about a deadlock that occurred.
- * ----------
- */
-typedef struct PgStat_MsgDeadlock
-{
- PgStat_MsgHdr m_hdr;
- Oid m_databaseid;
-} PgStat_MsgDeadlock;
-
-
-/* ----------
- * PgStat_Msg Union over all possible messages.
- * ----------
- */
-typedef union PgStat_Msg
-{
- PgStat_MsgHdr msg_hdr;
- PgStat_MsgDummy msg_dummy;
- PgStat_MsgInquiry msg_inquiry;
- PgStat_MsgTabstat msg_tabstat;
- PgStat_MsgTabpurge msg_tabpurge;
- PgStat_MsgDropdb msg_dropdb;
- PgStat_MsgResetcounter msg_resetcounter;
- PgStat_MsgResetsharedcounter msg_resetsharedcounter;
- PgStat_MsgResetsinglecounter msg_resetsinglecounter;
- PgStat_MsgAutovacStart msg_autovacuum;
- PgStat_MsgVacuum msg_vacuum;
- PgStat_MsgAnalyze msg_analyze;
- PgStat_MsgArchiver msg_archiver;
- PgStat_MsgBgWriter msg_bgwriter;
- PgStat_MsgFuncstat msg_funcstat;
- PgStat_MsgFuncpurge msg_funcpurge;
- PgStat_MsgRecoveryConflict msg_recoveryconflict;
- PgStat_MsgDeadlock msg_deadlock;
-} PgStat_Msg;
-
-
 /* ------------------------------------------------------------
  * Statistic collector data structures follow
  *
@@ -601,10 +248,13 @@ typedef struct PgStat_StatDBEntry
 
  /*
  * tables and functions must be last in the struct, because we don't write
- * the pointers out to the stats file.
+ * the handles and pointers out to the stats file.
  */
- HTAB   *tables;
- HTAB   *functions;
+ dshash_table_handle tables;
+ dshash_table_handle functions;
+ /* for snapshot tables */
+ HTAB *snapshot_tables;
+ HTAB *snapshot_functions;
 } PgStat_StatDBEntry;
 
 
@@ -1136,13 +786,15 @@ extern bool pgstat_track_counts;
 extern int pgstat_track_functions;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
 extern char *pgstat_stat_directory;
+
+/* No longer used, but will be removed with GUC */
 extern char *pgstat_stat_tmpname;
 extern char *pgstat_stat_filename;
 
 /*
  * BgWriter statistics counters are updated directly by bgwriter and bufmgr
  */
-extern PgStat_MsgBgWriter BgWriterStats;
+extern PgStat_BgWriter BgWriterStats;
 
 /*
  * Updated by pgstat_count_buffer_*_time macros
@@ -1154,34 +806,20 @@ extern PgStat_Counter pgStatBlockWriteTime;
  * Functions called from postmaster
  * ----------
  */
-extern Size BackendStatusShmemSize(void);
-extern void CreateSharedBackendStatus(void);
-
-extern void pgstat_init(void);
-extern int pgstat_start(void);
 extern void pgstat_reset_all(void);
-extern void allow_immediate_pgstat_restart(void);
-
-#ifdef EXEC_BACKEND
-extern void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
-#endif
-
 
 /* ----------
  * Functions called from backends
  * ----------
  */
-extern void pgstat_ping(void);
-
-extern void pgstat_report_stat(bool force);
+extern long pgstat_update_stat(bool force);
 extern void pgstat_vacuum_stat(void);
 extern void pgstat_drop_database(Oid databaseid);
 
-extern void pgstat_clear_snapshot(void);
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset_shared_counters(const char *);
-extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type type);
-
+extern void pgstat_reset_shared_counters(const char *target);
+extern void pgstat_reset_single_counter(Oid objectid,
+ PgStat_Single_Reset_Type type);
 extern void pgstat_report_autovac(Oid dboid);
 extern void pgstat_report_vacuum(Oid tableoid, bool shared,
  PgStat_Counter livetuples, PgStat_Counter deadtuples);
@@ -1192,6 +830,8 @@ extern void pgstat_report_analyze(Relation rel,
 extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 
+extern void pgstat_clear_snapshot(void);
+
 extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
@@ -1219,6 +859,9 @@ extern PgStat_BackendFunctionEntry *find_funcstat_entry(Oid func_id);
 extern void pgstat_initstats(Relation rel);
 
 extern char *pgstat_clip_activity(const char *raw_activity);
+extern PgStat_StatDBEntry *backend_get_db_entry(Oid dbid, bool oneshot);
+extern HTAB *backend_snapshot_all_db_entries(void);
+extern PgStat_StatTabEntry *backend_get_tab_entry(PgStat_StatDBEntry *dbent, Oid relid, bool oneshot);
 
 /* ----------
  * pgstat_report_wait_start() -
@@ -1338,15 +981,15 @@ extern void pgstat_twophase_postcommit(TransactionId xid, uint16 info,
 extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
   void *recdata, uint32 len);
 
-extern void pgstat_send_archiver(const char *xlog, bool failed);
-extern void pgstat_send_bgwriter(void);
+extern void pgstat_update_archiver(const char *xlog, bool failed);
+extern void pgstat_update_bgwriter(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
  * generate the pgstat* views.
  * ----------
  */
-extern PgStat_StatDBEntry *pgstat_fetch_stat_dbentry(Oid dbid);
+extern PgStat_StatDBEntry *pgstat_fetch_stat_dbentry(Oid relid, bool oneshot);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
 extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
 extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
@@ -1355,4 +998,14 @@ extern int pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 
+/* File input/output functions  */
+extern void pgstat_read_statsfiles(void);
+extern void pgstat_write_statsfiles(void);
+
+/* For shared memory allocation/initialize */
+extern Size BackendStatusShmemSize(void);
+extern void CreateSharedBackendStatus(void);
+extern Size StatsShmemSize(void);
+extern void StatsShmemInit(void);
+
 #endif /* PGSTAT_H */
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 7c44f4a6e7..c37ec33e9b 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -26,6 +26,7 @@ typedef struct dsm_segment dsm_segment;
 struct PGShmemHeader; /* avoid including pg_shmem.h */
 extern void dsm_cleanup_using_control_segment(dsm_handle old_control_handle);
 extern void dsm_postmaster_startup(struct PGShmemHeader *);
+extern void dsm_child_init(void);
 extern void dsm_backend_shutdown(void);
 extern void dsm_detach_all(void);
 
@@ -33,6 +34,8 @@ extern void dsm_detach_all(void);
 extern void dsm_set_control_handle(dsm_handle h);
 #endif
 
+extern bool dsm_is_available(void);
+
 /* Functions that create or remove mappings. */
 extern dsm_segment *dsm_create(Size size, int flags);
 extern dsm_segment *dsm_attach(dsm_handle h);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 96c7732006..daa269f816 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,9 @@ typedef enum BuiltinTrancheIds
  LWTRANCHE_SHARED_TUPLESTORE,
  LWTRANCHE_TBM,
  LWTRANCHE_PARALLEL_APPEND,
+ LWTRANCHE_STATS_DSA,
+ LWTRANCHE_STATS_DB,
+ LWTRANCHE_STATS_FUNC_TABLE,
  LWTRANCHE_FIRST_USER_DEFINED
 } BuiltinTrancheIds;
 
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 9244a2a7b7..a9b625211b 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -31,6 +31,7 @@ typedef enum TimeoutId
  STANDBY_TIMEOUT,
  STANDBY_LOCK_TIMEOUT,
  IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
+ IDLE_STATS_UPDATE_TIMEOUT,
  /* First user-definable timeout reason */
  USER_TIMEOUT,
  /* Maximum number of timeout reasons */
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index c1878dd694..7391e05f37 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -290,7 +290,7 @@ worker_spi_main(Datum main_arg)
  SPI_finish();
  PopActiveSnapshot();
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
  pgstat_report_activity(STATE_IDLE, NULL);
  }
 
--
2.16.3


From 60711e8bb25371b237835c0f78a64d139589c8c6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Tue, 27 Nov 2018 14:42:12 +0900
Subject: [PATCH 5/7] Remove the GUC stats_temp_directory

The guc used to specifie the directory to store temporary statistics
files. It is no longer needed by the stats collector but still used by
the programs in bin and contirb, and maybe other extensions. Thus this
patch removes the GUC but some backing variables and macro definitions
are left alone for backward comptibility.
---
 src/backend/postmaster/pgstat.c               | 12 +++-----
 src/backend/replication/basebackup.c          | 13 ++-------
 src/backend/utils/misc/guc.c                  | 41 ---------------------------
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/pgstat.h                          |  5 +++-
 5 files changed, 11 insertions(+), 61 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a97fbae7a8..78f0bbb558 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -89,15 +89,11 @@ bool pgstat_track_counts = false;
 int pgstat_track_functions = TRACK_FUNC_OFF;
 int pgstat_track_activity_query_size = 1024;
 
-/* ----------
- * Built from GUC parameter
- * ----------
+/*
+ * This was a GUC parameter and no longer used in this file. But left alone
+ * just for backward comptibility for extensions, having the default value.
  */
-char   *pgstat_stat_directory = NULL;
-
-/* No longer used, but will be removed with GUC */
-char   *pgstat_stat_filename = NULL;
-char   *pgstat_stat_tmpname = NULL;
+char   *pgstat_stat_directory = PG_STAT_TMP_DIR;
 
 /* Shared stats bootstrap infomation */
 typedef struct StatsShmemStruct {
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index def6c03dd0..58ba33e822 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -230,11 +230,8 @@ perform_base_backup(basebackup_options *opt)
  TimeLineID endtli;
  StringInfo labelfile;
  StringInfo tblspc_map_file = NULL;
- int datadirpathlen;
  List   *tablespaces = NIL;
 
- datadirpathlen = strlen(DataDir);
-
  backup_started_in_recovery = RecoveryInProgress();
 
  labelfile = makeStringInfo();
@@ -265,13 +262,9 @@ perform_base_backup(basebackup_options *opt)
  * Calculate the relative path of temporary statistics directory in
  * order to skip the files which are located in that directory later.
  */
- if (is_absolute_path(pgstat_stat_directory) &&
- strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
- else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory);
- else
- statrelpath = pgstat_stat_directory;
+
+ Assert(strchr(PG_STAT_TMP_DIR, '/') == NULL);
+ statrelpath = psprintf("./%s", PG_STAT_TMP_DIR);
 
  /* Add a node for the base directory at the end */
  ti = palloc0(sizeof(tablespaceinfo));
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c216ed0922..099afd0724 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -189,7 +189,6 @@ static bool check_autovacuum_max_workers(int *newval, void **extra, GucSource so
 static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source);
 static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
 static void assign_effective_io_concurrency(int newval, void *extra);
-static void assign_pgstat_temp_directory(const char *newval, void *extra);
 static bool check_application_name(char **newval, void **extra, GucSource source);
 static void assign_application_name(const char *newval, void *extra);
 static bool check_cluster_name(char **newval, void **extra, GucSource source);
@@ -3973,17 +3972,6 @@ static struct config_string ConfigureNamesString[] =
  NULL, NULL, NULL
  },
 
- {
- {"stats_temp_directory", PGC_SIGHUP, STATS_COLLECTOR,
- gettext_noop("Writes temporary statistics files to the specified directory."),
- NULL,
- GUC_SUPERUSER_ONLY
- },
- &pgstat_temp_directory,
- PG_STAT_TMP_DIR,
- check_canonical_path, assign_pgstat_temp_directory, NULL
- },
-
  {
  {"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
  gettext_noop("Number of synchronous standbys and list of names of potential synchronous ones."),
@@ -10966,35 +10954,6 @@ assign_effective_io_concurrency(int newval, void *extra)
 #endif /* USE_PREFETCH */
 }
 
-static void
-assign_pgstat_temp_directory(const char *newval, void *extra)
-{
- /* check_canonical_path already canonicalized newval for us */
- char   *dname;
- char   *tname;
- char   *fname;
-
- /* directory */
- dname = guc_malloc(ERROR, strlen(newval) + 1); /* runtime dir */
- sprintf(dname, "%s", newval);
-
- /* global stats */
- tname = guc_malloc(ERROR, strlen(newval) + 12); /* /global.tmp */
- sprintf(tname, "%s/global.tmp", newval);
- fname = guc_malloc(ERROR, strlen(newval) + 13); /* /global.stat */
- sprintf(fname, "%s/global.stat", newval);
-
- if (pgstat_stat_directory)
- free(pgstat_stat_directory);
- pgstat_stat_directory = dname;
- if (pgstat_stat_tmpname)
- free(pgstat_stat_tmpname);
- pgstat_stat_tmpname = tname;
- if (pgstat_stat_filename)
- free(pgstat_stat_filename);
- pgstat_stat_filename = fname;
-}
-
 static bool
 check_application_name(char **newval, void **extra, GucSource source)
 {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a21865a77f..a65656a4d2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -552,7 +552,6 @@
 #track_io_timing = off
 #track_functions = none # none, pl, all
 #track_activity_query_size = 1024 # (change requires restart)
-#stats_temp_directory = 'pg_stat_tmp'
 
 
 # - Monitoring -
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1ad77fb20f..d10ea5389b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -31,7 +31,10 @@
 #define PGSTAT_STAT_PERMANENT_FILENAME "pg_stat/global.stat"
 #define PGSTAT_STAT_PERMANENT_TMPFILE "pg_stat/global.tmp"
 
-/* Default directory to store temporary statistics data in */
+/*
+ * This used to be the directory to store temporary statistics data in but is
+ * no longer used. Defined here for backward compatibility.
+ */
 #define PG_STAT_TMP_DIR "pg_stat_tmp"
 
 /* Values for track_functions GUC variable --- order is significant! */
--
2.16.3


From 5be4706bc70dc7eeaa36674d01cf3c409172172a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Fri, 9 Nov 2018 15:48:49 +0900
Subject: [PATCH 6/7] Split out backend status monitor part from pgstat

A large file, pgstat.c, contained two major facilities, backend status
monitor and database usage monitor. Split out the former part from the
file and name the module "bestatus". The names of individual functions
are left alone except for some conficts.
---
 contrib/pg_prewarm/autoprewarm.c                   |    2 +-
 contrib/pg_stat_statements/pg_stat_statements.c    |    1 +
 contrib/postgres_fdw/connection.c                  |    2 +-
 src/backend/Makefile                               |    2 +-
 src/backend/access/heap/rewriteheap.c              |    4 +-
 src/backend/access/heap/vacuumlazy.c               |    1 +
 src/backend/access/nbtree/nbtree.c                 |    2 +-
 src/backend/access/nbtree/nbtsort.c                |    2 +-
 src/backend/access/transam/clog.c                  |    2 +-
 src/backend/access/transam/parallel.c              |    2 +-
 src/backend/access/transam/slru.c                  |    2 +-
 src/backend/access/transam/timeline.c              |    2 +-
 src/backend/access/transam/twophase.c              |    1 +
 src/backend/access/transam/xact.c                  |    1 +
 src/backend/access/transam/xlog.c                  |    1 +
 src/backend/access/transam/xlogfuncs.c             |    2 +-
 src/backend/access/transam/xlogutils.c             |    2 +-
 src/backend/bootstrap/bootstrap.c                  |    8 +-
 src/backend/executor/execParallel.c                |    2 +-
 src/backend/executor/nodeBitmapHeapscan.c          |    1 +
 src/backend/executor/nodeGather.c                  |    2 +-
 src/backend/executor/nodeHash.c                    |    2 +-
 src/backend/executor/nodeHashjoin.c                |    2 +-
 src/backend/libpq/be-secure-openssl.c              |    2 +-
 src/backend/libpq/be-secure.c                      |    2 +-
 src/backend/libpq/pqmq.c                           |    2 +-
 src/backend/postmaster/Makefile                    |    2 +-
 src/backend/postmaster/autovacuum.c                |    1 +
 src/backend/postmaster/bgworker.c                  |    2 +-
 src/backend/postmaster/bgwriter.c                  |    1 +
 src/backend/postmaster/checkpointer.c              |    1 +
 src/backend/postmaster/pgarch.c                    |    1 +
 src/backend/postmaster/postmaster.c                |    1 +
 src/backend/postmaster/syslogger.c                 |    2 +-
 src/backend/postmaster/walwriter.c                 |    2 +-
 src/backend/replication/basebackup.c               |    1 +
 .../libpqwalreceiver/libpqwalreceiver.c            |    2 +-
 src/backend/replication/logical/launcher.c         |    2 +-
 src/backend/replication/logical/origin.c           |    3 +-
 src/backend/replication/logical/reorderbuffer.c    |    2 +-
 src/backend/replication/logical/snapbuild.c        |    2 +-
 src/backend/replication/logical/tablesync.c        |    6 +-
 src/backend/replication/logical/worker.c           |    7 +-
 src/backend/replication/slot.c                     |    2 +-
 src/backend/replication/syncrep.c                  |    2 +-
 src/backend/replication/walreceiver.c              |    2 +-
 src/backend/replication/walsender.c                |    2 +-
 src/backend/statmon/Makefile                       |   17 +
 src/backend/statmon/bestatus.c                     | 1756 ++++++++++++++++++++
 src/backend/{postmaster => statmon}/pgstat.c       | 1727 +------------------
 src/backend/storage/buffer/bufmgr.c                |    1 +
 src/backend/storage/file/buffile.c                 |    2 +-
 src/backend/storage/file/copydir.c                 |    2 +-
 src/backend/storage/file/fd.c                      |    1 +
 src/backend/storage/ipc/dsm_impl.c                 |    2 +-
 src/backend/storage/ipc/latch.c                    |    2 +-
 src/backend/storage/ipc/procarray.c                |    2 +-
 src/backend/storage/ipc/shm_mq.c                   |    2 +-
 src/backend/storage/ipc/standby.c                  |    2 +-
 src/backend/storage/lmgr/deadlock.c                |    1 +
 src/backend/storage/lmgr/lwlock.c                  |    2 +-
 src/backend/storage/lmgr/predicate.c               |    2 +-
 src/backend/storage/lmgr/proc.c                    |    2 +-
 src/backend/storage/smgr/md.c                      |    2 +-
 src/backend/tcop/postgres.c                        |    1 +
 src/backend/utils/adt/misc.c                       |    2 +-
 src/backend/utils/adt/pgstatfuncs.c                |    1 +
 src/backend/utils/cache/relmapper.c                |    2 +-
 src/backend/utils/init/miscinit.c                  |    2 +-
 src/backend/utils/init/postinit.c                  |    4 +
 src/backend/utils/misc/guc.c                       |    1 +
 src/include/bestatus.h                             |  544 ++++++
 src/include/pgstat.h                               |  514 +-----
 73 files changed, 2441 insertions(+), 2255 deletions(-)
 create mode 100644 src/backend/statmon/Makefile
 create mode 100644 src/backend/statmon/bestatus.c
 rename src/backend/{postmaster => statmon}/pgstat.c (70%)
 create mode 100644 src/include/bestatus.h

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 45a5a26337..6296401b25 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -30,10 +30,10 @@
 
 #include "access/heapam.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "storage/buf_internals.h"
 #include "storage/dsm.h"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177ebaa2c..188d034387 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -62,6 +62,7 @@
 #include <unistd.h>
 
 #include "access/hash.h"
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 239d220c24..1ea71245df 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -15,11 +15,11 @@
 #include "postgres_fdw.h"
 
 #include "access/htup_details.h"
+#include "bestatus.h"
 #include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 478a96db9b..cc511672c9 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -20,7 +20,7 @@ include $(top_builddir)/src/Makefile.global
 SUBDIRS = access bootstrap catalog parser commands executor foreign lib libpq \
  main nodes optimizer partitioning port postmaster \
  regex replication rewrite \
- statistics storage tcop tsearch utils $(top_builddir)/src/timezone \
+ statistics statmon storage tcop tsearch utils $(top_builddir)/src/timezone \
  jit
 
 include $(srcdir)/common.mk
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f6b0f1b093..ef40a2e7a2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -115,12 +115,12 @@
 #include "access/xact.h"
 #include "access/xloginsert.h"
 
+#include "bestatus.h"
+
 #include "catalog/catalog.h"
 
 #include "lib/ilist.h"
 
-#include "pgstat.h"
-
 #include "replication/logical.h"
 #include "replication/slot.h"
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c09eb6eff8..189db9b8fd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -44,6 +44,7 @@
 #include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/storage.h"
 #include "commands/dbcommands.h"
 #include "commands/progress.h"
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 98917de2ef..69cd211369 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -22,10 +22,10 @@
 #include "access/nbtxlog.h"
 #include "access/relscan.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
-#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "storage/condition_variable.h"
 #include "storage/indexfsm.h"
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 5cc3cf57e2..a0173c19a8 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -64,9 +64,9 @@
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "bestatus.h"
 #include "catalog/index.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/smgr.h"
 #include "tcop/tcopprot.h" /* pgrminclude ignore */
 #include "utils/rel.h"
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index aa089d83fa..cf034ba333 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -38,8 +38,8 @@
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "pg_trace.h"
 #include "storage/proc.h"
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 9c55c20d6b..26d30b8853 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -19,6 +19,7 @@
 #include "access/session.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/pg_enum.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
@@ -29,7 +30,6 @@
 #include "libpq/pqmq.h"
 #include "miscadmin.h"
 #include "optimizer/planmain.h"
-#include "pgstat.h"
 #include "storage/ipc.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352b9c..a28fe474aa 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -54,7 +54,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/xlog.h"
-#include "pgstat.h"
+#include "bestatus.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
 #include "miscadmin.h"
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c96c8b60ba..bbe9c0eb5f 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -38,7 +38,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogdefs.h"
-#include "pgstat.h"
+#include "bestatus.h"
 #include "storage/fd.h"
 
 /*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 9a8a6bb119..9bbb1952ac 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -87,6 +87,7 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "access/xlogreader.h"
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
 #include "funcapi.h"
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 18467d96d2..c40ac790b0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -30,6 +30,7 @@
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_enum.h"
 #include "catalog/storage.h"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9e45581d89..4b4e3d07ac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -36,6 +36,7 @@
 #include "access/xloginsert.h"
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index b35043bf71..683c41575f 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,9 +23,9 @@
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/pg_type.h"
+#include "bestatus.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..53fa4890e9 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -23,8 +23,8 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/smgr.h"
 #include "utils/guc.h"
 #include "utils/hsearch.h"
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index df926d8dea..fca62770ac 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -22,6 +22,7 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "bootstrap/bootstrap.h"
 #include "catalog/index.h"
 #include "catalog/pg_collation.h"
@@ -329,9 +330,6 @@ AuxiliaryProcessMain(int argc, char *argv[])
  case BgWriterProcess:
  statmsg = pgstat_get_backend_desc(B_BG_WRITER);
  break;
- case ArchiverProcess:
- statmsg = pgstat_get_backend_desc(B_ARCHIVER);
- break;
  case CheckpointerProcess:
  statmsg = pgstat_get_backend_desc(B_CHECKPOINTER);
  break;
@@ -341,6 +339,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
  case WalReceiverProcess:
  statmsg = pgstat_get_backend_desc(B_WAL_RECEIVER);
  break;
+ case ArchiverProcess:
+ statmsg = pgstat_get_backend_desc(B_ARCHIVER);
+ break;
  default:
  statmsg = "??? process";
  break;
@@ -417,6 +418,7 @@ AuxiliaryProcessMain(int argc, char *argv[])
  CreateAuxProcessResourceOwner();
 
  /* Initialize backend status information */
+ pgstat_bearray_initialize();
  pgstat_initialize();
  pgstat_bestart();
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d6cfd28ddc..a8d29d2d33 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -48,7 +48,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
-#include "pgstat.h"
+#include "bestatus.h"
 
 /*
  * Magic numbers for parallel executor communication.  We use constants
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index cd20abc141..3ad7238b5a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -41,6 +41,7 @@
 #include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "bestatus.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "miscadmin.h"
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 70a4e90a05..02d58c463c 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
 
 #include "access/relscan.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "executor/execdebug.h"
 #include "executor/execParallel.h"
 #include "executor/nodeGather.h"
@@ -39,7 +40,6 @@
 #include "executor/tqueue.h"
 #include "miscadmin.h"
 #include "optimizer/planmain.h"
-#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 856daf6a7f..5a47eb4601 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -28,6 +28,7 @@
 
 #include "access/htup_details.h"
 #include "access/parallel.h"
+#include "bestatus.h"
 #include "catalog/pg_statistic.h"
 #include "commands/tablespace.h"
 #include "executor/execdebug.h"
@@ -35,7 +36,6 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "port/atomics.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2098708864..898a7916b0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -108,12 +108,12 @@
 
 #include "access/htup_details.h"
 #include "access/parallel.h"
+#include "bestatus.h"
 #include "executor/executor.h"
 #include "executor/hashjoin.h"
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/sharedtuplestore.h"
 
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 789a975409..de15e0907f 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -36,9 +36,9 @@
 #include <openssl/ec.h>
 #endif
 
+#include "bestatus.h"
 #include "libpq/libpq.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/latch.h"
 #include "tcop/tcopprot.h"
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index a7def3168d..fa1cf6cffa 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -29,9 +29,9 @@
 #include <arpa/inet.h>
 #endif
 
+#include "bestatus.h"
 #include "libpq/libpq.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 #include "storage/ipc.h"
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index a9bd47d937..f79a70d6fe 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -13,11 +13,11 @@
 
 #include "postgres.h"
 
+#include "bestatus.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqmq.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..311e63017d 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
- pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+ pgarch.o postmaster.o startup.o syslogger.o walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index a69ea230fb..b1c723bf1c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -71,6 +71,7 @@
 #include "access/reloptions.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "catalog/dependency.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f5db5a8c4a..7d7d55ef1a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -16,8 +16,8 @@
 
 #include "libpq/pqsignal.h"
 #include "access/parallel.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 3fb6badea8..c820d35fbc 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -40,6 +40,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index d58193774e..b592560dd2 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -43,6 +43,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 18bd8296b8..2a7c4fd1b1 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -35,6 +35,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index dd293a79f0..d3ec828657 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -95,6 +95,7 @@
 
 #include "access/transam.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "bootstrap/bootstrap.h"
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index d1ea46deb8..3accdf7bcf 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -31,11 +31,11 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "bestatus.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
-#include "pgstat.h"
 #include "pgtime.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/postmaster.h"
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a6fdba3f41..0de04159d5 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -45,9 +45,9 @@
 #include <unistd.h>
 
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/walwriter.h"
 #include "storage/bufmgr.h"
 #include "storage/condition_variable.h"
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 58ba33e822..a567aacf73 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,6 +17,7 @@
 #include <time.h>
 
 #include "access/xlog_internal.h" /* for pg_start/stop_backup */
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "common/file_perm.h"
 #include "lib/stringinfo.h"
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 7027737e67..75a3208f74 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -22,11 +22,11 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2b0d889c3b..ab967d7d65 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -19,7 +19,7 @@
 
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
+#include "bestatus.h"
 
 #include "access/heapam.h"
 #include "access/htup.h"
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index ca51318dbb..a5685b8e7e 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -77,13 +77,12 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
-
+#include "bestatus.h"
 #include "catalog/indexing.h"
 #include "nodes/execnodes.h"
 
 #include "replication/origin.h"
 #include "replication/logical.h"
-#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index b79ce5db95..b90768be86 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -61,10 +61,10 @@
 #include "access/tuptoaster.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
 #include "replication/slot.h"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 4053482420..a30f1e012e 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -126,7 +126,7 @@
 #include "access/transam.h"
 #include "access/xact.h"
 
-#include "pgstat.h"
+#include "bestatus.h"
 
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 9408f87614..fafef0578a 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -86,26 +86,28 @@
 #include "postgres.h"
 
 #include "miscadmin.h"
-#include "pgstat.h"
 
 #include "access/heapam.h"
 #include "access/xact.h"
 
+#include "bestatus.h"
+
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_type.h"
 
 #include "commands/copy.h"
 
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 
 #include "replication/logicallauncher.h"
 #include "replication/logicalrelation.h"
 #include "replication/walreceiver.h"
 #include "replication/worker_internal.h"
 
-#include "utils/snapmgr.h"
 #include "storage/ipc.h"
 
+#include "utils/snapmgr.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e4e2ad7b39..d8d7b35058 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -31,6 +31,8 @@
 #include "access/xact.h"
 #include "access/xlog_internal.h"
 
+#include "bestatus.h"
+
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_subscription.h"
@@ -42,17 +44,20 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 
+#include "funcapi.h"
+
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
 
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
 
 #include "nodes/makefuncs.h"
 
 #include "optimizer/planner.h"
 
 #include "parser/parse_relation.h"
-
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/walwriter.h"
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33b23b6b6d..c60e69302a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -41,9 +41,9 @@
 
 #include "access/transam.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "common/string.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
 #include "storage/proc.h"
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 6c160c13c6..02ec91d98e 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -75,8 +75,8 @@
 #include <unistd.h>
 
 #include "access/xact.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/syncrep.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2e90944ad5..bdca25499d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -50,6 +50,7 @@
 #include "access/timeline.h"
 #include "access/transam.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "common/ip.h"
@@ -57,7 +58,6 @@
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/ipc.h"
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2d2eb23eb7..3de752bd4c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -56,6 +56,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
@@ -65,7 +66,6 @@
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
 #include "nodes/replnodes.h"
-#include "pgstat.h"
 #include "replication/basebackup.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
diff --git a/src/backend/statmon/Makefile b/src/backend/statmon/Makefile
new file mode 100644
index 0000000000..64a04878e3
--- /dev/null
+++ b/src/backend/statmon/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/statmon
+#
+# IDENTIFICATION
+#    src/backend/statmon/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/statmon
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = pgstat.o bestatus.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statmon/bestatus.c b/src/backend/statmon/bestatus.c
new file mode 100644
index 0000000000..882a21c89a
--- /dev/null
+++ b/src/backend/statmon/bestatus.c
@@ -0,0 +1,1756 @@
+/* ----------
+ * bestatus.c
+ *
+ * Backend status monitor
+ *
+ * Status data is stored in shared memory. Every backends updates and read it
+ * individually.
+ *
+ * Copyright (c) 2001-2019, PostgreSQL Global Development Group
+ *
+ * src/backend/statmon/bestatus.c
+ * ----------
+ */
+#include "postgres.h"
+
+#include "bestatus.h"
+
+#include "access/xact.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "postmaster/autovacuum.h"
+#include "replication/walsender.h"
+#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/sinvaladt.h"
+#include "utils/ascii.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/probes.h"
+
+
+/* Status for backends including auxiliary */
+static LocalPgBackendStatus *localBackendStatusTable = NULL;
+
+/* Total number of backends including auxiliary */
+static int localNumBackends = 0;
+
+/* ----------
+ * Total number of backends including auxiliary
+ *
+ * We reserve a slot for each possible BackendId, plus one for each
+ * possible auxiliary process type.  (This scheme assumes there is not
+ * more than one of any auxiliary process type at a time.) MaxBackends
+ * includes autovacuum workers and background workers as well.
+ * ----------
+ */
+#define NumBackendStatSlots (MaxBackends + NUM_AUXPROCTYPES)
+
+
+/* ----------
+ * GUC parameters
+ * ----------
+ */
+bool pgstat_track_activities = false;
+int pgstat_track_activity_query_size = 1024;
+
+static MemoryContext pgBeStatLocalContext = NULL;
+
+/* ------------------------------------------------------------
+ * Functions for management of the shared-memory PgBackendStatus array
+ * ------------------------------------------------------------
+ */
+
+static PgBackendStatus *BackendStatusArray = NULL;
+static PgBackendStatus *MyBEEntry = NULL;
+static char *BackendAppnameBuffer = NULL;
+static char *BackendClientHostnameBuffer = NULL;
+static char *BackendActivityBuffer = NULL;
+static Size BackendActivityBufferSize = 0;
+#ifdef USE_SSL
+static PgBackendSSLStatus *BackendSslStatusBuffer = NULL;
+#endif
+
+static const char *pgstat_get_wait_activity(WaitEventActivity w);
+static const char *pgstat_get_wait_client(WaitEventClient w);
+static const char *pgstat_get_wait_ipc(WaitEventIPC w);
+static const char *pgstat_get_wait_timeout(WaitEventTimeout w);
+static const char *pgstat_get_wait_io(WaitEventIO w);
+static void pgstat_setup_memcxt(void);
+static void pgstat_beshutdown_hook(int code, Datum arg);
+/*
+ * Report shared-memory space needed by CreateSharedBackendStatus.
+ */
+Size
+BackendStatusShmemSize(void)
+{
+ Size size;
+
+ /* BackendStatusArray: */
+ size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
+ /* BackendAppnameBuffer: */
+ size = add_size(size,
+ mul_size(NAMEDATALEN, NumBackendStatSlots));
+ /* BackendClientHostnameBuffer: */
+ size = add_size(size,
+ mul_size(NAMEDATALEN, NumBackendStatSlots));
+ /* BackendActivityBuffer: */
+ size = add_size(size,
+ mul_size(pgstat_track_activity_query_size, NumBackendStatSlots));
+#ifdef USE_SSL
+ /* BackendSslStatusBuffer: */
+ size = add_size(size,
+ mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots));
+#endif
+ return size;
+}
+
+/*
+ * Initialize the shared status array and several string buffers
+ * during postmaster startup.
+ */
+void
+CreateSharedBackendStatus(void)
+{
+ Size size;
+ bool found;
+ int i;
+ char   *buffer;
+
+ /* Create or attach to the shared array */
+ size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
+ BackendStatusArray = (PgBackendStatus *)
+ ShmemInitStruct("Backend Status Array", size, &found);
+
+ if (!found)
+ {
+ /*
+ * We're the first - initialize.
+ */
+ MemSet(BackendStatusArray, 0, size);
+ }
+
+ /* Create or attach to the shared appname buffer */
+ size = mul_size(NAMEDATALEN, NumBackendStatSlots);
+ BackendAppnameBuffer = (char *)
+ ShmemInitStruct("Backend Application Name Buffer", size, &found);
+
+ if (!found)
+ {
+ MemSet(BackendAppnameBuffer, 0, size);
+
+ /* Initialize st_appname pointers. */
+ buffer = BackendAppnameBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_appname = buffer;
+ buffer += NAMEDATALEN;
+ }
+ }
+
+ /* Create or attach to the shared client hostname buffer */
+ size = mul_size(NAMEDATALEN, NumBackendStatSlots);
+ BackendClientHostnameBuffer = (char *)
+ ShmemInitStruct("Backend Client Host Name Buffer", size, &found);
+
+ if (!found)
+ {
+ MemSet(BackendClientHostnameBuffer, 0, size);
+
+ /* Initialize st_clienthostname pointers. */
+ buffer = BackendClientHostnameBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_clienthostname = buffer;
+ buffer += NAMEDATALEN;
+ }
+ }
+
+ /* Create or attach to the shared activity buffer */
+ BackendActivityBufferSize = mul_size(pgstat_track_activity_query_size,
+ NumBackendStatSlots);
+ BackendActivityBuffer = (char *)
+ ShmemInitStruct("Backend Activity Buffer",
+ BackendActivityBufferSize,
+ &found);
+
+ if (!found)
+ {
+ MemSet(BackendActivityBuffer, 0, BackendActivityBufferSize);
+
+ /* Initialize st_activity pointers. */
+ buffer = BackendActivityBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_activity_raw = buffer;
+ buffer += pgstat_track_activity_query_size;
+ }
+ }
+
+#ifdef USE_SSL
+ /* Create or attach to the shared SSL status buffer */
+ size = mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots);
+ BackendSslStatusBuffer = (PgBackendSSLStatus *)
+ ShmemInitStruct("Backend SSL Status Buffer", size, &found);
+
+ if (!found)
+ {
+ PgBackendSSLStatus *ptr;
+
+ MemSet(BackendSslStatusBuffer, 0, size);
+
+ /* Initialize st_sslstatus pointers. */
+ ptr = BackendSslStatusBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_sslstatus = ptr;
+ ptr++;
+ }
+ }
+#endif
+}
+
+/* ----------
+ * pgstat_bearray_initialize() -
+ *
+ * Initialize pgstats state, and set up our on-proc-exit hook.
+ * Called from InitPostgres and AuxiliaryProcessMain. For auxiliary process,
+ * MyBackendId is invalid. Otherwise, MyBackendId must be set,
+ * but we must not have started any transaction yet (since the
+ * exit hook must run after the last transaction exit).
+ * NOTE: MyDatabaseId isn't set yet; so the shutdown hook has to be careful.
+ * ----------
+ */
+void
+pgstat_bearray_initialize(void)
+{
+ /* Initialize MyBEEntry */
+ if (MyBackendId != InvalidBackendId)
+ {
+ Assert(MyBackendId >= 1 && MyBackendId <= MaxBackends);
+ MyBEEntry = &BackendStatusArray[MyBackendId - 1];
+ }
+ else
+ {
+ /* Must be an auxiliary process */
+ Assert(MyAuxProcType != NotAnAuxProcess);
+
+ /*
+ * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
+ * have a BackendId, the slot is statically allocated based on the
+ * auxiliary process type (MyAuxProcType).  Backends use slots indexed
+ * in the range from 1 to MaxBackends (inclusive), so we use
+ * MaxBackends + AuxBackendType + 1 as the index of the slot for an
+ * auxiliary process.
+ */
+ MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
+ }
+
+ /* Set up a process-exit hook to clean up */
+ before_shmem_exit(pgstat_beshutdown_hook, 0);
+}
+
+/*
+ * Shut down a single backend's statistics reporting at process exit.
+ *
+ * Flush any remaining statistics counts out to the collector.
+ * Without this, operations triggered during backend exit (such as
+ * temp table deletions) won't be counted.
+ *
+ * Lastly, clear out our entry in the PgBackendStatus array.
+ */
+static void
+pgstat_beshutdown_hook(int code, Datum arg)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Clear my status entry, following the protocol of bumping st_changecount
+ * before and after.  We use a volatile pointer here to ensure the
+ * compiler doesn't try to get cute.
+ */
+ pgstat_increment_changecount_before(beentry);
+
+ beentry->st_procpid = 0; /* mark invalid */
+
+ pgstat_increment_changecount_after(beentry);
+}
+
+
+/* ----------
+ * pgstat_bestart() -
+ *
+ * Initialize this backend's entry in the PgBackendStatus array.
+ * Called from InitPostgres.
+ *
+ * Apart from auxiliary processes, MyBackendId, MyDatabaseId,
+ * session userid, and application_name must be set for a
+ * backend (hence, this cannot be combined with pgstat_initialize).
+ * ----------
+ */
+void
+pgstat_bestart(void)
+{
+ SockAddr clientaddr;
+ volatile PgBackendStatus *beentry;
+
+ /*
+ * To minimize the time spent modifying the PgBackendStatus entry, fetch
+ * all the needed data first.
+ */
+
+ /*
+ * We may not have a MyProcPort (eg, if this is the autovacuum process).
+ * If so, use all-zeroes client address, which is dealt with specially in
+ * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+ */
+ if (MyProcPort)
+ memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
+ else
+ MemSet(&clientaddr, 0, sizeof(clientaddr));
+
+ /*
+ * Initialize my status entry, following the protocol of bumping
+ * st_changecount before and after; and make sure it's even afterwards. We
+ * use a volatile pointer here to ensure the compiler doesn't try to get
+ * cute.
+ */
+ beentry = MyBEEntry;
+
+ /* pgstats state must be initialized from pgstat_initialize() */
+ Assert(beentry != NULL);
+
+ if (MyBackendId != InvalidBackendId)
+ {
+ if (IsAutoVacuumLauncherProcess())
+ {
+ /* Autovacuum Launcher */
+ beentry->st_backendType = B_AUTOVAC_LAUNCHER;
+ }
+ else if (IsAutoVacuumWorkerProcess())
+ {
+ /* Autovacuum Worker */
+ beentry->st_backendType = B_AUTOVAC_WORKER;
+ }
+ else if (am_walsender)
+ {
+ /* Wal sender */
+ beentry->st_backendType = B_WAL_SENDER;
+ }
+ else if (IsBackgroundWorker)
+ {
+ /* bgworker */
+ beentry->st_backendType = B_BG_WORKER;
+ }
+ else
+ {
+ /* client-backend */
+ beentry->st_backendType = B_BACKEND;
+ }
+ }
+ else
+ {
+ /* Must be an auxiliary process */
+ Assert(MyAuxProcType != NotAnAuxProcess);
+ switch (MyAuxProcType)
+ {
+ case StartupProcess:
+ beentry->st_backendType = B_STARTUP;
+ break;
+ case BgWriterProcess:
+ beentry->st_backendType = B_BG_WRITER;
+ break;
+ case CheckpointerProcess:
+ beentry->st_backendType = B_CHECKPOINTER;
+ break;
+ case WalWriterProcess:
+ beentry->st_backendType = B_WAL_WRITER;
+ break;
+ case WalReceiverProcess:
+ beentry->st_backendType = B_WAL_RECEIVER;
+ break;
+ case ArchiverProcess:
+ beentry->st_backendType = B_ARCHIVER;
+ break;
+ default:
+ elog(FATAL, "unrecognized process type: %d",
+ (int) MyAuxProcType);
+ proc_exit(1);
+ }
+ }
+
+ do
+ {
+ pgstat_increment_changecount_before(beentry);
+ } while ((beentry->st_changecount & 1) == 0);
+
+ beentry->st_procpid = MyProcPid;
+ beentry->st_proc_start_timestamp = MyStartTimestamp;
+ beentry->st_activity_start_timestamp = 0;
+ beentry->st_state_start_timestamp = 0;
+ beentry->st_xact_start_timestamp = 0;
+ beentry->st_databaseid = MyDatabaseId;
+
+ /* We have userid for client-backends, wal-sender and bgworker processes */
+ if (beentry->st_backendType == B_BACKEND
+ || beentry->st_backendType == B_WAL_SENDER
+ || beentry->st_backendType == B_BG_WORKER)
+ beentry->st_userid = GetSessionUserId();
+ else
+ beentry->st_userid = InvalidOid;
+
+ beentry->st_clientaddr = clientaddr;
+ if (MyProcPort && MyProcPort->remote_hostname)
+ strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
+ NAMEDATALEN);
+ else
+ beentry->st_clienthostname[0] = '\0';
+#ifdef USE_SSL
+ if (MyProcPort && MyProcPort->ssl != NULL)
+ {
+ beentry->st_ssl = true;
+ beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
+ beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
+ strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
+ strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
+ be_tls_get_peerdn_name(MyProcPort, beentry->st_sslstatus->ssl_clientdn, NAMEDATALEN);
+ }
+ else
+ {
+ beentry->st_ssl = false;
+ }
+#else
+ beentry->st_ssl = false;
+#endif
+ beentry->st_state = STATE_UNDEFINED;
+ beentry->st_appname[0] = '\0';
+ beentry->st_activity_raw[0] = '\0';
+ /* Also make sure the last byte in each string area is always 0 */
+ beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
+ beentry->st_appname[NAMEDATALEN - 1] = '\0';
+ beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
+ beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
+ beentry->st_progress_command_target = InvalidOid;
+
+ /*
+ * we don't zero st_progress_param here to save cycles; nobody should
+ * examine it until st_progress_command has been set to something other
+ * than PROGRESS_COMMAND_INVALID
+ */
+
+ pgstat_increment_changecount_after(beentry);
+
+ /* Update app name to current GUC setting */
+ if (application_name)
+ pgstat_report_appname(application_name);
+}
+
+/* ----------
+ * pgstat_read_current_status() -
+ *
+ * Copy the current contents of the PgBackendStatus array to local memory,
+ * if not already done in this transaction.
+ * ----------
+ */
+static void
+pgstat_read_current_status(void)
+{
+ volatile PgBackendStatus *beentry;
+ LocalPgBackendStatus *localtable;
+ LocalPgBackendStatus *localentry;
+ char   *localappname,
+   *localclienthostname,
+   *localactivity;
+#ifdef USE_SSL
+ PgBackendSSLStatus *localsslstatus;
+#endif
+ int i;
+
+ Assert(IsUnderPostmaster);
+
+ if (localBackendStatusTable)
+ return; /* already done */
+
+ pgstat_setup_memcxt();
+
+ localtable = (LocalPgBackendStatus *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   sizeof(LocalPgBackendStatus) * NumBackendStatSlots);
+ localappname = (char *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   NAMEDATALEN * NumBackendStatSlots);
+ localclienthostname = (char *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   NAMEDATALEN * NumBackendStatSlots);
+ localactivity = (char *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   pgstat_track_activity_query_size * NumBackendStatSlots);
+#ifdef USE_SSL
+ localsslstatus = (PgBackendSSLStatus *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
+#endif
+
+ localNumBackends = 0;
+
+ beentry = BackendStatusArray;
+ localentry = localtable;
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * Follow the protocol of retrying if st_changecount changes while we
+ * copy the entry, or if it's odd.  (The check for odd is needed to
+ * cover the case where we are able to completely copy the entry while
+ * the source backend is between increment steps.) We use a volatile
+ * pointer here to ensure the compiler doesn't try to get cute.
+ */
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_save_changecount_before(beentry, before_changecount);
+
+ localentry->backendStatus.st_procpid = beentry->st_procpid;
+ if (localentry->backendStatus.st_procpid > 0)
+ {
+ memcpy(&localentry->backendStatus, (char *) beentry, sizeof(PgBackendStatus));
+
+ /*
+ * strcpy is safe even if the string is modified concurrently,
+ * because there's always a \0 at the end of the buffer.
+ */
+ strcpy(localappname, (char *) beentry->st_appname);
+ localentry->backendStatus.st_appname = localappname;
+ strcpy(localclienthostname, (char *) beentry->st_clienthostname);
+ localentry->backendStatus.st_clienthostname = localclienthostname;
+ strcpy(localactivity, (char *) beentry->st_activity_raw);
+ localentry->backendStatus.st_activity_raw = localactivity;
+ localentry->backendStatus.st_ssl = beentry->st_ssl;
+#ifdef USE_SSL
+ if (beentry->st_ssl)
+ {
+ memcpy(localsslstatus, beentry->st_sslstatus, sizeof(PgBackendSSLStatus));
+ localentry->backendStatus.st_sslstatus = localsslstatus;
+ }
+#endif
+ }
+
+ pgstat_save_changecount_after(beentry, after_changecount);
+ if (before_changecount == after_changecount &&
+ (before_changecount & 1) == 0)
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ beentry++;
+ /* Only valid entries get included into the local array */
+ if (localentry->backendStatus.st_procpid > 0)
+ {
+ BackendIdGetTransactionIds(i,
+   &localentry->backend_xid,
+   &localentry->backend_xmin);
+
+ localentry++;
+ localappname += NAMEDATALEN;
+ localclienthostname += NAMEDATALEN;
+ localactivity += pgstat_track_activity_query_size;
+#ifdef USE_SSL
+ localsslstatus++;
+#endif
+ localNumBackends++;
+ }
+ }
+
+ /* Set the pointer only after completion of a valid table */
+ localBackendStatusTable = localtable;
+}
+
+
+/* ----------
+ * pgstat_fetch_stat_beentry() -
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * our local copy of the current-activity entry for one backend.
+ *
+ * NB: caller is responsible for a check if the user is permitted to see
+ * this info (especially the querystring).
+ * ----------
+ */
+PgBackendStatus *
+pgstat_fetch_stat_beentry(int beid)
+{
+ pgstat_read_current_status();
+
+ if (beid < 1 || beid > localNumBackends)
+ return NULL;
+
+ return &localBackendStatusTable[beid - 1].backendStatus;
+}
+
+
+/* ----------
+ * pgstat_fetch_stat_local_beentry() -
+ *
+ * Like pgstat_fetch_stat_beentry() but with locally computed additions (like
+ * xid and xmin values of the backend)
+ *
+ * NB: caller is responsible for a check if the user is permitted to see
+ * this info (especially the querystring).
+ * ----------
+ */
+LocalPgBackendStatus *
+pgstat_fetch_stat_local_beentry(int beid)
+{
+ pgstat_read_current_status();
+
+ if (beid < 1 || beid > localNumBackends)
+ return NULL;
+
+ return &localBackendStatusTable[beid - 1];
+}
+
+
+/* ----------
+ * pgstat_fetch_stat_numbackends() -
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the maximum current backend id.
+ * ----------
+ */
+int
+pgstat_fetch_stat_numbackends(void)
+{
+ pgstat_read_current_status();
+
+ return localNumBackends;
+}
+
+/* ----------
+ * pgstat_get_wait_event_type() -
+ *
+ * Return a string representing the current wait event type, backend is
+ * waiting on.
+ */
+const char *
+pgstat_get_wait_event_type(uint32 wait_event_info)
+{
+ uint32 classId;
+ const char *event_type;
+
+ /* report process as not waiting. */
+ if (wait_event_info == 0)
+ return NULL;
+
+ classId = wait_event_info & 0xFF000000;
+
+ switch (classId)
+ {
+ case PG_WAIT_LWLOCK:
+ event_type = "LWLock";
+ break;
+ case PG_WAIT_LOCK:
+ event_type = "Lock";
+ break;
+ case PG_WAIT_BUFFER_PIN:
+ event_type = "BufferPin";
+ break;
+ case PG_WAIT_ACTIVITY:
+ event_type = "Activity";
+ break;
+ case PG_WAIT_CLIENT:
+ event_type = "Client";
+ break;
+ case PG_WAIT_EXTENSION:
+ event_type = "Extension";
+ break;
+ case PG_WAIT_IPC:
+ event_type = "IPC";
+ break;
+ case PG_WAIT_TIMEOUT:
+ event_type = "Timeout";
+ break;
+ case PG_WAIT_IO:
+ event_type = "IO";
+ break;
+ default:
+ event_type = "???";
+ break;
+ }
+
+ return event_type;
+}
+
+/* ----------
+ * pgstat_get_wait_event() -
+ *
+ * Return a string representing the current wait event, backend is
+ * waiting on.
+ */
+const char *
+pgstat_get_wait_event(uint32 wait_event_info)
+{
+ uint32 classId;
+ uint16 eventId;
+ const char *event_name;
+
+ /* report process as not waiting. */
+ if (wait_event_info == 0)
+ return NULL;
+
+ classId = wait_event_info & 0xFF000000;
+ eventId = wait_event_info & 0x0000FFFF;
+
+ switch (classId)
+ {
+ case PG_WAIT_LWLOCK:
+ event_name = GetLWLockIdentifier(classId, eventId);
+ break;
+ case PG_WAIT_LOCK:
+ event_name = GetLockNameFromTagType(eventId);
+ break;
+ case PG_WAIT_BUFFER_PIN:
+ event_name = "BufferPin";
+ break;
+ case PG_WAIT_ACTIVITY:
+ {
+ WaitEventActivity w = (WaitEventActivity) wait_event_info;
+
+ event_name = pgstat_get_wait_activity(w);
+ break;
+ }
+ case PG_WAIT_CLIENT:
+ {
+ WaitEventClient w = (WaitEventClient) wait_event_info;
+
+ event_name = pgstat_get_wait_client(w);
+ break;
+ }
+ case PG_WAIT_EXTENSION:
+ event_name = "Extension";
+ break;
+ case PG_WAIT_IPC:
+ {
+ WaitEventIPC w = (WaitEventIPC) wait_event_info;
+
+ event_name = pgstat_get_wait_ipc(w);
+ break;
+ }
+ case PG_WAIT_TIMEOUT:
+ {
+ WaitEventTimeout w = (WaitEventTimeout) wait_event_info;
+
+ event_name = pgstat_get_wait_timeout(w);
+ break;
+ }
+ case PG_WAIT_IO:
+ {
+ WaitEventIO w = (WaitEventIO) wait_event_info;
+
+ event_name = pgstat_get_wait_io(w);
+ break;
+ }
+ default:
+ event_name = "unknown wait event";
+ break;
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_activity() -
+ *
+ * Convert WaitEventActivity to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_activity(WaitEventActivity w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_ARCHIVER_MAIN:
+ event_name = "ArchiverMain";
+ break;
+ case WAIT_EVENT_AUTOVACUUM_MAIN:
+ event_name = "AutoVacuumMain";
+ break;
+ case WAIT_EVENT_BGWRITER_HIBERNATE:
+ event_name = "BgWriterHibernate";
+ break;
+ case WAIT_EVENT_BGWRITER_MAIN:
+ event_name = "BgWriterMain";
+ break;
+ case WAIT_EVENT_CHECKPOINTER_MAIN:
+ event_name = "CheckpointerMain";
+ break;
+ case WAIT_EVENT_LOGICAL_APPLY_MAIN:
+ event_name = "LogicalApplyMain";
+ break;
+ case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
+ event_name = "LogicalLauncherMain";
+ break;
+ case WAIT_EVENT_RECOVERY_WAL_ALL:
+ event_name = "RecoveryWalAll";
+ break;
+ case WAIT_EVENT_RECOVERY_WAL_STREAM:
+ event_name = "RecoveryWalStream";
+ break;
+ case WAIT_EVENT_SYSLOGGER_MAIN:
+ event_name = "SysLoggerMain";
+ break;
+ case WAIT_EVENT_WAL_RECEIVER_MAIN:
+ event_name = "WalReceiverMain";
+ break;
+ case WAIT_EVENT_WAL_SENDER_MAIN:
+ event_name = "WalSenderMain";
+ break;
+ case WAIT_EVENT_WAL_WRITER_MAIN:
+ event_name = "WalWriterMain";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_client() -
+ *
+ * Convert WaitEventClient to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_client(WaitEventClient w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_CLIENT_READ:
+ event_name = "ClientRead";
+ break;
+ case WAIT_EVENT_CLIENT_WRITE:
+ event_name = "ClientWrite";
+ break;
+ case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
+ event_name = "LibPQWalReceiverConnect";
+ break;
+ case WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE:
+ event_name = "LibPQWalReceiverReceive";
+ break;
+ case WAIT_EVENT_SSL_OPEN_SERVER:
+ event_name = "SSLOpenServer";
+ break;
+ case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
+ event_name = "WalReceiverWaitStart";
+ break;
+ case WAIT_EVENT_WAL_SENDER_WAIT_WAL:
+ event_name = "WalSenderWaitForWAL";
+ break;
+ case WAIT_EVENT_WAL_SENDER_WRITE_DATA:
+ event_name = "WalSenderWriteData";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_ipc() -
+ *
+ * Convert WaitEventIPC to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_ipc(WaitEventIPC w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_BGWORKER_SHUTDOWN:
+ event_name = "BgWorkerShutdown";
+ break;
+ case WAIT_EVENT_BGWORKER_STARTUP:
+ event_name = "BgWorkerStartup";
+ break;
+ case WAIT_EVENT_BTREE_PAGE:
+ event_name = "BtreePage";
+ break;
+ case WAIT_EVENT_CLOG_GROUP_UPDATE:
+ event_name = "ClogGroupUpdate";
+ break;
+ case WAIT_EVENT_EXECUTE_GATHER:
+ event_name = "ExecuteGather";
+ break;
+ case WAIT_EVENT_HASH_BATCH_ALLOCATING:
+ event_name = "Hash/Batch/Allocating";
+ break;
+ case WAIT_EVENT_HASH_BATCH_ELECTING:
+ event_name = "Hash/Batch/Electing";
+ break;
+ case WAIT_EVENT_HASH_BATCH_LOADING:
+ event_name = "Hash/Batch/Loading";
+ break;
+ case WAIT_EVENT_HASH_BUILD_ALLOCATING:
+ event_name = "Hash/Build/Allocating";
+ break;
+ case WAIT_EVENT_HASH_BUILD_ELECTING:
+ event_name = "Hash/Build/Electing";
+ break;
+ case WAIT_EVENT_HASH_BUILD_HASHING_INNER:
+ event_name = "Hash/Build/HashingInner";
+ break;
+ case WAIT_EVENT_HASH_BUILD_HASHING_OUTER:
+ event_name = "Hash/Build/HashingOuter";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATING:
+ event_name = "Hash/GrowBatches/Allocating";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_DECIDING:
+ event_name = "Hash/GrowBatches/Deciding";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_ELECTING:
+ event_name = "Hash/GrowBatches/Electing";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_FINISHING:
+ event_name = "Hash/GrowBatches/Finishing";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_REPARTITIONING:
+ event_name = "Hash/GrowBatches/Repartitioning";
+ break;
+ case WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATING:
+ event_name = "Hash/GrowBuckets/Allocating";
+ break;
+ case WAIT_EVENT_HASH_GROW_BUCKETS_ELECTING:
+ event_name = "Hash/GrowBuckets/Electing";
+ break;
+ case WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING:
+ event_name = "Hash/GrowBuckets/Reinserting";
+ break;
+ case WAIT_EVENT_LOGICAL_SYNC_DATA:
+ event_name = "LogicalSyncData";
+ break;
+ case WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE:
+ event_name = "LogicalSyncStateChange";
+ break;
+ case WAIT_EVENT_MQ_INTERNAL:
+ event_name = "MessageQueueInternal";
+ break;
+ case WAIT_EVENT_MQ_PUT_MESSAGE:
+ event_name = "MessageQueuePutMessage";
+ break;
+ case WAIT_EVENT_MQ_RECEIVE:
+ event_name = "MessageQueueReceive";
+ break;
+ case WAIT_EVENT_MQ_SEND:
+ event_name = "MessageQueueSend";
+ break;
+ case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
+ event_name = "ParallelBitmapScan";
+ break;
+ case WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN:
+ event_name = "ParallelCreateIndexScan";
+ break;
+ case WAIT_EVENT_PARALLEL_FINISH:
+ event_name = "ParallelFinish";
+ break;
+ case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
+ event_name = "ProcArrayGroupUpdate";
+ break;
+ case WAIT_EVENT_PROMOTE:
+ event_name = "Promote";
+ break;
+ case WAIT_EVENT_REPLICATION_ORIGIN_DROP:
+ event_name = "ReplicationOriginDrop";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_DROP:
+ event_name = "ReplicationSlotDrop";
+ break;
+ case WAIT_EVENT_SAFE_SNAPSHOT:
+ event_name = "SafeSnapshot";
+ break;
+ case WAIT_EVENT_SYNC_REP:
+ event_name = "SyncRep";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_timeout() -
+ *
+ * Convert WaitEventTimeout to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_timeout(WaitEventTimeout w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_BASE_BACKUP_THROTTLE:
+ event_name = "BaseBackupThrottle";
+ break;
+ case WAIT_EVENT_PG_SLEEP:
+ event_name = "PgSleep";
+ break;
+ case WAIT_EVENT_RECOVERY_APPLY_DELAY:
+ event_name = "RecoveryApplyDelay";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_io() -
+ *
+ * Convert WaitEventIO to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_io(WaitEventIO w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_BUFFILE_READ:
+ event_name = "BufFileRead";
+ break;
+ case WAIT_EVENT_BUFFILE_WRITE:
+ event_name = "BufFileWrite";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_READ:
+ event_name = "ControlFileRead";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_SYNC:
+ event_name = "ControlFileSync";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE:
+ event_name = "ControlFileSyncUpdate";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_WRITE:
+ event_name = "ControlFileWrite";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE:
+ event_name = "ControlFileWriteUpdate";
+ break;
+ case WAIT_EVENT_COPY_FILE_READ:
+ event_name = "CopyFileRead";
+ break;
+ case WAIT_EVENT_COPY_FILE_WRITE:
+ event_name = "CopyFileWrite";
+ break;
+ case WAIT_EVENT_DATA_FILE_EXTEND:
+ event_name = "DataFileExtend";
+ break;
+ case WAIT_EVENT_DATA_FILE_FLUSH:
+ event_name = "DataFileFlush";
+ break;
+ case WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC:
+ event_name = "DataFileImmediateSync";
+ break;
+ case WAIT_EVENT_DATA_FILE_PREFETCH:
+ event_name = "DataFilePrefetch";
+ break;
+ case WAIT_EVENT_DATA_FILE_READ:
+ event_name = "DataFileRead";
+ break;
+ case WAIT_EVENT_DATA_FILE_SYNC:
+ event_name = "DataFileSync";
+ break;
+ case WAIT_EVENT_DATA_FILE_TRUNCATE:
+ event_name = "DataFileTruncate";
+ break;
+ case WAIT_EVENT_DATA_FILE_WRITE:
+ event_name = "DataFileWrite";
+ break;
+ case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
+ event_name = "DSMFillZeroWrite";
+ break;
+ case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
+ event_name = "LockFileAddToDataDirRead";
+ break;
+ case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC:
+ event_name = "LockFileAddToDataDirSync";
+ break;
+ case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE:
+ event_name = "LockFileAddToDataDirWrite";
+ break;
+ case WAIT_EVENT_LOCK_FILE_CREATE_READ:
+ event_name = "LockFileCreateRead";
+ break;
+ case WAIT_EVENT_LOCK_FILE_CREATE_SYNC:
+ event_name = "LockFileCreateSync";
+ break;
+ case WAIT_EVENT_LOCK_FILE_CREATE_WRITE:
+ event_name = "LockFileCreateWrite";
+ break;
+ case WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ:
+ event_name = "LockFileReCheckDataDirRead";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC:
+ event_name = "LogicalRewriteCheckpointSync";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC:
+ event_name = "LogicalRewriteMappingSync";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE:
+ event_name = "LogicalRewriteMappingWrite";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_SYNC:
+ event_name = "LogicalRewriteSync";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE:
+ event_name = "LogicalRewriteTruncate";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_WRITE:
+ event_name = "LogicalRewriteWrite";
+ break;
+ case WAIT_EVENT_RELATION_MAP_READ:
+ event_name = "RelationMapRead";
+ break;
+ case WAIT_EVENT_RELATION_MAP_SYNC:
+ event_name = "RelationMapSync";
+ break;
+ case WAIT_EVENT_RELATION_MAP_WRITE:
+ event_name = "RelationMapWrite";
+ break;
+ case WAIT_EVENT_REORDER_BUFFER_READ:
+ event_name = "ReorderBufferRead";
+ break;
+ case WAIT_EVENT_REORDER_BUFFER_WRITE:
+ event_name = "ReorderBufferWrite";
+ break;
+ case WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ:
+ event_name = "ReorderLogicalMappingRead";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_READ:
+ event_name = "ReplicationSlotRead";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC:
+ event_name = "ReplicationSlotRestoreSync";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_SYNC:
+ event_name = "ReplicationSlotSync";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_WRITE:
+ event_name = "ReplicationSlotWrite";
+ break;
+ case WAIT_EVENT_SLRU_FLUSH_SYNC:
+ event_name = "SLRUFlushSync";
+ break;
+ case WAIT_EVENT_SLRU_READ:
+ event_name = "SLRURead";
+ break;
+ case WAIT_EVENT_SLRU_SYNC:
+ event_name = "SLRUSync";
+ break;
+ case WAIT_EVENT_SLRU_WRITE:
+ event_name = "SLRUWrite";
+ break;
+ case WAIT_EVENT_SNAPBUILD_READ:
+ event_name = "SnapbuildRead";
+ break;
+ case WAIT_EVENT_SNAPBUILD_SYNC:
+ event_name = "SnapbuildSync";
+ break;
+ case WAIT_EVENT_SNAPBUILD_WRITE:
+ event_name = "SnapbuildWrite";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC:
+ event_name = "TimelineHistoryFileSync";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE:
+ event_name = "TimelineHistoryFileWrite";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_READ:
+ event_name = "TimelineHistoryRead";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_SYNC:
+ event_name = "TimelineHistorySync";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_WRITE:
+ event_name = "TimelineHistoryWrite";
+ break;
+ case WAIT_EVENT_TWOPHASE_FILE_READ:
+ event_name = "TwophaseFileRead";
+ break;
+ case WAIT_EVENT_TWOPHASE_FILE_SYNC:
+ event_name = "TwophaseFileSync";
+ break;
+ case WAIT_EVENT_TWOPHASE_FILE_WRITE:
+ event_name = "TwophaseFileWrite";
+ break;
+ case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
+ event_name = "WALSenderTimelineHistoryRead";
+ break;
+ case WAIT_EVENT_WAL_BOOTSTRAP_SYNC:
+ event_name = "WALBootstrapSync";
+ break;
+ case WAIT_EVENT_WAL_BOOTSTRAP_WRITE:
+ event_name = "WALBootstrapWrite";
+ break;
+ case WAIT_EVENT_WAL_COPY_READ:
+ event_name = "WALCopyRead";
+ break;
+ case WAIT_EVENT_WAL_COPY_SYNC:
+ event_name = "WALCopySync";
+ break;
+ case WAIT_EVENT_WAL_COPY_WRITE:
+ event_name = "WALCopyWrite";
+ break;
+ case WAIT_EVENT_WAL_INIT_SYNC:
+ event_name = "WALInitSync";
+ break;
+ case WAIT_EVENT_WAL_INIT_WRITE:
+ event_name = "WALInitWrite";
+ break;
+ case WAIT_EVENT_WAL_READ:
+ event_name = "WALRead";
+ break;
+ case WAIT_EVENT_WAL_SYNC:
+ event_name = "WALSync";
+ break;
+ case WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN:
+ event_name = "WALSyncMethodAssign";
+ break;
+ case WAIT_EVENT_WAL_WRITE:
+ event_name = "WALWrite";
+ break;
+
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+
+/* ----------
+ * pgstat_get_backend_current_activity() -
+ *
+ * Return a string representing the current activity of the backend with
+ * the specified PID.  This looks directly at the BackendStatusArray,
+ * and so will provide current information regardless of the age of our
+ * transaction's snapshot of the status array.
+ *
+ * It is the caller's responsibility to invoke this only for backends whose
+ * state is expected to remain stable while the result is in use.  The
+ * only current use is in deadlock reporting, where we can expect that
+ * the target backend is blocked on a lock.  (There are corner cases
+ * where the target's wait could get aborted while we are looking at it,
+ * but the very worst consequence is to return a pointer to a string
+ * that's been changed, so we won't worry too much.)
+ *
+ * Note: return strings for special cases match pg_stat_get_backend_activity.
+ * ----------
+ */
+const char *
+pgstat_get_backend_current_activity(int pid, bool checkUser)
+{
+ PgBackendStatus *beentry;
+ int i;
+
+ beentry = BackendStatusArray;
+ for (i = 1; i <= MaxBackends; i++)
+ {
+ /*
+ * Although we expect the target backend's entry to be stable, that
+ * doesn't imply that anyone else's is.  To avoid identifying the
+ * wrong backend, while we check for a match to the desired PID we
+ * must follow the protocol of retrying if st_changecount changes
+ * while we examine the entry, or if it's odd.  (This might be
+ * unnecessary, since fetching or storing an int is almost certainly
+ * atomic, but let's play it safe.)  We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_save_changecount_before(vbeentry, before_changecount);
+
+ found = (vbeentry->st_procpid == pid);
+
+ pgstat_save_changecount_after(vbeentry, after_changecount);
+
+ if (before_changecount == after_changecount &&
+ (before_changecount & 1) == 0)
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ {
+ /* Now it is safe to use the non-volatile pointer */
+ if (checkUser && !superuser() && beentry->st_userid != GetUserId())
+ return "<insufficient privilege>";
+ else if (*(beentry->st_activity_raw) == '\0')
+ return "<command string not enabled>";
+ else
+ {
+ /* this'll leak a bit of memory, but that seems acceptable */
+ return pgstat_clip_activity(beentry->st_activity_raw);
+ }
+ }
+
+ beentry++;
+ }
+
+ /* If we get here, caller is in error ... */
+ return "<backend information not available>";
+}
+
+/* ----------
+ * pgstat_get_crashed_backend_activity() -
+ *
+ * Return a string representing the current activity of the backend with
+ * the specified PID.  Like the function above, but reads shared memory with
+ * the expectation that it may be corrupt.  On success, copy the string
+ * into the "buffer" argument and return that pointer.  On failure,
+ * return NULL.
+ *
+ * This function is only intended to be used by the postmaster to report the
+ * query that crashed a backend.  In particular, no attempt is made to
+ * follow the correct concurrency protocol when accessing the
+ * BackendStatusArray.  But that's OK, in the worst case we'll return a
+ * corrupted message.  We also must take care not to trip on ereport(ERROR).
+ * ----------
+ */
+const char *
+pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen)
+{
+ volatile PgBackendStatus *beentry;
+ int i;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return NULL;
+
+ for (i = 1; i <= MaxBackends; i++)
+ {
+ if (beentry->st_procpid == pid)
+ {
+ /* Read pointer just once, so it can't change after validation */
+ const char *activity = beentry->st_activity_raw;
+ const char *activity_last;
+
+ /*
+ * We mustn't access activity string before we verify that it
+ * falls within the BackendActivityBuffer. To make sure that the
+ * entire string including its ending is contained within the
+ * buffer, subtract one activity length from the buffer size.
+ */
+ activity_last = BackendActivityBuffer + BackendActivityBufferSize
+ - pgstat_track_activity_query_size;
+
+ if (activity < BackendActivityBuffer ||
+ activity > activity_last)
+ return NULL;
+
+ /* If no string available, no point in a report */
+ if (activity[0] == '\0')
+ return NULL;
+
+ /*
+ * Copy only ASCII-safe characters so we don't run into encoding
+ * problems when reporting the message; and be sure not to run off
+ * the end of memory.  As only ASCII characters are reported, it
+ * doesn't seem necessary to perform multibyte aware clipping.
+ */
+ ascii_safe_strlcpy(buffer, activity,
+   Min(buflen, pgstat_track_activity_query_size));
+
+ return buffer;
+ }
+
+ beentry++;
+ }
+
+ /* PID not found */
+ return NULL;
+}
+
+const char *
+pgstat_get_backend_desc(BackendType backendType)
+{
+ const char *backendDesc = "unknown process type";
+
+ switch (backendType)
+ {
+ case B_AUTOVAC_LAUNCHER:
+ backendDesc = "autovacuum launcher";
+ break;
+ case B_AUTOVAC_WORKER:
+ backendDesc = "autovacuum worker";
+ break;
+ case B_BACKEND:
+ backendDesc = "client backend";
+ break;
+ case B_BG_WORKER:
+ backendDesc = "background worker";
+ break;
+ case B_BG_WRITER:
+ backendDesc = "background writer";
+ break;
+ case B_ARCHIVER:
+ backendDesc = "archiver";
+ break;
+ case B_CHECKPOINTER:
+ backendDesc = "checkpointer";
+ break;
+ case B_STARTUP:
+ backendDesc = "startup";
+ break;
+ case B_WAL_RECEIVER:
+ backendDesc = "walreceiver";
+ break;
+ case B_WAL_SENDER:
+ backendDesc = "walsender";
+ break;
+ case B_WAL_WRITER:
+ backendDesc = "walwriter";
+ break;
+ }
+
+ return backendDesc;
+}
+
+/* ----------
+ * pgstat_report_appname() -
+ *
+ * Called to update our application name.
+ * ----------
+ */
+void
+pgstat_report_appname(const char *appname)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+ int len;
+
+ if (!beentry)
+ return;
+
+ /* This should be unnecessary if GUC did its job, but be safe */
+ len = pg_mbcliplen(appname, strlen(appname), NAMEDATALEN - 1);
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after.  We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ pgstat_increment_changecount_before(beentry);
+
+ memcpy((char *) beentry->st_appname, appname, len);
+ beentry->st_appname[len] = '\0';
+
+ pgstat_increment_changecount_after(beentry);
+}
+
+/*
+ * Report current transaction start timestamp as the specified value.
+ * Zero means there is no active transaction.
+ */
+void
+pgstat_report_xact_timestamp(TimestampTz tstamp)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!pgstat_track_activities || !beentry)
+ return;
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after.  We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ pgstat_increment_changecount_before(beentry);
+ beentry->st_xact_start_timestamp = tstamp;
+ pgstat_increment_changecount_after(beentry);
+}
+
+/* ----------
+ * pgstat_setup_memcxt() -
+ *
+ * Create pgBeStatLocalContext, if not already done.
+ * ----------
+ */
+static void
+pgstat_setup_memcxt(void)
+{
+ if (!pgBeStatLocalContext)
+ pgBeStatLocalContext = AllocSetContextCreate(TopMemoryContext,
+ "Backend status snapshot",
+ ALLOCSET_SMALL_SIZES);
+}
+
+/* ----------
+ * pgstat_clear_snapshot() -
+ *
+ * Discard any data collected in the current transaction.  Any subsequent
+ * request will cause new snapshots to be read.
+ *
+ * This is also invoked during transaction commit or abort to discard
+ * the no-longer-wanted snapshot.
+ * ----------
+ */
+void
+pgstat_bestatus_clear_snapshot(void)
+{
+ /* Release memory, if any was allocated */
+ if (pgBeStatLocalContext)
+ MemoryContextDelete(pgBeStatLocalContext);
+
+ /* Reset variables */
+ pgBeStatLocalContext = NULL;
+ localBackendStatusTable = NULL;
+ localNumBackends = 0;
+}
+
+
+
+/* ----------
+ * pgstat_report_activity() -
+ *
+ * Called from tcop/postgres.c to report what the backend is actually doing
+ * (but note cmd_str can be NULL for certain cases).
+ *
+ * All updates of the status entry follow the protocol of bumping
+ * st_changecount before and after.  We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ * ----------
+ */
+void
+pgstat_report_activity(BackendState state, const char *cmd_str)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+ TimestampTz start_timestamp;
+ TimestampTz current_timestamp;
+ int len = 0;
+
+ TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
+
+ if (!beentry)
+ return;
+
+ if (!pgstat_track_activities)
+ {
+ if (beentry->st_state != STATE_DISABLED)
+ {
+ volatile PGPROC *proc = MyProc;
+
+ /*
+ * track_activities is disabled, but we last reported a
+ * non-disabled state.  As our final update, change the state and
+ * clear fields we will not be updating anymore.
+ */
+ pgstat_increment_changecount_before(beentry);
+ beentry->st_state = STATE_DISABLED;
+ beentry->st_state_start_timestamp = 0;
+ beentry->st_activity_raw[0] = '\0';
+ beentry->st_activity_start_timestamp = 0;
+ /* st_xact_start_timestamp and wait_event_info are also disabled */
+ beentry->st_xact_start_timestamp = 0;
+ proc->wait_event_info = 0;
+ pgstat_increment_changecount_after(beentry);
+ }
+ return;
+ }
+
+ /*
+ * To minimize the time spent modifying the entry, fetch all the needed
+ * data first.
+ */
+ start_timestamp = GetCurrentStatementStartTimestamp();
+ if (cmd_str != NULL)
+ {
+ /*
+ * Compute length of to-be-stored string unaware of multi-byte
+ * characters. For speed reasons that'll get corrected on read, rather
+ * than computed every write.
+ */
+ len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1);
+ }
+ current_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Now update the status entry
+ */
+ pgstat_increment_changecount_before(beentry);
+
+ beentry->st_state = state;
+ beentry->st_state_start_timestamp = current_timestamp;
+
+ if (cmd_str != NULL)
+ {
+ memcpy((char *) beentry->st_activity_raw, cmd_str, len);
+ beentry->st_activity_raw[len] = '\0';
+ beentry->st_activity_start_timestamp = start_timestamp;
+ }
+
+ pgstat_increment_changecount_after(beentry);
+}
+
+/*-----------
+ * pgstat_progress_start_command() -
+ *
+ * Set st_progress_command (and st_progress_command_target) in own backend
+ * entry.  Also, zero-initialize st_progress_param array.
+ *-----------
+ */
+void
+pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ return;
+
+ pgstat_increment_changecount_before(beentry);
+ beentry->st_progress_command = cmdtype;
+ beentry->st_progress_command_target = relid;
+ MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
+ pgstat_increment_changecount_after(beentry);
+}
+
+/*-----------
+ * pgstat_progress_update_param() -
+ *
+ * Update index'th member in st_progress_param[] of own backend entry.
+ *-----------
+ */
+void
+pgstat_progress_update_param(int index, int64 val)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+ if (!beentry || !pgstat_track_activities)
+ return;
+
+ pgstat_increment_changecount_before(beentry);
+ beentry->st_progress_param[index] = val;
+ pgstat_increment_changecount_after(beentry);
+}
+
+/*-----------
+ * pgstat_progress_update_multi_param() -
+ *
+ * Update multiple members in st_progress_param[] of own backend entry.
+ * This is atomic; readers won't see intermediate states.
+ *-----------
+ */
+void
+pgstat_progress_update_multi_param(int nparam, const int *index,
+   const int64 *val)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+ int i;
+
+ if (!beentry || !pgstat_track_activities || nparam == 0)
+ return;
+
+ pgstat_increment_changecount_before(beentry);
+
+ for (i = 0; i < nparam; ++i)
+ {
+ Assert(index[i] >= 0 && index[i] < PGSTAT_NUM_PROGRESS_PARAM);
+
+ beentry->st_progress_param[index[i]] = val[i];
+ }
+
+ pgstat_increment_changecount_after(beentry);
+}
+
+/*-----------
+ * pgstat_progress_end_command() -
+ *
+ * Reset st_progress_command (and st_progress_command_target) in own backend
+ * entry.  This signals the end of the command.
+ *-----------
+ */
+void
+pgstat_progress_end_command(void)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+ if (!pgstat_track_activities
+ && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
+ return;
+
+ pgstat_increment_changecount_before(beentry);
+ beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
+ beentry->st_progress_command_target = InvalidOid;
+ pgstat_increment_changecount_after(beentry);
+}
+
+
+/*
+ * Convert a potentially unsafely truncated activity string (see
+ * PgBackendStatus.st_activity_raw's documentation) into a correctly truncated
+ * one.
+ *
+ * The returned string is allocated in the caller's memory context and may be
+ * freed.
+ */
+char *
+pgstat_clip_activity(const char *raw_activity)
+{
+ char   *activity;
+ int rawlen;
+ int cliplen;
+
+ /*
+ * Some callers, like pgstat_get_backend_current_activity(), do not
+ * guarantee that the buffer isn't concurrently modified. We try to take
+ * care that the buffer is always terminated by a NUL byte regardless, but
+ * let's still be paranoid about the string's length. In those cases the
+ * underlying buffer is guaranteed to be pgstat_track_activity_query_size
+ * large.
+ */
+ activity = pnstrdup(raw_activity, pgstat_track_activity_query_size - 1);
+
+ /* now double-guaranteed to be NUL terminated */
+ rawlen = strlen(activity);
+
+ /*
+ * All supported server-encodings make it possible to determine the length
+ * of a multi-byte character from its first byte (this is not the case for
+ * client encodings, see GB18030). As st_activity is always stored using
+ * server encoding, this allows us to perform multi-byte aware truncation,
+ * even if the string earlier was truncated in the middle of a multi-byte
+ * character.
+ */
+ cliplen = pg_mbcliplen(activity, rawlen,
+   pgstat_track_activity_query_size - 1);
+
+ activity[cliplen] = '\0';
+
+ return activity;
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/statmon/pgstat.c
similarity index 70%
rename from src/backend/postmaster/pgstat.c
rename to src/backend/statmon/pgstat.c
index 78f0bbb558..7ec7a454c7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/statmon/pgstat.c
@@ -8,7 +8,7 @@
  *
  * Copyright (c) 2001-2019, PostgreSQL Global Development Group
  *
- * src/backend/postmaster/pgstat.c
+ * src/backend/statmon/pgstat.c
  * ----------
  */
 #include "postgres.h"
@@ -21,19 +21,14 @@
 #include "access/htup_details.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_proc.h"
-#include "libpq/libpq.h"
 #include "miscadmin.h"
-#include "pg_trace.h"
 #include "postmaster/autovacuum.h"
-#include "replication/walsender.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/procsignal.h"
-#include "storage/sinvaladt.h"
-#include "utils/ascii.h"
-#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -68,26 +63,12 @@ typedef enum
  PGSTAT_ENTRY_LOCK_FAILED
 } pg_stat_table_result_status;
 
-/* ----------
- * Total number of backends including auxiliary
- *
- * We reserve a slot for each possible BackendId, plus one for each
- * possible auxiliary process type.  (This scheme assumes there is not
- * more than one of any auxiliary process type at a time.) MaxBackends
- * includes autovacuum workers and background workers as well.
- * ----------
- */
-#define NumBackendStatSlots (MaxBackends + NUM_AUXPROCTYPES)
-
-
 /* ----------
  * GUC parameters
  * ----------
  */
-bool pgstat_track_activities = false;
 bool pgstat_track_counts = false;
 int pgstat_track_functions = TRACK_FUNC_OFF;
-int pgstat_track_activity_query_size = 1024;
 
 /*
  * This was a GUC parameter and no longer used in this file. But left alone
@@ -131,6 +112,8 @@ static bool pgstat_pending_recoveryconflict = false;
 static bool pgstat_pending_deadlock = false;
 static bool pgstat_pending_tempfile = false;
 
+static MemoryContext pgStatLocalContext = NULL;
+
 /* dshash parameter for each type of table */
 static const dshash_parameters dsh_dbparams = {
  sizeof(Oid),
@@ -242,15 +225,8 @@ typedef struct
 /*
  * Info about current "snapshot" of stats file
  */
-static MemoryContext pgStatLocalContext = NULL;
 static HTAB *pgStatDBHash = NULL;
 
-/* Status for backends including auxiliary */
-static LocalPgBackendStatus *localBackendStatusTable = NULL;
-
-/* Total number of backends including auxiliary */
-static int localNumBackends = 0;
-
 /*
  * Cluster wide statistics.
  * Contains statistics that are not collected per database or per table.
@@ -286,7 +262,6 @@ static void pgstat_read_db_statsfile(Oid databaseid, dshash_table *tabhash, dsha
 /* functions used in backends */
 static bool backend_snapshot_global_stats(void);
 static PgStat_StatFuncEntry *backend_get_func_etnry(PgStat_StatDBEntry *dbent, Oid funcid, bool oneshot);
-static void pgstat_read_current_status(void);
 
 static void pgstat_postmaster_shutdown(int code, Datum arg);
 static void pgstat_apply_pending_tabstats(bool shared, bool force,
@@ -313,12 +288,6 @@ static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
 
 static void pgstat_setup_memcxt(void);
 
-static const char *pgstat_get_wait_activity(WaitEventActivity w);
-static const char *pgstat_get_wait_client(WaitEventClient w);
-static const char *pgstat_get_wait_ipc(WaitEventIPC w);
-static const char *pgstat_get_wait_timeout(WaitEventTimeout w);
-static const char *pgstat_get_wait_io(WaitEventIO w);
-
 static bool pgstat_update_tabentry(dshash_table *tabhash,
    PgStat_TableStatus *stat, bool nowait);
 static void pgstat_update_dbentry(PgStat_StatDBEntry *dbentry,
@@ -329,6 +298,14 @@ static void pgstat_update_dbentry(PgStat_StatDBEntry *dbentry,
  * ------------------------------------------------------------
  */
 
+
+void
+pgstat_initialize(void)
+{
+ /* Set up a process-exit hook to clean up */
+ before_shmem_exit(pgstat_beshutdown_hook, 0);
+}
+
 /*
  * subroutine for pgstat_reset_all
  */
@@ -490,7 +467,7 @@ pgstat_update_stat(bool force)
  */
  TimestampDifference(last_report, now, &secs, &usecs);
  elapsed = secs * 1000 + usecs /1000;
-
+
  if(elapsed < PGSTAT_STAT_MIN_INTERVAL)
  {
  /* we know we have some statistics */
@@ -746,7 +723,7 @@ pgstat_apply_tabstat(pgstat_apply_tabstat_context *cxt,
  pgStatBlockReadTime = 0;
  pgStatBlockWriteTime = 0;
  }
-
+
  cxt->tabhash =
  dshash_attach(area, &dsh_tblparams, cxt->dbentry->tables, 0);
  }
@@ -806,7 +783,7 @@ pgstat_merge_tabentry(PgStat_TableStatus *deststat,
  dest->t_blocks_hit += src->t_blocks_hit;
  }
 }
-
+
 /*
  * pgstat_update_funcstats: subroutine for pgstat_update_stat
  *
@@ -926,7 +903,7 @@ pgstat_update_funcstats(bool force, PgStat_StatDBEntry *dbentry)
  hash_search(pgStatPendingFunctions,
  (void *) &(pendent->functionid), HASH_REMOVE, NULL);
  }
- }
+ }
 
  /* destroy the hsah if no entry remains */
  if (hash_get_num_entries(pgStatPendingFunctions) == 0)
@@ -1064,7 +1041,7 @@ pgstat_vacuum_stat(void)
  dbentry = pgstat_get_db_entry(MyDatabaseId, PGSTAT_FETCH_EXCLUSIVE, NULL);
  if (!dbentry)
  return;
-
+
  /*
  * Similarly to above, make a list of all known relations in this DB.
  */
@@ -2621,66 +2598,6 @@ pgstat_fetch_stat_funcentry(Oid func_id)
  return funcentry;
 }
 
-
-/* ----------
- * pgstat_fetch_stat_beentry() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * our local copy of the current-activity entry for one backend.
- *
- * NB: caller is responsible for a check if the user is permitted to see
- * this info (especially the querystring).
- * ----------
- */
-PgBackendStatus *
-pgstat_fetch_stat_beentry(int beid)
-{
- pgstat_read_current_status();
-
- if (beid < 1 || beid > localNumBackends)
- return NULL;
-
- return &localBackendStatusTable[beid - 1].backendStatus;
-}
-
-
-/* ----------
- * pgstat_fetch_stat_local_beentry() -
- *
- * Like pgstat_fetch_stat_beentry() but with locally computed additions (like
- * xid and xmin values of the backend)
- *
- * NB: caller is responsible for a check if the user is permitted to see
- * this info (especially the querystring).
- * ----------
- */
-LocalPgBackendStatus *
-pgstat_fetch_stat_local_beentry(int beid)
-{
- pgstat_read_current_status();
-
- if (beid < 1 || beid > localNumBackends)
- return NULL;
-
- return &localBackendStatusTable[beid - 1];
-}
-
-
-/* ----------
- * pgstat_fetch_stat_numbackends() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * the maximum current backend id.
- * ----------
- */
-int
-pgstat_fetch_stat_numbackends(void)
-{
- pgstat_read_current_status();
-
- return localNumBackends;
-}
-
 /*
  * ---------
  * pgstat_fetch_stat_archiver() -
@@ -2718,364 +2635,6 @@ pgstat_fetch_global(void)
  return snapshot_globalStats;
 }
 
-
-/* ------------------------------------------------------------
- * Functions for management of the shared-memory PgBackendStatus array
- * ------------------------------------------------------------
- */
-
-static PgBackendStatus *BackendStatusArray = NULL;
-static PgBackendStatus *MyBEEntry = NULL;
-static char *BackendAppnameBuffer = NULL;
-static char *BackendClientHostnameBuffer = NULL;
-static char *BackendActivityBuffer = NULL;
-static Size BackendActivityBufferSize = 0;
-#ifdef USE_SSL
-static PgBackendSSLStatus *BackendSslStatusBuffer = NULL;
-#endif
-
-
-/*
- * Report shared-memory space needed by CreateSharedBackendStatus.
- */
-Size
-BackendStatusShmemSize(void)
-{
- Size size;
-
- /* BackendStatusArray: */
- size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
- /* BackendAppnameBuffer: */
- size = add_size(size,
- mul_size(NAMEDATALEN, NumBackendStatSlots));
- /* BackendClientHostnameBuffer: */
- size = add_size(size,
- mul_size(NAMEDATALEN, NumBackendStatSlots));
- /* BackendActivityBuffer: */
- size = add_size(size,
- mul_size(pgstat_track_activity_query_size, NumBackendStatSlots));
-#ifdef USE_SSL
- /* BackendSslStatusBuffer: */
- size = add_size(size,
- mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots));
-#endif
- return size;
-}
-
-/*
- * Initialize the shared status array and several string buffers
- * during postmaster startup.
- */
-void
-CreateSharedBackendStatus(void)
-{
- Size size;
- bool found;
- int i;
- char   *buffer;
-
- /* Create or attach to the shared array */
- size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
- BackendStatusArray = (PgBackendStatus *)
- ShmemInitStruct("Backend Status Array", size, &found);
-
- if (!found)
- {
- /*
- * We're the first - initialize.
- */
- MemSet(BackendStatusArray, 0, size);
- }
-
- /* Create or attach to the shared appname buffer */
- size = mul_size(NAMEDATALEN, NumBackendStatSlots);
- BackendAppnameBuffer = (char *)
- ShmemInitStruct("Backend Application Name Buffer", size, &found);
-
- if (!found)
- {
- MemSet(BackendAppnameBuffer, 0, size);
-
- /* Initialize st_appname pointers. */
- buffer = BackendAppnameBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_appname = buffer;
- buffer += NAMEDATALEN;
- }
- }
-
- /* Create or attach to the shared client hostname buffer */
- size = mul_size(NAMEDATALEN, NumBackendStatSlots);
- BackendClientHostnameBuffer = (char *)
- ShmemInitStruct("Backend Client Host Name Buffer", size, &found);
-
- if (!found)
- {
- MemSet(BackendClientHostnameBuffer, 0, size);
-
- /* Initialize st_clienthostname pointers. */
- buffer = BackendClientHostnameBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_clienthostname = buffer;
- buffer += NAMEDATALEN;
- }
- }
-
- /* Create or attach to the shared activity buffer */
- BackendActivityBufferSize = mul_size(pgstat_track_activity_query_size,
- NumBackendStatSlots);
- BackendActivityBuffer = (char *)
- ShmemInitStruct("Backend Activity Buffer",
- BackendActivityBufferSize,
- &found);
-
- if (!found)
- {
- MemSet(BackendActivityBuffer, 0, BackendActivityBufferSize);
-
- /* Initialize st_activity pointers. */
- buffer = BackendActivityBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_activity_raw = buffer;
- buffer += pgstat_track_activity_query_size;
- }
- }
-
-#ifdef USE_SSL
- /* Create or attach to the shared SSL status buffer */
- size = mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots);
- BackendSslStatusBuffer = (PgBackendSSLStatus *)
- ShmemInitStruct("Backend SSL Status Buffer", size, &found);
-
- if (!found)
- {
- PgBackendSSLStatus *ptr;
-
- MemSet(BackendSslStatusBuffer, 0, size);
-
- /* Initialize st_sslstatus pointers. */
- ptr = BackendSslStatusBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_sslstatus = ptr;
- ptr++;
- }
- }
-#endif
-}
-
-
-/* ----------
- * pgstat_initialize() -
- *
- * Initialize pgstats state, and set up our on-proc-exit hook.
- * Called from InitPostgres and AuxiliaryProcessMain. For auxiliary process,
- * MyBackendId is invalid. Otherwise, MyBackendId must be set,
- * but we must not have started any transaction yet (since the
- * exit hook must run after the last transaction exit).
- * NOTE: MyDatabaseId isn't set yet; so the shutdown hook has to be careful.
- * ----------
- */
-void
-pgstat_initialize(void)
-{
- /* Initialize MyBEEntry */
- if (MyBackendId != InvalidBackendId)
- {
- Assert(MyBackendId >= 1 && MyBackendId <= MaxBackends);
- MyBEEntry = &BackendStatusArray[MyBackendId - 1];
- }
- else
- {
- /* Must be an auxiliary process */
- Assert(MyAuxProcType != NotAnAuxProcess);
-
- /*
- * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
- * have a BackendId, the slot is statically allocated based on the
- * auxiliary process type (MyAuxProcType).  Backends use slots indexed
- * in the range from 1 to MaxBackends (inclusive), so we use
- * MaxBackends + AuxBackendType + 1 as the index of the slot for an
- * auxiliary process.
- */
- MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
- }
-
- /* Set up a process-exit hook to clean up */
- before_shmem_exit(pgstat_beshutdown_hook, 0);
-}
-
-/* ----------
- * pgstat_bestart() -
- *
- * Initialize this backend's entry in the PgBackendStatus array.
- * Called from InitPostgres.
- *
- * Apart from auxiliary processes, MyBackendId, MyDatabaseId,
- * session userid, and application_name must be set for a
- * backend (hence, this cannot be combined with pgstat_initialize).
- * ----------
- */
-void
-pgstat_bestart(void)
-{
- SockAddr clientaddr;
- volatile PgBackendStatus *beentry;
-
- /*
- * To minimize the time spent modifying the PgBackendStatus entry, fetch
- * all the needed data first.
- */
-
- /*
- * We may not have a MyProcPort (eg, if this is the autovacuum process).
- * If so, use all-zeroes client address, which is dealt with specially in
- * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
- */
- if (MyProcPort)
- memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
- else
- MemSet(&clientaddr, 0, sizeof(clientaddr));
-
- /*
- * Initialize my status entry, following the protocol of bumping
- * st_changecount before and after; and make sure it's even afterwards. We
- * use a volatile pointer here to ensure the compiler doesn't try to get
- * cute.
- */
- beentry = MyBEEntry;
-
- /* pgstats state must be initialized from pgstat_initialize() */
- Assert(beentry != NULL);
-
- if (MyBackendId != InvalidBackendId)
- {
- if (IsAutoVacuumLauncherProcess())
- {
- /* Autovacuum Launcher */
- beentry->st_backendType = B_AUTOVAC_LAUNCHER;
- }
- else if (IsAutoVacuumWorkerProcess())
- {
- /* Autovacuum Worker */
- beentry->st_backendType = B_AUTOVAC_WORKER;
- }
- else if (am_walsender)
- {
- /* Wal sender */
- beentry->st_backendType = B_WAL_SENDER;
- }
- else if (IsBackgroundWorker)
- {
- /* bgworker */
- beentry->st_backendType = B_BG_WORKER;
- }
- else
- {
- /* client-backend */
- beentry->st_backendType = B_BACKEND;
- }
- }
- else
- {
- /* Must be an auxiliary process */
- Assert(MyAuxProcType != NotAnAuxProcess);
- switch (MyAuxProcType)
- {
- case StartupProcess:
- beentry->st_backendType = B_STARTUP;
- break;
- case BgWriterProcess:
- beentry->st_backendType = B_BG_WRITER;
- break;
- case ArchiverProcess:
- beentry->st_backendType = B_ARCHIVER;
- break;
- case CheckpointerProcess:
- beentry->st_backendType = B_CHECKPOINTER;
- break;
- case WalWriterProcess:
- beentry->st_backendType = B_WAL_WRITER;
- break;
- case WalReceiverProcess:
- beentry->st_backendType = B_WAL_RECEIVER;
- break;
- default:
- elog(FATAL, "unrecognized process type: %d",
- (int) MyAuxProcType);
- proc_exit(1);
- }
- }
-
- do
- {
- pgstat_increment_changecount_before(beentry);
- } while ((beentry->st_changecount & 1) == 0);
-
- beentry->st_procpid = MyProcPid;
- beentry->st_proc_start_timestamp = MyStartTimestamp;
- beentry->st_activity_start_timestamp = 0;
- beentry->st_state_start_timestamp = 0;
- beentry->st_xact_start_timestamp = 0;
- beentry->st_databaseid = MyDatabaseId;
-
- /* We have userid for client-backends, wal-sender and bgworker processes */
- if (beentry->st_backendType == B_BACKEND
- || beentry->st_backendType == B_WAL_SENDER
- || beentry->st_backendType == B_BG_WORKER)
- beentry->st_userid = GetSessionUserId();
- else
- beentry->st_userid = InvalidOid;
-
- beentry->st_clientaddr = clientaddr;
- if (MyProcPort && MyProcPort->remote_hostname)
- strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
- NAMEDATALEN);
- else
- beentry->st_clienthostname[0] = '\0';
-#ifdef USE_SSL
- if (MyProcPort && MyProcPort->ssl != NULL)
- {
- beentry->st_ssl = true;
- beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
- beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
- strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
- strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
- be_tls_get_peerdn_name(MyProcPort, beentry->st_sslstatus->ssl_clientdn, NAMEDATALEN);
- }
- else
- {
- beentry->st_ssl = false;
- }
-#else
- beentry->st_ssl = false;
-#endif
- beentry->st_state = STATE_UNDEFINED;
- beentry->st_appname[0] = '\0';
- beentry->st_activity_raw[0] = '\0';
- /* Also make sure the last byte in each string area is always 0 */
- beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
- beentry->st_appname[NAMEDATALEN - 1] = '\0';
- beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
- beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
- beentry->st_progress_command_target = InvalidOid;
-
- /*
- * we don't zero st_progress_param here to save cycles; nobody should
- * examine it until st_progress_command has been set to something other
- * than PROGRESS_COMMAND_INVALID
- */
-
- pgstat_increment_changecount_after(beentry);
-
- /* Update app name to current GUC setting */
- if (application_name)
- pgstat_report_appname(application_name);
-}
-
 /*
  * Shut down a single backend's statistics reporting at process exit.
  *
@@ -3088,8 +2647,6 @@ pgstat_bestart(void)
 static void
 pgstat_beshutdown_hook(int code, Datum arg)
 {
- volatile PgBackendStatus *beentry = MyBEEntry;
-
  /*
  * If we got as far as discovering our own database ID, we can report what
  * we did to the collector.  Otherwise, we'd be sending an invalid
@@ -3098,1188 +2655,9 @@ pgstat_beshutdown_hook(int code, Datum arg)
  */
  if (OidIsValid(MyDatabaseId))
  pgstat_update_stat(true);
-
- /*
- * Clear my status entry, following the protocol of bumping st_changecount
- * before and after.  We use a volatile pointer here to ensure the
- * compiler doesn't try to get cute.
- */
- pgstat_increment_changecount_before(beentry);
-
- beentry->st_procpid = 0; /* mark invalid */
-
- pgstat_increment_changecount_after(beentry);
 }
 
 
-/* ----------
- * pgstat_report_activity() -
- *
- * Called from tcop/postgres.c to report what the backend is actually doing
- * (but note cmd_str can be NULL for certain cases).
- *
- * All updates of the status entry follow the protocol of bumping
- * st_changecount before and after.  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- * ----------
- */
-void
-pgstat_report_activity(BackendState state, const char *cmd_str)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
- TimestampTz start_timestamp;
- TimestampTz current_timestamp;
- int len = 0;
-
- TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
-
- if (!beentry)
- return;
-
- if (!pgstat_track_activities)
- {
- if (beentry->st_state != STATE_DISABLED)
- {
- volatile PGPROC *proc = MyProc;
-
- /*
- * track_activities is disabled, but we last reported a
- * non-disabled state.  As our final update, change the state and
- * clear fields we will not be updating anymore.
- */
- pgstat_increment_changecount_before(beentry);
- beentry->st_state = STATE_DISABLED;
- beentry->st_state_start_timestamp = 0;
- beentry->st_activity_raw[0] = '\0';
- beentry->st_activity_start_timestamp = 0;
- /* st_xact_start_timestamp and wait_event_info are also disabled */
- beentry->st_xact_start_timestamp = 0;
- proc->wait_event_info = 0;
- pgstat_increment_changecount_after(beentry);
- }
- return;
- }
-
- /*
- * To minimize the time spent modifying the entry, fetch all the needed
- * data first.
- */
- start_timestamp = GetCurrentStatementStartTimestamp();
- if (cmd_str != NULL)
- {
- /*
- * Compute length of to-be-stored string unaware of multi-byte
- * characters. For speed reasons that'll get corrected on read, rather
- * than computed every write.
- */
- len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1);
- }
- current_timestamp = GetCurrentTimestamp();
-
- /*
- * Now update the status entry
- */
- pgstat_increment_changecount_before(beentry);
-
- beentry->st_state = state;
- beentry->st_state_start_timestamp = current_timestamp;
-
- if (cmd_str != NULL)
- {
- memcpy((char *) beentry->st_activity_raw, cmd_str, len);
- beentry->st_activity_raw[len] = '\0';
- beentry->st_activity_start_timestamp = start_timestamp;
- }
-
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_start_command() -
- *
- * Set st_progress_command (and st_progress_command_target) in own backend
- * entry.  Also, zero-initialize st_progress_param array.
- *-----------
- */
-void
-pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- if (!beentry || !pgstat_track_activities)
- return;
-
- pgstat_increment_changecount_before(beentry);
- beentry->st_progress_command = cmdtype;
- beentry->st_progress_command_target = relid;
- MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_update_param() -
- *
- * Update index'th member in st_progress_param[] of own backend entry.
- *-----------
- */
-void
-pgstat_progress_update_param(int index, int64 val)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
-
- if (!beentry || !pgstat_track_activities)
- return;
-
- pgstat_increment_changecount_before(beentry);
- beentry->st_progress_param[index] = val;
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_update_multi_param() -
- *
- * Update multiple members in st_progress_param[] of own backend entry.
- * This is atomic; readers won't see intermediate states.
- *-----------
- */
-void
-pgstat_progress_update_multi_param(int nparam, const int *index,
-   const int64 *val)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
- int i;
-
- if (!beentry || !pgstat_track_activities || nparam == 0)
- return;
-
- pgstat_increment_changecount_before(beentry);
-
- for (i = 0; i < nparam; ++i)
- {
- Assert(index[i] >= 0 && index[i] < PGSTAT_NUM_PROGRESS_PARAM);
-
- beentry->st_progress_param[index[i]] = val[i];
- }
-
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_end_command() -
- *
- * Reset st_progress_command (and st_progress_command_target) in own backend
- * entry.  This signals the end of the command.
- *-----------
- */
-void
-pgstat_progress_end_command(void)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- if (!beentry)
- return;
- if (!pgstat_track_activities
- && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
- return;
-
- pgstat_increment_changecount_before(beentry);
- beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
- beentry->st_progress_command_target = InvalidOid;
- pgstat_increment_changecount_after(beentry);
-}
-
-/* ----------
- * pgstat_report_appname() -
- *
- * Called to update our application name.
- * ----------
- */
-void
-pgstat_report_appname(const char *appname)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
- int len;
-
- if (!beentry)
- return;
-
- /* This should be unnecessary if GUC did its job, but be safe */
- len = pg_mbcliplen(appname, strlen(appname), NAMEDATALEN - 1);
-
- /*
- * Update my status entry, following the protocol of bumping
- * st_changecount before and after.  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- */
- pgstat_increment_changecount_before(beentry);
-
- memcpy((char *) beentry->st_appname, appname, len);
- beentry->st_appname[len] = '\0';
-
- pgstat_increment_changecount_after(beentry);
-}
-
-/*
- * Report current transaction start timestamp as the specified value.
- * Zero means there is no active transaction.
- */
-void
-pgstat_report_xact_timestamp(TimestampTz tstamp)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- if (!pgstat_track_activities || !beentry)
- return;
-
- /*
- * Update my status entry, following the protocol of bumping
- * st_changecount before and after.  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- */
- pgstat_increment_changecount_before(beentry);
- beentry->st_xact_start_timestamp = tstamp;
- pgstat_increment_changecount_after(beentry);
-}
-
-/* ----------
- * pgstat_read_current_status() -
- *
- * Copy the current contents of the PgBackendStatus array to local memory,
- * if not already done in this transaction.
- * ----------
- */
-static void
-pgstat_read_current_status(void)
-{
- volatile PgBackendStatus *beentry;
- LocalPgBackendStatus *localtable;
- LocalPgBackendStatus *localentry;
- char   *localappname,
-   *localclienthostname,
-   *localactivity;
-#ifdef USE_SSL
- PgBackendSSLStatus *localsslstatus;
-#endif
- int i;
-
- Assert(IsUnderPostmaster);
-
- if (localBackendStatusTable)
- return; /* already done */
-
- pgstat_setup_memcxt();
-
- localtable = (LocalPgBackendStatus *)
- MemoryContextAlloc(pgStatLocalContext,
-   sizeof(LocalPgBackendStatus) * NumBackendStatSlots);
- localappname = (char *)
- MemoryContextAlloc(pgStatLocalContext,
-   NAMEDATALEN * NumBackendStatSlots);
- localclienthostname = (char *)
- MemoryContextAlloc(pgStatLocalContext,
-   NAMEDATALEN * NumBackendStatSlots);
- localactivity = (char *)
- MemoryContextAlloc(pgStatLocalContext,
-   pgstat_track_activity_query_size * NumBackendStatSlots);
-#ifdef USE_SSL
- localsslstatus = (PgBackendSSLStatus *)
- MemoryContextAlloc(pgStatLocalContext,
-   sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
-#endif
-
- localNumBackends = 0;
-
- beentry = BackendStatusArray;
- localentry = localtable;
- for (i = 1; i <= NumBackendStatSlots; i++)
- {
- /*
- * Follow the protocol of retrying if st_changecount changes while we
- * copy the entry, or if it's odd.  (The check for odd is needed to
- * cover the case where we are able to completely copy the entry while
- * the source backend is between increment steps.) We use a volatile
- * pointer here to ensure the compiler doesn't try to get cute.
- */
- for (;;)
- {
- int before_changecount;
- int after_changecount;
-
- pgstat_save_changecount_before(beentry, before_changecount);
-
- localentry->backendStatus.st_procpid = beentry->st_procpid;
- if (localentry->backendStatus.st_procpid > 0)
- {
- memcpy(&localentry->backendStatus, (char *) beentry, sizeof(PgBackendStatus));
-
- /*
- * strcpy is safe even if the string is modified concurrently,
- * because there's always a \0 at the end of the buffer.
- */
- strcpy(localappname, (char *) beentry->st_appname);
- localentry->backendStatus.st_appname = localappname;
- strcpy(localclienthostname, (char *) beentry->st_clienthostname);
- localentry->backendStatus.st_clienthostname = localclienthostname;
- strcpy(localactivity, (char *) beentry->st_activity_raw);
- localentry->backendStatus.st_activity_raw = localactivity;
- localentry->backendStatus.st_ssl = beentry->st_ssl;
-#ifdef USE_SSL
- if (beentry->st_ssl)
- {
- memcpy(localsslstatus, beentry->st_sslstatus, sizeof(PgBackendSSLStatus));
- localentry->backendStatus.st_sslstatus = localsslstatus;
- }
-#endif
- }
-
- pgstat_save_changecount_after(beentry, after_changecount);
- if (before_changecount == after_changecount &&
- (before_changecount & 1) == 0)
- break;
-
- /* Make sure we can break out of loop if stuck... */
- CHECK_FOR_INTERRUPTS();
- }
-
- beentry++;
- /* Only valid entries get included into the local array */
- if (localentry->backendStatus.st_procpid > 0)
- {
- BackendIdGetTransactionIds(i,
-   &localentry->backend_xid,
-   &localentry->backend_xmin);
-
- localentry++;
- localappname += NAMEDATALEN;
- localclienthostname += NAMEDATALEN;
- localactivity += pgstat_track_activity_query_size;
-#ifdef USE_SSL
- localsslstatus++;
-#endif
- localNumBackends++;
- }
- }
-
- /* Set the pointer only after completion of a valid table */
- localBackendStatusTable = localtable;
-}
-
-/* ----------
- * pgstat_get_wait_event_type() -
- *
- * Return a string representing the current wait event type, backend is
- * waiting on.
- */
-const char *
-pgstat_get_wait_event_type(uint32 wait_event_info)
-{
- uint32 classId;
- const char *event_type;
-
- /* report process as not waiting. */
- if (wait_event_info == 0)
- return NULL;
-
- classId = wait_event_info & 0xFF000000;
-
- switch (classId)
- {
- case PG_WAIT_LWLOCK:
- event_type = "LWLock";
- break;
- case PG_WAIT_LOCK:
- event_type = "Lock";
- break;
- case PG_WAIT_BUFFER_PIN:
- event_type = "BufferPin";
- break;
- case PG_WAIT_ACTIVITY:
- event_type = "Activity";
- break;
- case PG_WAIT_CLIENT:
- event_type = "Client";
- break;
- case PG_WAIT_EXTENSION:
- event_type = "Extension";
- break;
- case PG_WAIT_IPC:
- event_type = "IPC";
- break;
- case PG_WAIT_TIMEOUT:
- event_type = "Timeout";
- break;
- case PG_WAIT_IO:
- event_type = "IO";
- break;
- default:
- event_type = "???";
- break;
- }
-
- return event_type;
-}
-
-/* ----------
- * pgstat_get_wait_event() -
- *
- * Return a string representing the current wait event, backend is
- * waiting on.
- */
-const char *
-pgstat_get_wait_event(uint32 wait_event_info)
-{
- uint32 classId;
- uint16 eventId;
- const char *event_name;
-
- /* report process as not waiting. */
- if (wait_event_info == 0)
- return NULL;
-
- classId = wait_event_info & 0xFF000000;
- eventId = wait_event_info & 0x0000FFFF;
-
- switch (classId)
- {
- case PG_WAIT_LWLOCK:
- event_name = GetLWLockIdentifier(classId, eventId);
- break;
- case PG_WAIT_LOCK:
- event_name = GetLockNameFromTagType(eventId);
- break;
- case PG_WAIT_BUFFER_PIN:
- event_name = "BufferPin";
- break;
- case PG_WAIT_ACTIVITY:
- {
- WaitEventActivity w = (WaitEventActivity) wait_event_info;
-
- event_name = pgstat_get_wait_activity(w);
- break;
- }
- case PG_WAIT_CLIENT:
- {
- WaitEventClient w = (WaitEventClient) wait_event_info;
-
- event_name = pgstat_get_wait_client(w);
- break;
- }
- case PG_WAIT_EXTENSION:
- event_name = "Extension";
- break;
- case PG_WAIT_IPC:
- {
- WaitEventIPC w = (WaitEventIPC) wait_event_info;
-
- event_name = pgstat_get_wait_ipc(w);
- break;
- }
- case PG_WAIT_TIMEOUT:
- {
- WaitEventTimeout w = (WaitEventTimeout) wait_event_info;
-
- event_name = pgstat_get_wait_timeout(w);
- break;
- }
- case PG_WAIT_IO:
- {
- WaitEventIO w = (WaitEventIO) wait_event_info;
-
- event_name = pgstat_get_wait_io(w);
- break;
- }
- default:
- event_name = "unknown wait event";
- break;
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_activity() -
- *
- * Convert WaitEventActivity to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_activity(WaitEventActivity w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_ARCHIVER_MAIN:
- event_name = "ArchiverMain";
- break;
- case WAIT_EVENT_AUTOVACUUM_MAIN:
- event_name = "AutoVacuumMain";
- break;
- case WAIT_EVENT_BGWRITER_HIBERNATE:
- event_name = "BgWriterHibernate";
- break;
- case WAIT_EVENT_BGWRITER_MAIN:
- event_name = "BgWriterMain";
- break;
- case WAIT_EVENT_CHECKPOINTER_MAIN:
- event_name = "CheckpointerMain";
- break;
- case WAIT_EVENT_LOGICAL_APPLY_MAIN:
- event_name = "LogicalApplyMain";
- break;
- case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
- event_name = "LogicalLauncherMain";
- break;
- case WAIT_EVENT_PGSTAT_MAIN:
- event_name = "PgStatMain";
- break;
- case WAIT_EVENT_RECOVERY_WAL_ALL:
- event_name = "RecoveryWalAll";
- break;
- case WAIT_EVENT_RECOVERY_WAL_STREAM:
- event_name = "RecoveryWalStream";
- break;
- case WAIT_EVENT_SYSLOGGER_MAIN:
- event_name = "SysLoggerMain";
- break;
- case WAIT_EVENT_WAL_RECEIVER_MAIN:
- event_name = "WalReceiverMain";
- break;
- case WAIT_EVENT_WAL_SENDER_MAIN:
- event_name = "WalSenderMain";
- break;
- case WAIT_EVENT_WAL_WRITER_MAIN:
- event_name = "WalWriterMain";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_client() -
- *
- * Convert WaitEventClient to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_client(WaitEventClient w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_CLIENT_READ:
- event_name = "ClientRead";
- break;
- case WAIT_EVENT_CLIENT_WRITE:
- event_name = "ClientWrite";
- break;
- case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
- event_name = "LibPQWalReceiverConnect";
- break;
- case WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE:
- event_name = "LibPQWalReceiverReceive";
- break;
- case WAIT_EVENT_SSL_OPEN_SERVER:
- event_name = "SSLOpenServer";
- break;
- case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
- event_name = "WalReceiverWaitStart";
- break;
- case WAIT_EVENT_WAL_SENDER_WAIT_WAL:
- event_name = "WalSenderWaitForWAL";
- break;
- case WAIT_EVENT_WAL_SENDER_WRITE_DATA:
- event_name = "WalSenderWriteData";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_ipc() -
- *
- * Convert WaitEventIPC to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_ipc(WaitEventIPC w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_BGWORKER_SHUTDOWN:
- event_name = "BgWorkerShutdown";
- break;
- case WAIT_EVENT_BGWORKER_STARTUP:
- event_name = "BgWorkerStartup";
- break;
- case WAIT_EVENT_BTREE_PAGE:
- event_name = "BtreePage";
- break;
- case WAIT_EVENT_CLOG_GROUP_UPDATE:
- event_name = "ClogGroupUpdate";
- break;
- case WAIT_EVENT_EXECUTE_GATHER:
- event_name = "ExecuteGather";
- break;
- case WAIT_EVENT_HASH_BATCH_ALLOCATING:
- event_name = "Hash/Batch/Allocating";
- break;
- case WAIT_EVENT_HASH_BATCH_ELECTING:
- event_name = "Hash/Batch/Electing";
- break;
- case WAIT_EVENT_HASH_BATCH_LOADING:
- event_name = "Hash/Batch/Loading";
- break;
- case WAIT_EVENT_HASH_BUILD_ALLOCATING:
- event_name = "Hash/Build/Allocating";
- break;
- case WAIT_EVENT_HASH_BUILD_ELECTING:
- event_name = "Hash/Build/Electing";
- break;
- case WAIT_EVENT_HASH_BUILD_HASHING_INNER:
- event_name = "Hash/Build/HashingInner";
- break;
- case WAIT_EVENT_HASH_BUILD_HASHING_OUTER:
- event_name = "Hash/Build/HashingOuter";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATING:
- event_name = "Hash/GrowBatches/Allocating";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_DECIDING:
- event_name = "Hash/GrowBatches/Deciding";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_ELECTING:
- event_name = "Hash/GrowBatches/Electing";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_FINISHING:
- event_name = "Hash/GrowBatches/Finishing";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_REPARTITIONING:
- event_name = "Hash/GrowBatches/Repartitioning";
- break;
- case WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATING:
- event_name = "Hash/GrowBuckets/Allocating";
- break;
- case WAIT_EVENT_HASH_GROW_BUCKETS_ELECTING:
- event_name = "Hash/GrowBuckets/Electing";
- break;
- case WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING:
- event_name = "Hash/GrowBuckets/Reinserting";
- break;
- case WAIT_EVENT_LOGICAL_SYNC_DATA:
- event_name = "LogicalSyncData";
- break;
- case WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE:
- event_name = "LogicalSyncStateChange";
- break;
- case WAIT_EVENT_MQ_INTERNAL:
- event_name = "MessageQueueInternal";
- break;
- case WAIT_EVENT_MQ_PUT_MESSAGE:
- event_name = "MessageQueuePutMessage";
- break;
- case WAIT_EVENT_MQ_RECEIVE:
- event_name = "MessageQueueReceive";
- break;
- case WAIT_EVENT_MQ_SEND:
- event_name = "MessageQueueSend";
- break;
- case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
- event_name = "ParallelBitmapScan";
- break;
- case WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN:
- event_name = "ParallelCreateIndexScan";
- break;
- case WAIT_EVENT_PARALLEL_FINISH:
- event_name = "ParallelFinish";
- break;
- case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
- event_name = "ProcArrayGroupUpdate";
- break;
- case WAIT_EVENT_PROMOTE:
- event_name = "Promote";
- break;
- case WAIT_EVENT_REPLICATION_ORIGIN_DROP:
- event_name = "ReplicationOriginDrop";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_DROP:
- event_name = "ReplicationSlotDrop";
- break;
- case WAIT_EVENT_SAFE_SNAPSHOT:
- event_name = "SafeSnapshot";
- break;
- case WAIT_EVENT_SYNC_REP:
- event_name = "SyncRep";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_timeout() -
- *
- * Convert WaitEventTimeout to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_timeout(WaitEventTimeout w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_BASE_BACKUP_THROTTLE:
- event_name = "BaseBackupThrottle";
- break;
- case WAIT_EVENT_PG_SLEEP:
- event_name = "PgSleep";
- break;
- case WAIT_EVENT_RECOVERY_APPLY_DELAY:
- event_name = "RecoveryApplyDelay";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_io() -
- *
- * Convert WaitEventIO to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_io(WaitEventIO w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_BUFFILE_READ:
- event_name = "BufFileRead";
- break;
- case WAIT_EVENT_BUFFILE_WRITE:
- event_name = "BufFileWrite";
- break;
- case WAIT_EVENT_CONTROL_FILE_READ:
- event_name = "ControlFileRead";
- break;
- case WAIT_EVENT_CONTROL_FILE_SYNC:
- event_name = "ControlFileSync";
- break;
- case WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE:
- event_name = "ControlFileSyncUpdate";
- break;
- case WAIT_EVENT_CONTROL_FILE_WRITE:
- event_name = "ControlFileWrite";
- break;
- case WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE:
- event_name = "ControlFileWriteUpdate";
- break;
- case WAIT_EVENT_COPY_FILE_READ:
- event_name = "CopyFileRead";
- break;
- case WAIT_EVENT_COPY_FILE_WRITE:
- event_name = "CopyFileWrite";
- break;
- case WAIT_EVENT_DATA_FILE_EXTEND:
- event_name = "DataFileExtend";
- break;
- case WAIT_EVENT_DATA_FILE_FLUSH:
- event_name = "DataFileFlush";
- break;
- case WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC:
- event_name = "DataFileImmediateSync";
- break;
- case WAIT_EVENT_DATA_FILE_PREFETCH:
- event_name = "DataFilePrefetch";
- break;
- case WAIT_EVENT_DATA_FILE_READ:
- event_name = "DataFileRead";
- break;
- case WAIT_EVENT_DATA_FILE_SYNC:
- event_name = "DataFileSync";
- break;
- case WAIT_EVENT_DATA_FILE_TRUNCATE:
- event_name = "DataFileTruncate";
- break;
- case WAIT_EVENT_DATA_FILE_WRITE:
- event_name = "DataFileWrite";
- break;
- case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
- event_name = "DSMFillZeroWrite";
- break;
- case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
- event_name = "LockFileAddToDataDirRead";
- break;
- case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC:
- event_name = "LockFileAddToDataDirSync";
- break;
- case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE:
- event_name = "LockFileAddToDataDirWrite";
- break;
- case WAIT_EVENT_LOCK_FILE_CREATE_READ:
- event_name = "LockFileCreateRead";
- break;
- case WAIT_EVENT_LOCK_FILE_CREATE_SYNC:
- event_name = "LockFileCreateSync";
- break;
- case WAIT_EVENT_LOCK_FILE_CREATE_WRITE:
- event_name = "LockFileCreateWrite";
- break;
- case WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ:
- event_name = "LockFileReCheckDataDirRead";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC:
- event_name = "LogicalRewriteCheckpointSync";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC:
- event_name = "LogicalRewriteMappingSync";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE:
- event_name = "LogicalRewriteMappingWrite";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_SYNC:
- event_name = "LogicalRewriteSync";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE:
- event_name = "LogicalRewriteTruncate";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_WRITE:
- event_name = "LogicalRewriteWrite";
- break;
- case WAIT_EVENT_RELATION_MAP_READ:
- event_name = "RelationMapRead";
- break;
- case WAIT_EVENT_RELATION_MAP_SYNC:
- event_name = "RelationMapSync";
- break;
- case WAIT_EVENT_RELATION_MAP_WRITE:
- event_name = "RelationMapWrite";
- break;
- case WAIT_EVENT_REORDER_BUFFER_READ:
- event_name = "ReorderBufferRead";
- break;
- case WAIT_EVENT_REORDER_BUFFER_WRITE:
- event_name = "ReorderBufferWrite";
- break;
- case WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ:
- event_name = "ReorderLogicalMappingRead";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_READ:
- event_name = "ReplicationSlotRead";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC:
- event_name = "ReplicationSlotRestoreSync";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_SYNC:
- event_name = "ReplicationSlotSync";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_WRITE:
- event_name = "ReplicationSlotWrite";
- break;
- case WAIT_EVENT_SLRU_FLUSH_SYNC:
- event_name = "SLRUFlushSync";
- break;
- case WAIT_EVENT_SLRU_READ:
- event_name = "SLRURead";
- break;
- case WAIT_EVENT_SLRU_SYNC:
- event_name = "SLRUSync";
- break;
- case WAIT_EVENT_SLRU_WRITE:
- event_name = "SLRUWrite";
- break;
- case WAIT_EVENT_SNAPBUILD_READ:
- event_name = "SnapbuildRead";
- break;
- case WAIT_EVENT_SNAPBUILD_SYNC:
- event_name = "SnapbuildSync";
- break;
- case WAIT_EVENT_SNAPBUILD_WRITE:
- event_name = "SnapbuildWrite";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC:
- event_name = "TimelineHistoryFileSync";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE:
- event_name = "TimelineHistoryFileWrite";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_READ:
- event_name = "TimelineHistoryRead";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_SYNC:
- event_name = "TimelineHistorySync";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_WRITE:
- event_name = "TimelineHistoryWrite";
- break;
- case WAIT_EVENT_TWOPHASE_FILE_READ:
- event_name = "TwophaseFileRead";
- break;
- case WAIT_EVENT_TWOPHASE_FILE_SYNC:
- event_name = "TwophaseFileSync";
- break;
- case WAIT_EVENT_TWOPHASE_FILE_WRITE:
- event_name = "TwophaseFileWrite";
- break;
- case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
- event_name = "WALSenderTimelineHistoryRead";
- break;
- case WAIT_EVENT_WAL_BOOTSTRAP_SYNC:
- event_name = "WALBootstrapSync";
- break;
- case WAIT_EVENT_WAL_BOOTSTRAP_WRITE:
- event_name = "WALBootstrapWrite";
- break;
- case WAIT_EVENT_WAL_COPY_READ:
- event_name = "WALCopyRead";
- break;
- case WAIT_EVENT_WAL_COPY_SYNC:
- event_name = "WALCopySync";
- break;
- case WAIT_EVENT_WAL_COPY_WRITE:
- event_name = "WALCopyWrite";
- break;
- case WAIT_EVENT_WAL_INIT_SYNC:
- event_name = "WALInitSync";
- break;
- case WAIT_EVENT_WAL_INIT_WRITE:
- event_name = "WALInitWrite";
- break;
- case WAIT_EVENT_WAL_READ:
- event_name = "WALRead";
- break;
- case WAIT_EVENT_WAL_SYNC:
- event_name = "WALSync";
- break;
- case WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN:
- event_name = "WALSyncMethodAssign";
- break;
- case WAIT_EVENT_WAL_WRITE:
- event_name = "WALWrite";
- break;
-
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-
-/* ----------
- * pgstat_get_backend_current_activity() -
- *
- * Return a string representing the current activity of the backend with
- * the specified PID.  This looks directly at the BackendStatusArray,
- * and so will provide current information regardless of the age of our
- * transaction's snapshot of the status array.
- *
- * It is the caller's responsibility to invoke this only for backends whose
- * state is expected to remain stable while the result is in use.  The
- * only current use is in deadlock reporting, where we can expect that
- * the target backend is blocked on a lock.  (There are corner cases
- * where the target's wait could get aborted while we are looking at it,
- * but the very worst consequence is to return a pointer to a string
- * that's been changed, so we won't worry too much.)
- *
- * Note: return strings for special cases match pg_stat_get_backend_activity.
- * ----------
- */
-const char *
-pgstat_get_backend_current_activity(int pid, bool checkUser)
-{
- PgBackendStatus *beentry;
- int i;
-
- beentry = BackendStatusArray;
- for (i = 1; i <= MaxBackends; i++)
- {
- /*
- * Although we expect the target backend's entry to be stable, that
- * doesn't imply that anyone else's is.  To avoid identifying the
- * wrong backend, while we check for a match to the desired PID we
- * must follow the protocol of retrying if st_changecount changes
- * while we examine the entry, or if it's odd.  (This might be
- * unnecessary, since fetching or storing an int is almost certainly
- * atomic, but let's play it safe.)  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- */
- volatile PgBackendStatus *vbeentry = beentry;
- bool found;
-
- for (;;)
- {
- int before_changecount;
- int after_changecount;
-
- pgstat_save_changecount_before(vbeentry, before_changecount);
-
- found = (vbeentry->st_procpid == pid);
-
- pgstat_save_changecount_after(vbeentry, after_changecount);
-
- if (before_changecount == after_changecount &&
- (before_changecount & 1) == 0)
- break;
-
- /* Make sure we can break out of loop if stuck... */
- CHECK_FOR_INTERRUPTS();
- }
-
- if (found)
- {
- /* Now it is safe to use the non-volatile pointer */
- if (checkUser && !superuser() && beentry->st_userid != GetUserId())
- return "<insufficient privilege>";
- else if (*(beentry->st_activity_raw) == '\0')
- return "<command string not enabled>";
- else
- {
- /* this'll leak a bit of memory, but that seems acceptable */
- return pgstat_clip_activity(beentry->st_activity_raw);
- }
- }
-
- beentry++;
- }
-
- /* If we get here, caller is in error ... */
- return "<backend information not available>";
-}
-
-/* ----------
- * pgstat_get_crashed_backend_activity() -
- *
- * Return a string representing the current activity of the backend with
- * the specified PID.  Like the function above, but reads shared memory with
- * the expectation that it may be corrupt.  On success, copy the string
- * into the "buffer" argument and return that pointer.  On failure,
- * return NULL.
- *
- * This function is only intended to be used by the postmaster to report the
- * query that crashed a backend.  In particular, no attempt is made to
- * follow the correct concurrency protocol when accessing the
- * BackendStatusArray.  But that's OK, in the worst case we'll return a
- * corrupted message.  We also must take care not to trip on ereport(ERROR).
- * ----------
- */
-const char *
-pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen)
-{
- volatile PgBackendStatus *beentry;
- int i;
-
- beentry = BackendStatusArray;
-
- /*
- * We probably shouldn't get here before shared memory has been set up,
- * but be safe.
- */
- if (beentry == NULL || BackendActivityBuffer == NULL)
- return NULL;
-
- for (i = 1; i <= MaxBackends; i++)
- {
- if (beentry->st_procpid == pid)
- {
- /* Read pointer just once, so it can't change after validation */
- const char *activity = beentry->st_activity_raw;
- const char *activity_last;
-
- /*
- * We mustn't access activity string before we verify that it
- * falls within the BackendActivityBuffer. To make sure that the
- * entire string including its ending is contained within the
- * buffer, subtract one activity length from the buffer size.
- */
- activity_last = BackendActivityBuffer + BackendActivityBufferSize
- - pgstat_track_activity_query_size;
-
- if (activity < BackendActivityBuffer ||
- activity > activity_last)
- return NULL;
-
- /* If no string available, no point in a report */
- if (activity[0] == '\0')
- return NULL;
-
- /*
- * Copy only ASCII-safe characters so we don't run into encoding
- * problems when reporting the message; and be sure not to run off
- * the end of memory.  As only ASCII characters are reported, it
- * doesn't seem necessary to perform multibyte aware clipping.
- */
- ascii_safe_strlcpy(buffer, activity,
-   Min(buflen, pgstat_track_activity_query_size));
-
- return buffer;
- }
-
- beentry++;
- }
-
- /* PID not found */
- return NULL;
-}
-
-const char *
-pgstat_get_backend_desc(BackendType backendType)
-{
- const char *backendDesc = "unknown process type";
-
- switch (backendType)
- {
- case B_AUTOVAC_LAUNCHER:
- backendDesc = "autovacuum launcher";
- break;
- case B_AUTOVAC_WORKER:
- backendDesc = "autovacuum worker";
- break;
- case B_BACKEND:
- backendDesc = "client backend";
- break;
- case B_BG_WORKER:
- backendDesc = "background worker";
- break;
- case B_BG_WRITER:
- backendDesc = "background writer";
- break;
- case B_ARCHIVER:
- backendDesc = "archiver";
- break;
- case B_CHECKPOINTER:
- backendDesc = "checkpointer";
- break;
- case B_STARTUP:
- backendDesc = "startup";
- break;
- case B_WAL_RECEIVER:
- backendDesc = "walreceiver";
- break;
- case B_WAL_SENDER:
- backendDesc = "walsender";
- break;
- case B_WAL_WRITER:
- backendDesc = "walwriter";
- break;
- }
-
- return backendDesc;
-}
-
 /* ------------------------------------------------------------
  * Local support functions follow
  * ------------------------------------------------------------
@@ -5422,22 +3800,6 @@ backend_get_func_etnry(PgStat_StatDBEntry *dbent, Oid funcid, bool oneshot)
   funcid);
 }
 
-/* ----------
- * pgstat_setup_memcxt() -
- *
- * Create pgStatLocalContext, if not already done.
- * ----------
- */
-static void
-pgstat_setup_memcxt(void)
-{
- if (!pgStatLocalContext)
- pgStatLocalContext = AllocSetContextCreate(TopMemoryContext,
-   "Statistics snapshot",
-   ALLOCSET_SMALL_SIZES);
-}
-
-
 /* ----------
  * pgstat_clear_snapshot() -
  *
@@ -5453,6 +3815,8 @@ pgstat_clear_snapshot(void)
 {
  int param = 0; /* only the address is significant */
 
+ pgstat_bestatus_clear_snapshot();
+
  /* Release memory, if any was allocated */
  if (pgStatLocalContext)
  MemoryContextDelete(pgStatLocalContext);
@@ -5460,8 +3824,6 @@ pgstat_clear_snapshot(void)
  /* Reset variables */
  pgStatLocalContext = NULL;
  pgStatDBHash = NULL;
- localBackendStatusTable = NULL;
- localNumBackends = 0;
 
  /*
  * the parameter inform the function that it is not called from
@@ -5567,47 +3929,18 @@ pgstat_update_dbentry(PgStat_StatDBEntry *dbentry, PgStat_TableStatus *stat)
  dbentry->n_blocks_hit += stat->t_counts.t_blocks_hit;
 }
 
-
-/*
- * Convert a potentially unsafely truncated activity string (see
- * PgBackendStatus.st_activity_raw's documentation) into a correctly truncated
- * one.
+/* ----------
+ * pgstat_setup_memcxt() -
  *
- * The returned string is allocated in the caller's memory context and may be
- * freed.
+ * Create pgStatLocalContext, if not already done.
+ * ----------
  */
-char *
-pgstat_clip_activity(const char *raw_activity)
+static void
+pgstat_setup_memcxt(void)
 {
- char   *activity;
- int rawlen;
- int cliplen;
-
- /*
- * Some callers, like pgstat_get_backend_current_activity(), do not
- * guarantee that the buffer isn't concurrently modified. We try to take
- * care that the buffer is always terminated by a NUL byte regardless, but
- * let's still be paranoid about the string's length. In those cases the
- * underlying buffer is guaranteed to be pgstat_track_activity_query_size
- * large.
- */
- activity = pnstrdup(raw_activity, pgstat_track_activity_query_size - 1);
-
- /* now double-guaranteed to be NUL terminated */
- rawlen = strlen(activity);
-
- /*
- * All supported server-encodings make it possible to determine the length
- * of a multi-byte character from its first byte (this is not the case for
- * client encodings, see GB18030). As st_activity is always stored using
- * server encoding, this allows us to perform multi-byte aware truncation,
- * even if the string earlier was truncated in the middle of a multi-byte
- * character.
- */
- cliplen = pg_mbcliplen(activity, rawlen,
-   pgstat_track_activity_query_size - 1);
-
- activity[cliplen] = '\0';
-
- return activity;
+ if (!pgStatLocalContext)
+ pgStatLocalContext = AllocSetContextCreate(TopMemoryContext,
+   "Statistics snapshot",
+   ALLOCSET_SMALL_SIZES);
 }
+
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index d8d0ad2487..cb11dc6ffb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -34,6 +34,7 @@
 #include <unistd.h>
 
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
 #include "executor/instrument.h"
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index c2c445dbf4..0bb2132c71 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -41,9 +41,9 @@
 
 #include "postgres.h"
 
+#include "bestatus.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/buffile.h"
 #include "storage/buf_internals.h"
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 1f766d20d1..a0401ee494 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,10 +22,10 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#include "bestatus.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 
 /*
  * copydir: copy a directory
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 213de7698a..6bc5fd6089 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -82,6 +82,7 @@
 #include "miscadmin.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/pg_tablespace.h"
 #include "common/file_perm.h"
 #include "pgstat.h"
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index aeda32c9c5..e84275d4c2 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -61,8 +61,8 @@
 #ifdef HAVE_SYS_SHM_H
 #include <sys/shm.h>
 #endif
+#include "bestatus.h"
 #include "common/file_perm.h"
-#include "pgstat.h"
 
 #include "portability/mem.h"
 #include "storage/dsm_impl.h"
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 7da337d11f..97526f1c72 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -43,8 +43,8 @@
 #include <poll.h>
 #endif
 
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "port/atomics.h"
 #include "portability/instr_time.h"
 #include "postmaster/postmaster.h"
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 43110e57b6..9d88d8c023 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -51,9 +51,9 @@
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/spin.h"
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 6e471c3e43..cfa5c9089f 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -18,8 +18,8 @@
 
 #include "postgres.h"
 
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "storage/procsignal.h"
 #include "storage/shm_mq.h"
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4d10e57a80..243da57c49 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -21,8 +21,8 @@
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index 74eb449060..dd76088a29 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -25,6 +25,7 @@
  */
 #include "postgres.h"
 
+#include "bestatus.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 979478e2e5..2cd4d5531e 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -76,8 +76,8 @@
  */
 #include "postgres.h"
 
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "pg_trace.h"
 #include "postmaster/postmaster.h"
 #include "replication/slot.h"
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a962034753..718232ae18 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -193,8 +193,8 @@
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
 #include "storage/predicate_internals.h"
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 89c80fb687..c8198d7311 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,8 +38,8 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index c37dd1290b..a09d4f5313 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -28,7 +28,7 @@
 #include "miscadmin.h"
 #include "access/xlogutils.h"
 #include "access/xlog.h"
-#include "pgstat.h"
+#include "bestatus.h"
 #include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8b9142461a..acbbef36a5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -39,6 +39,7 @@
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "commands/async.h"
 #include "commands/prepare.h"
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index f4d3eab2ea..0e3abeba36 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -21,6 +21,7 @@
 
 #include "access/heapam.h"
 #include "access/sysattr.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
@@ -29,7 +30,6 @@
 #include "common/keywords.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "parser/scansup.h"
 #include "postmaster/syslogger.h"
 #include "rewrite/rewriteHandler.h"
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6eac39fb57..6054581fe4 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "common/ip.h"
diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 5e61d908fd..2dd99f935d 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -46,11 +46,11 @@
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/storage.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/lwlock.h"
 #include "utils/inval.h"
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index bd2e4e89d8..1eabc0f41d 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -31,12 +31,12 @@
 #endif
 
 #include "access/htup_details.h"
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "common/file_perm.h"
 #include "libpq/libpq.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 626a4326a4..e07ca89065 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -26,6 +26,7 @@
 #include "access/sysattr.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
@@ -689,7 +690,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 
  /* Initialize stats collection --- must happen before first xact */
  if (!bootstrap)
+ {
+ pgstat_bearray_initialize();
  pgstat_initialize();
+ }
 
  /*
  * Load relcache entries for the shared system catalogs.  This must create
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 099afd0724..0fd4db5cb8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "commands/async.h"
diff --git a/src/include/bestatus.h b/src/include/bestatus.h
new file mode 100644
index 0000000000..3b47e9c063
--- /dev/null
+++ b/src/include/bestatus.h
@@ -0,0 +1,544 @@
+/* ----------
+ * bestatus.h
+ *
+ * Definitions for the PostgreSQL backend status monitor facility
+ *
+ * Copyright (c) 2001-2018, PostgreSQL Global Development Group
+ *
+ * src/include/bestatus.h
+ * ----------
+ */
+#ifndef BESTATUS_H
+#define BESTATUS_H
+
+#include "datatype/timestamp.h"
+#include "libpq/pqcomm.h"
+#include "storage/proc.h"
+
+/* ----------
+ * Backend types
+ * ----------
+ */
+typedef enum BackendType
+{
+ B_AUTOVAC_LAUNCHER,
+ B_AUTOVAC_WORKER,
+ B_BACKEND,
+ B_BG_WORKER,
+ B_BG_WRITER,
+ B_CHECKPOINTER,
+ B_STARTUP,
+ B_WAL_RECEIVER,
+ B_WAL_SENDER,
+ B_WAL_WRITER,
+ B_ARCHIVER
+} BackendType;
+
+
+/* ----------
+ * Backend states
+ * ----------
+ */
+typedef enum BackendState
+{
+ STATE_UNDEFINED,
+ STATE_IDLE,
+ STATE_RUNNING,
+ STATE_IDLEINTRANSACTION,
+ STATE_FASTPATH,
+ STATE_IDLEINTRANSACTION_ABORTED,
+ STATE_DISABLED
+} BackendState;
+
+
+/* ----------
+ * Wait Classes
+ * ----------
+ */
+#define PG_WAIT_LWLOCK 0x01000000U
+#define PG_WAIT_LOCK 0x03000000U
+#define PG_WAIT_BUFFER_PIN 0x04000000U
+#define PG_WAIT_ACTIVITY 0x05000000U
+#define PG_WAIT_CLIENT 0x06000000U
+#define PG_WAIT_EXTENSION 0x07000000U
+#define PG_WAIT_IPC 0x08000000U
+#define PG_WAIT_TIMEOUT 0x09000000U
+#define PG_WAIT_IO 0x0A000000U
+
+/* ----------
+ * Wait Events - Activity
+ *
+ * Use this category when a process is waiting because it has no work to do,
+ * unless the "Client" or "Timeout" category describes the situation better.
+ * Typically, this should only be used for background processes.
+ * ----------
+ */
+typedef enum
+{
+ WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
+ WAIT_EVENT_AUTOVACUUM_MAIN,
+ WAIT_EVENT_BGWRITER_HIBERNATE,
+ WAIT_EVENT_BGWRITER_MAIN,
+ WAIT_EVENT_CHECKPOINTER_MAIN,
+ WAIT_EVENT_LOGICAL_APPLY_MAIN,
+ WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
+ WAIT_EVENT_RECOVERY_WAL_ALL,
+ WAIT_EVENT_RECOVERY_WAL_STREAM,
+ WAIT_EVENT_SYSLOGGER_MAIN,
+ WAIT_EVENT_WAL_RECEIVER_MAIN,
+ WAIT_EVENT_WAL_SENDER_MAIN,
+ WAIT_EVENT_WAL_WRITER_MAIN
+} WaitEventActivity;
+
+/* ----------
+ * Wait Events - Client
+ *
+ * Use this category when a process is waiting to send data to or receive data
+ * from the frontend process to which it is connected.  This is never used for
+ * a background process, which has no client connection.
+ * ----------
+ */
+typedef enum
+{
+ WAIT_EVENT_CLIENT_READ = PG_WAIT_CLIENT,
+ WAIT_EVENT_CLIENT_WRITE,
+ WAIT_EVENT_LIBPQWALRECEIVER_CONNECT,
+ WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE,
+ WAIT_EVENT_SSL_OPEN_SERVER,
+ WAIT_EVENT_WAL_RECEIVER_WAIT_START,
+ WAIT_EVENT_WAL_SENDER_WAIT_WAL,
+ WAIT_EVENT_WAL_SENDER_WRITE_DATA
+} WaitEventClient;
+
+/* ----------
+ * Wait Events - IPC
+ *
+ * Use this category when a process cannot complete the work it is doing because
+ * it is waiting for a notification from another process.
+ * ----------
+ */
+typedef enum
+{
+ WAIT_EVENT_BGWORKER_SHUTDOWN = PG_WAIT_IPC,
+ WAIT_EVENT_BGWORKER_STARTUP,
+ WAIT_EVENT_BTREE_PAGE,
+ WAIT_EVENT_CLOG_GROUP_UPDATE,
+ WAIT_EVENT_EXECUTE_GATHER,
+ WAIT_EVENT_HASH_BATCH_ALLOCATING,
+ WAIT_EVENT_HASH_BATCH_ELECTING,
+ WAIT_EVENT_HASH_BATCH_LOADING,
+ WAIT_EVENT_HASH_BUILD_ALLOCATING,
+ WAIT_EVENT_HASH_BUILD_ELECTING,
+ WAIT_EVENT_HASH_BUILD_HASHING_INNER,
+ WAIT_EVENT_HASH_BUILD_HASHING_OUTER,
+ WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATING,
+ WAIT_EVENT_HASH_GROW_BATCHES_DECIDING,
+ WAIT_EVENT_HASH_GROW_BATCHES_ELECTING,
+ WAIT_EVENT_HASH_GROW_BATCHES_FINISHING,
+ WAIT_EVENT_HASH_GROW_BATCHES_REPARTITIONING,
+ WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATING,
+ WAIT_EVENT_HASH_GROW_BUCKETS_ELECTING,
+ WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING,
+ WAIT_EVENT_LOGICAL_SYNC_DATA,
+ WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
+ WAIT_EVENT_MQ_INTERNAL,
+ WAIT_EVENT_MQ_PUT_MESSAGE,
+ WAIT_EVENT_MQ_RECEIVE,
+ WAIT_EVENT_MQ_SEND,
+ WAIT_EVENT_PARALLEL_BITMAP_SCAN,
+ WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN,
+ WAIT_EVENT_PARALLEL_FINISH,
+ WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
+ WAIT_EVENT_PROMOTE,
+ WAIT_EVENT_REPLICATION_ORIGIN_DROP,
+ WAIT_EVENT_REPLICATION_SLOT_DROP,
+ WAIT_EVENT_SAFE_SNAPSHOT,
+ WAIT_EVENT_SYNC_REP
+} WaitEventIPC;
+
+/* ----------
+ * Wait Events - Timeout
+ *
+ * Use this category when a process is waiting for a timeout to expire.
+ * ----------
+ */
+typedef enum
+{
+ WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
+ WAIT_EVENT_PG_SLEEP,
+ WAIT_EVENT_RECOVERY_APPLY_DELAY
+} WaitEventTimeout;
+
+/* ----------
+ * Wait Events - IO
+ *
+ * Use this category when a process is waiting for a IO.
+ * ----------
+ */
+typedef enum
+{
+ WAIT_EVENT_BUFFILE_READ = PG_WAIT_IO,
+ WAIT_EVENT_BUFFILE_WRITE,
+ WAIT_EVENT_CONTROL_FILE_READ,
+ WAIT_EVENT_CONTROL_FILE_SYNC,
+ WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE,
+ WAIT_EVENT_CONTROL_FILE_WRITE,
+ WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE,
+ WAIT_EVENT_COPY_FILE_READ,
+ WAIT_EVENT_COPY_FILE_WRITE,
+ WAIT_EVENT_DATA_FILE_EXTEND,
+ WAIT_EVENT_DATA_FILE_FLUSH,
+ WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC,
+ WAIT_EVENT_DATA_FILE_PREFETCH,
+ WAIT_EVENT_DATA_FILE_READ,
+ WAIT_EVENT_DATA_FILE_SYNC,
+ WAIT_EVENT_DATA_FILE_TRUNCATE,
+ WAIT_EVENT_DATA_FILE_WRITE,
+ WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+ WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
+ WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
+ WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
+ WAIT_EVENT_LOCK_FILE_CREATE_READ,
+ WAIT_EVENT_LOCK_FILE_CREATE_SYNC,
+ WAIT_EVENT_LOCK_FILE_CREATE_WRITE,
+ WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ,
+ WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC,
+ WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC,
+ WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE,
+ WAIT_EVENT_LOGICAL_REWRITE_SYNC,
+ WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE,
+ WAIT_EVENT_LOGICAL_REWRITE_WRITE,
+ WAIT_EVENT_RELATION_MAP_READ,
+ WAIT_EVENT_RELATION_MAP_SYNC,
+ WAIT_EVENT_RELATION_MAP_WRITE,
+ WAIT_EVENT_REORDER_BUFFER_READ,
+ WAIT_EVENT_REORDER_BUFFER_WRITE,
+ WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ,
+ WAIT_EVENT_REPLICATION_SLOT_READ,
+ WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC,
+ WAIT_EVENT_REPLICATION_SLOT_SYNC,
+ WAIT_EVENT_REPLICATION_SLOT_WRITE,
+ WAIT_EVENT_SLRU_FLUSH_SYNC,
+ WAIT_EVENT_SLRU_READ,
+ WAIT_EVENT_SLRU_SYNC,
+ WAIT_EVENT_SLRU_WRITE,
+ WAIT_EVENT_SNAPBUILD_READ,
+ WAIT_EVENT_SNAPBUILD_SYNC,
+ WAIT_EVENT_SNAPBUILD_WRITE,
+ WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC,
+ WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE,
+ WAIT_EVENT_TIMELINE_HISTORY_READ,
+ WAIT_EVENT_TIMELINE_HISTORY_SYNC,
+ WAIT_EVENT_TIMELINE_HISTORY_WRITE,
+ WAIT_EVENT_TWOPHASE_FILE_READ,
+ WAIT_EVENT_TWOPHASE_FILE_SYNC,
+ WAIT_EVENT_TWOPHASE_FILE_WRITE,
+ WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
+ WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
+ WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
+ WAIT_EVENT_WAL_COPY_READ,
+ WAIT_EVENT_WAL_COPY_SYNC,
+ WAIT_EVENT_WAL_COPY_WRITE,
+ WAIT_EVENT_WAL_INIT_SYNC,
+ WAIT_EVENT_WAL_INIT_WRITE,
+ WAIT_EVENT_WAL_READ,
+ WAIT_EVENT_WAL_SYNC,
+ WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN,
+ WAIT_EVENT_WAL_WRITE
+} WaitEventIO;
+
+/* ----------
+ * Command type for progress reporting purposes
+ * ----------
+ */
+typedef enum ProgressCommandType
+{
+ PROGRESS_COMMAND_INVALID,
+ PROGRESS_COMMAND_VACUUM
+} ProgressCommandType;
+
+#define PGSTAT_NUM_PROGRESS_PARAM 10
+
+/* ----------
+ * Shared-memory data structures
+ * ----------
+ */
+
+
+/*
+ * PgBackendSSLStatus
+ *
+ * For each backend, we keep the SSL status in a separate struct, that
+ * is only filled in if SSL is enabled.
+ */
+typedef struct PgBackendSSLStatus
+{
+ /* Information about SSL connection */
+ int ssl_bits;
+ bool ssl_compression;
+ char ssl_version[NAMEDATALEN]; /* MUST be null-terminated */
+ char ssl_cipher[NAMEDATALEN]; /* MUST be null-terminated */
+ char ssl_clientdn[NAMEDATALEN]; /* MUST be null-terminated */
+} PgBackendSSLStatus;
+
+
+/* ----------
+ * PgBackendStatus
+ *
+ * Each live backend maintains a PgBackendStatus struct in shared memory
+ * showing its current activity.  (The structs are allocated according to
+ * BackendId, but that is not critical.)  Note that the collector process
+ * has no involvement in, or even access to, these structs.
+ *
+ * Each auxiliary process also maintains a PgBackendStatus struct in shared
+ * memory.
+ * ----------
+ */
+typedef struct PgBackendStatus
+{
+ /*
+ * To avoid locking overhead, we use the following protocol: a backend
+ * increments st_changecount before modifying its entry, and again after
+ * finishing a modification.  A would-be reader should note the value of
+ * st_changecount, copy the entry into private memory, then check
+ * st_changecount again.  If the value hasn't changed, and if it's even,
+ * the copy is valid; otherwise start over.  This makes updates cheap
+ * while reads are potentially expensive, but that's the tradeoff we want.
+ *
+ * The above protocol needs the memory barriers to ensure that the
+ * apparent order of execution is as it desires. Otherwise, for example,
+ * the CPU might rearrange the code so that st_changecount is incremented
+ * twice before the modification on a machine with weak memory ordering.
+ * This surprising result can lead to bugs.
+ */
+ int st_changecount;
+
+ /* The entry is valid iff st_procpid > 0, unused if st_procpid == 0 */
+ int st_procpid;
+
+ /* Type of backends */
+ BackendType st_backendType;
+
+ /* Times when current backend, transaction, and activity started */
+ TimestampTz st_proc_start_timestamp;
+ TimestampTz st_xact_start_timestamp;
+ TimestampTz st_activity_start_timestamp;
+ TimestampTz st_state_start_timestamp;
+
+ /* Database OID, owning user's OID, connection client address */
+ Oid st_databaseid;
+ Oid st_userid;
+ SockAddr st_clientaddr;
+ char   *st_clienthostname; /* MUST be null-terminated */
+
+ /* Information about SSL connection */
+ bool st_ssl;
+ PgBackendSSLStatus *st_sslstatus;
+
+ /* current state */
+ BackendState st_state;
+
+ /* application name; MUST be null-terminated */
+ char   *st_appname;
+
+ /*
+ * Current command string; MUST be null-terminated. Note that this string
+ * possibly is truncated in the middle of a multi-byte character. As
+ * activity strings are stored more frequently than read, that allows to
+ * move the cost of correct truncation to the display side. Use
+ * pgstat_clip_activity() to truncate correctly.
+ */
+ char   *st_activity_raw;
+
+ /*
+ * Command progress reporting.  Any command which wishes can advertise
+ * that it is running by setting st_progress_command,
+ * st_progress_command_target, and st_progress_param[].
+ * st_progress_command_target should be the OID of the relation which the
+ * command targets (we assume there's just one, as this is meant for
+ * utility commands), but the meaning of each element in the
+ * st_progress_param array is command-specific.
+ */
+ ProgressCommandType st_progress_command;
+ Oid st_progress_command_target;
+ int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+} PgBackendStatus;
+
+/*
+ * Macros to load and store st_changecount with the memory barriers.
+ *
+ * pgstat_increment_changecount_before() and
+ * pgstat_increment_changecount_after() need to be called before and after
+ * PgBackendStatus entries are modified, respectively. This makes sure that
+ * st_changecount is incremented around the modification.
+ *
+ * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
+ * need to be called before and after PgBackendStatus entries are copied into
+ * private memory, respectively.
+ */
+#define pgstat_increment_changecount_before(beentry) \
+ do { \
+ beentry->st_changecount++; \
+ pg_write_barrier(); \
+ } while (0)
+
+#define pgstat_increment_changecount_after(beentry) \
+ do { \
+ pg_write_barrier(); \
+ beentry->st_changecount++; \
+ Assert((beentry->st_changecount & 1) == 0); \
+ } while (0)
+
+#define pgstat_save_changecount_before(beentry, save_changecount) \
+ do { \
+ save_changecount = beentry->st_changecount; \
+ pg_read_barrier(); \
+ } while (0)
+
+#define pgstat_save_changecount_after(beentry, save_changecount) \
+ do { \
+ pg_read_barrier(); \
+ save_changecount = beentry->st_changecount; \
+ } while (0)
+
+/* ----------
+ * LocalPgBackendStatus
+ *
+ * When we build the backend status array, we use LocalPgBackendStatus to be
+ * able to add new values to the struct when needed without adding new fields
+ * to the shared memory. It contains the backend status as a first member.
+ * ----------
+ */
+typedef struct LocalPgBackendStatus
+{
+ /*
+ * Local version of the backend status entry.
+ */
+ PgBackendStatus backendStatus;
+
+ /*
+ * The xid of the current transaction if available, InvalidTransactionId
+ * if not.
+ */
+ TransactionId backend_xid;
+
+ /*
+ * The xmin of the current session if available, InvalidTransactionId if
+ * not.
+ */
+ TransactionId backend_xmin;
+} LocalPgBackendStatus;
+
+/* ----------
+ * GUC parameters
+ * ----------
+ */
+extern bool pgstat_track_activities;
+extern PGDLLIMPORT int pgstat_track_activity_query_size;
+
+/* ----------
+ * Functions called from backends
+ * ----------
+ */
+extern void pgstat_bestatus_clear_snapshot(void);
+extern void pgstat_bearray_initialize(void);
+extern void pgstat_bestart(void);
+
+extern const char *pgstat_get_wait_event(uint32 wait_event_info);
+extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
+extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
+extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
+ int buflen);
+extern const char *pgstat_get_backend_desc(BackendType backendType);
+
+extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
+  Oid relid);
+extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_update_multi_param(int nparam, const int *index,
+   const int64 *val);
+extern void pgstat_progress_end_command(void);
+
+extern char *pgstat_clip_activity(const char *raw_activity);
+
+/* ----------
+ * pgstat_report_wait_start() -
+ *
+ * Called from places where server process needs to wait.  This is called
+ * to report wait event information.  The wait information is stored
+ * as 4-bytes where first byte represents the wait event class (type of
+ * wait, for different types of wait, refer WaitClass) and the next
+ * 3-bytes represent the actual wait event.  Currently 2-bytes are used
+ * for wait event which is sufficient for current usage, 1-byte is
+ * reserved for future usage.
+ *
+ * NB: this *must* be able to survive being called before MyProc has been
+ * initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_wait_start(uint32 wait_event_info)
+{
+ volatile PGPROC *proc = MyProc;
+
+ if (!pgstat_track_activities || !proc)
+ return;
+
+ /*
+ * Since this is a four-byte field which is always read and written as
+ * four-bytes, updates are atomic.
+ */
+ proc->wait_event_info = wait_event_info;
+}
+
+/* ----------
+ * pgstat_report_wait_end() -
+ *
+ * Called to report end of a wait.
+ *
+ * NB: this *must* be able to survive being called before MyProc has been
+ * initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_wait_end(void)
+{
+ volatile PGPROC *proc = MyProc;
+
+ if (!pgstat_track_activities || !proc)
+ return;
+
+ /*
+ * Since this is a four-byte field which is always read and written as
+ * four-bytes, updates are atomic.
+ */
+ proc->wait_event_info = 0;
+}
+extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
+extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
+extern int pgstat_fetch_stat_numbackends(void);
+
+/* For shared memory allocation/initialize */
+extern Size BackendStatusShmemSize(void);
+extern void CreateSharedBackendStatus(void);
+
+void pgstat_report_xact_timestamp(TimestampTz tstamp);
+void pgstat_bestat_initialize(void);
+
+extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_appname(const char *appname);
+extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
+extern const char *pgstat_get_wait_event(uint32 wait_event_info);
+extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
+extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
+extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
+ int buflen);
+extern const char *pgstat_get_backend_desc(BackendType backendType);
+
+extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
+  Oid relid);
+extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_update_multi_param(int nparam, const int *index,
+   const int64 *val);
+extern void pgstat_progress_end_command(void);
+
+#endif /* BESTATUS_H */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d10ea5389b..6f4e94ab5b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1,7 +1,7 @@
 /* ----------
  * pgstat.h
  *
- * Definitions for the PostgreSQL statistics collector daemon.
+ * Definitions for the PostgreSQL statistics collector facility.
  *
  * Copyright (c) 2001-2019, PostgreSQL Global Development Group
  *
@@ -14,11 +14,8 @@
 #include "datatype/timestamp.h"
 #include "fmgr.h"
 #include "lib/dshash.h"
-#include "libpq/pqcomm.h"
-#include "port/atomics.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"
-#include "storage/proc.h"
 #include "utils/hsearch.h"
 #include "utils/relcache.h"
 
@@ -100,12 +97,11 @@ typedef enum PgStat_Single_Reset_Type
  RESET_FUNCTION
 } PgStat_Single_Reset_Type;
 
+
 /* ------------------------------------------------------------
  * Structures kept in backend local memory while accumulating counts
  * ------------------------------------------------------------
  */
-
-
 /* ----------
  * PgStat_TableStatus Per-table status within a backend
  *
@@ -173,10 +169,10 @@ typedef struct PgStat_BgWriter
  * PgStat_FunctionCounts The actual per-function counts kept by a backend
  *
  * This struct should contain only actual event counters, because we memcmp
- * it against zeroes to detect whether there are any counts to transmit.
+ * it against zeroes to detect whether there are any counts to apply.
  *
  * Note that the time counters are in instr_time format here.  We convert to
- * microseconds in PgStat_Counter format when transmitting to the collector.
+ * microseconds in PgStat_Counter format when applying to shared statsitics.
  * ----------
  */
 typedef struct PgStat_FunctionCounts
@@ -209,7 +205,7 @@ typedef struct PgStat_FunctionEntry
 } PgStat_FunctionEntry;
 
 /* ------------------------------------------------------------
- * Statistic collector data structures follow
+ * Statistic collector data structures on file and shared memory follow
  *
  * PGSTAT_FILE_FORMAT_ID should be changed whenever any of these
  * data structures change.
@@ -313,7 +309,7 @@ typedef struct PgStat_StatFuncEntry
 
 
 /*
- * Archiver statistics kept in the stats collector
+ * Archiver statistics kept in the shared stats
  */
 typedef struct PgStat_ArchiverStats
 {
@@ -329,7 +325,7 @@ typedef struct PgStat_ArchiverStats
 } PgStat_ArchiverStats;
 
 /*
- * Global statistics kept in the stats collector
+ * Global statistics kept in the shared stats
  */
 typedef struct PgStat_GlobalStats
 {
@@ -347,422 +343,6 @@ typedef struct PgStat_GlobalStats
  TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
-
-/* ----------
- * Backend types
- * ----------
- */
-typedef enum BackendType
-{
- B_AUTOVAC_LAUNCHER,
- B_AUTOVAC_WORKER,
- B_BACKEND,
- B_BG_WORKER,
- B_BG_WRITER,
- B_ARCHIVER,
- B_CHECKPOINTER,
- B_STARTUP,
- B_WAL_RECEIVER,
- B_WAL_SENDER,
- B_WAL_WRITER
-} BackendType;
-
-
-/* ----------
- * Backend states
- * ----------
- */
-typedef enum BackendState
-{
- STATE_UNDEFINED,
- STATE_IDLE,
- STATE_RUNNING,
- STATE_IDLEINTRANSACTION,
- STATE_FASTPATH,
- STATE_IDLEINTRANSACTION_ABORTED,
- STATE_DISABLED
-} BackendState;
-
-
-/* ----------
- * Wait Classes
- * ----------
- */
-#define PG_WAIT_LWLOCK 0x01000000U
-#define PG_WAIT_LOCK 0x03000000U
-#define PG_WAIT_BUFFER_PIN 0x04000000U
-#define PG_WAIT_ACTIVITY 0x05000000U
-#define PG_WAIT_CLIENT 0x06000000U
-#define PG_WAIT_EXTENSION 0x07000000U
-#define PG_WAIT_IPC 0x08000000U
-#define PG_WAIT_TIMEOUT 0x09000000U
-#define PG_WAIT_IO 0x0A000000U
-
-/* ----------
- * Wait Events - Activity
- *
- * Use this category when a process is waiting because it has no work to do,
- * unless the "Client" or "Timeout" category describes the situation better.
- * Typically, this should only be used for background processes.
- * ----------
- */
-typedef enum
-{
- WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
- WAIT_EVENT_AUTOVACUUM_MAIN,
- WAIT_EVENT_BGWRITER_HIBERNATE,
- WAIT_EVENT_BGWRITER_MAIN,
- WAIT_EVENT_CHECKPOINTER_MAIN,
- WAIT_EVENT_LOGICAL_APPLY_MAIN,
- WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
- WAIT_EVENT_PGSTAT_MAIN,
- WAIT_EVENT_RECOVERY_WAL_ALL,
- WAIT_EVENT_RECOVERY_WAL_STREAM,
- WAIT_EVENT_SYSLOGGER_MAIN,
- WAIT_EVENT_WAL_RECEIVER_MAIN,
- WAIT_EVENT_WAL_SENDER_MAIN,
- WAIT_EVENT_WAL_WRITER_MAIN
-} WaitEventActivity;
-
-/* ----------
- * Wait Events - Client
- *
- * Use this category when a process is waiting to send data to or receive data
- * from the frontend process to which it is connected.  This is never used for
- * a background process, which has no client connection.
- * ----------
- */
-typedef enum
-{
- WAIT_EVENT_CLIENT_READ = PG_WAIT_CLIENT,
- WAIT_EVENT_CLIENT_WRITE,
- WAIT_EVENT_LIBPQWALRECEIVER_CONNECT,
- WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE,
- WAIT_EVENT_SSL_OPEN_SERVER,
- WAIT_EVENT_WAL_RECEIVER_WAIT_START,
- WAIT_EVENT_WAL_SENDER_WAIT_WAL,
- WAIT_EVENT_WAL_SENDER_WRITE_DATA
-} WaitEventClient;
-
-/* ----------
- * Wait Events - IPC
- *
- * Use this category when a process cannot complete the work it is doing because
- * it is waiting for a notification from another process.
- * ----------
- */
-typedef enum
-{
- WAIT_EVENT_BGWORKER_SHUTDOWN = PG_WAIT_IPC,
- WAIT_EVENT_BGWORKER_STARTUP,
- WAIT_EVENT_BTREE_PAGE,
- WAIT_EVENT_CLOG_GROUP_UPDATE,
- WAIT_EVENT_EXECUTE_GATHER,
- WAIT_EVENT_HASH_BATCH_ALLOCATING,
- WAIT_EVENT_HASH_BATCH_ELECTING,
- WAIT_EVENT_HASH_BATCH_LOADING,
- WAIT_EVENT_HASH_BUILD_ALLOCATING,
- WAIT_EVENT_HASH_BUILD_ELECTING,
- WAIT_EVENT_HASH_BUILD_HASHING_INNER,
- WAIT_EVENT_HASH_BUILD_HASHING_OUTER,
- WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATING,
- WAIT_EVENT_HASH_GROW_BATCHES_DECIDING,
- WAIT_EVENT_HASH_GROW_BATCHES_ELECTING,
- WAIT_EVENT_HASH_GROW_BATCHES_FINISHING,
- WAIT_EVENT_HASH_GROW_BATCHES_REPARTITIONING,
- WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATING,
- WAIT_EVENT_HASH_GROW_BUCKETS_ELECTING,
- WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING,
- WAIT_EVENT_LOGICAL_SYNC_DATA,
- WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
- WAIT_EVENT_MQ_INTERNAL,
- WAIT_EVENT_MQ_PUT_MESSAGE,
- WAIT_EVENT_MQ_RECEIVE,
- WAIT_EVENT_MQ_SEND,
- WAIT_EVENT_PARALLEL_BITMAP_SCAN,
- WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN,
- WAIT_EVENT_PARALLEL_FINISH,
- WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
- WAIT_EVENT_PROMOTE,
- WAIT_EVENT_REPLICATION_ORIGIN_DROP,
- WAIT_EVENT_REPLICATION_SLOT_DROP,
- WAIT_EVENT_SAFE_SNAPSHOT,
- WAIT_EVENT_SYNC_REP
-} WaitEventIPC;
-
-/* ----------
- * Wait Events - Timeout
- *
- * Use this category when a process is waiting for a timeout to expire.
- * ----------
- */
-typedef enum
-{
- WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
- WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
-} WaitEventTimeout;
-
-/* ----------
- * Wait Events - IO
- *
- * Use this category when a process is waiting for a IO.
- * ----------
- */
-typedef enum
-{
- WAIT_EVENT_BUFFILE_READ = PG_WAIT_IO,
- WAIT_EVENT_BUFFILE_WRITE,
- WAIT_EVENT_CONTROL_FILE_READ,
- WAIT_EVENT_CONTROL_FILE_SYNC,
- WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE,
- WAIT_EVENT_CONTROL_FILE_WRITE,
- WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE,
- WAIT_EVENT_COPY_FILE_READ,
- WAIT_EVENT_COPY_FILE_WRITE,
- WAIT_EVENT_DATA_FILE_EXTEND,
- WAIT_EVENT_DATA_FILE_FLUSH,
- WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC,
- WAIT_EVENT_DATA_FILE_PREFETCH,
- WAIT_EVENT_DATA_FILE_READ,
- WAIT_EVENT_DATA_FILE_SYNC,
- WAIT_EVENT_DATA_FILE_TRUNCATE,
- WAIT_EVENT_DATA_FILE_WRITE,
- WAIT_EVENT_DSM_FILL_ZERO_WRITE,
- WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
- WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
- WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
- WAIT_EVENT_LOCK_FILE_CREATE_READ,
- WAIT_EVENT_LOCK_FILE_CREATE_SYNC,
- WAIT_EVENT_LOCK_FILE_CREATE_WRITE,
- WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ,
- WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC,
- WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC,
- WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE,
- WAIT_EVENT_LOGICAL_REWRITE_SYNC,
- WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE,
- WAIT_EVENT_LOGICAL_REWRITE_WRITE,
- WAIT_EVENT_RELATION_MAP_READ,
- WAIT_EVENT_RELATION_MAP_SYNC,
- WAIT_EVENT_RELATION_MAP_WRITE,
- WAIT_EVENT_REORDER_BUFFER_READ,
- WAIT_EVENT_REORDER_BUFFER_WRITE,
- WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ,
- WAIT_EVENT_REPLICATION_SLOT_READ,
- WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC,
- WAIT_EVENT_REPLICATION_SLOT_SYNC,
- WAIT_EVENT_REPLICATION_SLOT_WRITE,
- WAIT_EVENT_SLRU_FLUSH_SYNC,
- WAIT_EVENT_SLRU_READ,
- WAIT_EVENT_SLRU_SYNC,
- WAIT_EVENT_SLRU_WRITE,
- WAIT_EVENT_SNAPBUILD_READ,
- WAIT_EVENT_SNAPBUILD_SYNC,
- WAIT_EVENT_SNAPBUILD_WRITE,
- WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC,
- WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE,
- WAIT_EVENT_TIMELINE_HISTORY_READ,
- WAIT_EVENT_TIMELINE_HISTORY_SYNC,
- WAIT_EVENT_TIMELINE_HISTORY_WRITE,
- WAIT_EVENT_TWOPHASE_FILE_READ,
- WAIT_EVENT_TWOPHASE_FILE_SYNC,
- WAIT_EVENT_TWOPHASE_FILE_WRITE,
- WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
- WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
- WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
- WAIT_EVENT_WAL_COPY_READ,
- WAIT_EVENT_WAL_COPY_SYNC,
- WAIT_EVENT_WAL_COPY_WRITE,
- WAIT_EVENT_WAL_INIT_SYNC,
- WAIT_EVENT_WAL_INIT_WRITE,
- WAIT_EVENT_WAL_READ,
- WAIT_EVENT_WAL_SYNC,
- WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN,
- WAIT_EVENT_WAL_WRITE
-} WaitEventIO;
-
-/* ----------
- * Command type for progress reporting purposes
- * ----------
- */
-typedef enum ProgressCommandType
-{
- PROGRESS_COMMAND_INVALID,
- PROGRESS_COMMAND_VACUUM
-} ProgressCommandType;
-
-#define PGSTAT_NUM_PROGRESS_PARAM 10
-
-/* ----------
- * Shared-memory data structures
- * ----------
- */
-
-
-/*
- * PgBackendSSLStatus
- *
- * For each backend, we keep the SSL status in a separate struct, that
- * is only filled in if SSL is enabled.
- */
-typedef struct PgBackendSSLStatus
-{
- /* Information about SSL connection */
- int ssl_bits;
- bool ssl_compression;
- char ssl_version[NAMEDATALEN]; /* MUST be null-terminated */
- char ssl_cipher[NAMEDATALEN]; /* MUST be null-terminated */
- char ssl_clientdn[NAMEDATALEN]; /* MUST be null-terminated */
-} PgBackendSSLStatus;
-
-
-/* ----------
- * PgBackendStatus
- *
- * Each live backend maintains a PgBackendStatus struct in shared memory
- * showing its current activity.  (The structs are allocated according to
- * BackendId, but that is not critical.)  Note that the collector process
- * has no involvement in, or even access to, these structs.
- *
- * Each auxiliary process also maintains a PgBackendStatus struct in shared
- * memory.
- * ----------
- */
-typedef struct PgBackendStatus
-{
- /*
- * To avoid locking overhead, we use the following protocol: a backend
- * increments st_changecount before modifying its entry, and again after
- * finishing a modification.  A would-be reader should note the value of
- * st_changecount, copy the entry into private memory, then check
- * st_changecount again.  If the value hasn't changed, and if it's even,
- * the copy is valid; otherwise start over.  This makes updates cheap
- * while reads are potentially expensive, but that's the tradeoff we want.
- *
- * The above protocol needs the memory barriers to ensure that the
- * apparent order of execution is as it desires. Otherwise, for example,
- * the CPU might rearrange the code so that st_changecount is incremented
- * twice before the modification on a machine with weak memory ordering.
- * This surprising result can lead to bugs.
- */
- int st_changecount;
-
- /* The entry is valid iff st_procpid > 0, unused if st_procpid == 0 */
- int st_procpid;
-
- /* Type of backends */
- BackendType st_backendType;
-
- /* Times when current backend, transaction, and activity started */
- TimestampTz st_proc_start_timestamp;
- TimestampTz st_xact_start_timestamp;
- TimestampTz st_activity_start_timestamp;
- TimestampTz st_state_start_timestamp;
-
- /* Database OID, owning user's OID, connection client address */
- Oid st_databaseid;
- Oid st_userid;
- SockAddr st_clientaddr;
- char   *st_clienthostname; /* MUST be null-terminated */
-
- /* Information about SSL connection */
- bool st_ssl;
- PgBackendSSLStatus *st_sslstatus;
-
- /* current state */
- BackendState st_state;
-
- /* application name; MUST be null-terminated */
- char   *st_appname;
-
- /*
- * Current command string; MUST be null-terminated. Note that this string
- * possibly is truncated in the middle of a multi-byte character. As
- * activity strings are stored more frequently than read, that allows to
- * move the cost of correct truncation to the display side. Use
- * pgstat_clip_activity() to truncate correctly.
- */
- char   *st_activity_raw;
-
- /*
- * Command progress reporting.  Any command which wishes can advertise
- * that it is running by setting st_progress_command,
- * st_progress_command_target, and st_progress_param[].
- * st_progress_command_target should be the OID of the relation which the
- * command targets (we assume there's just one, as this is meant for
- * utility commands), but the meaning of each element in the
- * st_progress_param array is command-specific.
- */
- ProgressCommandType st_progress_command;
- Oid st_progress_command_target;
- int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
-} PgBackendStatus;
-
-/*
- * Macros to load and store st_changecount with the memory barriers.
- *
- * pgstat_increment_changecount_before() and
- * pgstat_increment_changecount_after() need to be called before and after
- * PgBackendStatus entries are modified, respectively. This makes sure that
- * st_changecount is incremented around the modification.
- *
- * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
- * need to be called before and after PgBackendStatus entries are copied into
- * private memory, respectively.
- */
-#define pgstat_increment_changecount_before(beentry) \
- do { \
- beentry->st_changecount++; \
- pg_write_barrier(); \
- } while (0)
-
-#define pgstat_increment_changecount_after(beentry) \
- do { \
- pg_write_barrier(); \
- beentry->st_changecount++; \
- Assert((beentry->st_changecount & 1) == 0); \
- } while (0)
-
-#define pgstat_save_changecount_before(beentry, save_changecount) \
- do { \
- save_changecount = beentry->st_changecount; \
- pg_read_barrier(); \
- } while (0)
-
-#define pgstat_save_changecount_after(beentry, save_changecount) \
- do { \
- pg_read_barrier(); \
- save_changecount = beentry->st_changecount; \
- } while (0)
-
-/* ----------
- * LocalPgBackendStatus
- *
- * When we build the backend status array, we use LocalPgBackendStatus to be
- * able to add new values to the struct when needed without adding new fields
- * to the shared memory. It contains the backend status as a first member.
- * ----------
- */
-typedef struct LocalPgBackendStatus
-{
- /*
- * Local version of the backend status entry.
- */
- PgBackendStatus backendStatus;
-
- /*
- * The xid of the current transaction if available, InvalidTransactionId
- * if not.
- */
- TransactionId backend_xid;
-
- /*
- * The xmin of the current session if available, InvalidTransactionId if
- * not.
- */
- TransactionId backend_xmin;
-} LocalPgBackendStatus;
-
 /*
  * Working state needed to accumulate per-function-call timing statistics.
  */
@@ -784,10 +364,8 @@ typedef struct PgStat_FunctionCallUsage
  * GUC parameters
  * ----------
  */
-extern bool pgstat_track_activities;
 extern bool pgstat_track_counts;
 extern int pgstat_track_functions;
-extern PGDLLIMPORT int pgstat_track_activity_query_size;
 extern char *pgstat_stat_directory;
 
 /* No longer used, but will be removed with GUC */
@@ -836,26 +414,9 @@ extern void pgstat_report_deadlock(void);
 extern void pgstat_clear_snapshot(void);
 
 extern void pgstat_initialize(void);
+extern void pgstat_bearray_initialize(void);
 extern void pgstat_bestart(void);
 
-extern void pgstat_report_activity(BackendState state, const char *cmd_str);
-extern void pgstat_report_tempfile(size_t filesize);
-extern void pgstat_report_appname(const char *appname);
-extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
-extern const char *pgstat_get_wait_event(uint32 wait_event_info);
-extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
-extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
-extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
- int buflen);
-extern const char *pgstat_get_backend_desc(BackendType backendType);
-
-extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
-  Oid relid);
-extern void pgstat_progress_update_param(int index, int64 val);
-extern void pgstat_progress_update_multi_param(int nparam, const int *index,
-   const int64 *val);
-extern void pgstat_progress_end_command(void);
-
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
 extern PgStat_BackendFunctionEntry *find_funcstat_entry(Oid func_id);
 
@@ -866,60 +427,6 @@ extern PgStat_StatDBEntry *backend_get_db_entry(Oid dbid, bool oneshot);
 extern HTAB *backend_snapshot_all_db_entries(void);
 extern PgStat_StatTabEntry *backend_get_tab_entry(PgStat_StatDBEntry *dbent, Oid relid, bool oneshot);
 
-/* ----------
- * pgstat_report_wait_start() -
- *
- * Called from places where server process needs to wait.  This is called
- * to report wait event information.  The wait information is stored
- * as 4-bytes where first byte represents the wait event class (type of
- * wait, for different types of wait, refer WaitClass) and the next
- * 3-bytes represent the actual wait event.  Currently 2-bytes are used
- * for wait event which is sufficient for current usage, 1-byte is
- * reserved for future usage.
- *
- * NB: this *must* be able to survive being called before MyProc has been
- * initialized.
- * ----------
- */
-static inline void
-pgstat_report_wait_start(uint32 wait_event_info)
-{
- volatile PGPROC *proc = MyProc;
-
- if (!pgstat_track_activities || !proc)
- return;
-
- /*
- * Since this is a four-byte field which is always read and written as
- * four-bytes, updates are atomic.
- */
- proc->wait_event_info = wait_event_info;
-}
-
-/* ----------
- * pgstat_report_wait_end() -
- *
- * Called to report end of a wait.
- *
- * NB: this *must* be able to survive being called before MyProc has been
- * initialized.
- * ----------
- */
-static inline void
-pgstat_report_wait_end(void)
-{
- volatile PGPROC *proc = MyProc;
-
- if (!pgstat_track_activities || !proc)
- return;
-
- /*
- * Since this is a four-byte field which is always read and written as
- * four-bytes, updates are atomic.
- */
- proc->wait_event_info = 0;
-}
-
 /* nontransactional event counts are simple enough to inline */
 
 #define pgstat_count_heap_scan(rel) \
@@ -987,6 +494,8 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 extern void pgstat_update_archiver(const char *xlog, bool failed);
 extern void pgstat_update_bgwriter(void);
 
+extern void pgstat_report_tempfile(size_t filesize);
+
 /* ----------
  * Support functions for the SQL-callable functions to
  * generate the pgstat* views.
@@ -994,10 +503,7 @@ extern void pgstat_update_bgwriter(void);
  */
 extern PgStat_StatDBEntry *pgstat_fetch_stat_dbentry(Oid relid, bool oneshot);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
-extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
-extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
 extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
-extern int pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 
--
2.16.3


From 824320b129e55f4111b033b42225ecdbe1576f7d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Wed, 4 Jul 2018 11:44:31 +0900
Subject: [PATCH 7/7] Documentation update

Remove all description on pg_stat_tmp directory from documentation.
---
 doc/src/sgml/backup.sgml     |  2 --
 doc/src/sgml/config.sgml     | 19 -------------------
 doc/src/sgml/monitoring.sgml |  7 +------
 doc/src/sgml/storage.sgml    |  3 +--
 4 files changed, 2 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..95285809c2 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1119,8 +1119,6 @@ SELECT pg_stop_backup();
     <filename>pg_snapshots/</filename>, <filename>pg_stat_tmp/</filename>,
     and <filename>pg_subtrans/</filename> (but not the directories themselves) can be
     omitted from the backup as they will be initialized on postmaster startup.
-    If <xref linkend="guc-stats-temp-directory"/> is set and is under the data
-    directory then the contents of that directory can also be omitted.
    </para>
 
    <para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..8a5291a18d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6671,25 +6671,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-stats-temp-directory" xreflabel="stats_temp_directory">
-      <term><varname>stats_temp_directory</varname> (<type>string</type>)
-      <indexterm>
-       <primary><varname>stats_temp_directory</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Sets the directory to store temporary statistics data in. This can be
-        a path relative to the data directory or an absolute path. The default
-        is <filename>pg_stat_tmp</filename>. Pointing this at a RAM-based
-        file system will decrease physical I/O requirements and can lead to
-        improved performance.
-        This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.
-       </para>
-      </listitem>
-     </varlistentry>
-
      </variablelist>
     </sect2>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 60a85a7898..fa483ef0f7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -197,12 +197,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
   <para>
    The statistics collector transmits the collected information to other
-   <productname>PostgreSQL</productname> processes through temporary files.
-   These files are stored in the directory named by the
-   <xref linkend="guc-stats-temp-directory"/> parameter,
-   <filename>pg_stat_tmp</filename> by default.
-   For better performance, <varname>stats_temp_directory</varname> can be
-   pointed at a RAM-based file system, decreasing physical I/O requirements.
+   <productname>PostgreSQL</productname> processes through shared memory.
    When the server shuts down cleanly, a permanent copy of the statistics
    data is stored in the <filename>pg_stat</filename> subdirectory, so that
    statistics can be retained across server restarts.  When recovery is
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 8ef2ac8010..e137e6b494 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -122,8 +122,7 @@ Item
 
 <row>
  <entry><filename>pg_stat_tmp</filename></entry>
- <entry>Subdirectory containing temporary files for the statistics
-  subsystem</entry>
+ <entry>Subdirectory containing ephemeral files for extensions</entry>
 </row>
 
 <row>
--
2.16.3

Reply | Threaded
Open this post in threaded view
|

Re: shared-memory based stats collector

Kyotaro HORIGUCHI-2
Hello.

At Mon, 21 Jan 2019 21:19:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>
> I'll reconsider the referer side of the stats.

The most significant cause of the slowdown is repeated search for
non-existent entries both on local and shared hash each time.

Negative cache in addition to cache expiration interval
eliminates the slowdown.

1000 times repetition with -O2 binary:

 master : 124.99 tps
 patched: 125.48 tps (+0.4%)

> I didn't merge the suggested two pairs of commits. I'll do that
> after adressing the slowdown issue.


I agree to commiting 0001-0003 separately. In the attached patch
set, old 0004+0006 are merged as 0004 and old 0005+0007 are
merged as new 0005.

Changed cacheing policy.
  Expired at every xact end
    -> Keeps at least for PGSTAT_STAT_MIN_INTERVAL (500ms).

Added negative cache feature (snapshot_statentry).

Improved separation between pgstat and bestat (separated AtEOXact_* functions).

Fixed doubious memory context usage.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center




From 7149e93d7b41af0c7ce1cddc847a9bb7bc31b1e7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Fri, 29 Jun 2018 16:41:04 +0900
Subject: [PATCH 1/5] sequential scan for dshash

Add sequential scan feature to dshash.
---
 src/backend/lib/dshash.c | 188 ++++++++++++++++++++++++++++++++++++++++++++++-
 src/include/lib/dshash.h |  23 +++++-
 2 files changed, 206 insertions(+), 5 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index f095196fb6..d1908a6137 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -112,6 +112,7 @@ struct dshash_table
  size_t size_log2; /* log2(number of buckets) */
  bool find_locked; /* Is any partition lock held by 'find'? */
  bool find_exclusively_locked; /* ... exclusively? */
+ bool seqscan_running;/* now under sequential scan */
 };
 
 /* Given a pointer to an item, find the entry (user data) it holds. */
@@ -127,6 +128,10 @@ struct dshash_table
 #define NUM_SPLITS(size_log2) \
  (size_log2 - DSHASH_NUM_PARTITIONS_LOG2)
 
+/* How many buckets are there in a given size? */
+#define NUM_BUCKETS(size_log2) \
+ (((size_t) 1) << (size_log2))
+
 /* How many buckets are there in each partition at a given size? */
 #define BUCKETS_PER_PARTITION(size_log2) \
  (((size_t) 1) << NUM_SPLITS(size_log2))
@@ -153,6 +158,10 @@ struct dshash_table
 #define BUCKET_INDEX_FOR_PARTITION(partition, size_log2) \
  ((partition) << NUM_SPLITS(size_log2))
 
+/* Choose partition based on bucket index. */
+#define PARTITION_FOR_BUCKET_INDEX(bucket_idx, size_log2) \
+ ((bucket_idx) >> NUM_SPLITS(size_log2))
+
 /* The head of the active bucket for a given hash value (lvalue). */
 #define BUCKET_FOR_HASH(hash_table, hash) \
  (hash_table->buckets[ \
@@ -228,6 +237,7 @@ dshash_create(dsa_area *area, const dshash_parameters *params, void *arg)
 
  hash_table->find_locked = false;
  hash_table->find_exclusively_locked = false;
+ hash_table->seqscan_running = false;
 
  /*
  * Set up the initial array of buckets.  Our initial size is the same as
@@ -279,6 +289,7 @@ dshash_attach(dsa_area *area, const dshash_parameters *params,
  hash_table->control = dsa_get_address(area, control);
  hash_table->find_locked = false;
  hash_table->find_exclusively_locked = false;
+ hash_table->seqscan_running = false;
  Assert(hash_table->control->magic == DSHASH_MAGIC);
 
  /*
@@ -324,7 +335,7 @@ dshash_destroy(dshash_table *hash_table)
  ensure_valid_bucket_pointers(hash_table);
 
  /* Free all the entries. */
- size = ((size_t) 1) << hash_table->size_log2;
+ size = NUM_BUCKETS(hash_table->size_log2);
  for (i = 0; i < size; ++i)
  {
  dsa_pointer item_pointer = hash_table->buckets[i];
@@ -549,9 +560,14 @@ dshash_delete_entry(dshash_table *hash_table, void *entry)
  LW_EXCLUSIVE));
 
  delete_item(hash_table, item);
- hash_table->find_locked = false;
- hash_table->find_exclusively_locked = false;
- LWLockRelease(PARTITION_LOCK(hash_table, partition));
+
+ /* We need to keep partition lock while sequential scan */
+ if (!hash_table->seqscan_running)
+ {
+ hash_table->find_locked = false;
+ hash_table->find_exclusively_locked = false;
+ LWLockRelease(PARTITION_LOCK(hash_table, partition));
+ }
 }
 
 /*
@@ -568,6 +584,8 @@ dshash_release_lock(dshash_table *hash_table, void *entry)
  Assert(LWLockHeldByMeInMode(PARTITION_LOCK(hash_table, partition_index),
  hash_table->find_exclusively_locked
  ? LW_EXCLUSIVE : LW_SHARED));
+ /* lock is under control of sequential scan */
+ Assert(!hash_table->seqscan_running);
 
  hash_table->find_locked = false;
  hash_table->find_exclusively_locked = false;
@@ -592,6 +610,168 @@ dshash_memhash(const void *v, size_t size, void *arg)
  return tag_hash(v, size);
 }
 
+/*
+ * dshash_seq_init/_next/_term
+ *           Sequentially scan trhough dshash table and return all the
+ *           elements one by one, return NULL when no more.
+ *
+ * dshash_seq_term should be called if and only if the scan is abandoned
+ * before completion; if dshash_seq_next returns NULL then it has already done
+ * the end-of-scan cleanup.
+ *
+ * On returning element, it is locked as is the case with dshash_find.
+ * However, the caller must not release the lock. The lock is released as
+ * necessary in continued scan.
+ *
+ * As opposed to the equivalent for dynanash, the caller is not supposed to
+ * delete the returned element before continuing the scan.
+ *
+ * If consistent is set for dshash_seq_init, the whole hash table is
+ * non-exclusively locked. Otherwise a part of the hash table is locked in the
+ * same mode (partition lock).
+ */
+void
+dshash_seq_init(dshash_seq_status *status, dshash_table *hash_table,
+ bool consistent, bool exclusive)
+{
+ /* allowed at most one scan at once */
+ Assert(!hash_table->seqscan_running);
+
+ status->hash_table = hash_table;
+ status->curbucket = 0;
+ status->nbuckets = 0;
+ status->curitem = NULL;
+ status->pnextitem = InvalidDsaPointer;
+ status->curpartition = -1;
+ status->consistent = consistent;
+ status->exclusive = exclusive;
+ hash_table->seqscan_running = true;
+
+ /*
+ * Protect all partitions from modification if the caller wants a
+ * consistent result.
+ */
+ if (consistent)
+ {
+ int i;
+
+ for (i = 0; i < DSHASH_NUM_PARTITIONS; ++i)
+ {
+ Assert(!LWLockHeldByMe(PARTITION_LOCK(hash_table, i)));
+
+ LWLockAcquire(PARTITION_LOCK(hash_table, i),
+  exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ }
+ ensure_valid_bucket_pointers(hash_table);
+ }
+}
+
+void *
+dshash_seq_next(dshash_seq_status *status)
+{
+ dsa_pointer next_item_pointer;
+
+ Assert(status->hash_table->seqscan_running);
+ if (status->curitem == NULL)
+ {
+ int partition;
+
+ Assert (status->curbucket == 0);
+ Assert(!status->hash_table->find_locked);
+
+ /* first shot. grab the first item. */
+ if (!status->consistent)
+ {
+ partition =
+ PARTITION_FOR_BUCKET_INDEX(status->curbucket,
+   status->hash_table->size_log2);
+ LWLockAcquire(PARTITION_LOCK(status->hash_table, partition),
+  status->exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ status->curpartition = partition;
+
+ /* resize doesn't happen from now until seq scan ends */
+ status->nbuckets =
+ NUM_BUCKETS(status->hash_table->control->size_log2);
+ ensure_valid_bucket_pointers(status->hash_table);
+ }
+
+ next_item_pointer = status->hash_table->buckets[status->curbucket];
+ }
+ else
+ next_item_pointer = status->pnextitem;
+
+ /* Move to the next bucket if we finished the current bucket */
+ while (!DsaPointerIsValid(next_item_pointer))
+ {
+ if (++status->curbucket >= status->nbuckets)
+ {
+ /* all buckets have been scanned. finsih. */
+ dshash_seq_term(status);
+ return NULL;
+ }
+
+ /* Also move parititon lock if needed */
+ if (!status->consistent)
+ {
+ int next_partition =
+ PARTITION_FOR_BUCKET_INDEX(status->curbucket,
+   status->hash_table->size_log2);
+
+ /* Move lock along with partition for the bucket */
+ if (status->curpartition != next_partition)
+ {
+ /*
+ * Take lock on the next partition then release the current,
+ * not in the reverse order. This is required to avoid
+ * resizing from happening during a sequential scan. Locks are
+ * taken in partition order so no dead lock happen with other
+ * seq scans or resizing.
+ */
+ LWLockAcquire(PARTITION_LOCK(status->hash_table,
+ next_partition),
+  status->exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ LWLockRelease(PARTITION_LOCK(status->hash_table,
+ status->curpartition));
+ status->curpartition = next_partition;
+ }
+ }
+
+ next_item_pointer = status->hash_table->buckets[status->curbucket];
+ }
+
+ status->curitem =
+ dsa_get_address(status->hash_table->area, next_item_pointer);
+ status->hash_table->find_locked = true;
+ status->hash_table->find_exclusively_locked = status->exclusive;
+
+ /*
+ * This item can be deleted by the caller. Store the next item for the
+ * next iteration for the occasion.
+ */
+ status->pnextitem = status->curitem->next;
+
+ return ENTRY_FROM_ITEM(status->curitem);
+}
+
+void
+dshash_seq_term(dshash_seq_status *status)
+{
+ Assert(status->hash_table->seqscan_running);
+ status->hash_table->find_locked = false;
+ status->hash_table->find_exclusively_locked = false;
+ status->hash_table->seqscan_running = false;
+
+ if (status->consistent)
+ {
+ int i;
+
+ for (i = 0; i < DSHASH_NUM_PARTITIONS; ++i)
+ LWLockRelease(PARTITION_LOCK(status->hash_table, i));
+ }
+ else if (status->curpartition >= 0)
+ LWLockRelease(PARTITION_LOCK(status->hash_table, status->curpartition));
+}
+
 /*
  * Print debugging information about the internal state of the hash table to
  * stderr.  The caller must hold no partition locks.
diff --git a/src/include/lib/dshash.h b/src/include/lib/dshash.h
index e5dfd57f0a..b80f3af995 100644
--- a/src/include/lib/dshash.h
+++ b/src/include/lib/dshash.h
@@ -59,6 +59,23 @@ typedef struct dshash_parameters
 struct dshash_table_item;
 typedef struct dshash_table_item dshash_table_item;
 
+/*
+ * Sequential scan state of dshash. The detail is exposed since the storage
+ * size should be known to users but it should be considered as an opaque
+ * type by callers.
+ */
+typedef struct dshash_seq_status
+{
+ dshash_table   *hash_table;
+ int curbucket;
+ int nbuckets;
+ dshash_table_item  *curitem;
+ dsa_pointer pnextitem;
+ int curpartition;
+ bool consistent;
+ bool exclusive;
+} dshash_seq_status;
+
 /* Creating, sharing and destroying from hash tables. */
 extern dshash_table *dshash_create(dsa_area *area,
   const dshash_parameters *params,
@@ -70,7 +87,6 @@ extern dshash_table *dshash_attach(dsa_area *area,
 extern void dshash_detach(dshash_table *hash_table);
 extern dshash_table_handle dshash_get_hash_table_handle(dshash_table *hash_table);
 extern void dshash_destroy(dshash_table *hash_table);
-
 /* Finding, creating, deleting entries. */
 extern void *dshash_find(dshash_table *hash_table,
  const void *key, bool exclusive);
@@ -80,6 +96,11 @@ extern bool dshash_delete_key(dshash_table *hash_table, const void *key);
 extern void dshash_delete_entry(dshash_table *hash_table, void *entry);
 extern void dshash_release_lock(dshash_table *hash_table, void *entry);
 
+/* seq scan support */
+extern void dshash_seq_init(dshash_seq_status *status, dshash_table *hash_table,
+ bool consistent, bool exclusive);
+extern void *dshash_seq_next(dshash_seq_status *status);
+extern void dshash_seq_term(dshash_seq_status *status);
 /* Convenience hash and compare functions wrapping memcmp and tag_hash. */
 extern int dshash_memcmp(const void *a, const void *b, size_t size, void *arg);
 extern dshash_hash dshash_memhash(const void *v, size_t size, void *arg);
--
2.16.3


From 8dafcc8293b856f42bc3a68fa792ea139fd8d0cf Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Thu, 27 Sep 2018 11:15:19 +0900
Subject: [PATCH 2/5] Add conditional lock feature to dshash

Dshash currently waits for lock unconditionally. This commit adds new
interfaces for dshash_find and dshash_find_or_insert. The new
interfaces have an extra parameter "nowait" taht commands not to wait
for lock.
---
 src/backend/lib/dshash.c | 69 +++++++++++++++++++++++++++++++++++++++++++-----
 src/include/lib/dshash.h |  6 ++++-
 2 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index d1908a6137..db8d6899af 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -394,19 +394,48 @@ dshash_get_hash_table_handle(dshash_table *hash_table)
  */
 void *
 dshash_find(dshash_table *hash_table, const void *key, bool exclusive)
+{
+ return dshash_find_extended(hash_table, key, exclusive, false, NULL);
+}
+
+/*
+ * Addition to dshash_find, returns immediately when nowait is true and lock
+ * was not acquired. Lock status is set to *lock_failed if any.
+ */
+void *
+dshash_find_extended(dshash_table *hash_table, const void *key,
+ bool exclusive, bool nowait, bool *lock_acquired)
 {
  dshash_hash hash;
  size_t partition;
  dshash_table_item *item;
 
+ /* allowing !nowait returning the result is just not sensible */
+ Assert(nowait || !lock_acquired);
+
  hash = hash_key(hash_table, key);
  partition = PARTITION_FOR_HASH(hash);
 
  Assert(hash_table->control->magic == DSHASH_MAGIC);
  Assert(!hash_table->find_locked);
 
- LWLockAcquire(PARTITION_LOCK(hash_table, partition),
-  exclusive ? LW_EXCLUSIVE : LW_SHARED);
+ if (nowait)
+ {
+ if (!LWLockConditionalAcquire(PARTITION_LOCK(hash_table, partition),
+  exclusive ? LW_EXCLUSIVE : LW_SHARED))
+ {
+ if (lock_acquired)
+ *lock_acquired = false;
+ return NULL;
+ }
+ }
+ else
+ LWLockAcquire(PARTITION_LOCK(hash_table, partition),
+  exclusive ? LW_EXCLUSIVE : LW_SHARED);
+
+ if (lock_acquired)
+ *lock_acquired = true;
+
  ensure_valid_bucket_pointers(hash_table);
 
  /* Search the active bucket. */
@@ -441,6 +470,22 @@ void *
 dshash_find_or_insert(dshash_table *hash_table,
   const void *key,
   bool *found)
+{
+ return dshash_find_or_insert_extended(hash_table, key, found, false);
+}
+
+/*
+ * Addition to dshash_find_or_insert, returns NULL if nowait is true and lock
+ * was not acquired.
+ *
+ * Notes above dshash_find_extended() regarding locking and error handling
+ * equally apply here.
+ */
+void *
+dshash_find_or_insert_extended(dshash_table *hash_table,
+   const void *key,
+   bool *found,
+   bool nowait)
 {
  dshash_hash hash;
  size_t partition_index;
@@ -455,8 +500,16 @@ dshash_find_or_insert(dshash_table *hash_table,
  Assert(!hash_table->find_locked);
 
 restart:
- LWLockAcquire(PARTITION_LOCK(hash_table, partition_index),
-  LW_EXCLUSIVE);
+ if (nowait)
+ {
+ if (!LWLockConditionalAcquire(
+ PARTITION_LOCK(hash_table, partition_index),
+ LW_EXCLUSIVE))
+ return NULL;
+ }
+ else
+ LWLockAcquire(PARTITION_LOCK(hash_table, partition_index),
+  LW_EXCLUSIVE);
  ensure_valid_bucket_pointers(hash_table);
 
  /* Search the active bucket. */
@@ -626,9 +679,11 @@ dshash_memhash(const void *v, size_t size, void *arg)
  * As opposed to the equivalent for dynanash, the caller is not supposed to
  * delete the returned element before continuing the scan.
  *
- * If consistent is set for dshash_seq_init, the whole hash table is
- * non-exclusively locked. Otherwise a part of the hash table is locked in the
- * same mode (partition lock).
+ * If consistent is set for dshash_seq_init, the all hash table
+ * partitions are locked in the requested mode (as determined by the
+ * exclusive flag), and the locks are held until the end of the scan.
+ * Otherwise the partition locks are acquired and released as needed
+ * during the scan (up to two partitions may be locked at the same time).
  */
 void
 dshash_seq_init(dshash_seq_status *status, dshash_table *hash_table,
diff --git a/src/include/lib/dshash.h b/src/include/lib/dshash.h
index b80f3af995..fe1d4d75c5 100644
--- a/src/include/lib/dshash.h
+++ b/src/include/lib/dshash.h
@@ -90,8 +90,12 @@ extern void dshash_destroy(dshash_table *hash_table);
 /* Finding, creating, deleting entries. */
 extern void *dshash_find(dshash_table *hash_table,
  const void *key, bool exclusive);
+extern void *dshash_find_extended(dshash_table *hash_table, const void *key,
+ bool exclusive, bool nowait, bool *lock_acquired);
 extern void *dshash_find_or_insert(dshash_table *hash_table,
-  const void *key, bool *found);
+ const void *key, bool *found);
+extern void *dshash_find_or_insert_extended(dshash_table *hash_table,
+ const void *key, bool *found, bool nowait);
 extern bool dshash_delete_key(dshash_table *hash_table, const void *key);
 extern void dshash_delete_entry(dshash_table *hash_table, void *entry);
 extern void dshash_release_lock(dshash_table *hash_table, void *entry);
--
2.16.3


From 90522c1de96ac84ba2ad7cc1ada47c7bb9f95e10 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Wed, 7 Nov 2018 16:53:49 +0900
Subject: [PATCH 3/5] Make archiver process an auxiliary process

This is a preliminary patch for shared-memory based stats collector.
Archiver process must be a auxiliary process since it uses shared
memory after stats data wes moved onto shared-memory. Make the process
an auxiliary process in order to make it work.
---
 src/backend/bootstrap/bootstrap.c   |  8 +++
 src/backend/postmaster/pgarch.c     | 98 +++++++++----------------------------
 src/backend/postmaster/pgstat.c     |  6 +++
 src/backend/postmaster/postmaster.c | 35 +++++++++----
 src/include/miscadmin.h             |  2 +
 src/include/pgstat.h                |  1 +
 src/include/postmaster/pgarch.h     |  4 +-
 7 files changed, 67 insertions(+), 87 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 63bb134949..df926d8dea 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -329,6 +329,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
  case BgWriterProcess:
  statmsg = pgstat_get_backend_desc(B_BG_WRITER);
  break;
+ case ArchiverProcess:
+ statmsg = pgstat_get_backend_desc(B_ARCHIVER);
+ break;
  case CheckpointerProcess:
  statmsg = pgstat_get_backend_desc(B_CHECKPOINTER);
  break;
@@ -456,6 +459,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
  BackgroundWriterMain();
  proc_exit(1); /* should never return */
 
+ case ArchiverProcess:
+ /* don't set signals, bgwriter has its own agenda */
+ PgArchiverMain();
+ proc_exit(1); /* should never return */
+
  case CheckpointerProcess:
  /* don't set signals, checkpointer has its own agenda */
  CheckpointerMain();
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index f84f882c4c..4342ebdab4 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -77,7 +77,6 @@
  * Local data
  * ----------
  */
-static time_t last_pgarch_start_time;
 static time_t last_sigterm_time = 0;
 
 /*
@@ -96,7 +95,6 @@ static volatile sig_atomic_t ready_to_stop = false;
 static pid_t pgarch_forkexec(void);
 #endif
 
-NON_EXEC_STATIC void PgArchiverMain(int argc, char *argv[]) pg_attribute_noreturn();
 static void pgarch_exit(SIGNAL_ARGS);
 static void ArchSigHupHandler(SIGNAL_ARGS);
 static void ArchSigTermHandler(SIGNAL_ARGS);
@@ -114,75 +112,6 @@ static void pgarch_archiveDone(char *xlog);
  * ------------------------------------------------------------
  */
 
-/*
- * pgarch_start
- *
- * Called from postmaster at startup or after an existing archiver
- * died.  Attempt to fire up a fresh archiver process.
- *
- * Returns PID of child process, or 0 if fail.
- *
- * Note: if fail, we will be called again from the postmaster main loop.
- */
-int
-pgarch_start(void)
-{
- time_t curtime;
- pid_t pgArchPid;
-
- /*
- * Do nothing if no archiver needed
- */
- if (!XLogArchivingActive())
- return 0;
-
- /*
- * Do nothing if too soon since last archiver start.  This is a safety
- * valve to protect against continuous respawn attempts if the archiver is
- * dying immediately at launch. Note that since we will be re-called from
- * the postmaster main loop, we will get another chance later.
- */
- curtime = time(NULL);
- if ((unsigned int) (curtime - last_pgarch_start_time) <
- (unsigned int) PGARCH_RESTART_INTERVAL)
- return 0;
- last_pgarch_start_time = curtime;
-
-#ifdef EXEC_BACKEND
- switch ((pgArchPid = pgarch_forkexec()))
-#else
- switch ((pgArchPid = fork_process()))
-#endif
- {
- case -1:
- ereport(LOG,
- (errmsg("could not fork archiver: %m")));
- return 0;
-
-#ifndef EXEC_BACKEND
- case 0:
- /* in postmaster child ... */
- InitPostmasterChild();
-
- /* Close the postmaster's sockets */
- ClosePostmasterPorts(false);
-
- /* Drop our connection to postmaster's shared memory, as well */
- dsm_detach_all();
- PGSharedMemoryDetach();
-
- PgArchiverMain(0, NULL);
- break;
-#endif
-
- default:
- return (int) pgArchPid;
- }
-
- /* shouldn't get here */
- return 0;
-}
-
 /* ------------------------------------------------------------
  * Local functions called by archiver follow
  * ------------------------------------------------------------
@@ -222,8 +151,8 @@ pgarch_forkexec(void)
  * The argc/argv parameters are valid only in EXEC_BACKEND case.  However,
  * since we don't use 'em, it hardly matters...
  */
-NON_EXEC_STATIC void
-PgArchiverMain(int argc, char *argv[])
+void
+PgArchiverMain(void)
 {
  /*
  * Ignore all signals usually bound to some action in the postmaster,
@@ -255,8 +184,27 @@ PgArchiverMain(int argc, char *argv[])
 static void
 pgarch_exit(SIGNAL_ARGS)
 {
- /* SIGQUIT means curl up and die ... */
- exit(1);
+ PG_SETMASK(&BlockSig);
+
+ /*
+ * We DO NOT want to run proc_exit() callbacks -- we're here because
+ * shared memory may be corrupted, so we don't want to try to clean up our
+ * transaction.  Just nail the windows shut and get out of town.  Now that
+ * there's an atexit callback to prevent third-party code from breaking
+ * things by calling exit() directly, we have to reset the callbacks
+ * explicitly to make this work as intended.
+ */
+ on_exit_reset();
+
+ /*
+ * Note we do exit(2) not exit(0).  This is to force the postmaster into a
+ * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
+ * backend.  This is necessary precisely because we don't clean up our
+ * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
+ * should ensure the postmaster sees this as a crash, too, but no harm in
+ * being doubly sure.)
+ */
+ exit(2);
 }
 
 /* SIGHUP signal handler for archiver process */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 13da412c59..d1fe052abf 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2857,6 +2857,9 @@ pgstat_bestart(void)
  case BgWriterProcess:
  beentry->st_backendType = B_BG_WRITER;
  break;
+ case ArchiverProcess:
+ beentry->st_backendType = B_ARCHIVER;
+ break;
  case CheckpointerProcess:
  beentry->st_backendType = B_CHECKPOINTER;
  break;
@@ -4119,6 +4122,9 @@ pgstat_get_backend_desc(BackendType backendType)
  case B_BG_WRITER:
  backendDesc = "background writer";
  break;
+ case B_ARCHIVER:
+ backendDesc = "archiver";
+ break;
  case B_CHECKPOINTER:
  backendDesc = "checkpointer";
  break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3052bbbc21..65eab02b3e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -146,7 +146,8 @@
 #define BACKEND_TYPE_AUTOVAC 0x0002 /* autovacuum worker process */
 #define BACKEND_TYPE_WALSND 0x0004 /* walsender process */
 #define BACKEND_TYPE_BGWORKER 0x0008 /* bgworker process */
-#define BACKEND_TYPE_ALL 0x000F /* OR of all the above */
+#define BACKEND_TYPE_ARCHIVER 0x0010 /* archiver process */
+#define BACKEND_TYPE_ALL 0x001F /* OR of all the above */
 
 #define BACKEND_TYPE_WORKER (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
 
@@ -539,6 +540,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 
 #define StartupDataBase() StartChildProcess(StartupProcess)
 #define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
+#define StartArchiver() StartChildProcess(ArchiverProcess)
 #define StartCheckpointer() StartChildProcess(CheckpointerProcess)
 #define StartWalWriter() StartChildProcess(WalWriterProcess)
 #define StartWalReceiver() StartChildProcess(WalReceiverProcess)
@@ -1757,7 +1759,7 @@ ServerLoop(void)
 
  /* If we have lost the archiver, try to start a new one. */
  if (PgArchPID == 0 && PgArchStartupAllowed())
- PgArchPID = pgarch_start();
+ PgArchPID = StartArchiver();
 
  /* If we need to signal the autovacuum launcher, do so now */
  if (avlauncher_needs_signal)
@@ -2920,7 +2922,7 @@ reaper(SIGNAL_ARGS)
  if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
  AutoVacPID = StartAutoVacLauncher();
  if (PgArchStartupAllowed() && PgArchPID == 0)
- PgArchPID = pgarch_start();
+ PgArchPID = StartArchiver();
  if (PgStatPID == 0)
  PgStatPID = pgstat_start();
 
@@ -3065,10 +3067,8 @@ reaper(SIGNAL_ARGS)
  {
  PgArchPID = 0;
  if (!EXIT_STATUS_0(exitstatus))
- LogChildExit(LOG, _("archiver process"),
- pid, exitstatus);
- if (PgArchStartupAllowed())
- PgArchPID = pgarch_start();
+ HandleChildCrash(pid, exitstatus,
+ _("archiver process"));
  continue;
  }
 
@@ -3314,7 +3314,7 @@ CleanupBackend(int pid,
 
 /*
  * HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
- * walwriter, autovacuum, or background worker.
+ * walwriter, autovacuum, archiver or background worker.
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
@@ -3519,6 +3519,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
  signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT));
  }
 
+ /* Take care of the archiver too */
+ if (pid == PgArchPID)
+ PgArchPID = 0;
+ else if (PgArchPID != 0 && take_action)
+ {
+ ereport(DEBUG2,
+ (errmsg_internal("sending %s to process %d",
+ (SendStop ? "SIGSTOP" : "SIGQUIT"),
+ (int) PgArchPID)));
+ signal_child(PgArchPID, (SendStop ? SIGSTOP : SIGQUIT));
+ }
+
  /*
  * Force a power-cycle of the pgarch process too.  (This isn't absolutely
  * necessary, but it seems like a good idea for robustness, and it
@@ -3795,6 +3807,7 @@ PostmasterStateMachine(void)
  Assert(CheckpointerPID == 0);
  Assert(WalWriterPID == 0);
  Assert(AutoVacPID == 0);
+ Assert(PgArchPID == 0);
  /* syslogger is not considered here */
  pmState = PM_NO_CHILDREN;
  }
@@ -5064,7 +5077,7 @@ sigusr1_handler(SIGNAL_ARGS)
  */
  Assert(PgArchPID == 0);
  if (XLogArchivingAlways())
- PgArchPID = pgarch_start();
+ PgArchPID = StartArchiver();
 
  /*
  * If we aren't planning to enter hot standby mode later, treat
@@ -5342,6 +5355,10 @@ StartChildProcess(AuxProcType type)
  ereport(LOG,
  (errmsg("could not fork background writer process: %m")));
  break;
+ case ArchiverProcess:
+ ereport(LOG,
+ (errmsg("could not fork archiver process: %m")));
+ break;
  case CheckpointerProcess:
  ereport(LOG,
  (errmsg("could not fork checkpointer process: %m")));
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index c9e35003a5..63a7653457 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -399,6 +399,7 @@ typedef enum
  BootstrapProcess,
  StartupProcess,
  BgWriterProcess,
+ ArchiverProcess,
  CheckpointerProcess,
  WalWriterProcess,
  WalReceiverProcess,
@@ -411,6 +412,7 @@ extern AuxProcType MyAuxProcType;
 #define AmBootstrapProcess() (MyAuxProcType == BootstrapProcess)
 #define AmStartupProcess() (MyAuxProcType == StartupProcess)
 #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess)
+#define AmArchiverProcess() (MyAuxProcType == ArchiverProcess)
 #define AmCheckpointerProcess() (MyAuxProcType == CheckpointerProcess)
 #define AmWalWriterProcess() (MyAuxProcType == WalWriterProcess)
 #define AmWalReceiverProcess() (MyAuxProcType == WalReceiverProcess)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 313ca5f3c3..f299d1d601 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -706,6 +706,7 @@ typedef enum BackendType
  B_BACKEND,
  B_BG_WORKER,
  B_BG_WRITER,
+ B_ARCHIVER,
  B_CHECKPOINTER,
  B_STARTUP,
  B_WAL_RECEIVER,
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 2474eac26a..88f16863d4 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,8 +32,6 @@
  */
 extern int pgarch_start(void);
 
-#ifdef EXEC_BACKEND
-extern void PgArchiverMain(int argc, char *argv[]) pg_attribute_noreturn();
-#endif
+extern void PgArchiverMain(void) pg_attribute_noreturn();
 
 #endif /* _PGARCH_H */
--
2.16.3


From db229efcd159a0dfcc5d3420b7dcba7f918c8419 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Mon, 12 Nov 2018 17:26:33 +0900
Subject: [PATCH 4/5] Shared-memory based stats collector

Previously activity statistics is shared via files on disk. Every
backend sends the numbers to the stats collector process via a socket.
It makes snapshots as a set of files on disk with a certain interval
then every backend reads them as necessary. It worked fine for
comparatively small set of statistics but the set is under the
pressure to growing up and the file size has reached the order of
megabytes. To deal with larger statistics set, this patch let backends
directly share the statistics via shared memory.
---
 contrib/pg_prewarm/autoprewarm.c                   |    2 +-
 contrib/pg_stat_statements/pg_stat_statements.c    |    1 +
 contrib/postgres_fdw/connection.c                  |    2 +-
 src/backend/Makefile                               |    2 +-
 src/backend/access/heap/rewriteheap.c              |    4 +-
 src/backend/access/heap/vacuumlazy.c               |    1 +
 src/backend/access/nbtree/nbtree.c                 |    2 +-
 src/backend/access/nbtree/nbtsort.c                |    2 +-
 src/backend/access/transam/clog.c                  |    2 +-
 src/backend/access/transam/parallel.c              |    2 +-
 src/backend/access/transam/slru.c                  |    2 +-
 src/backend/access/transam/timeline.c              |    2 +-
 src/backend/access/transam/twophase.c              |    2 +
 src/backend/access/transam/xact.c                  |    3 +
 src/backend/access/transam/xlog.c                  |    5 +-
 src/backend/access/transam/xlogfuncs.c             |    2 +-
 src/backend/access/transam/xlogutils.c             |    2 +-
 src/backend/bootstrap/bootstrap.c                  |    8 +-
 src/backend/executor/execParallel.c                |    2 +-
 src/backend/executor/nodeBitmapHeapscan.c          |    1 +
 src/backend/executor/nodeGather.c                  |    2 +-
 src/backend/executor/nodeHash.c                    |    2 +-
 src/backend/executor/nodeHashjoin.c                |    2 +-
 src/backend/libpq/be-secure-openssl.c              |    2 +-
 src/backend/libpq/be-secure.c                      |    2 +-
 src/backend/libpq/pqmq.c                           |    2 +-
 src/backend/postmaster/Makefile                    |    2 +-
 src/backend/postmaster/autovacuum.c                |   60 +-
 src/backend/postmaster/bgworker.c                  |    2 +-
 src/backend/postmaster/bgwriter.c                  |    5 +-
 src/backend/postmaster/checkpointer.c              |   25 +-
 src/backend/postmaster/pgarch.c                    |    5 +-
 src/backend/postmaster/pgstat.c                    | 6384 --------------------
 src/backend/postmaster/postmaster.c                |   86 +-
 src/backend/postmaster/syslogger.c                 |    2 +-
 src/backend/postmaster/walwriter.c                 |    2 +-
 src/backend/replication/basebackup.c               |    1 +
 .../libpqwalreceiver/libpqwalreceiver.c            |    2 +-
 src/backend/replication/logical/launcher.c         |    2 +-
 src/backend/replication/logical/origin.c           |    3 +-
 src/backend/replication/logical/reorderbuffer.c    |    2 +-
 src/backend/replication/logical/snapbuild.c        |    2 +-
 src/backend/replication/logical/tablesync.c        |   15 +-
 src/backend/replication/logical/worker.c           |   11 +-
 src/backend/replication/slot.c                     |    2 +-
 src/backend/replication/syncrep.c                  |    2 +-
 src/backend/replication/walreceiver.c              |    2 +-
 src/backend/replication/walsender.c                |    2 +-
 src/backend/statmon/Makefile                       |   17 +
 src/backend/statmon/bestatus.c                     | 1779 ++++++
 src/backend/statmon/pgstat.c                       | 3935 ++++++++++++
 src/backend/storage/buffer/bufmgr.c                |    9 +-
 src/backend/storage/file/buffile.c                 |    2 +-
 src/backend/storage/file/copydir.c                 |    2 +-
 src/backend/storage/file/fd.c                      |    1 +
 src/backend/storage/ipc/dsm.c                      |   24 +-
 src/backend/storage/ipc/dsm_impl.c                 |    2 +-
 src/backend/storage/ipc/ipci.c                     |    6 +
 src/backend/storage/ipc/latch.c                    |    2 +-
 src/backend/storage/ipc/procarray.c                |    2 +-
 src/backend/storage/ipc/shm_mq.c                   |    2 +-
 src/backend/storage/ipc/standby.c                  |    2 +-
 src/backend/storage/lmgr/deadlock.c                |    1 +
 src/backend/storage/lmgr/lwlock.c                  |    5 +-
 src/backend/storage/lmgr/lwlocknames.txt           |    1 +
 src/backend/storage/lmgr/predicate.c               |    2 +-
 src/backend/storage/lmgr/proc.c                    |    2 +-
 src/backend/storage/smgr/md.c                      |    2 +-
 src/backend/tcop/postgres.c                        |   28 +-
 src/backend/utils/adt/misc.c                       |    2 +-
 src/backend/utils/adt/pgstatfuncs.c                |   51 +-
 src/backend/utils/cache/relmapper.c                |    2 +-
 src/backend/utils/init/globals.c                   |    1 +
 src/backend/utils/init/miscinit.c                  |    2 +-
 src/backend/utils/init/postinit.c                  |   15 +
 src/backend/utils/misc/guc.c                       |    1 +
 src/bin/pg_basebackup/t/010_pg_basebackup.pl       |    2 +-
 src/include/bestatus.h                             |  545 ++
 src/include/miscadmin.h                            |    2 +-
 src/include/pgstat.h                               |  951 +--
 src/include/storage/dsm.h                          |    3 +
 src/include/storage/lwlock.h                       |    3 +
 src/include/utils/timeout.h                        |    1 +
 src/test/modules/worker_spi/worker_spi.c           |    2 +-
 84 files changed, 6588 insertions(+), 7501 deletions(-)
 delete mode 100644 src/backend/postmaster/pgstat.c
 create mode 100644 src/backend/statmon/Makefile
 create mode 100644 src/backend/statmon/bestatus.c
 create mode 100644 src/backend/statmon/pgstat.c
 create mode 100644 src/include/bestatus.h

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 45a5a26337..6296401b25 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -30,10 +30,10 @@
 
 #include "access/heapam.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "storage/buf_internals.h"
 #include "storage/dsm.h"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177ebaa2c..188d034387 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -62,6 +62,7 @@
 #include <unistd.h>
 
 #include "access/hash.h"
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 239d220c24..1ea71245df 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -15,11 +15,11 @@
 #include "postgres_fdw.h"
 
 #include "access/htup_details.h"
+#include "bestatus.h"
 #include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 478a96db9b..cc511672c9 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -20,7 +20,7 @@ include $(top_builddir)/src/Makefile.global
 SUBDIRS = access bootstrap catalog parser commands executor foreign lib libpq \
  main nodes optimizer partitioning port postmaster \
  regex replication rewrite \
- statistics storage tcop tsearch utils $(top_builddir)/src/timezone \
+ statistics statmon storage tcop tsearch utils $(top_builddir)/src/timezone \
  jit
 
 include $(srcdir)/common.mk
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f6b0f1b093..ef40a2e7a2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -115,12 +115,12 @@
 #include "access/xact.h"
 #include "access/xloginsert.h"
 
+#include "bestatus.h"
+
 #include "catalog/catalog.h"
 
 #include "lib/ilist.h"
 
-#include "pgstat.h"
-
 #include "replication/logical.h"
 #include "replication/slot.h"
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c09eb6eff8..189db9b8fd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -44,6 +44,7 @@
 #include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/storage.h"
 #include "commands/dbcommands.h"
 #include "commands/progress.h"
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 98917de2ef..69cd211369 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -22,10 +22,10 @@
 #include "access/nbtxlog.h"
 #include "access/relscan.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
-#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "storage/condition_variable.h"
 #include "storage/indexfsm.h"
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 5cc3cf57e2..a0173c19a8 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -64,9 +64,9 @@
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "bestatus.h"
 #include "catalog/index.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/smgr.h"
 #include "tcop/tcopprot.h" /* pgrminclude ignore */
 #include "utils/rel.h"
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index aa089d83fa..cf034ba333 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -38,8 +38,8 @@
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "pg_trace.h"
 #include "storage/proc.h"
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 9c55c20d6b..26d30b8853 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -19,6 +19,7 @@
 #include "access/session.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/pg_enum.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
@@ -29,7 +30,6 @@
 #include "libpq/pqmq.h"
 #include "miscadmin.h"
 #include "optimizer/planmain.h"
-#include "pgstat.h"
 #include "storage/ipc.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352b9c..a28fe474aa 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -54,7 +54,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/xlog.h"
-#include "pgstat.h"
+#include "bestatus.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
 #include "miscadmin.h"
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c96c8b60ba..bbe9c0eb5f 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -38,7 +38,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogdefs.h"
-#include "pgstat.h"
+#include "bestatus.h"
 #include "storage/fd.h"
 
 /*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 9a8a6bb119..0dc9f39424 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -87,6 +87,7 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "access/xlogreader.h"
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
 #include "funcapi.h"
@@ -1569,6 +1570,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
  PredicateLockTwoPhaseFinish(xid, isCommit);
 
  /* Count the prepared xact as committed or aborted */
+ AtEOXact_BEStatus(isCommit);
  AtEOXact_PgStat(isCommit);
 
  /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 18467d96d2..837f7e2be6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -30,6 +30,7 @@
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_enum.h"
 #include "catalog/storage.h"
@@ -2147,6 +2148,7 @@ CommitTransaction(void)
  AtEOXact_Files(true);
  AtEOXact_ComboCid();
  AtEOXact_HashTables(true);
+ AtEOXact_BEStatus(true);
  AtEOXact_PgStat(true);
  AtEOXact_Snapshot(true, false);
  AtEOXact_ApplyLauncher(true);
@@ -2641,6 +2643,7 @@ AbortTransaction(void)
  AtEOXact_Files(false);
  AtEOXact_ComboCid();
  AtEOXact_HashTables(false);
+ AtEOXact_BEStatus(false);
  AtEOXact_PgStat(false);
  AtEOXact_ApplyLauncher(false);
  pgstat_report_xact_timestamp(0);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ab7d804f0..4b4e3d07ac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -36,6 +36,7 @@
 #include "access/xloginsert.h"
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
@@ -8416,9 +8417,9 @@ LogCheckpointEnd(bool restartpoint)
  &sync_secs, &sync_usecs);
 
  /* Accumulate checkpoint timing summary data, in milliseconds. */
- BgWriterStats.m_checkpoint_write_time +=
+ BgWriterStats.checkpoint_write_time +=
  write_secs * 1000 + write_usecs / 1000;
- BgWriterStats.m_checkpoint_sync_time +=
+ BgWriterStats.checkpoint_sync_time +=
  sync_secs * 1000 + sync_usecs / 1000;
 
  /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index b35043bf71..683c41575f 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,9 +23,9 @@
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/pg_type.h"
+#include "bestatus.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..53fa4890e9 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -23,8 +23,8 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/smgr.h"
 #include "utils/guc.h"
 #include "utils/hsearch.h"
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index df926d8dea..fca62770ac 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -22,6 +22,7 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "bootstrap/bootstrap.h"
 #include "catalog/index.h"
 #include "catalog/pg_collation.h"
@@ -329,9 +330,6 @@ AuxiliaryProcessMain(int argc, char *argv[])
  case BgWriterProcess:
  statmsg = pgstat_get_backend_desc(B_BG_WRITER);
  break;
- case ArchiverProcess:
- statmsg = pgstat_get_backend_desc(B_ARCHIVER);
- break;
  case CheckpointerProcess:
  statmsg = pgstat_get_backend_desc(B_CHECKPOINTER);
  break;
@@ -341,6 +339,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
  case WalReceiverProcess:
  statmsg = pgstat_get_backend_desc(B_WAL_RECEIVER);
  break;
+ case ArchiverProcess:
+ statmsg = pgstat_get_backend_desc(B_ARCHIVER);
+ break;
  default:
  statmsg = "??? process";
  break;
@@ -417,6 +418,7 @@ AuxiliaryProcessMain(int argc, char *argv[])
  CreateAuxProcessResourceOwner();
 
  /* Initialize backend status information */
+ pgstat_bearray_initialize();
  pgstat_initialize();
  pgstat_bestart();
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d6cfd28ddc..a8d29d2d33 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -48,7 +48,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
-#include "pgstat.h"
+#include "bestatus.h"
 
 /*
  * Magic numbers for parallel executor communication.  We use constants
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index cd20abc141..3ad7238b5a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -41,6 +41,7 @@
 #include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "bestatus.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "miscadmin.h"
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 70a4e90a05..02d58c463c 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
 
 #include "access/relscan.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "executor/execdebug.h"
 #include "executor/execParallel.h"
 #include "executor/nodeGather.h"
@@ -39,7 +40,6 @@
 #include "executor/tqueue.h"
 #include "miscadmin.h"
 #include "optimizer/planmain.h"
-#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 856daf6a7f..5a47eb4601 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -28,6 +28,7 @@
 
 #include "access/htup_details.h"
 #include "access/parallel.h"
+#include "bestatus.h"
 #include "catalog/pg_statistic.h"
 #include "commands/tablespace.h"
 #include "executor/execdebug.h"
@@ -35,7 +36,6 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "port/atomics.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2098708864..898a7916b0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -108,12 +108,12 @@
 
 #include "access/htup_details.h"
 #include "access/parallel.h"
+#include "bestatus.h"
 #include "executor/executor.h"
 #include "executor/hashjoin.h"
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/sharedtuplestore.h"
 
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 789a975409..de15e0907f 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -36,9 +36,9 @@
 #include <openssl/ec.h>
 #endif
 
+#include "bestatus.h"
 #include "libpq/libpq.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/latch.h"
 #include "tcop/tcopprot.h"
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index a7def3168d..fa1cf6cffa 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -29,9 +29,9 @@
 #include <arpa/inet.h>
 #endif
 
+#include "bestatus.h"
 #include "libpq/libpq.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 #include "storage/ipc.h"
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index a9bd47d937..f79a70d6fe 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -13,11 +13,11 @@
 
 #include "postgres.h"
 
+#include "bestatus.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqmq.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..311e63017d 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
- pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+ pgarch.o postmaster.o startup.o syslogger.o walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 4cf67873b1..b1c723bf1c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -71,6 +71,7 @@
 #include "access/reloptions.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "bestatus.h"
 #include "catalog/dependency.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_database.h"
@@ -969,7 +970,7 @@ rebuild_database_list(Oid newdb)
  PgStat_StatDBEntry *entry;
 
  /* only consider this database if it has a pgstat entry */
- entry = pgstat_fetch_stat_dbentry(newdb);
+ entry = pgstat_fetch_stat_dbentry(newdb, true);
  if (entry != NULL)
  {
  /* we assume it isn't found because the hash was just created */
@@ -978,6 +979,7 @@ rebuild_database_list(Oid newdb)
  /* hash_search already filled in the key */
  db->adl_score = score++;
  /* next_worker is filled in later */
+ pfree(entry);
  }
  }
 
@@ -993,7 +995,7 @@ rebuild_database_list(Oid newdb)
  * skip databases with no stat entries -- in particular, this gets rid
  * of dropped databases
  */
- entry = pgstat_fetch_stat_dbentry(avdb->adl_datid);
+ entry = pgstat_fetch_stat_dbentry(avdb->adl_datid, true);
  if (entry == NULL)
  continue;
 
@@ -1005,6 +1007,7 @@ rebuild_database_list(Oid newdb)
  db->adl_score = score++;
  /* next_worker is filled in later */
  }
+ pfree(entry);
  }
 
  /* finally, insert all qualifying databases not previously inserted */
@@ -1017,7 +1020,7 @@ rebuild_database_list(Oid newdb)
  PgStat_StatDBEntry *entry;
 
  /* only consider databases with a pgstat entry */
- entry = pgstat_fetch_stat_dbentry(avdb->adw_datid);
+ entry = pgstat_fetch_stat_dbentry(avdb->adw_datid, true);
  if (entry == NULL)
  continue;
 
@@ -1029,6 +1032,7 @@ rebuild_database_list(Oid newdb)
  db->adl_score = score++;
  /* next_worker is filled in later */
  }
+ pfree(entry);
  }
  nelems = score;
 
@@ -1227,7 +1231,7 @@ do_start_worker(void)
  continue; /* ignore not-at-risk DBs */
 
  /* Find pgstat entry if any */
- tmp->adw_entry = pgstat_fetch_stat_dbentry(tmp->adw_datid);
+ tmp->adw_entry = pgstat_fetch_stat_dbentry(tmp->adw_datid, true);
 
  /*
  * Skip a database with no pgstat entry; it means it hasn't seen any
@@ -1265,16 +1269,22 @@ do_start_worker(void)
  break;
  }
  }
- if (skipit)
- continue;
+ if (!skipit)
+ {
+ /* Remember the db with oldest autovac time. */
+ if (avdb == NULL ||
+ tmp->adw_entry->last_autovac_time <
+ avdb->adw_entry->last_autovac_time)
+ {
+ if (avdb)
+ pfree(avdb->adw_entry);
+ avdb = tmp;
+ }
+ }
 
- /*
- * Remember the db with oldest autovac time.  (If we are here, both
- * tmp->entry and db->entry must be non-null.)
- */
- if (avdb == NULL ||
- tmp->adw_entry->last_autovac_time < avdb->adw_entry->last_autovac_time)
- avdb = tmp;
+ /* Immediately free it if not used */
+ if(avdb != tmp)
+ pfree(tmp->adw_entry);
  }
 
  /* Found a database -- process it */
@@ -1963,7 +1973,7 @@ do_autovacuum(void)
  * may be NULL if we couldn't find an entry (only happens if we are
  * forcing a vacuum for anti-wrap purposes).
  */
- dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId, true);
 
  /* Start a transaction so our commands have one to play into. */
  StartTransactionCommand();
@@ -2013,7 +2023,7 @@ do_autovacuum(void)
  MemoryContextSwitchTo(AutovacMemCxt);
 
  /* The database hash where pgstat keeps shared relations */
- shared = pgstat_fetch_stat_dbentry(InvalidOid);
+ shared = pgstat_fetch_stat_dbentry(InvalidOid, true);
 
  classRel = heap_open(RelationRelationId, AccessShareLock);
 
@@ -2099,6 +2109,8 @@ do_autovacuum(void)
  relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
+ if (tabentry)
+ pfree(tabentry);
 
  /* Relations that need work are added to table_oids */
  if (dovacuum || doanalyze)
@@ -2178,10 +2190,11 @@ do_autovacuum(void)
  /* Fetch the pgstat entry for this table */
  tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
  shared, dbentry);
-
  relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
+ if (tabentry)
+ pfree(tabentry);
 
  /* ignore analyze for toast tables */
  if (dovacuum)
@@ -2750,12 +2763,10 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
  if (isshared)
  {
  if (PointerIsValid(shared))
- tabentry = hash_search(shared->tables, &relid,
-   HASH_FIND, NULL);
+ tabentry = backend_get_tab_entry(shared, relid, true);
  }
  else if (PointerIsValid(dbentry))
- tabentry = hash_search(dbentry->tables, &relid,
-   HASH_FIND, NULL);
+ tabentry = backend_get_tab_entry(dbentry, relid, true);
 
  return tabentry;
 }
@@ -2787,8 +2798,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  /* use fresh stats */
  autovac_refresh_stats();
 
- shared = pgstat_fetch_stat_dbentry(InvalidOid);
- dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+ shared = pgstat_fetch_stat_dbentry(InvalidOid, true);
+ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId, true);
 
  /* fetch the relation's relcache entry */
  classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2819,6 +2830,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
+ if (tabentry)
+ pfree(tabentry);
 
  /* ignore ANALYZE for toast tables */
  if (classForm->relkind == RELKIND_TOASTVALUE)
@@ -2909,7 +2922,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  }
 
  heap_freetuple(classTup);
-
+ pfree(shared);
+ pfree(dbentry);
  return tab;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f5db5a8c4a..7d7d55ef1a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -16,8 +16,8 @@
 
 #include "libpq/pqsignal.h"
 #include "access/parallel.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index e6b6c549de..c820d35fbc 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -40,6 +40,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -267,9 +268,9 @@ BackgroundWriterMain(void)
  can_hibernate = BgBufferSync(&wb_context);
 
  /*
- * Send off activity statistics to the stats collector
+ * Update activity statistics.
  */
- pgstat_send_bgwriter();
+ pgstat_update_bgwriter();
 
  if (FirstCallSinceLastCheckpoint())
  {
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fe96c41359..b592560dd2 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -43,6 +43,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -371,7 +372,7 @@ CheckpointerMain(void)
  {
  checkpoint_requested = false;
  do_checkpoint = true;
- BgWriterStats.m_requested_checkpoints++;
+ BgWriterStats.requested_checkpoints++;
  }
  if (shutdown_requested)
  {
@@ -397,7 +398,7 @@ CheckpointerMain(void)
  if (elapsed_secs >= CheckPointTimeout)
  {
  if (!do_checkpoint)
- BgWriterStats.m_timed_checkpoints++;
+ BgWriterStats.timed_checkpoints++;
  do_checkpoint = true;
  flags |= CHECKPOINT_CAUSE_TIME;
  }
@@ -515,13 +516,13 @@ CheckpointerMain(void)
  CheckArchiveTimeout();
 
  /*
- * Send off activity statistics to the stats collector.  (The reason
- * why we re-use bgwriter-related code for this is that the bgwriter
- * and checkpointer used to be just one process.  It's probably not
- * worth the trouble to split the stats support into two independent
- * stats message types.)
+ * Update activity statistics.  (The reason why we re-use
+ * bgwriter-related code for this is that the bgwriter and
+ * checkpointer used to be just one process.  It's probably not worth
+ * the trouble to split the stats support into two independent
+ * functions.)
  */
- pgstat_send_bgwriter();
+ pgstat_update_bgwriter();
 
  /*
  * Sleep until we are signaled or it's time for another checkpoint or
@@ -682,9 +683,9 @@ CheckpointWriteDelay(int flags, double progress)
  CheckArchiveTimeout();
 
  /*
- * Report interim activity statistics to the stats collector.
+ * Register interim activity statistics.
  */
- pgstat_send_bgwriter();
+ pgstat_update_bgwriter();
 
  /*
  * This sleep used to be connected to bgwriter_delay, typically 200ms.
@@ -1284,8 +1285,8 @@ AbsorbFsyncRequests(void)
  LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
  /* Transfer stats counts into pending pgstats message */
- BgWriterStats.m_buf_written_backend += CheckpointerShmem->num_backend_writes;
- BgWriterStats.m_buf_fsync_backend += CheckpointerShmem->num_backend_fsync;
+ BgWriterStats.buf_written_backend += CheckpointerShmem->num_backend_writes;
+ BgWriterStats.buf_fsync_backend += CheckpointerShmem->num_backend_fsync;
 
  CheckpointerShmem->num_backend_writes = 0;
  CheckpointerShmem->num_backend_fsync = 0;
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 4342ebdab4..2a7c4fd1b1 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -35,6 +35,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -468,7 +469,7 @@ pgarch_ArchiverCopyLoop(void)
  * Tell the collector about the WAL file that we successfully
  * archived
  */
- pgstat_send_archiver(xlog, false);
+ pgstat_update_archiver(xlog, false);
 
  break; /* out of inner retry loop */
  }
@@ -478,7 +479,7 @@ pgarch_ArchiverCopyLoop(void)
  * Tell the collector about the WAL file that we failed to
  * archive
  */
- pgstat_send_archiver(xlog, true);
+ pgstat_update_archiver(xlog, true);
 
  if (++failures >= NUM_ARCHIVE_RETRIES)
  {
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
deleted file mode 100644
index d1fe052abf..0000000000
--- a/src/backend/postmaster/pgstat.c
+++ /dev/null
@@ -1,6384 +0,0 @@
-/* ----------
- * pgstat.c
- *
- * All the statistics collector stuff hacked up in one big, ugly file.
- *
- * TODO: - Separate collector, postmaster and backend stuff
- *  into different files.
- *
- * - Add some automatic call for pgstat vacuuming.
- *
- * - Add a pgstat config column to pg_database, so this
- *  entire thing can be enabled/disabled on a per db basis.
- *
- * Copyright (c) 2001-2019, PostgreSQL Global Development Group
- *
- * src/backend/postmaster/pgstat.c
- * ----------
- */
-#include "postgres.h"
-
-#include <unistd.h>
-#include <fcntl.h>
-#include <sys/param.h>
-#include <sys/time.h>
-#include <sys/socket.h>
-#include <netdb.h>
-#include <netinet/in.h>
-#include <arpa/inet.h>
-#include <signal.h>
-#include <time.h>
-#ifdef HAVE_SYS_SELECT_H
-#include <sys/select.h>
-#endif
-
-#include "pgstat.h"
-
-#include "access/heapam.h"
-#include "access/htup_details.h"
-#include "access/transam.h"
-#include "access/twophase_rmgr.h"
-#include "access/xact.h"
-#include "catalog/pg_database.h"
-#include "catalog/pg_proc.h"
-#include "common/ip.h"
-#include "libpq/libpq.h"
-#include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
-#include "miscadmin.h"
-#include "pg_trace.h"
-#include "postmaster/autovacuum.h"
-#include "postmaster/fork_process.h"
-#include "postmaster/postmaster.h"
-#include "replication/walsender.h"
-#include "storage/backendid.h"
-#include "storage/dsm.h"
-#include "storage/fd.h"
-#include "storage/ipc.h"
-#include "storage/latch.h"
-#include "storage/lmgr.h"
-#include "storage/pg_shmem.h"
-#include "storage/procsignal.h"
-#include "storage/sinvaladt.h"
-#include "utils/ascii.h"
-#include "utils/guc.h"
-#include "utils/memutils.h"
-#include "utils/ps_status.h"
-#include "utils/rel.h"
-#include "utils/snapmgr.h"
-#include "utils/timestamp.h"
-#include "utils/tqual.h"
-
-
-/* ----------
- * Timer definitions.
- * ----------
- */
-#define PGSTAT_STAT_INTERVAL 500 /* Minimum time between stats file
- * updates; in milliseconds. */
-
-#define PGSTAT_RETRY_DELAY 10 /* How long to wait between checks for a
- * new file; in milliseconds. */
-
-#define PGSTAT_MAX_WAIT_TIME 10000 /* Maximum time to wait for a stats
- * file update; in milliseconds. */
-
-#define PGSTAT_INQ_INTERVAL 640 /* How often to ping the collector for a
- * new file; in milliseconds. */
-
-#define PGSTAT_RESTART_INTERVAL 60 /* How often to attempt to restart a
- * failed statistics collector; in
- * seconds. */
-
-#define PGSTAT_POLL_LOOP_COUNT (PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY)
-#define PGSTAT_INQ_LOOP_COUNT (PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY)
-
-/* Minimum receive buffer size for the collector's socket. */
-#define PGSTAT_MIN_RCVBUF (100 * 1024)
-
-
-/* ----------
- * The initial size hints for the hash tables used in the collector.
- * ----------
- */
-#define PGSTAT_DB_HASH_SIZE 16
-#define PGSTAT_TAB_HASH_SIZE 512
-#define PGSTAT_FUNCTION_HASH_SIZE 512
-
-
-/* ----------
- * Total number of backends including auxiliary
- *
- * We reserve a slot for each possible BackendId, plus one for each
- * possible auxiliary process type.  (This scheme assumes there is not
- * more than one of any auxiliary process type at a time.) MaxBackends
- * includes autovacuum workers and background workers as well.
- * ----------
- */
-#define NumBackendStatSlots (MaxBackends + NUM_AUXPROCTYPES)
-
-
-/* ----------
- * GUC parameters
- * ----------
- */
-bool pgstat_track_activities = false;
-bool pgstat_track_counts = false;
-int pgstat_track_functions = TRACK_FUNC_OFF;
-int pgstat_track_activity_query_size = 1024;
-
-/* ----------
- * Built from GUC parameter
- * ----------
- */
-char   *pgstat_stat_directory = NULL;
-char   *pgstat_stat_filename = NULL;
-char   *pgstat_stat_tmpname = NULL;
-
-/*
- * BgWriter global statistics counters (unused in other processes).
- * Stored directly in a stats message structure so it can be sent
- * without needing to copy things around.  We assume this inits to zeroes.
- */
-PgStat_MsgBgWriter BgWriterStats;
-
-/* ----------
- * Local data
- * ----------
- */
-NON_EXEC_STATIC pgsocket pgStatSock = PGINVALID_SOCKET;
-
-static struct sockaddr_storage pgStatAddr;
-
-static time_t last_pgstat_start_time;
-
-static bool pgStatRunningInCollector = false;
-
-/*
- * Structures in which backends store per-table info that's waiting to be
- * sent to the collector.
- *
- * NOTE: once allocated, TabStatusArray structures are never moved or deleted
- * for the life of the backend.  Also, we zero out the t_id fields of the
- * contained PgStat_TableStatus structs whenever they are not actively in use.
- * This allows relcache pgstat_info pointers to be treated as long-lived data,
- * avoiding repeated searches in pgstat_initstats() when a relation is
- * repeatedly opened during a transaction.
- */
-#define TABSTAT_QUANTUM 100 /* we alloc this many at a time */
-
-typedef struct TabStatusArray
-{
- struct TabStatusArray *tsa_next; /* link to next array, if any */
- int tsa_used; /* # entries currently used */
- PgStat_TableStatus tsa_entries[TABSTAT_QUANTUM]; /* per-table data */
-} TabStatusArray;
-
-static TabStatusArray *pgStatTabList = NULL;
-
-/*
- * pgStatTabHash entry: map from relation OID to PgStat_TableStatus pointer
- */
-typedef struct TabStatHashEntry
-{
- Oid t_id;
- PgStat_TableStatus *tsa_entry;
-} TabStatHashEntry;
-
-/*
- * Hash table for O(1) t_id -> tsa_entry lookup
- */
-static HTAB *pgStatTabHash = NULL;
-
-/*
- * Backends store per-function info that's waiting to be sent to the collector
- * in this hash table (indexed by function OID).
- */
-static HTAB *pgStatFunctions = NULL;
-
-/*
- * Indicates if backend has some function stats that it hasn't yet
- * sent to the collector.
- */
-static bool have_function_stats = false;
-
-/*
- * Tuple insertion/deletion counts for an open transaction can't be propagated
- * into PgStat_TableStatus counters until we know if it is going to commit
- * or abort.  Hence, we keep these counts in per-subxact structs that live
- * in TopTransactionContext.  This data structure is designed on the assumption
- * that subxacts won't usually modify very many tables.
- */
-typedef struct PgStat_SubXactStatus
-{
- int nest_level; /* subtransaction nest level */
- struct PgStat_SubXactStatus *prev; /* higher-level subxact if any */
- PgStat_TableXactStatus *first; /* head of list for this subxact */
-} PgStat_SubXactStatus;
-
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
-
-static int pgStatXactCommit = 0;
-static int pgStatXactRollback = 0;
-PgStat_Counter pgStatBlockReadTime = 0;
-PgStat_Counter pgStatBlockWriteTime = 0;
-
-/* Record that's written to 2PC state file when pgstat state is persisted */
-typedef struct TwoPhasePgStatRecord
-{
- PgStat_Counter tuples_inserted; /* tuples inserted in xact */
- PgStat_Counter tuples_updated; /* tuples updated in xact */
- PgStat_Counter tuples_deleted; /* tuples deleted in xact */
- PgStat_Counter inserted_pre_trunc; /* tuples inserted prior to truncate */
- PgStat_Counter updated_pre_trunc; /* tuples updated prior to truncate */
- PgStat_Counter deleted_pre_trunc; /* tuples deleted prior to truncate */
- Oid t_id; /* table's OID */
- bool t_shared; /* is it a shared catalog? */
- bool t_truncated; /* was the relation truncated? */
-} TwoPhasePgStatRecord;
-
-/*
- * Info about current "snapshot" of stats file
- */
-static MemoryContext pgStatLocalContext = NULL;
-static HTAB *pgStatDBHash = NULL;
-
-/* Status for backends including auxiliary */
-static LocalPgBackendStatus *localBackendStatusTable = NULL;
-
-/* Total number of backends including auxiliary */
-static int localNumBackends = 0;
-
-/*
- * Cluster wide statistics, kept in the stats collector.
- * Contains statistics that are not collected per database
- * or per table.
- */
-static PgStat_ArchiverStats archiverStats;
-static PgStat_GlobalStats globalStats;
-
-/*
- * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
- * it means to write only the shared-catalog stats ("DB 0"); otherwise, we
- * will write both that DB's data and the shared stats.
- */
-static List *pending_write_requests = NIL;
-
-/* Signal handler flags */
-static volatile bool need_exit = false;
-static volatile bool got_SIGHUP = false;
-
-/*
- * Total time charged to functions so far in the current backend.
- * We use this to help separate "self" and "other" time charges.
- * (We assume this initializes to zero.)
- */
-static instr_time total_func_time;
-
-
-/* ----------
- * Local function forward declarations
- * ----------
- */
-#ifdef EXEC_BACKEND
-static pid_t pgstat_forkexec(void);
-#endif
-
-NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
-static void pgstat_exit(SIGNAL_ARGS);
-static void pgstat_beshutdown_hook(int code, Datum arg);
-static void pgstat_sighup_handler(SIGNAL_ARGS);
-
-static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
-static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
- Oid tableoid, bool create);
-static void pgstat_write_statsfiles(bool permanent, bool allDbs);
-static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
-static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
-static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
-static void backend_read_statsfile(void);
-static void pgstat_read_current_status(void);
-
-static bool pgstat_write_statsfile_needed(void);
-static bool pgstat_db_requested(Oid databaseid);
-
-static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
-static void pgstat_send_funcstats(void);
-static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
-
-static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
-
-static void pgstat_setup_memcxt(void);
-
-static const char *pgstat_get_wait_activity(WaitEventActivity w);
-static const char *pgstat_get_wait_client(WaitEventClient w);
-static const char *pgstat_get_wait_ipc(WaitEventIPC w);
-static const char *pgstat_get_wait_timeout(WaitEventTimeout w);
-static const char *pgstat_get_wait_io(WaitEventIO w);
-
-static void pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype);
-static void pgstat_send(void *msg, int len);
-
-static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len);
-static void pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len);
-static void pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len);
-static void pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len);
-static void pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len);
-static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len);
-static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
-static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
-static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
-static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
-static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
-static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
-static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
-static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
-static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
-static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
-static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
-
-/* ------------------------------------------------------------
- * Public functions called from postmaster follow
- * ------------------------------------------------------------
- */
-
-/* ----------
- * pgstat_init() -
- *
- * Called from postmaster at startup. Create the resources required
- * by the statistics collector process.  If unable to do so, do not
- * fail --- better to let the postmaster start with stats collection
- * disabled.
- * ----------
- */
-void
-pgstat_init(void)
-{
- ACCEPT_TYPE_ARG3 alen;
- struct addrinfo *addrs = NULL,
-   *addr,
- hints;
- int ret;
- fd_set rset;
- struct timeval tv;
- char test_byte;
- int sel_res;
- int tries = 0;
-
-#define TESTBYTEVAL ((char) 199)
-
- /*
- * This static assertion verifies that we didn't mess up the calculations
- * involved in selecting maximum payload sizes for our UDP messages.
- * Because the only consequence of overrunning PGSTAT_MAX_MSG_SIZE would
- * be silent performance loss from fragmentation, it seems worth having a
- * compile-time cross-check that we didn't.
- */
- StaticAssertStmt(sizeof(PgStat_Msg) <= PGSTAT_MAX_MSG_SIZE,
- "maximum stats message size exceeds PGSTAT_MAX_MSG_SIZE");
-
- /*
- * Create the UDP socket for sending and receiving statistic messages
- */
- hints.ai_flags = AI_PASSIVE;
- hints.ai_family = AF_UNSPEC;
- hints.ai_socktype = SOCK_DGRAM;
- hints.ai_protocol = 0;
- hints.ai_addrlen = 0;
- hints.ai_addr = NULL;
- hints.ai_canonname = NULL;
- hints.ai_next = NULL;
- ret = pg_getaddrinfo_all("localhost", NULL, &hints, &addrs);
- if (ret || !addrs)
- {
- ereport(LOG,
- (errmsg("could not resolve \"localhost\": %s",
- gai_strerror(ret))));
- goto startup_failed;
- }
-
- /*
- * On some platforms, pg_getaddrinfo_all() may return multiple addresses
- * only one of which will actually work (eg, both IPv6 and IPv4 addresses
- * when kernel will reject IPv6).  Worse, the failure may occur at the
- * bind() or perhaps even connect() stage.  So we must loop through the
- * results till we find a working combination. We will generate LOG
- * messages, but no error, for bogus combinations.
- */
- for (addr = addrs; addr; addr = addr->ai_next)
- {
-#ifdef HAVE_UNIX_SOCKETS
- /* Ignore AF_UNIX sockets, if any are returned. */
- if (addr->ai_family == AF_UNIX)
- continue;
-#endif
-
- if (++tries > 1)
- ereport(LOG,
- (errmsg("trying another address for the statistics collector")));
-
- /*
- * Create the socket.
- */
- if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) == PGINVALID_SOCKET)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not create socket for statistics collector: %m")));
- continue;
- }
-
- /*
- * Bind it to a kernel assigned port on localhost and get the assigned
- * port via getsockname().
- */
- if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not bind socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- alen = sizeof(pgStatAddr);
- if (getsockname(pgStatSock, (struct sockaddr *) &pgStatAddr, &alen) < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not get address of socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /*
- * Connect the socket to its own address.  This saves a few cycles by
- * not having to respecify the target address on every send. This also
- * provides a kernel-level check that only packets from this same
- * address will be received.
- */
- if (connect(pgStatSock, (struct sockaddr *) &pgStatAddr, alen) < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not connect socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /*
- * Try to send and receive a one-byte test message on the socket. This
- * is to catch situations where the socket can be created but will not
- * actually pass data (for instance, because kernel packet filtering
- * rules prevent it).
- */
- test_byte = TESTBYTEVAL;
-
-retry1:
- if (send(pgStatSock, &test_byte, 1, 0) != 1)
- {
- if (errno == EINTR)
- goto retry1; /* if interrupted, just retry */
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not send test message on socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /*
- * There could possibly be a little delay before the message can be
- * received.  We arbitrarily allow up to half a second before deciding
- * it's broken.
- */
- for (;;) /* need a loop to handle EINTR */
- {
- FD_ZERO(&rset);
- FD_SET(pgStatSock, &rset);
-
- tv.tv_sec = 0;
- tv.tv_usec = 500000;
- sel_res = select(pgStatSock + 1, &rset, NULL, NULL, &tv);
- if (sel_res >= 0 || errno != EINTR)
- break;
- }
- if (sel_res < 0)
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("select() failed in statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
- if (sel_res == 0 || !FD_ISSET(pgStatSock, &rset))
- {
- /*
- * This is the case we actually think is likely, so take pains to
- * give a specific message for it.
- *
- * errno will not be set meaningfully here, so don't use it.
- */
- ereport(LOG,
- (errcode(ERRCODE_CONNECTION_FAILURE),
- errmsg("test message did not get through on socket for statistics collector")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- test_byte++; /* just make sure variable is changed */
-
-retry2:
- if (recv(pgStatSock, &test_byte, 1, 0) != 1)
- {
- if (errno == EINTR)
- goto retry2; /* if interrupted, just retry */
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not receive test message on socket for statistics collector: %m")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- if (test_byte != TESTBYTEVAL) /* strictly paranoia ... */
- {
- ereport(LOG,
- (errcode(ERRCODE_INTERNAL_ERROR),
- errmsg("incorrect test message transmission on socket for statistics collector")));
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
- continue;
- }
-
- /* If we get here, we have a working socket */
- break;
- }
-
- /* Did we find a working address? */
- if (!addr || pgStatSock == PGINVALID_SOCKET)
- goto startup_failed;
-
- /*
- * Set the socket to non-blocking IO.  This ensures that if the collector
- * falls behind, statistics messages will be discarded; backends won't
- * block waiting to send messages to the collector.
- */
- if (!pg_set_noblock(pgStatSock))
- {
- ereport(LOG,
- (errcode_for_socket_access(),
- errmsg("could not set statistics collector socket to nonblocking mode: %m")));
- goto startup_failed;
- }
-
- /*
- * Try to ensure that the socket's receive buffer is at least
- * PGSTAT_MIN_RCVBUF bytes, so that it won't easily overflow and lose
- * data.  Use of UDP protocol means that we are willing to lose data under
- * heavy load, but we don't want it to happen just because of ridiculously
- * small default buffer sizes (such as 8KB on older Windows versions).
- */
- {
- int old_rcvbuf;
- int new_rcvbuf;
- ACCEPT_TYPE_ARG3 rcvbufsize = sizeof(old_rcvbuf);
-
- if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
-   (char *) &old_rcvbuf, &rcvbufsize) < 0)
- {
- elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
- /* if we can't get existing size, always try to set it */
- old_rcvbuf = 0;
- }
-
- new_rcvbuf = PGSTAT_MIN_RCVBUF;
- if (old_rcvbuf < new_rcvbuf)
- {
- if (setsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
-   (char *) &new_rcvbuf, sizeof(new_rcvbuf)) < 0)
- elog(LOG, "setsockopt(SO_RCVBUF) failed: %m");
- }
- }
-
- pg_freeaddrinfo_all(hints.ai_family, addrs);
-
- return;
-
-startup_failed:
- ereport(LOG,
- (errmsg("disabling statistics collector for lack of working socket")));
-
- if (addrs)
- pg_freeaddrinfo_all(hints.ai_family, addrs);
-
- if (pgStatSock != PGINVALID_SOCKET)
- closesocket(pgStatSock);
- pgStatSock = PGINVALID_SOCKET;
-
- /*
- * Adjust GUC variables to suppress useless activity, and for debugging
- * purposes (seeing track_counts off is a clue that we failed here). We
- * use PGC_S_OVERRIDE because there is no point in trying to turn it back
- * on from postgresql.conf without a restart.
- */
- SetConfigOption("track_counts", "off", PGC_INTERNAL, PGC_S_OVERRIDE);
-}
-
-/*
- * subroutine for pgstat_reset_all
- */
-static void
-pgstat_reset_remove_files(const char *directory)
-{
- DIR   *dir;
- struct dirent *entry;
- char fname[MAXPGPATH * 2];
-
- dir = AllocateDir(directory);
- while ((entry = ReadDir(dir, directory)) != NULL)
- {
- int nchars;
- Oid tmp_oid;
-
- /*
- * Skip directory entries that don't match the file names we write.
- * See get_dbstat_filename for the database-specific pattern.
- */
- if (strncmp(entry->d_name, "global.", 7) == 0)
- nchars = 7;
- else
- {
- nchars = 0;
- (void) sscanf(entry->d_name, "db_%u.%n",
-  &tmp_oid, &nchars);
- if (nchars <= 0)
- continue;
- /* %u allows leading whitespace, so reject that */
- if (strchr("0123456789", entry->d_name[3]) == NULL)
- continue;
- }
-
- if (strcmp(entry->d_name + nchars, "tmp") != 0 &&
- strcmp(entry->d_name + nchars, "stat") != 0)
- continue;
-
- snprintf(fname, sizeof(fname), "%s/%s", directory,
- entry->d_name);
- unlink(fname);
- }
- FreeDir(dir);
-}
-
-/*
- * pgstat_reset_all() -
- *
- * Remove the stats files.  This is currently used only if WAL
- * recovery is needed after a crash.
- */
-void
-pgstat_reset_all(void)
-{
- pgstat_reset_remove_files(pgstat_stat_directory);
- pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
-}
-
-#ifdef EXEC_BACKEND
-
-/*
- * pgstat_forkexec() -
- *
- * Format up the arglist for, then fork and exec, statistics collector process
- */
-static pid_t
-pgstat_forkexec(void)
-{
- char   *av[10];
- int ac = 0;
-
- av[ac++] = "postgres";
- av[ac++] = "--forkcol";
- av[ac++] = NULL; /* filled in by postmaster_forkexec */
-
- av[ac] = NULL;
- Assert(ac < lengthof(av));
-
- return postmaster_forkexec(ac, av);
-}
-#endif /* EXEC_BACKEND */
-
-
-/*
- * pgstat_start() -
- *
- * Called from postmaster at startup or after an existing collector
- * died.  Attempt to fire up a fresh statistics collector.
- *
- * Returns PID of child process, or 0 if fail.
- *
- * Note: if fail, we will be called again from the postmaster main loop.
- */
-int
-pgstat_start(void)
-{
- time_t curtime;
- pid_t pgStatPid;
-
- /*
- * Check that the socket is there, else pgstat_init failed and we can do
- * nothing useful.
- */
- if (pgStatSock == PGINVALID_SOCKET)
- return 0;
-
- /*
- * Do nothing if too soon since last collector start.  This is a safety
- * valve to protect against continuous respawn attempts if the collector
- * is dying immediately at launch.  Note that since we will be re-called
- * from the postmaster main loop, we will get another chance later.
- */
- curtime = time(NULL);
- if ((unsigned int) (curtime - last_pgstat_start_time) <
- (unsigned int) PGSTAT_RESTART_INTERVAL)
- return 0;
- last_pgstat_start_time = curtime;
-
- /*
- * Okay, fork off the collector.
- */
-#ifdef EXEC_BACKEND
- switch ((pgStatPid = pgstat_forkexec()))
-#else
- switch ((pgStatPid = fork_process()))
-#endif
- {
- case -1:
- ereport(LOG,
- (errmsg("could not fork statistics collector: %m")));
- return 0;
-
-#ifndef EXEC_BACKEND
- case 0:
- /* in postmaster child ... */
- InitPostmasterChild();
-
- /* Close the postmaster's sockets */
- ClosePostmasterPorts(false);
-
- /* Drop our connection to postmaster's shared memory, as well */
- dsm_detach_all();
- PGSharedMemoryDetach();
-
- PgstatCollectorMain(0, NULL);
- break;
-#endif
-
- default:
- return (int) pgStatPid;
- }
-
- /* shouldn't get here */
- return 0;
-}
-
-void
-allow_immediate_pgstat_restart(void)
-{
- last_pgstat_start_time = 0;
-}
-
-/* ------------------------------------------------------------
- * Public functions used by backends follow
- *------------------------------------------------------------
- */
-
-
-/* ----------
- * pgstat_report_stat() -
- *
- * Must be called by processes that performs DML: tcop/postgres.c, logical
- * receiver processes, SPI worker, etc. to send the so far collected
- * per-table and function usage statistics to the collector.  Note that this
- * is called only when not within a transaction, so it is fair to use
- * transaction stop time as an approximation of current time.
- * ----------
- */
-void
-pgstat_report_stat(bool force)
-{
- /* we assume this inits to all zeroes: */
- static const PgStat_TableCounts all_zeroes;
- static TimestampTz last_report = 0;
-
- TimestampTz now;
- PgStat_MsgTabstat regular_msg;
- PgStat_MsgTabstat shared_msg;
- TabStatusArray *tsa;
- int i;
-
- /* Don't expend a clock check if nothing to do */
- if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
- pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
- !have_function_stats)
- return;
-
- /*
- * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
- * msec since we last sent one, or the caller wants to force stats out.
- */
- now = GetCurrentTransactionStopTimestamp();
- if (!force &&
- !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
- return;
- last_report = now;
-
- /*
- * Destroy pgStatTabHash before we start invalidating PgStat_TableEntry
- * entries it points to.  (Should we fail partway through the loop below,
- * it's okay to have removed the hashtable already --- the only
- * consequence is we'd get multiple entries for the same table in the
- * pgStatTabList, and that's safe.)
- */
- if (pgStatTabHash)
- hash_destroy(pgStatTabHash);
- pgStatTabHash = NULL;
-
- /*
- * Scan through the TabStatusArray struct(s) to find tables that actually
- * have counts, and build messages to send.  We have to separate shared
- * relations from regular ones because the databaseid field in the message
- * header has to depend on that.
- */
- regular_msg.m_databaseid = MyDatabaseId;
- shared_msg.m_databaseid = InvalidOid;
- regular_msg.m_nentries = 0;
- shared_msg.m_nentries = 0;
-
- for (tsa = pgStatTabList; tsa != NULL; tsa = tsa->tsa_next)
- {
- for (i = 0; i < tsa->tsa_used; i++)
- {
- PgStat_TableStatus *entry = &tsa->tsa_entries[i];
- PgStat_MsgTabstat *this_msg;
- PgStat_TableEntry *this_ent;
-
- /* Shouldn't have any pending transaction-dependent counts */
- Assert(entry->trans == NULL);
-
- /*
- * Ignore entries that didn't accumulate any actual counts, such
- * as indexes that were opened by the planner but not used.
- */
- if (memcmp(&entry->t_counts, &all_zeroes,
-   sizeof(PgStat_TableCounts)) == 0)
- continue;
-
- /*
- * OK, insert data into the appropriate message, and send if full.
- */
- this_msg = entry->t_shared ? &shared_msg : &regular_msg;
- this_ent = &this_msg->m_entry[this_msg->m_nentries];
- this_ent->t_id = entry->t_id;
- memcpy(&this_ent->t_counts, &entry->t_counts,
-   sizeof(PgStat_TableCounts));
- if (++this_msg->m_nentries >= PGSTAT_NUM_TABENTRIES)
- {
- pgstat_send_tabstat(this_msg);
- this_msg->m_nentries = 0;
- }
- }
- /* zero out TableStatus structs after use */
- MemSet(tsa->tsa_entries, 0,
-   tsa->tsa_used * sizeof(PgStat_TableStatus));
- tsa->tsa_used = 0;
- }
-
- /*
- * Send partial messages.  Make sure that any pending xact commit/abort
- * gets counted, even if there are no table stats to send.
- */
- if (regular_msg.m_nentries > 0 ||
- pgStatXactCommit > 0 || pgStatXactRollback > 0)
- pgstat_send_tabstat(&regular_msg);
- if (shared_msg.m_nentries > 0)
- pgstat_send_tabstat(&shared_msg);
-
- /* Now, send function statistics */
- pgstat_send_funcstats();
-}
-
-/*
- * Subroutine for pgstat_report_stat: finish and send a tabstat message
- */
-static void
-pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg)
-{
- int n;
- int len;
-
- /* It's unlikely we'd get here with no socket, but maybe not impossible */
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- /*
- * Report and reset accumulated xact commit/rollback and I/O timings
- * whenever we send a normal tabstat message
- */
- if (OidIsValid(tsmsg->m_databaseid))
- {
- tsmsg->m_xact_commit = pgStatXactCommit;
- tsmsg->m_xact_rollback = pgStatXactRollback;
- tsmsg->m_block_read_time = pgStatBlockReadTime;
- tsmsg->m_block_write_time = pgStatBlockWriteTime;
- pgStatXactCommit = 0;
- pgStatXactRollback = 0;
- pgStatBlockReadTime = 0;
- pgStatBlockWriteTime = 0;
- }
- else
- {
- tsmsg->m_xact_commit = 0;
- tsmsg->m_xact_rollback = 0;
- tsmsg->m_block_read_time = 0;
- tsmsg->m_block_write_time = 0;
- }
-
- n = tsmsg->m_nentries;
- len = offsetof(PgStat_MsgTabstat, m_entry[0]) +
- n * sizeof(PgStat_TableEntry);
-
- pgstat_setheader(&tsmsg->m_hdr, PGSTAT_MTYPE_TABSTAT);
- pgstat_send(tsmsg, len);
-}
-
-/*
- * Subroutine for pgstat_report_stat: populate and send a function stat message
- */
-static void
-pgstat_send_funcstats(void)
-{
- /* we assume this inits to all zeroes: */
- static const PgStat_FunctionCounts all_zeroes;
-
- PgStat_MsgFuncstat msg;
- PgStat_BackendFunctionEntry *entry;
- HASH_SEQ_STATUS fstat;
-
- if (pgStatFunctions == NULL)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_FUNCSTAT);
- msg.m_databaseid = MyDatabaseId;
- msg.m_nentries = 0;
-
- hash_seq_init(&fstat, pgStatFunctions);
- while ((entry = (PgStat_BackendFunctionEntry *) hash_seq_search(&fstat)) != NULL)
- {
- PgStat_FunctionEntry *m_ent;
-
- /* Skip it if no counts accumulated since last time */
- if (memcmp(&entry->f_counts, &all_zeroes,
-   sizeof(PgStat_FunctionCounts)) == 0)
- continue;
-
- /* need to convert format of time accumulators */
- m_ent = &msg.m_entry[msg.m_nentries];
- m_ent->f_id = entry->f_id;
- m_ent->f_numcalls = entry->f_counts.f_numcalls;
- m_ent->f_total_time = INSTR_TIME_GET_MICROSEC(entry->f_counts.f_total_time);
- m_ent->f_self_time = INSTR_TIME_GET_MICROSEC(entry->f_counts.f_self_time);
-
- if (++msg.m_nentries >= PGSTAT_NUM_FUNCENTRIES)
- {
- pgstat_send(&msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) +
- msg.m_nentries * sizeof(PgStat_FunctionEntry));
- msg.m_nentries = 0;
- }
-
- /* reset the entry's counts */
- MemSet(&entry->f_counts, 0, sizeof(PgStat_FunctionCounts));
- }
-
- if (msg.m_nentries > 0)
- pgstat_send(&msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) +
- msg.m_nentries * sizeof(PgStat_FunctionEntry));
-
- have_function_stats = false;
-}
-
-
-/* ----------
- * pgstat_vacuum_stat() -
- *
- * Will tell the collector about objects he can get rid of.
- * ----------
- */
-void
-pgstat_vacuum_stat(void)
-{
- HTAB   *htab;
- PgStat_MsgTabpurge msg;
- PgStat_MsgFuncpurge f_msg;
- HASH_SEQ_STATUS hstat;
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
- PgStat_StatFuncEntry *funcentry;
- int len;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- /*
- * If not done for this transaction, read the statistics collector stats
- * file into some hash tables.
- */
- backend_read_statsfile();
-
- /*
- * Read pg_database and make a list of OIDs of all existing databases
- */
- htab = pgstat_collect_oids(DatabaseRelationId, Anum_pg_database_oid);
-
- /*
- * Search the database hash table for dead databases and tell the
- * collector to drop them.
- */
- hash_seq_init(&hstat, pgStatDBHash);
- while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
- {
- Oid dbid = dbentry->databaseid;
-
- CHECK_FOR_INTERRUPTS();
-
- /* the DB entry for shared tables (with InvalidOid) is never dropped */
- if (OidIsValid(dbid) &&
- hash_search(htab, (void *) &dbid, HASH_FIND, NULL) == NULL)
- pgstat_drop_database(dbid);
- }
-
- /* Clean up */
- hash_destroy(htab);
-
- /*
- * Lookup our own database entry; if not found, nothing more to do.
- */
- dbentry = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- (void *) &MyDatabaseId,
- HASH_FIND, NULL);
- if (dbentry == NULL || dbentry->tables == NULL)
- return;
-
- /*
- * Similarly to above, make a list of all known relations in this DB.
- */
- htab = pgstat_collect_oids(RelationRelationId, Anum_pg_class_oid);
-
- /*
- * Initialize our messages table counter to zero
- */
- msg.m_nentries = 0;
-
- /*
- * Check for all tables listed in stats hashtable if they still exist.
- */
- hash_seq_init(&hstat, dbentry->tables);
- while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&hstat)) != NULL)
- {
- Oid tabid = tabentry->tableid;
-
- CHECK_FOR_INTERRUPTS();
-
- if (hash_search(htab, (void *) &tabid, HASH_FIND, NULL) != NULL)
- continue;
-
- /*
- * Not there, so add this table's Oid to the message
- */
- msg.m_tableid[msg.m_nentries++] = tabid;
-
- /*
- * If the message is full, send it out and reinitialize to empty
- */
- if (msg.m_nentries >= PGSTAT_NUM_TABPURGE)
- {
- len = offsetof(PgStat_MsgTabpurge, m_tableid[0])
- + msg.m_nentries * sizeof(Oid);
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABPURGE);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, len);
-
- msg.m_nentries = 0;
- }
- }
-
- /*
- * Send the rest
- */
- if (msg.m_nentries > 0)
- {
- len = offsetof(PgStat_MsgTabpurge, m_tableid[0])
- + msg.m_nentries * sizeof(Oid);
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABPURGE);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, len);
- }
-
- /* Clean up */
- hash_destroy(htab);
-
- /*
- * Now repeat the above steps for functions.  However, we needn't bother
- * in the common case where no function stats are being collected.
- */
- if (dbentry->functions != NULL &&
- hash_get_num_entries(dbentry->functions) > 0)
- {
- htab = pgstat_collect_oids(ProcedureRelationId, Anum_pg_proc_oid);
-
- pgstat_setheader(&f_msg.m_hdr, PGSTAT_MTYPE_FUNCPURGE);
- f_msg.m_databaseid = MyDatabaseId;
- f_msg.m_nentries = 0;
-
- hash_seq_init(&hstat, dbentry->functions);
- while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&hstat)) != NULL)
- {
- Oid funcid = funcentry->functionid;
-
- CHECK_FOR_INTERRUPTS();
-
- if (hash_search(htab, (void *) &funcid, HASH_FIND, NULL) != NULL)
- continue;
-
- /*
- * Not there, so add this function's Oid to the message
- */
- f_msg.m_functionid[f_msg.m_nentries++] = funcid;
-
- /*
- * If the message is full, send it out and reinitialize to empty
- */
- if (f_msg.m_nentries >= PGSTAT_NUM_FUNCPURGE)
- {
- len = offsetof(PgStat_MsgFuncpurge, m_functionid[0])
- + f_msg.m_nentries * sizeof(Oid);
-
- pgstat_send(&f_msg, len);
-
- f_msg.m_nentries = 0;
- }
- }
-
- /*
- * Send the rest
- */
- if (f_msg.m_nentries > 0)
- {
- len = offsetof(PgStat_MsgFuncpurge, m_functionid[0])
- + f_msg.m_nentries * sizeof(Oid);
-
- pgstat_send(&f_msg, len);
- }
-
- hash_destroy(htab);
- }
-}
-
-
-/* ----------
- * pgstat_collect_oids() -
- *
- * Collect the OIDs of all objects listed in the specified system catalog
- * into a temporary hash table.  Caller should hash_destroy the result
- * when done with it.  (However, we make the table in CurrentMemoryContext
- * so that it will be freed properly in event of an error.)
- * ----------
- */
-static HTAB *
-pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid)
-{
- HTAB   *htab;
- HASHCTL hash_ctl;
- Relation rel;
- HeapScanDesc scan;
- HeapTuple tup;
- Snapshot snapshot;
-
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(Oid);
- hash_ctl.hcxt = CurrentMemoryContext;
- htab = hash_create("Temporary table of OIDs",
-   PGSTAT_TAB_HASH_SIZE,
-   &hash_ctl,
-   HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
- rel = heap_open(catalogid, AccessShareLock);
- snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = heap_beginscan(rel, snapshot, 0, NULL);
- while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
- {
- Oid thisoid;
- bool isnull;
-
- thisoid = heap_getattr(tup, anum_oid, RelationGetDescr(rel), &isnull);
- Assert(!isnull);
-
- CHECK_FOR_INTERRUPTS();
-
- (void) hash_search(htab, (void *) &thisoid, HASH_ENTER, NULL);
- }
- heap_endscan(scan);
- UnregisterSnapshot(snapshot);
- heap_close(rel, AccessShareLock);
-
- return htab;
-}
-
-
-/* ----------
- * pgstat_drop_database() -
- *
- * Tell the collector that we just dropped a database.
- * (If the message gets lost, we will still clean the dead DB eventually
- * via future invocations of pgstat_vacuum_stat().)
- * ----------
- */
-void
-pgstat_drop_database(Oid databaseid)
-{
- PgStat_MsgDropdb msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_DROPDB);
- msg.m_databaseid = databaseid;
- pgstat_send(&msg, sizeof(msg));
-}
-
-
-/* ----------
- * pgstat_drop_relation() -
- *
- * Tell the collector that we just dropped a relation.
- * (If the message gets lost, we will still clean the dead entry eventually
- * via future invocations of pgstat_vacuum_stat().)
- *
- * Currently not used for lack of any good place to call it; we rely
- * entirely on pgstat_vacuum_stat() to clean out stats for dead rels.
- * ----------
- */
-#ifdef NOT_USED
-void
-pgstat_drop_relation(Oid relid)
-{
- PgStat_MsgTabpurge msg;
- int len;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- msg.m_tableid[0] = relid;
- msg.m_nentries = 1;
-
- len = offsetof(PgStat_MsgTabpurge, m_tableid[0]) + sizeof(Oid);
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABPURGE);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, len);
-}
-#endif /* NOT_USED */
-
-
-/* ----------
- * pgstat_reset_counters() -
- *
- * Tell the statistics collector to reset counters for our database.
- *
- * Permission checking for this function is managed through the normal
- * GRANT system.
- * ----------
- */
-void
-pgstat_reset_counters(void)
-{
- PgStat_MsgResetcounter msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETCOUNTER);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* ----------
- * pgstat_reset_shared_counters() -
- *
- * Tell the statistics collector to reset cluster-wide shared counters.
- *
- * Permission checking for this function is managed through the normal
- * GRANT system.
- * ----------
- */
-void
-pgstat_reset_shared_counters(const char *target)
-{
- PgStat_MsgResetsharedcounter msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- if (strcmp(target, "archiver") == 0)
- msg.m_resettarget = RESET_ARCHIVER;
- else if (strcmp(target, "bgwriter") == 0)
- msg.m_resettarget = RESET_BGWRITER;
- else
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized reset target: \"%s\"", target),
- errhint("Target must be \"archiver\" or \"bgwriter\".")));
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* ----------
- * pgstat_reset_single_counter() -
- *
- * Tell the statistics collector to reset a single counter.
- *
- * Permission checking for this function is managed through the normal
- * GRANT system.
- * ----------
- */
-void
-pgstat_reset_single_counter(Oid objoid, PgStat_Single_Reset_Type type)
-{
- PgStat_MsgResetsinglecounter msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSINGLECOUNTER);
- msg.m_databaseid = MyDatabaseId;
- msg.m_resettype = type;
- msg.m_objectid = objoid;
-
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* ----------
- * pgstat_report_autovac() -
- *
- * Called from autovacuum.c to report startup of an autovacuum process.
- * We are called before InitPostgres is done, so can't rely on MyDatabaseId;
- * the db OID must be passed in, instead.
- * ----------
- */
-void
-pgstat_report_autovac(Oid dboid)
-{
- PgStat_MsgAutovacStart msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_AUTOVAC_START);
- msg.m_databaseid = dboid;
- msg.m_start_time = GetCurrentTimestamp();
-
- pgstat_send(&msg, sizeof(msg));
-}
-
-
-/* ---------
- * pgstat_report_vacuum() -
- *
- * Tell the collector about the table we just vacuumed.
- * ---------
- */
-void
-pgstat_report_vacuum(Oid tableoid, bool shared,
- PgStat_Counter livetuples, PgStat_Counter deadtuples)
-{
- PgStat_MsgVacuum msg;
-
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUM);
- msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
- msg.m_tableoid = tableoid;
- msg.m_autovacuum = IsAutoVacuumWorkerProcess();
- msg.m_vacuumtime = GetCurrentTimestamp();
- msg.m_live_tuples = livetuples;
- msg.m_dead_tuples = deadtuples;
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* --------
- * pgstat_report_analyze() -
- *
- * Tell the collector about the table we just analyzed.
- *
- * Caller must provide new live- and dead-tuples estimates, as well as a
- * flag indicating whether to reset the changes_since_analyze counter.
- * --------
- */
-void
-pgstat_report_analyze(Relation rel,
-  PgStat_Counter livetuples, PgStat_Counter deadtuples,
-  bool resetcounter)
-{
- PgStat_MsgAnalyze msg;
-
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- return;
-
- /*
- * Unlike VACUUM, ANALYZE might be running inside a transaction that has
- * already inserted and/or deleted rows in the target table. ANALYZE will
- * have counted such rows as live or dead respectively. Because we will
- * report our counts of such rows at transaction end, we should subtract
- * off these counts from what we send to the collector now, else they'll
- * be double-counted after commit.  (This approach also ensures that the
- * collector ends up with the right numbers if we abort instead of
- * committing.)
- */
- if (rel->pgstat_info != NULL)
- {
- PgStat_TableXactStatus *trans;
-
- for (trans = rel->pgstat_info->trans; trans; trans = trans->upper)
- {
- livetuples -= trans->tuples_inserted - trans->tuples_deleted;
- deadtuples -= trans->tuples_updated + trans->tuples_deleted;
- }
- /* count stuff inserted by already-aborted subxacts, too */
- deadtuples -= rel->pgstat_info->t_counts.t_delta_dead_tuples;
- /* Since ANALYZE's counts are estimates, we could have underflowed */
- livetuples = Max(livetuples, 0);
- deadtuples = Max(deadtuples, 0);
- }
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ANALYZE);
- msg.m_databaseid = rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId;
- msg.m_tableoid = RelationGetRelid(rel);
- msg.m_autovacuum = IsAutoVacuumWorkerProcess();
- msg.m_resetcounter = resetcounter;
- msg.m_analyzetime = GetCurrentTimestamp();
- msg.m_live_tuples = livetuples;
- msg.m_dead_tuples = deadtuples;
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* --------
- * pgstat_report_recovery_conflict() -
- *
- * Tell the collector about a Hot Standby recovery conflict.
- * --------
- */
-void
-pgstat_report_recovery_conflict(int reason)
-{
- PgStat_MsgRecoveryConflict msg;
-
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RECOVERYCONFLICT);
- msg.m_databaseid = MyDatabaseId;
- msg.m_reason = reason;
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* --------
- * pgstat_report_deadlock() -
- *
- * Tell the collector about a deadlock detected.
- * --------
- */
-void
-pgstat_report_deadlock(void)
-{
- PgStat_MsgDeadlock msg;
-
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_DEADLOCK);
- msg.m_databaseid = MyDatabaseId;
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* --------
- * pgstat_report_tempfile() -
- *
- * Tell the collector about a temporary file.
- * --------
- */
-void
-pgstat_report_tempfile(size_t filesize)
-{
- PgStat_MsgTempFile msg;
-
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TEMPFILE);
- msg.m_databaseid = MyDatabaseId;
- msg.m_filesize = filesize;
- pgstat_send(&msg, sizeof(msg));
-}
-
-
-/* ----------
- * pgstat_ping() -
- *
- * Send some junk data to the collector to increase traffic.
- * ----------
- */
-void
-pgstat_ping(void)
-{
- PgStat_MsgDummy msg;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_DUMMY);
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* ----------
- * pgstat_send_inquiry() -
- *
- * Notify collector that we need fresh data.
- * ----------
- */
-static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
-{
- PgStat_MsgInquiry msg;
-
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
- msg.clock_time = clock_time;
- msg.cutoff_time = cutoff_time;
- msg.databaseid = databaseid;
- pgstat_send(&msg, sizeof(msg));
-}
-
-
-/*
- * Initialize function call usage data.
- * Called by the executor before invoking a function.
- */
-void
-pgstat_init_function_usage(FunctionCallInfoData *fcinfo,
-   PgStat_FunctionCallUsage *fcu)
-{
- PgStat_BackendFunctionEntry *htabent;
- bool found;
-
- if (pgstat_track_functions <= fcinfo->flinfo->fn_stats)
- {
- /* stats not wanted */
- fcu->fs = NULL;
- return;
- }
-
- if (!pgStatFunctions)
- {
- /* First time through - initialize function stat table */
- HASHCTL hash_ctl;
-
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_BackendFunctionEntry);
- pgStatFunctions = hash_create("Function stat entries",
-  PGSTAT_FUNCTION_HASH_SIZE,
-  &hash_ctl,
-  HASH_ELEM | HASH_BLOBS);
- }
-
- /* Get the stats entry for this function, create if necessary */
- htabent = hash_search(pgStatFunctions, &fcinfo->flinfo->fn_oid,
-  HASH_ENTER, &found);
- if (!found)
- MemSet(&htabent->f_counts, 0, sizeof(PgStat_FunctionCounts));
-
- fcu->fs = &htabent->f_counts;
-
- /* save stats for this function, later used to compensate for recursion */
- fcu->save_f_total_time = htabent->f_counts.f_total_time;
-
- /* save current backend-wide total time */
- fcu->save_total = total_func_time;
-
- /* get clock time as of function start */
- INSTR_TIME_SET_CURRENT(fcu->f_start);
-}
-
-/*
- * find_funcstat_entry - find any existing PgStat_BackendFunctionEntry entry
- * for specified function
- *
- * If no entry, return NULL, don't create a new one
- */
-PgStat_BackendFunctionEntry *
-find_funcstat_entry(Oid func_id)
-{
- if (pgStatFunctions == NULL)
- return NULL;
-
- return (PgStat_BackendFunctionEntry *) hash_search(pgStatFunctions,
-   (void *) &func_id,
-   HASH_FIND, NULL);
-}
-
-/*
- * Calculate function call usage and update stat counters.
- * Called by the executor after invoking a function.
- *
- * In the case of a set-returning function that runs in value-per-call mode,
- * we will see multiple pgstat_init_function_usage/pgstat_end_function_usage
- * calls for what the user considers a single call of the function.  The
- * finalize flag should be TRUE on the last call.
- */
-void
-pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
-{
- PgStat_FunctionCounts *fs = fcu->fs;
- instr_time f_total;
- instr_time f_others;
- instr_time f_self;
-
- /* stats not wanted? */
- if (fs == NULL)
- return;
-
- /* total elapsed time in this function call */
- INSTR_TIME_SET_CURRENT(f_total);
- INSTR_TIME_SUBTRACT(f_total, fcu->f_start);
-
- /* self usage: elapsed minus anything already charged to other calls */
- f_others = total_func_time;
- INSTR_TIME_SUBTRACT(f_others, fcu->save_total);
- f_self = f_total;
- INSTR_TIME_SUBTRACT(f_self, f_others);
-
- /* update backend-wide total time */
- INSTR_TIME_ADD(total_func_time, f_self);
-
- /*
- * Compute the new f_total_time as the total elapsed time added to the
- * pre-call value of f_total_time.  This is necessary to avoid
- * double-counting any time taken by recursive calls of myself.  (We do
- * not need any similar kluge for self time, since that already excludes
- * any recursive calls.)
- */
- INSTR_TIME_ADD(f_total, fcu->save_f_total_time);
-
- /* update counters in function stats table */
- if (finalize)
- fs->f_numcalls++;
- fs->f_total_time = f_total;
- INSTR_TIME_ADD(fs->f_self_time, f_self);
-
- /* indicate that we have something to send */
- have_function_stats = true;
-}
-
-
-/* ----------
- * pgstat_initstats() -
- *
- * Initialize a relcache entry to count access statistics.
- * Called whenever a relation is opened.
- *
- * We assume that a relcache entry's pgstat_info field is zeroed by
- * relcache.c when the relcache entry is made; thereafter it is long-lived
- * data.  We can avoid repeated searches of the TabStatus arrays when the
- * same relation is touched repeatedly within a transaction.
- * ----------
- */
-void
-pgstat_initstats(Relation rel)
-{
- Oid rel_id = rel->rd_id;
- char relkind = rel->rd_rel->relkind;
-
- /* We only count stats for things that have storage */
- if (!(relkind == RELKIND_RELATION ||
-  relkind == RELKIND_MATVIEW ||
-  relkind == RELKIND_INDEX ||
-  relkind == RELKIND_TOASTVALUE ||
-  relkind == RELKIND_SEQUENCE))
- {
- rel->pgstat_info = NULL;
- return;
- }
-
- if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
- {
- /* We're not counting at all */
- rel->pgstat_info = NULL;
- return;
- }
-
- /*
- * If we already set up this relation in the current transaction, nothing
- * to do.
- */
- if (rel->pgstat_info != NULL &&
- rel->pgstat_info->t_id == rel_id)
- return;
-
- /* Else find or make the PgStat_TableStatus entry, and update link */
- rel->pgstat_info = get_tabstat_entry(rel_id, rel->rd_rel->relisshared);
-}
-
-/*
- * get_tabstat_entry - find or create a PgStat_TableStatus entry for rel
- */
-static PgStat_TableStatus *
-get_tabstat_entry(Oid rel_id, bool isshared)
-{
- TabStatHashEntry *hash_entry;
- PgStat_TableStatus *entry;
- TabStatusArray *tsa;
- bool found;
-
- /*
- * Create hash table if we don't have it already.
- */
- if (pgStatTabHash == NULL)
- {
- HASHCTL ctl;
-
- memset(&ctl, 0, sizeof(ctl));
- ctl.keysize = sizeof(Oid);
- ctl.entrysize = sizeof(TabStatHashEntry);
-
- pgStatTabHash = hash_create("pgstat TabStatusArray lookup hash table",
- TABSTAT_QUANTUM,
- &ctl,
- HASH_ELEM | HASH_BLOBS);
- }
-
- /*
- * Find an entry or create a new one.
- */
- hash_entry = hash_search(pgStatTabHash, &rel_id, HASH_ENTER, &found);
- if (!found)
- {
- /* initialize new entry with null pointer */
- hash_entry->tsa_entry = NULL;
- }
-
- /*
- * If entry is already valid, we're done.
- */
- if (hash_entry->tsa_entry)
- return hash_entry->tsa_entry;
-
- /*
- * Locate the first pgStatTabList entry with free space, making a new list
- * entry if needed.  Note that we could get an OOM failure here, but if so
- * we have left the hashtable and the list in a consistent state.
- */
- if (pgStatTabList == NULL)
- {
- /* Set up first pgStatTabList entry */
- pgStatTabList = (TabStatusArray *)
- MemoryContextAllocZero(TopMemoryContext,
-   sizeof(TabStatusArray));
- }
-
- tsa = pgStatTabList;
- while (tsa->tsa_used >= TABSTAT_QUANTUM)
- {
- if (tsa->tsa_next == NULL)
- tsa->tsa_next = (TabStatusArray *)
- MemoryContextAllocZero(TopMemoryContext,
-   sizeof(TabStatusArray));
- tsa = tsa->tsa_next;
- }
-
- /*
- * Allocate a PgStat_TableStatus entry within this list entry.  We assume
- * the entry was already zeroed, either at creation or after last use.
- */
- entry = &tsa->tsa_entries[tsa->tsa_used++];
- entry->t_id = rel_id;
- entry->t_shared = isshared;
-
- /*
- * Now we can fill the entry in pgStatTabHash.
- */
- hash_entry->tsa_entry = entry;
-
- return entry;
-}
-
-/*
- * find_tabstat_entry - find any existing PgStat_TableStatus entry for rel
- *
- * If no entry, return NULL, don't create a new one
- *
- * Note: if we got an error in the most recent execution of pgstat_report_stat,
- * it's possible that an entry exists but there's no hashtable entry for it.
- * That's okay, we'll treat this case as "doesn't exist".
- */
-PgStat_TableStatus *
-find_tabstat_entry(Oid rel_id)
-{
- TabStatHashEntry *hash_entry;
-
- /* If hashtable doesn't exist, there are no entries at all */
- if (!pgStatTabHash)
- return NULL;
-
- hash_entry = hash_search(pgStatTabHash, &rel_id, HASH_FIND, NULL);
- if (!hash_entry)
- return NULL;
-
- /* Note that this step could also return NULL, but that's correct */
- return hash_entry->tsa_entry;
-}
-
-/*
- * get_tabstat_stack_level - add a new (sub)transaction stack entry if needed
- */
-static PgStat_SubXactStatus *
-get_tabstat_stack_level(int nest_level)
-{
- PgStat_SubXactStatus *xact_state;
-
- xact_state = pgStatXactStack;
- if (xact_state == NULL || xact_state->nest_level != nest_level)
- {
- xact_state = (PgStat_SubXactStatus *)
- MemoryContextAlloc(TopTransactionContext,
-   sizeof(PgStat_SubXactStatus));
- xact_state->nest_level = nest_level;
- xact_state->prev = pgStatXactStack;
- xact_state->first = NULL;
- pgStatXactStack = xact_state;
- }
- return xact_state;
-}
-
-/*
- * add_tabstat_xact_level - add a new (sub)transaction state record
- */
-static void
-add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level)
-{
- PgStat_SubXactStatus *xact_state;
- PgStat_TableXactStatus *trans;
-
- /*
- * If this is the first rel to be modified at the current nest level, we
- * first have to push a transaction stack entry.
- */
- xact_state = get_tabstat_stack_level(nest_level);
-
- /* Now make a per-table stack entry */
- trans = (PgStat_TableXactStatus *)
- MemoryContextAllocZero(TopTransactionContext,
-   sizeof(PgStat_TableXactStatus));
- trans->nest_level = nest_level;
- trans->upper = pgstat_info->trans;
- trans->parent = pgstat_info;
- trans->next = xact_state->first;
- xact_state->first = trans;
- pgstat_info->trans = trans;
-}
-
-/*
- * pgstat_count_heap_insert - count a tuple insertion of n tuples
- */
-void
-pgstat_count_heap_insert(Relation rel, PgStat_Counter n)
-{
- PgStat_TableStatus *pgstat_info = rel->pgstat_info;
-
- if (pgstat_info != NULL)
- {
- /* We have to log the effect at the proper transactional level */
- int nest_level = GetCurrentTransactionNestLevel();
-
- if (pgstat_info->trans == NULL ||
- pgstat_info->trans->nest_level != nest_level)
- add_tabstat_xact_level(pgstat_info, nest_level);
-
- pgstat_info->trans->tuples_inserted += n;
- }
-}
-
-/*
- * pgstat_count_heap_update - count a tuple update
- */
-void
-pgstat_count_heap_update(Relation rel, bool hot)
-{
- PgStat_TableStatus *pgstat_info = rel->pgstat_info;
-
- if (pgstat_info != NULL)
- {
- /* We have to log the effect at the proper transactional level */
- int nest_level = GetCurrentTransactionNestLevel();
-
- if (pgstat_info->trans == NULL ||
- pgstat_info->trans->nest_level != nest_level)
- add_tabstat_xact_level(pgstat_info, nest_level);
-
- pgstat_info->trans->tuples_updated++;
-
- /* t_tuples_hot_updated is nontransactional, so just advance it */
- if (hot)
- pgstat_info->t_counts.t_tuples_hot_updated++;
- }
-}
-
-/*
- * pgstat_count_heap_delete - count a tuple deletion
- */
-void
-pgstat_count_heap_delete(Relation rel)
-{
- PgStat_TableStatus *pgstat_info = rel->pgstat_info;
-
- if (pgstat_info != NULL)
- {
- /* We have to log the effect at the proper transactional level */
- int nest_level = GetCurrentTransactionNestLevel();
-
- if (pgstat_info->trans == NULL ||
- pgstat_info->trans->nest_level != nest_level)
- add_tabstat_xact_level(pgstat_info, nest_level);
-
- pgstat_info->trans->tuples_deleted++;
- }
-}
-
-/*
- * pgstat_truncate_save_counters
- *
- * Whenever a table is truncated, we save its i/u/d counters so that they can
- * be cleared, and if the (sub)xact that executed the truncate later aborts,
- * the counters can be restored to the saved (pre-truncate) values.  Note we do
- * this on the first truncate in any particular subxact level only.
- */
-static void
-pgstat_truncate_save_counters(PgStat_TableXactStatus *trans)
-{
- if (!trans->truncated)
- {
- trans->inserted_pre_trunc = trans->tuples_inserted;
- trans->updated_pre_trunc = trans->tuples_updated;
- trans->deleted_pre_trunc = trans->tuples_deleted;
- trans->truncated = true;
- }
-}
-
-/*
- * pgstat_truncate_restore_counters - restore counters when a truncate aborts
- */
-static void
-pgstat_truncate_restore_counters(PgStat_TableXactStatus *trans)
-{
- if (trans->truncated)
- {
- trans->tuples_inserted = trans->inserted_pre_trunc;
- trans->tuples_updated = trans->updated_pre_trunc;
- trans->tuples_deleted = trans->deleted_pre_trunc;
- }
-}
-
-/*
- * pgstat_count_truncate - update tuple counters due to truncate
- */
-void
-pgstat_count_truncate(Relation rel)
-{
- PgStat_TableStatus *pgstat_info = rel->pgstat_info;
-
- if (pgstat_info != NULL)
- {
- /* We have to log the effect at the proper transactional level */
- int nest_level = GetCurrentTransactionNestLevel();
-
- if (pgstat_info->trans == NULL ||
- pgstat_info->trans->nest_level != nest_level)
- add_tabstat_xact_level(pgstat_info, nest_level);
-
- pgstat_truncate_save_counters(pgstat_info->trans);
- pgstat_info->trans->tuples_inserted = 0;
- pgstat_info->trans->tuples_updated = 0;
- pgstat_info->trans->tuples_deleted = 0;
- }
-}
-
-/*
- * pgstat_update_heap_dead_tuples - update dead-tuples count
- *
- * The semantics of this are that we are reporting the nontransactional
- * recovery of "delta" dead tuples; so t_delta_dead_tuples decreases
- * rather than increasing, and the change goes straight into the per-table
- * counter, not into transactional state.
- */
-void
-pgstat_update_heap_dead_tuples(Relation rel, int delta)
-{
- PgStat_TableStatus *pgstat_info = rel->pgstat_info;
-
- if (pgstat_info != NULL)
- pgstat_info->t_counts.t_delta_dead_tuples -= delta;
-}
-
-
-/* ----------
- * AtEOXact_PgStat
- *
- * Called from access/transam/xact.c at top-level transaction commit/abort.
- * ----------
- */
-void
-AtEOXact_PgStat(bool isCommit)
-{
- PgStat_SubXactStatus *xact_state;
-
- /*
- * Count transaction commit or abort.  (We use counters, not just bools,
- * in case the reporting message isn't sent right away.)
- */
- if (isCommit)
- pgStatXactCommit++;
- else
- pgStatXactRollback++;
-
- /*
- * Transfer transactional insert/update counts into the base tabstat
- * entries.  We don't bother to free any of the transactional state, since
- * it's all in TopTransactionContext and will go away anyway.
- */
- xact_state = pgStatXactStack;
- if (xact_state != NULL)
- {
- PgStat_TableXactStatus *trans;
-
- Assert(xact_state->nest_level == 1);
- Assert(xact_state->prev == NULL);
- for (trans = xact_state->first; trans != NULL; trans = trans->next)
- {
- PgStat_TableStatus *tabstat;
-
- Assert(trans->nest_level == 1);
- Assert(trans->upper == NULL);
- tabstat = trans->parent;
- Assert(tabstat->trans == trans);
- /* restore pre-truncate stats (if any) in case of aborted xact */
- if (!isCommit)
- pgstat_truncate_restore_counters(trans);
- /* count attempted actions regardless of commit/abort */
- tabstat->t_counts.t_tuples_inserted += trans->tuples_inserted;
- tabstat->t_counts.t_tuples_updated += trans->tuples_updated;
- tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted;
- if (isCommit)
- {
- tabstat->t_counts.t_truncated = trans->truncated;
- if (trans->truncated)
- {
- /* forget live/dead stats seen by backend thus far */
- tabstat->t_counts.t_delta_live_tuples = 0;
- tabstat->t_counts.t_delta_dead_tuples = 0;
- }
- /* insert adds a live tuple, delete removes one */
- tabstat->t_counts.t_delta_live_tuples +=
- trans->tuples_inserted - trans->tuples_deleted;
- /* update and delete each create a dead tuple */
- tabstat->t_counts.t_delta_dead_tuples +=
- trans->tuples_updated + trans->tuples_deleted;
- /* insert, update, delete each count as one change event */
- tabstat->t_counts.t_changed_tuples +=
- trans->tuples_inserted + trans->tuples_updated +
- trans->tuples_deleted;
- }
- else
- {
- /* inserted tuples are dead, deleted tuples are unaffected */
- tabstat->t_counts.t_delta_dead_tuples +=
- trans->tuples_inserted + trans->tuples_updated;
- /* an aborted xact generates no changed_tuple events */
- }
- tabstat->trans = NULL;
- }
- }
- pgStatXactStack = NULL;
-
- /* Make sure any stats snapshot is thrown away */
- pgstat_clear_snapshot();
-}
-
-/* ----------
- * AtEOSubXact_PgStat
- *
- * Called from access/transam/xact.c at subtransaction commit/abort.
- * ----------
- */
-void
-AtEOSubXact_PgStat(bool isCommit, int nestDepth)
-{
- PgStat_SubXactStatus *xact_state;
-
- /*
- * Transfer transactional insert/update counts into the next higher
- * subtransaction state.
- */
- xact_state = pgStatXactStack;
- if (xact_state != NULL &&
- xact_state->nest_level >= nestDepth)
- {
- PgStat_TableXactStatus *trans;
- PgStat_TableXactStatus *next_trans;
-
- /* delink xact_state from stack immediately to simplify reuse case */
- pgStatXactStack = xact_state->prev;
-
- for (trans = xact_state->first; trans != NULL; trans = next_trans)
- {
- PgStat_TableStatus *tabstat;
-
- next_trans = trans->next;
- Assert(trans->nest_level == nestDepth);
- tabstat = trans->parent;
- Assert(tabstat->trans == trans);
- if (isCommit)
- {
- if (trans->upper && trans->upper->nest_level == nestDepth - 1)
- {
- if (trans->truncated)
- {
- /* propagate the truncate status one level up */
- pgstat_truncate_save_counters(trans->upper);
- /* replace upper xact stats with ours */
- trans->upper->tuples_inserted = trans->tuples_inserted;
- trans->upper->tuples_updated = trans->tuples_updated;
- trans->upper->tuples_deleted = trans->tuples_deleted;
- }
- else
- {
- trans->upper->tuples_inserted += trans->tuples_inserted;
- trans->upper->tuples_updated += trans->tuples_updated;
- trans->upper->tuples_deleted += trans->tuples_deleted;
- }
- tabstat->trans = trans->upper;
- pfree(trans);
- }
- else
- {
- /*
- * When there isn't an immediate parent state, we can just
- * reuse the record instead of going through a
- * palloc/pfree pushup (this works since it's all in
- * TopTransactionContext anyway).  We have to re-link it
- * into the parent level, though, and that might mean
- * pushing a new entry into the pgStatXactStack.
- */
- PgStat_SubXactStatus *upper_xact_state;
-
- upper_xact_state = get_tabstat_stack_level(nestDepth - 1);
- trans->next = upper_xact_state->first;
- upper_xact_state->first = trans;
- trans->nest_level = nestDepth - 1;
- }
- }
- else
- {
- /*
- * On abort, update top-level tabstat counts, then forget the
- * subtransaction
- */
-
- /* first restore values obliterated by truncate */
- pgstat_truncate_restore_counters(trans);
- /* count attempted actions regardless of commit/abort */
- tabstat->t_counts.t_tuples_inserted += trans->tuples_inserted;
- tabstat->t_counts.t_tuples_updated += trans->tuples_updated;
- tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted;
- /* inserted tuples are dead, deleted tuples are unaffected */
- tabstat->t_counts.t_delta_dead_tuples +=
- trans->tuples_inserted + trans->tuples_updated;
- tabstat->trans = trans->upper;
- pfree(trans);
- }
- }
- pfree(xact_state);
- }
-}
-
-
-/*
- * AtPrepare_PgStat
- * Save the transactional stats state at 2PC transaction prepare.
- *
- * In this phase we just generate 2PC records for all the pending
- * transaction-dependent stats work.
- */
-void
-AtPrepare_PgStat(void)
-{
- PgStat_SubXactStatus *xact_state;
-
- xact_state = pgStatXactStack;
- if (xact_state != NULL)
- {
- PgStat_TableXactStatus *trans;
-
- Assert(xact_state->nest_level == 1);
- Assert(xact_state->prev == NULL);
- for (trans = xact_state->first; trans != NULL; trans = trans->next)
- {
- PgStat_TableStatus *tabstat;
- TwoPhasePgStatRecord record;
-
- Assert(trans->nest_level == 1);
- Assert(trans->upper == NULL);
- tabstat = trans->parent;
- Assert(tabstat->trans == trans);
-
- record.tuples_inserted = trans->tuples_inserted;
- record.tuples_updated = trans->tuples_updated;
- record.tuples_deleted = trans->tuples_deleted;
- record.inserted_pre_trunc = trans->inserted_pre_trunc;
- record.updated_pre_trunc = trans->updated_pre_trunc;
- record.deleted_pre_trunc = trans->deleted_pre_trunc;
- record.t_id = tabstat->t_id;
- record.t_shared = tabstat->t_shared;
- record.t_truncated = trans->truncated;
-
- RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
-   &record, sizeof(TwoPhasePgStatRecord));
- }
- }
-}
-
-/*
- * PostPrepare_PgStat
- * Clean up after successful PREPARE.
- *
- * All we need do here is unlink the transaction stats state from the
- * nontransactional state.  The nontransactional action counts will be
- * reported to the stats collector immediately, while the effects on live
- * and dead tuple counts are preserved in the 2PC state file.
- *
- * Note: AtEOXact_PgStat is not called during PREPARE.
- */
-void
-PostPrepare_PgStat(void)
-{
- PgStat_SubXactStatus *xact_state;
-
- /*
- * We don't bother to free any of the transactional state, since it's all
- * in TopTransactionContext and will go away anyway.
- */
- xact_state = pgStatXactStack;
- if (xact_state != NULL)
- {
- PgStat_TableXactStatus *trans;
-
- for (trans = xact_state->first; trans != NULL; trans = trans->next)
- {
- PgStat_TableStatus *tabstat;
-
- tabstat = trans->parent;
- tabstat->trans = NULL;
- }
- }
- pgStatXactStack = NULL;
-
- /* Make sure any stats snapshot is thrown away */
- pgstat_clear_snapshot();
-}
-
-/*
- * 2PC processing routine for COMMIT PREPARED case.
- *
- * Load the saved counts into our local pgstats state.
- */
-void
-pgstat_twophase_postcommit(TransactionId xid, uint16 info,
-   void *recdata, uint32 len)
-{
- TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
- PgStat_TableStatus *pgstat_info;
-
- /* Find or create a tabstat entry for the rel */
- pgstat_info = get_tabstat_entry(rec->t_id, rec->t_shared);
-
- /* Same math as in AtEOXact_PgStat, commit case */
- pgstat_info->t_counts.t_tuples_inserted += rec->tuples_inserted;
- pgstat_info->t_counts.t_tuples_updated += rec->tuples_updated;
- pgstat_info->t_counts.t_tuples_deleted += rec->tuples_deleted;
- pgstat_info->t_counts.t_truncated = rec->t_truncated;
- if (rec->t_truncated)
- {
- /* forget live/dead stats seen by backend thus far */
- pgstat_info->t_counts.t_delta_live_tuples = 0;
- pgstat_info->t_counts.t_delta_dead_tuples = 0;
- }
- pgstat_info->t_counts.t_delta_live_tuples +=
- rec->tuples_inserted - rec->tuples_deleted;
- pgstat_info->t_counts.t_delta_dead_tuples +=
- rec->tuples_updated + rec->tuples_deleted;
- pgstat_info->t_counts.t_changed_tuples +=
- rec->tuples_inserted + rec->tuples_updated +
- rec->tuples_deleted;
-}
-
-/*
- * 2PC processing routine for ROLLBACK PREPARED case.
- *
- * Load the saved counts into our local pgstats state, but treat them
- * as aborted.
- */
-void
-pgstat_twophase_postabort(TransactionId xid, uint16 info,
-  void *recdata, uint32 len)
-{
- TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
- PgStat_TableStatus *pgstat_info;
-
- /* Find or create a tabstat entry for the rel */
- pgstat_info = get_tabstat_entry(rec->t_id, rec->t_shared);
-
- /* Same math as in AtEOXact_PgStat, abort case */
- if (rec->t_truncated)
- {
- rec->tuples_inserted = rec->inserted_pre_trunc;
- rec->tuples_updated = rec->updated_pre_trunc;
- rec->tuples_deleted = rec->deleted_pre_trunc;
- }
- pgstat_info->t_counts.t_tuples_inserted += rec->tuples_inserted;
- pgstat_info->t_counts.t_tuples_updated += rec->tuples_updated;
- pgstat_info->t_counts.t_tuples_deleted += rec->tuples_deleted;
- pgstat_info->t_counts.t_delta_dead_tuples +=
- rec->tuples_inserted + rec->tuples_updated;
-}
-
-
-/* ----------
- * pgstat_fetch_stat_dbentry() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * the collected statistics for one database or NULL. NULL doesn't mean
- * that the database doesn't exist, it is just not yet known by the
- * collector, so the caller is better off to report ZERO instead.
- * ----------
- */
-PgStat_StatDBEntry *
-pgstat_fetch_stat_dbentry(Oid dbid)
-{
- /*
- * If not done for this transaction, read the statistics collector stats
- * file into some hash tables.
- */
- backend_read_statsfile();
-
- /*
- * Lookup the requested database; return NULL if not found
- */
- return (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
-  (void *) &dbid,
-  HASH_FIND, NULL);
-}
-
-
-/* ----------
- * pgstat_fetch_stat_tabentry() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * the collected statistics for one table or NULL. NULL doesn't mean
- * that the table doesn't exist, it is just not yet known by the
- * collector, so the caller is better off to report ZERO instead.
- * ----------
- */
-PgStat_StatTabEntry *
-pgstat_fetch_stat_tabentry(Oid relid)
-{
- Oid dbid;
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
-
- /*
- * If not done for this transaction, read the statistics collector stats
- * file into some hash tables.
- */
- backend_read_statsfile();
-
- /*
- * Lookup our database, then look in its table hash table.
- */
- dbid = MyDatabaseId;
- dbentry = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- (void *) &dbid,
- HASH_FIND, NULL);
- if (dbentry != NULL && dbentry->tables != NULL)
- {
- tabentry = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
-   (void *) &relid,
-   HASH_FIND, NULL);
- if (tabentry)
- return tabentry;
- }
-
- /*
- * If we didn't find it, maybe it's a shared table.
- */
- dbid = InvalidOid;
- dbentry = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- (void *) &dbid,
- HASH_FIND, NULL);
- if (dbentry != NULL && dbentry->tables != NULL)
- {
- tabentry = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
-   (void *) &relid,
-   HASH_FIND, NULL);
- if (tabentry)
- return tabentry;
- }
-
- return NULL;
-}
-
-
-/* ----------
- * pgstat_fetch_stat_funcentry() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * the collected statistics for one function or NULL.
- * ----------
- */
-PgStat_StatFuncEntry *
-pgstat_fetch_stat_funcentry(Oid func_id)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatFuncEntry *funcentry = NULL;
-
- /* load the stats file if needed */
- backend_read_statsfile();
-
- /* Lookup our database, then find the requested function.  */
- dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
- if (dbentry != NULL && dbentry->functions != NULL)
- {
- funcentry = (PgStat_StatFuncEntry *) hash_search(dbentry->functions,
- (void *) &func_id,
- HASH_FIND, NULL);
- }
-
- return funcentry;
-}
-
-
-/* ----------
- * pgstat_fetch_stat_beentry() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * our local copy of the current-activity entry for one backend.
- *
- * NB: caller is responsible for a check if the user is permitted to see
- * this info (especially the querystring).
- * ----------
- */
-PgBackendStatus *
-pgstat_fetch_stat_beentry(int beid)
-{
- pgstat_read_current_status();
-
- if (beid < 1 || beid > localNumBackends)
- return NULL;
-
- return &localBackendStatusTable[beid - 1].backendStatus;
-}
-
-
-/* ----------
- * pgstat_fetch_stat_local_beentry() -
- *
- * Like pgstat_fetch_stat_beentry() but with locally computed additions (like
- * xid and xmin values of the backend)
- *
- * NB: caller is responsible for a check if the user is permitted to see
- * this info (especially the querystring).
- * ----------
- */
-LocalPgBackendStatus *
-pgstat_fetch_stat_local_beentry(int beid)
-{
- pgstat_read_current_status();
-
- if (beid < 1 || beid > localNumBackends)
- return NULL;
-
- return &localBackendStatusTable[beid - 1];
-}
-
-
-/* ----------
- * pgstat_fetch_stat_numbackends() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * the maximum current backend id.
- * ----------
- */
-int
-pgstat_fetch_stat_numbackends(void)
-{
- pgstat_read_current_status();
-
- return localNumBackends;
-}
-
-/*
- * ---------
- * pgstat_fetch_stat_archiver() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * a pointer to the archiver statistics struct.
- * ---------
- */
-PgStat_ArchiverStats *
-pgstat_fetch_stat_archiver(void)
-{
- backend_read_statsfile();
-
- return &archiverStats;
-}
-
-
-/*
- * ---------
- * pgstat_fetch_global() -
- *
- * Support function for the SQL-callable pgstat* functions. Returns
- * a pointer to the global statistics struct.
- * ---------
- */
-PgStat_GlobalStats *
-pgstat_fetch_global(void)
-{
- backend_read_statsfile();
-
- return &globalStats;
-}
-
-
-/* ------------------------------------------------------------
- * Functions for management of the shared-memory PgBackendStatus array
- * ------------------------------------------------------------
- */
-
-static PgBackendStatus *BackendStatusArray = NULL;
-static PgBackendStatus *MyBEEntry = NULL;
-static char *BackendAppnameBuffer = NULL;
-static char *BackendClientHostnameBuffer = NULL;
-static char *BackendActivityBuffer = NULL;
-static Size BackendActivityBufferSize = 0;
-#ifdef USE_SSL
-static PgBackendSSLStatus *BackendSslStatusBuffer = NULL;
-#endif
-
-
-/*
- * Report shared-memory space needed by CreateSharedBackendStatus.
- */
-Size
-BackendStatusShmemSize(void)
-{
- Size size;
-
- /* BackendStatusArray: */
- size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
- /* BackendAppnameBuffer: */
- size = add_size(size,
- mul_size(NAMEDATALEN, NumBackendStatSlots));
- /* BackendClientHostnameBuffer: */
- size = add_size(size,
- mul_size(NAMEDATALEN, NumBackendStatSlots));
- /* BackendActivityBuffer: */
- size = add_size(size,
- mul_size(pgstat_track_activity_query_size, NumBackendStatSlots));
-#ifdef USE_SSL
- /* BackendSslStatusBuffer: */
- size = add_size(size,
- mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots));
-#endif
- return size;
-}
-
-/*
- * Initialize the shared status array and several string buffers
- * during postmaster startup.
- */
-void
-CreateSharedBackendStatus(void)
-{
- Size size;
- bool found;
- int i;
- char   *buffer;
-
- /* Create or attach to the shared array */
- size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
- BackendStatusArray = (PgBackendStatus *)
- ShmemInitStruct("Backend Status Array", size, &found);
-
- if (!found)
- {
- /*
- * We're the first - initialize.
- */
- MemSet(BackendStatusArray, 0, size);
- }
-
- /* Create or attach to the shared appname buffer */
- size = mul_size(NAMEDATALEN, NumBackendStatSlots);
- BackendAppnameBuffer = (char *)
- ShmemInitStruct("Backend Application Name Buffer", size, &found);
-
- if (!found)
- {
- MemSet(BackendAppnameBuffer, 0, size);
-
- /* Initialize st_appname pointers. */
- buffer = BackendAppnameBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_appname = buffer;
- buffer += NAMEDATALEN;
- }
- }
-
- /* Create or attach to the shared client hostname buffer */
- size = mul_size(NAMEDATALEN, NumBackendStatSlots);
- BackendClientHostnameBuffer = (char *)
- ShmemInitStruct("Backend Client Host Name Buffer", size, &found);
-
- if (!found)
- {
- MemSet(BackendClientHostnameBuffer, 0, size);
-
- /* Initialize st_clienthostname pointers. */
- buffer = BackendClientHostnameBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_clienthostname = buffer;
- buffer += NAMEDATALEN;
- }
- }
-
- /* Create or attach to the shared activity buffer */
- BackendActivityBufferSize = mul_size(pgstat_track_activity_query_size,
- NumBackendStatSlots);
- BackendActivityBuffer = (char *)
- ShmemInitStruct("Backend Activity Buffer",
- BackendActivityBufferSize,
- &found);
-
- if (!found)
- {
- MemSet(BackendActivityBuffer, 0, BackendActivityBufferSize);
-
- /* Initialize st_activity pointers. */
- buffer = BackendActivityBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_activity_raw = buffer;
- buffer += pgstat_track_activity_query_size;
- }
- }
-
-#ifdef USE_SSL
- /* Create or attach to the shared SSL status buffer */
- size = mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots);
- BackendSslStatusBuffer = (PgBackendSSLStatus *)
- ShmemInitStruct("Backend SSL Status Buffer", size, &found);
-
- if (!found)
- {
- PgBackendSSLStatus *ptr;
-
- MemSet(BackendSslStatusBuffer, 0, size);
-
- /* Initialize st_sslstatus pointers. */
- ptr = BackendSslStatusBuffer;
- for (i = 0; i < NumBackendStatSlots; i++)
- {
- BackendStatusArray[i].st_sslstatus = ptr;
- ptr++;
- }
- }
-#endif
-}
-
-
-/* ----------
- * pgstat_initialize() -
- *
- * Initialize pgstats state, and set up our on-proc-exit hook.
- * Called from InitPostgres and AuxiliaryProcessMain. For auxiliary process,
- * MyBackendId is invalid. Otherwise, MyBackendId must be set,
- * but we must not have started any transaction yet (since the
- * exit hook must run after the last transaction exit).
- * NOTE: MyDatabaseId isn't set yet; so the shutdown hook has to be careful.
- * ----------
- */
-void
-pgstat_initialize(void)
-{
- /* Initialize MyBEEntry */
- if (MyBackendId != InvalidBackendId)
- {
- Assert(MyBackendId >= 1 && MyBackendId <= MaxBackends);
- MyBEEntry = &BackendStatusArray[MyBackendId - 1];
- }
- else
- {
- /* Must be an auxiliary process */
- Assert(MyAuxProcType != NotAnAuxProcess);
-
- /*
- * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
- * have a BackendId, the slot is statically allocated based on the
- * auxiliary process type (MyAuxProcType).  Backends use slots indexed
- * in the range from 1 to MaxBackends (inclusive), so we use
- * MaxBackends + AuxBackendType + 1 as the index of the slot for an
- * auxiliary process.
- */
- MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
- }
-
- /* Set up a process-exit hook to clean up */
- on_shmem_exit(pgstat_beshutdown_hook, 0);
-}
-
-/* ----------
- * pgstat_bestart() -
- *
- * Initialize this backend's entry in the PgBackendStatus array.
- * Called from InitPostgres.
- *
- * Apart from auxiliary processes, MyBackendId, MyDatabaseId,
- * session userid, and application_name must be set for a
- * backend (hence, this cannot be combined with pgstat_initialize).
- * ----------
- */
-void
-pgstat_bestart(void)
-{
- SockAddr clientaddr;
- volatile PgBackendStatus *beentry;
-
- /*
- * To minimize the time spent modifying the PgBackendStatus entry, fetch
- * all the needed data first.
- */
-
- /*
- * We may not have a MyProcPort (eg, if this is the autovacuum process).
- * If so, use all-zeroes client address, which is dealt with specially in
- * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
- */
- if (MyProcPort)
- memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
- else
- MemSet(&clientaddr, 0, sizeof(clientaddr));
-
- /*
- * Initialize my status entry, following the protocol of bumping
- * st_changecount before and after; and make sure it's even afterwards. We
- * use a volatile pointer here to ensure the compiler doesn't try to get
- * cute.
- */
- beentry = MyBEEntry;
-
- /* pgstats state must be initialized from pgstat_initialize() */
- Assert(beentry != NULL);
-
- if (MyBackendId != InvalidBackendId)
- {
- if (IsAutoVacuumLauncherProcess())
- {
- /* Autovacuum Launcher */
- beentry->st_backendType = B_AUTOVAC_LAUNCHER;
- }
- else if (IsAutoVacuumWorkerProcess())
- {
- /* Autovacuum Worker */
- beentry->st_backendType = B_AUTOVAC_WORKER;
- }
- else if (am_walsender)
- {
- /* Wal sender */
- beentry->st_backendType = B_WAL_SENDER;
- }
- else if (IsBackgroundWorker)
- {
- /* bgworker */
- beentry->st_backendType = B_BG_WORKER;
- }
- else
- {
- /* client-backend */
- beentry->st_backendType = B_BACKEND;
- }
- }
- else
- {
- /* Must be an auxiliary process */
- Assert(MyAuxProcType != NotAnAuxProcess);
- switch (MyAuxProcType)
- {
- case StartupProcess:
- beentry->st_backendType = B_STARTUP;
- break;
- case BgWriterProcess:
- beentry->st_backendType = B_BG_WRITER;
- break;
- case ArchiverProcess:
- beentry->st_backendType = B_ARCHIVER;
- break;
- case CheckpointerProcess:
- beentry->st_backendType = B_CHECKPOINTER;
- break;
- case WalWriterProcess:
- beentry->st_backendType = B_WAL_WRITER;
- break;
- case WalReceiverProcess:
- beentry->st_backendType = B_WAL_RECEIVER;
- break;
- default:
- elog(FATAL, "unrecognized process type: %d",
- (int) MyAuxProcType);
- proc_exit(1);
- }
- }
-
- do
- {
- pgstat_increment_changecount_before(beentry);
- } while ((beentry->st_changecount & 1) == 0);
-
- beentry->st_procpid = MyProcPid;
- beentry->st_proc_start_timestamp = MyStartTimestamp;
- beentry->st_activity_start_timestamp = 0;
- beentry->st_state_start_timestamp = 0;
- beentry->st_xact_start_timestamp = 0;
- beentry->st_databaseid = MyDatabaseId;
-
- /* We have userid for client-backends, wal-sender and bgworker processes */
- if (beentry->st_backendType == B_BACKEND
- || beentry->st_backendType == B_WAL_SENDER
- || beentry->st_backendType == B_BG_WORKER)
- beentry->st_userid = GetSessionUserId();
- else
- beentry->st_userid = InvalidOid;
-
- beentry->st_clientaddr = clientaddr;
- if (MyProcPort && MyProcPort->remote_hostname)
- strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
- NAMEDATALEN);
- else
- beentry->st_clienthostname[0] = '\0';
-#ifdef USE_SSL
- if (MyProcPort && MyProcPort->ssl != NULL)
- {
- beentry->st_ssl = true;
- beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
- beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
- strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
- strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
- be_tls_get_peerdn_name(MyProcPort, beentry->st_sslstatus->ssl_clientdn, NAMEDATALEN);
- }
- else
- {
- beentry->st_ssl = false;
- }
-#else
- beentry->st_ssl = false;
-#endif
- beentry->st_state = STATE_UNDEFINED;
- beentry->st_appname[0] = '\0';
- beentry->st_activity_raw[0] = '\0';
- /* Also make sure the last byte in each string area is always 0 */
- beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
- beentry->st_appname[NAMEDATALEN - 1] = '\0';
- beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
- beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
- beentry->st_progress_command_target = InvalidOid;
-
- /*
- * we don't zero st_progress_param here to save cycles; nobody should
- * examine it until st_progress_command has been set to something other
- * than PROGRESS_COMMAND_INVALID
- */
-
- pgstat_increment_changecount_after(beentry);
-
- /* Update app name to current GUC setting */
- if (application_name)
- pgstat_report_appname(application_name);
-}
-
-/*
- * Shut down a single backend's statistics reporting at process exit.
- *
- * Flush any remaining statistics counts out to the collector.
- * Without this, operations triggered during backend exit (such as
- * temp table deletions) won't be counted.
- *
- * Lastly, clear out our entry in the PgBackendStatus array.
- */
-static void
-pgstat_beshutdown_hook(int code, Datum arg)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- /*
- * If we got as far as discovering our own database ID, we can report what
- * we did to the collector.  Otherwise, we'd be sending an invalid
- * database ID, so forget it.  (This means that accesses to pg_database
- * during failed backend starts might never get counted.)
- */
- if (OidIsValid(MyDatabaseId))
- pgstat_report_stat(true);
-
- /*
- * Clear my status entry, following the protocol of bumping st_changecount
- * before and after.  We use a volatile pointer here to ensure the
- * compiler doesn't try to get cute.
- */
- pgstat_increment_changecount_before(beentry);
-
- beentry->st_procpid = 0; /* mark invalid */
-
- pgstat_increment_changecount_after(beentry);
-}
-
-
-/* ----------
- * pgstat_report_activity() -
- *
- * Called from tcop/postgres.c to report what the backend is actually doing
- * (but note cmd_str can be NULL for certain cases).
- *
- * All updates of the status entry follow the protocol of bumping
- * st_changecount before and after.  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- * ----------
- */
-void
-pgstat_report_activity(BackendState state, const char *cmd_str)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
- TimestampTz start_timestamp;
- TimestampTz current_timestamp;
- int len = 0;
-
- TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
-
- if (!beentry)
- return;
-
- if (!pgstat_track_activities)
- {
- if (beentry->st_state != STATE_DISABLED)
- {
- volatile PGPROC *proc = MyProc;
-
- /*
- * track_activities is disabled, but we last reported a
- * non-disabled state.  As our final update, change the state and
- * clear fields we will not be updating anymore.
- */
- pgstat_increment_changecount_before(beentry);
- beentry->st_state = STATE_DISABLED;
- beentry->st_state_start_timestamp = 0;
- beentry->st_activity_raw[0] = '\0';
- beentry->st_activity_start_timestamp = 0;
- /* st_xact_start_timestamp and wait_event_info are also disabled */
- beentry->st_xact_start_timestamp = 0;
- proc->wait_event_info = 0;
- pgstat_increment_changecount_after(beentry);
- }
- return;
- }
-
- /*
- * To minimize the time spent modifying the entry, fetch all the needed
- * data first.
- */
- start_timestamp = GetCurrentStatementStartTimestamp();
- if (cmd_str != NULL)
- {
- /*
- * Compute length of to-be-stored string unaware of multi-byte
- * characters. For speed reasons that'll get corrected on read, rather
- * than computed every write.
- */
- len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1);
- }
- current_timestamp = GetCurrentTimestamp();
-
- /*
- * Now update the status entry
- */
- pgstat_increment_changecount_before(beentry);
-
- beentry->st_state = state;
- beentry->st_state_start_timestamp = current_timestamp;
-
- if (cmd_str != NULL)
- {
- memcpy((char *) beentry->st_activity_raw, cmd_str, len);
- beentry->st_activity_raw[len] = '\0';
- beentry->st_activity_start_timestamp = start_timestamp;
- }
-
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_start_command() -
- *
- * Set st_progress_command (and st_progress_command_target) in own backend
- * entry.  Also, zero-initialize st_progress_param array.
- *-----------
- */
-void
-pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- if (!beentry || !pgstat_track_activities)
- return;
-
- pgstat_increment_changecount_before(beentry);
- beentry->st_progress_command = cmdtype;
- beentry->st_progress_command_target = relid;
- MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_update_param() -
- *
- * Update index'th member in st_progress_param[] of own backend entry.
- *-----------
- */
-void
-pgstat_progress_update_param(int index, int64 val)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
-
- if (!beentry || !pgstat_track_activities)
- return;
-
- pgstat_increment_changecount_before(beentry);
- beentry->st_progress_param[index] = val;
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_update_multi_param() -
- *
- * Update multiple members in st_progress_param[] of own backend entry.
- * This is atomic; readers won't see intermediate states.
- *-----------
- */
-void
-pgstat_progress_update_multi_param(int nparam, const int *index,
-   const int64 *val)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
- int i;
-
- if (!beentry || !pgstat_track_activities || nparam == 0)
- return;
-
- pgstat_increment_changecount_before(beentry);
-
- for (i = 0; i < nparam; ++i)
- {
- Assert(index[i] >= 0 && index[i] < PGSTAT_NUM_PROGRESS_PARAM);
-
- beentry->st_progress_param[index[i]] = val[i];
- }
-
- pgstat_increment_changecount_after(beentry);
-}
-
-/*-----------
- * pgstat_progress_end_command() -
- *
- * Reset st_progress_command (and st_progress_command_target) in own backend
- * entry.  This signals the end of the command.
- *-----------
- */
-void
-pgstat_progress_end_command(void)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- if (!beentry)
- return;
- if (!pgstat_track_activities
- && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
- return;
-
- pgstat_increment_changecount_before(beentry);
- beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
- beentry->st_progress_command_target = InvalidOid;
- pgstat_increment_changecount_after(beentry);
-}
-
-/* ----------
- * pgstat_report_appname() -
- *
- * Called to update our application name.
- * ----------
- */
-void
-pgstat_report_appname(const char *appname)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
- int len;
-
- if (!beentry)
- return;
-
- /* This should be unnecessary if GUC did its job, but be safe */
- len = pg_mbcliplen(appname, strlen(appname), NAMEDATALEN - 1);
-
- /*
- * Update my status entry, following the protocol of bumping
- * st_changecount before and after.  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- */
- pgstat_increment_changecount_before(beentry);
-
- memcpy((char *) beentry->st_appname, appname, len);
- beentry->st_appname[len] = '\0';
-
- pgstat_increment_changecount_after(beentry);
-}
-
-/*
- * Report current transaction start timestamp as the specified value.
- * Zero means there is no active transaction.
- */
-void
-pgstat_report_xact_timestamp(TimestampTz tstamp)
-{
- volatile PgBackendStatus *beentry = MyBEEntry;
-
- if (!pgstat_track_activities || !beentry)
- return;
-
- /*
- * Update my status entry, following the protocol of bumping
- * st_changecount before and after.  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- */
- pgstat_increment_changecount_before(beentry);
- beentry->st_xact_start_timestamp = tstamp;
- pgstat_increment_changecount_after(beentry);
-}
-
-/* ----------
- * pgstat_read_current_status() -
- *
- * Copy the current contents of the PgBackendStatus array to local memory,
- * if not already done in this transaction.
- * ----------
- */
-static void
-pgstat_read_current_status(void)
-{
- volatile PgBackendStatus *beentry;
- LocalPgBackendStatus *localtable;
- LocalPgBackendStatus *localentry;
- char   *localappname,
-   *localclienthostname,
-   *localactivity;
-#ifdef USE_SSL
- PgBackendSSLStatus *localsslstatus;
-#endif
- int i;
-
- Assert(!pgStatRunningInCollector);
- if (localBackendStatusTable)
- return; /* already done */
-
- pgstat_setup_memcxt();
-
- localtable = (LocalPgBackendStatus *)
- MemoryContextAlloc(pgStatLocalContext,
-   sizeof(LocalPgBackendStatus) * NumBackendStatSlots);
- localappname = (char *)
- MemoryContextAlloc(pgStatLocalContext,
-   NAMEDATALEN * NumBackendStatSlots);
- localclienthostname = (char *)
- MemoryContextAlloc(pgStatLocalContext,
-   NAMEDATALEN * NumBackendStatSlots);
- localactivity = (char *)
- MemoryContextAlloc(pgStatLocalContext,
-   pgstat_track_activity_query_size * NumBackendStatSlots);
-#ifdef USE_SSL
- localsslstatus = (PgBackendSSLStatus *)
- MemoryContextAlloc(pgStatLocalContext,
-   sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
-#endif
-
- localNumBackends = 0;
-
- beentry = BackendStatusArray;
- localentry = localtable;
- for (i = 1; i <= NumBackendStatSlots; i++)
- {
- /*
- * Follow the protocol of retrying if st_changecount changes while we
- * copy the entry, or if it's odd.  (The check for odd is needed to
- * cover the case where we are able to completely copy the entry while
- * the source backend is between increment steps.) We use a volatile
- * pointer here to ensure the compiler doesn't try to get cute.
- */
- for (;;)
- {
- int before_changecount;
- int after_changecount;
-
- pgstat_save_changecount_before(beentry, before_changecount);
-
- localentry->backendStatus.st_procpid = beentry->st_procpid;
- if (localentry->backendStatus.st_procpid > 0)
- {
- memcpy(&localentry->backendStatus, (char *) beentry, sizeof(PgBackendStatus));
-
- /*
- * strcpy is safe even if the string is modified concurrently,
- * because there's always a \0 at the end of the buffer.
- */
- strcpy(localappname, (char *) beentry->st_appname);
- localentry->backendStatus.st_appname = localappname;
- strcpy(localclienthostname, (char *) beentry->st_clienthostname);
- localentry->backendStatus.st_clienthostname = localclienthostname;
- strcpy(localactivity, (char *) beentry->st_activity_raw);
- localentry->backendStatus.st_activity_raw = localactivity;
- localentry->backendStatus.st_ssl = beentry->st_ssl;
-#ifdef USE_SSL
- if (beentry->st_ssl)
- {
- memcpy(localsslstatus, beentry->st_sslstatus, sizeof(PgBackendSSLStatus));
- localentry->backendStatus.st_sslstatus = localsslstatus;
- }
-#endif
- }
-
- pgstat_save_changecount_after(beentry, after_changecount);
- if (before_changecount == after_changecount &&
- (before_changecount & 1) == 0)
- break;
-
- /* Make sure we can break out of loop if stuck... */
- CHECK_FOR_INTERRUPTS();
- }
-
- beentry++;
- /* Only valid entries get included into the local array */
- if (localentry->backendStatus.st_procpid > 0)
- {
- BackendIdGetTransactionIds(i,
-   &localentry->backend_xid,
-   &localentry->backend_xmin);
-
- localentry++;
- localappname += NAMEDATALEN;
- localclienthostname += NAMEDATALEN;
- localactivity += pgstat_track_activity_query_size;
-#ifdef USE_SSL
- localsslstatus++;
-#endif
- localNumBackends++;
- }
- }
-
- /* Set the pointer only after completion of a valid table */
- localBackendStatusTable = localtable;
-}
-
-/* ----------
- * pgstat_get_wait_event_type() -
- *
- * Return a string representing the current wait event type, backend is
- * waiting on.
- */
-const char *
-pgstat_get_wait_event_type(uint32 wait_event_info)
-{
- uint32 classId;
- const char *event_type;
-
- /* report process as not waiting. */
- if (wait_event_info == 0)
- return NULL;
-
- classId = wait_event_info & 0xFF000000;
-
- switch (classId)
- {
- case PG_WAIT_LWLOCK:
- event_type = "LWLock";
- break;
- case PG_WAIT_LOCK:
- event_type = "Lock";
- break;
- case PG_WAIT_BUFFER_PIN:
- event_type = "BufferPin";
- break;
- case PG_WAIT_ACTIVITY:
- event_type = "Activity";
- break;
- case PG_WAIT_CLIENT:
- event_type = "Client";
- break;
- case PG_WAIT_EXTENSION:
- event_type = "Extension";
- break;
- case PG_WAIT_IPC:
- event_type = "IPC";
- break;
- case PG_WAIT_TIMEOUT:
- event_type = "Timeout";
- break;
- case PG_WAIT_IO:
- event_type = "IO";
- break;
- default:
- event_type = "???";
- break;
- }
-
- return event_type;
-}
-
-/* ----------
- * pgstat_get_wait_event() -
- *
- * Return a string representing the current wait event, backend is
- * waiting on.
- */
-const char *
-pgstat_get_wait_event(uint32 wait_event_info)
-{
- uint32 classId;
- uint16 eventId;
- const char *event_name;
-
- /* report process as not waiting. */
- if (wait_event_info == 0)
- return NULL;
-
- classId = wait_event_info & 0xFF000000;
- eventId = wait_event_info & 0x0000FFFF;
-
- switch (classId)
- {
- case PG_WAIT_LWLOCK:
- event_name = GetLWLockIdentifier(classId, eventId);
- break;
- case PG_WAIT_LOCK:
- event_name = GetLockNameFromTagType(eventId);
- break;
- case PG_WAIT_BUFFER_PIN:
- event_name = "BufferPin";
- break;
- case PG_WAIT_ACTIVITY:
- {
- WaitEventActivity w = (WaitEventActivity) wait_event_info;
-
- event_name = pgstat_get_wait_activity(w);
- break;
- }
- case PG_WAIT_CLIENT:
- {
- WaitEventClient w = (WaitEventClient) wait_event_info;
-
- event_name = pgstat_get_wait_client(w);
- break;
- }
- case PG_WAIT_EXTENSION:
- event_name = "Extension";
- break;
- case PG_WAIT_IPC:
- {
- WaitEventIPC w = (WaitEventIPC) wait_event_info;
-
- event_name = pgstat_get_wait_ipc(w);
- break;
- }
- case PG_WAIT_TIMEOUT:
- {
- WaitEventTimeout w = (WaitEventTimeout) wait_event_info;
-
- event_name = pgstat_get_wait_timeout(w);
- break;
- }
- case PG_WAIT_IO:
- {
- WaitEventIO w = (WaitEventIO) wait_event_info;
-
- event_name = pgstat_get_wait_io(w);
- break;
- }
- default:
- event_name = "unknown wait event";
- break;
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_activity() -
- *
- * Convert WaitEventActivity to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_activity(WaitEventActivity w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_ARCHIVER_MAIN:
- event_name = "ArchiverMain";
- break;
- case WAIT_EVENT_AUTOVACUUM_MAIN:
- event_name = "AutoVacuumMain";
- break;
- case WAIT_EVENT_BGWRITER_HIBERNATE:
- event_name = "BgWriterHibernate";
- break;
- case WAIT_EVENT_BGWRITER_MAIN:
- event_name = "BgWriterMain";
- break;
- case WAIT_EVENT_CHECKPOINTER_MAIN:
- event_name = "CheckpointerMain";
- break;
- case WAIT_EVENT_LOGICAL_APPLY_MAIN:
- event_name = "LogicalApplyMain";
- break;
- case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
- event_name = "LogicalLauncherMain";
- break;
- case WAIT_EVENT_PGSTAT_MAIN:
- event_name = "PgStatMain";
- break;
- case WAIT_EVENT_RECOVERY_WAL_ALL:
- event_name = "RecoveryWalAll";
- break;
- case WAIT_EVENT_RECOVERY_WAL_STREAM:
- event_name = "RecoveryWalStream";
- break;
- case WAIT_EVENT_SYSLOGGER_MAIN:
- event_name = "SysLoggerMain";
- break;
- case WAIT_EVENT_WAL_RECEIVER_MAIN:
- event_name = "WalReceiverMain";
- break;
- case WAIT_EVENT_WAL_SENDER_MAIN:
- event_name = "WalSenderMain";
- break;
- case WAIT_EVENT_WAL_WRITER_MAIN:
- event_name = "WalWriterMain";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_client() -
- *
- * Convert WaitEventClient to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_client(WaitEventClient w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_CLIENT_READ:
- event_name = "ClientRead";
- break;
- case WAIT_EVENT_CLIENT_WRITE:
- event_name = "ClientWrite";
- break;
- case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
- event_name = "LibPQWalReceiverConnect";
- break;
- case WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE:
- event_name = "LibPQWalReceiverReceive";
- break;
- case WAIT_EVENT_SSL_OPEN_SERVER:
- event_name = "SSLOpenServer";
- break;
- case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
- event_name = "WalReceiverWaitStart";
- break;
- case WAIT_EVENT_WAL_SENDER_WAIT_WAL:
- event_name = "WalSenderWaitForWAL";
- break;
- case WAIT_EVENT_WAL_SENDER_WRITE_DATA:
- event_name = "WalSenderWriteData";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_ipc() -
- *
- * Convert WaitEventIPC to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_ipc(WaitEventIPC w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_BGWORKER_SHUTDOWN:
- event_name = "BgWorkerShutdown";
- break;
- case WAIT_EVENT_BGWORKER_STARTUP:
- event_name = "BgWorkerStartup";
- break;
- case WAIT_EVENT_BTREE_PAGE:
- event_name = "BtreePage";
- break;
- case WAIT_EVENT_CLOG_GROUP_UPDATE:
- event_name = "ClogGroupUpdate";
- break;
- case WAIT_EVENT_EXECUTE_GATHER:
- event_name = "ExecuteGather";
- break;
- case WAIT_EVENT_HASH_BATCH_ALLOCATING:
- event_name = "Hash/Batch/Allocating";
- break;
- case WAIT_EVENT_HASH_BATCH_ELECTING:
- event_name = "Hash/Batch/Electing";
- break;
- case WAIT_EVENT_HASH_BATCH_LOADING:
- event_name = "Hash/Batch/Loading";
- break;
- case WAIT_EVENT_HASH_BUILD_ALLOCATING:
- event_name = "Hash/Build/Allocating";
- break;
- case WAIT_EVENT_HASH_BUILD_ELECTING:
- event_name = "Hash/Build/Electing";
- break;
- case WAIT_EVENT_HASH_BUILD_HASHING_INNER:
- event_name = "Hash/Build/HashingInner";
- break;
- case WAIT_EVENT_HASH_BUILD_HASHING_OUTER:
- event_name = "Hash/Build/HashingOuter";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATING:
- event_name = "Hash/GrowBatches/Allocating";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_DECIDING:
- event_name = "Hash/GrowBatches/Deciding";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_ELECTING:
- event_name = "Hash/GrowBatches/Electing";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_FINISHING:
- event_name = "Hash/GrowBatches/Finishing";
- break;
- case WAIT_EVENT_HASH_GROW_BATCHES_REPARTITIONING:
- event_name = "Hash/GrowBatches/Repartitioning";
- break;
- case WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATING:
- event_name = "Hash/GrowBuckets/Allocating";
- break;
- case WAIT_EVENT_HASH_GROW_BUCKETS_ELECTING:
- event_name = "Hash/GrowBuckets/Electing";
- break;
- case WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING:
- event_name = "Hash/GrowBuckets/Reinserting";
- break;
- case WAIT_EVENT_LOGICAL_SYNC_DATA:
- event_name = "LogicalSyncData";
- break;
- case WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE:
- event_name = "LogicalSyncStateChange";
- break;
- case WAIT_EVENT_MQ_INTERNAL:
- event_name = "MessageQueueInternal";
- break;
- case WAIT_EVENT_MQ_PUT_MESSAGE:
- event_name = "MessageQueuePutMessage";
- break;
- case WAIT_EVENT_MQ_RECEIVE:
- event_name = "MessageQueueReceive";
- break;
- case WAIT_EVENT_MQ_SEND:
- event_name = "MessageQueueSend";
- break;
- case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
- event_name = "ParallelBitmapScan";
- break;
- case WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN:
- event_name = "ParallelCreateIndexScan";
- break;
- case WAIT_EVENT_PARALLEL_FINISH:
- event_name = "ParallelFinish";
- break;
- case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
- event_name = "ProcArrayGroupUpdate";
- break;
- case WAIT_EVENT_PROMOTE:
- event_name = "Promote";
- break;
- case WAIT_EVENT_REPLICATION_ORIGIN_DROP:
- event_name = "ReplicationOriginDrop";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_DROP:
- event_name = "ReplicationSlotDrop";
- break;
- case WAIT_EVENT_SAFE_SNAPSHOT:
- event_name = "SafeSnapshot";
- break;
- case WAIT_EVENT_SYNC_REP:
- event_name = "SyncRep";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_timeout() -
- *
- * Convert WaitEventTimeout to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_timeout(WaitEventTimeout w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_BASE_BACKUP_THROTTLE:
- event_name = "BaseBackupThrottle";
- break;
- case WAIT_EVENT_PG_SLEEP:
- event_name = "PgSleep";
- break;
- case WAIT_EVENT_RECOVERY_APPLY_DELAY:
- event_name = "RecoveryApplyDelay";
- break;
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-/* ----------
- * pgstat_get_wait_io() -
- *
- * Convert WaitEventIO to string.
- * ----------
- */
-static const char *
-pgstat_get_wait_io(WaitEventIO w)
-{
- const char *event_name = "unknown wait event";
-
- switch (w)
- {
- case WAIT_EVENT_BUFFILE_READ:
- event_name = "BufFileRead";
- break;
- case WAIT_EVENT_BUFFILE_WRITE:
- event_name = "BufFileWrite";
- break;
- case WAIT_EVENT_CONTROL_FILE_READ:
- event_name = "ControlFileRead";
- break;
- case WAIT_EVENT_CONTROL_FILE_SYNC:
- event_name = "ControlFileSync";
- break;
- case WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE:
- event_name = "ControlFileSyncUpdate";
- break;
- case WAIT_EVENT_CONTROL_FILE_WRITE:
- event_name = "ControlFileWrite";
- break;
- case WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE:
- event_name = "ControlFileWriteUpdate";
- break;
- case WAIT_EVENT_COPY_FILE_READ:
- event_name = "CopyFileRead";
- break;
- case WAIT_EVENT_COPY_FILE_WRITE:
- event_name = "CopyFileWrite";
- break;
- case WAIT_EVENT_DATA_FILE_EXTEND:
- event_name = "DataFileExtend";
- break;
- case WAIT_EVENT_DATA_FILE_FLUSH:
- event_name = "DataFileFlush";
- break;
- case WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC:
- event_name = "DataFileImmediateSync";
- break;
- case WAIT_EVENT_DATA_FILE_PREFETCH:
- event_name = "DataFilePrefetch";
- break;
- case WAIT_EVENT_DATA_FILE_READ:
- event_name = "DataFileRead";
- break;
- case WAIT_EVENT_DATA_FILE_SYNC:
- event_name = "DataFileSync";
- break;
- case WAIT_EVENT_DATA_FILE_TRUNCATE:
- event_name = "DataFileTruncate";
- break;
- case WAIT_EVENT_DATA_FILE_WRITE:
- event_name = "DataFileWrite";
- break;
- case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
- event_name = "DSMFillZeroWrite";
- break;
- case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
- event_name = "LockFileAddToDataDirRead";
- break;
- case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC:
- event_name = "LockFileAddToDataDirSync";
- break;
- case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE:
- event_name = "LockFileAddToDataDirWrite";
- break;
- case WAIT_EVENT_LOCK_FILE_CREATE_READ:
- event_name = "LockFileCreateRead";
- break;
- case WAIT_EVENT_LOCK_FILE_CREATE_SYNC:
- event_name = "LockFileCreateSync";
- break;
- case WAIT_EVENT_LOCK_FILE_CREATE_WRITE:
- event_name = "LockFileCreateWrite";
- break;
- case WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ:
- event_name = "LockFileReCheckDataDirRead";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC:
- event_name = "LogicalRewriteCheckpointSync";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC:
- event_name = "LogicalRewriteMappingSync";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE:
- event_name = "LogicalRewriteMappingWrite";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_SYNC:
- event_name = "LogicalRewriteSync";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE:
- event_name = "LogicalRewriteTruncate";
- break;
- case WAIT_EVENT_LOGICAL_REWRITE_WRITE:
- event_name = "LogicalRewriteWrite";
- break;
- case WAIT_EVENT_RELATION_MAP_READ:
- event_name = "RelationMapRead";
- break;
- case WAIT_EVENT_RELATION_MAP_SYNC:
- event_name = "RelationMapSync";
- break;
- case WAIT_EVENT_RELATION_MAP_WRITE:
- event_name = "RelationMapWrite";
- break;
- case WAIT_EVENT_REORDER_BUFFER_READ:
- event_name = "ReorderBufferRead";
- break;
- case WAIT_EVENT_REORDER_BUFFER_WRITE:
- event_name = "ReorderBufferWrite";
- break;
- case WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ:
- event_name = "ReorderLogicalMappingRead";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_READ:
- event_name = "ReplicationSlotRead";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC:
- event_name = "ReplicationSlotRestoreSync";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_SYNC:
- event_name = "ReplicationSlotSync";
- break;
- case WAIT_EVENT_REPLICATION_SLOT_WRITE:
- event_name = "ReplicationSlotWrite";
- break;
- case WAIT_EVENT_SLRU_FLUSH_SYNC:
- event_name = "SLRUFlushSync";
- break;
- case WAIT_EVENT_SLRU_READ:
- event_name = "SLRURead";
- break;
- case WAIT_EVENT_SLRU_SYNC:
- event_name = "SLRUSync";
- break;
- case WAIT_EVENT_SLRU_WRITE:
- event_name = "SLRUWrite";
- break;
- case WAIT_EVENT_SNAPBUILD_READ:
- event_name = "SnapbuildRead";
- break;
- case WAIT_EVENT_SNAPBUILD_SYNC:
- event_name = "SnapbuildSync";
- break;
- case WAIT_EVENT_SNAPBUILD_WRITE:
- event_name = "SnapbuildWrite";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC:
- event_name = "TimelineHistoryFileSync";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE:
- event_name = "TimelineHistoryFileWrite";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_READ:
- event_name = "TimelineHistoryRead";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_SYNC:
- event_name = "TimelineHistorySync";
- break;
- case WAIT_EVENT_TIMELINE_HISTORY_WRITE:
- event_name = "TimelineHistoryWrite";
- break;
- case WAIT_EVENT_TWOPHASE_FILE_READ:
- event_name = "TwophaseFileRead";
- break;
- case WAIT_EVENT_TWOPHASE_FILE_SYNC:
- event_name = "TwophaseFileSync";
- break;
- case WAIT_EVENT_TWOPHASE_FILE_WRITE:
- event_name = "TwophaseFileWrite";
- break;
- case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
- event_name = "WALSenderTimelineHistoryRead";
- break;
- case WAIT_EVENT_WAL_BOOTSTRAP_SYNC:
- event_name = "WALBootstrapSync";
- break;
- case WAIT_EVENT_WAL_BOOTSTRAP_WRITE:
- event_name = "WALBootstrapWrite";
- break;
- case WAIT_EVENT_WAL_COPY_READ:
- event_name = "WALCopyRead";
- break;
- case WAIT_EVENT_WAL_COPY_SYNC:
- event_name = "WALCopySync";
- break;
- case WAIT_EVENT_WAL_COPY_WRITE:
- event_name = "WALCopyWrite";
- break;
- case WAIT_EVENT_WAL_INIT_SYNC:
- event_name = "WALInitSync";
- break;
- case WAIT_EVENT_WAL_INIT_WRITE:
- event_name = "WALInitWrite";
- break;
- case WAIT_EVENT_WAL_READ:
- event_name = "WALRead";
- break;
- case WAIT_EVENT_WAL_SYNC:
- event_name = "WALSync";
- break;
- case WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN:
- event_name = "WALSyncMethodAssign";
- break;
- case WAIT_EVENT_WAL_WRITE:
- event_name = "WALWrite";
- break;
-
- /* no default case, so that compiler will warn */
- }
-
- return event_name;
-}
-
-
-/* ----------
- * pgstat_get_backend_current_activity() -
- *
- * Return a string representing the current activity of the backend with
- * the specified PID.  This looks directly at the BackendStatusArray,
- * and so will provide current information regardless of the age of our
- * transaction's snapshot of the status array.
- *
- * It is the caller's responsibility to invoke this only for backends whose
- * state is expected to remain stable while the result is in use.  The
- * only current use is in deadlock reporting, where we can expect that
- * the target backend is blocked on a lock.  (There are corner cases
- * where the target's wait could get aborted while we are looking at it,
- * but the very worst consequence is to return a pointer to a string
- * that's been changed, so we won't worry too much.)
- *
- * Note: return strings for special cases match pg_stat_get_backend_activity.
- * ----------
- */
-const char *
-pgstat_get_backend_current_activity(int pid, bool checkUser)
-{
- PgBackendStatus *beentry;
- int i;
-
- beentry = BackendStatusArray;
- for (i = 1; i <= MaxBackends; i++)
- {
- /*
- * Although we expect the target backend's entry to be stable, that
- * doesn't imply that anyone else's is.  To avoid identifying the
- * wrong backend, while we check for a match to the desired PID we
- * must follow the protocol of retrying if st_changecount changes
- * while we examine the entry, or if it's odd.  (This might be
- * unnecessary, since fetching or storing an int is almost certainly
- * atomic, but let's play it safe.)  We use a volatile pointer here to
- * ensure the compiler doesn't try to get cute.
- */
- volatile PgBackendStatus *vbeentry = beentry;
- bool found;
-
- for (;;)
- {
- int before_changecount;
- int after_changecount;
-
- pgstat_save_changecount_before(vbeentry, before_changecount);
-
- found = (vbeentry->st_procpid == pid);
-
- pgstat_save_changecount_after(vbeentry, after_changecount);
-
- if (before_changecount == after_changecount &&
- (before_changecount & 1) == 0)
- break;
-
- /* Make sure we can break out of loop if stuck... */
- CHECK_FOR_INTERRUPTS();
- }
-
- if (found)
- {
- /* Now it is safe to use the non-volatile pointer */
- if (checkUser && !superuser() && beentry->st_userid != GetUserId())
- return "<insufficient privilege>";
- else if (*(beentry->st_activity_raw) == '\0')
- return "<command string not enabled>";
- else
- {
- /* this'll leak a bit of memory, but that seems acceptable */
- return pgstat_clip_activity(beentry->st_activity_raw);
- }
- }
-
- beentry++;
- }
-
- /* If we get here, caller is in error ... */
- return "<backend information not available>";
-}
-
-/* ----------
- * pgstat_get_crashed_backend_activity() -
- *
- * Return a string representing the current activity of the backend with
- * the specified PID.  Like the function above, but reads shared memory with
- * the expectation that it may be corrupt.  On success, copy the string
- * into the "buffer" argument and return that pointer.  On failure,
- * return NULL.
- *
- * This function is only intended to be used by the postmaster to report the
- * query that crashed a backend.  In particular, no attempt is made to
- * follow the correct concurrency protocol when accessing the
- * BackendStatusArray.  But that's OK, in the worst case we'll return a
- * corrupted message.  We also must take care not to trip on ereport(ERROR).
- * ----------
- */
-const char *
-pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen)
-{
- volatile PgBackendStatus *beentry;
- int i;
-
- beentry = BackendStatusArray;
-
- /*
- * We probably shouldn't get here before shared memory has been set up,
- * but be safe.
- */
- if (beentry == NULL || BackendActivityBuffer == NULL)
- return NULL;
-
- for (i = 1; i <= MaxBackends; i++)
- {
- if (beentry->st_procpid == pid)
- {
- /* Read pointer just once, so it can't change after validation */
- const char *activity = beentry->st_activity_raw;
- const char *activity_last;
-
- /*
- * We mustn't access activity string before we verify that it
- * falls within the BackendActivityBuffer. To make sure that the
- * entire string including its ending is contained within the
- * buffer, subtract one activity length from the buffer size.
- */
- activity_last = BackendActivityBuffer + BackendActivityBufferSize
- - pgstat_track_activity_query_size;
-
- if (activity < BackendActivityBuffer ||
- activity > activity_last)
- return NULL;
-
- /* If no string available, no point in a report */
- if (activity[0] == '\0')
- return NULL;
-
- /*
- * Copy only ASCII-safe characters so we don't run into encoding
- * problems when reporting the message; and be sure not to run off
- * the end of memory.  As only ASCII characters are reported, it
- * doesn't seem necessary to perform multibyte aware clipping.
- */
- ascii_safe_strlcpy(buffer, activity,
-   Min(buflen, pgstat_track_activity_query_size));
-
- return buffer;
- }
-
- beentry++;
- }
-
- /* PID not found */
- return NULL;
-}
-
-const char *
-pgstat_get_backend_desc(BackendType backendType)
-{
- const char *backendDesc = "unknown process type";
-
- switch (backendType)
- {
- case B_AUTOVAC_LAUNCHER:
- backendDesc = "autovacuum launcher";
- break;
- case B_AUTOVAC_WORKER:
- backendDesc = "autovacuum worker";
- break;
- case B_BACKEND:
- backendDesc = "client backend";
- break;
- case B_BG_WORKER:
- backendDesc = "background worker";
- break;
- case B_BG_WRITER:
- backendDesc = "background writer";
- break;
- case B_ARCHIVER:
- backendDesc = "archiver";
- break;
- case B_CHECKPOINTER:
- backendDesc = "checkpointer";
- break;
- case B_STARTUP:
- backendDesc = "startup";
- break;
- case B_WAL_RECEIVER:
- backendDesc = "walreceiver";
- break;
- case B_WAL_SENDER:
- backendDesc = "walsender";
- break;
- case B_WAL_WRITER:
- backendDesc = "walwriter";
- break;
- }
-
- return backendDesc;
-}
-
-/* ------------------------------------------------------------
- * Local support functions follow
- * ------------------------------------------------------------
- */
-
-
-/* ----------
- * pgstat_setheader() -
- *
- * Set common header fields in a statistics message
- * ----------
- */
-static void
-pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype)
-{
- hdr->m_type = mtype;
-}
-
-
-/* ----------
- * pgstat_send() -
- *
- * Send out one statistics message to the collector
- * ----------
- */
-static void
-pgstat_send(void *msg, int len)
-{
- int rc;
-
- if (pgStatSock == PGINVALID_SOCKET)
- return;
-
- ((PgStat_MsgHdr *) msg)->m_size = len;
-
- /* We'll retry after EINTR, but ignore all other failures */
- do
- {
- rc = send(pgStatSock, msg, len, 0);
- } while (rc < 0 && errno == EINTR);
-
-#ifdef USE_ASSERT_CHECKING
- /* In debug builds, log send failures ... */
- if (rc < 0)
- elog(LOG, "could not send to statistics collector: %m");
-#endif
-}
-
-/* ----------
- * pgstat_send_archiver() -
- *
- * Tell the collector about the WAL file that we successfully
- * archived or failed to archive.
- * ----------
- */
-void
-pgstat_send_archiver(const char *xlog, bool failed)
-{
- PgStat_MsgArchiver msg;
-
- /*
- * Prepare and send the message
- */
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ARCHIVER);
- msg.m_failed = failed;
- StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
- msg.m_timestamp = GetCurrentTimestamp();
- pgstat_send(&msg, sizeof(msg));
-}
-
-/* ----------
- * pgstat_send_bgwriter() -
- *
- * Send bgwriter statistics to the collector
- * ----------
- */
-void
-pgstat_send_bgwriter(void)
-{
- /* We assume this initializes to zeroes */
- static const PgStat_MsgBgWriter all_zeroes;
-
- /*
- * This function can be called even if nothing at all has happened. In
- * this case, avoid sending a completely empty message to the stats
- * collector.
- */
- if (memcmp(&BgWriterStats, &all_zeroes, sizeof(PgStat_MsgBgWriter)) == 0)
- return;
-
- /*
- * Prepare and send the message
- */
- pgstat_setheader(&BgWriterStats.m_hdr, PGSTAT_MTYPE_BGWRITER);
- pgstat_send(&BgWriterStats, sizeof(BgWriterStats));
-
- /*
- * Clear out the statistics buffer, so it can be re-used.
- */
- MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
-}
-
-
-/* ----------
- * PgstatCollectorMain() -
- *
- * Start up the statistics collector process.  This is the body of the
- * postmaster child process.
- *
- * The argc/argv parameters are valid only in EXEC_BACKEND case.
- * ----------
- */
-NON_EXEC_STATIC void
-PgstatCollectorMain(int argc, char *argv[])
-{
- int len;
- PgStat_Msg msg;
- int wr;
-
- /*
- * Ignore all signals usually bound to some action in the postmaster,
- * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
- * support latch operations, because we only use a local latch.
- */
- pqsignal(SIGHUP, pgstat_sighup_handler);
- pqsignal(SIGINT, SIG_IGN);
- pqsignal(SIGTERM, SIG_IGN);
- pqsignal(SIGQUIT, pgstat_exit);
- pqsignal(SIGALRM, SIG_IGN);
- pqsignal(SIGPIPE, SIG_IGN);
- pqsignal(SIGUSR1, SIG_IGN);
- pqsignal(SIGUSR2, SIG_IGN);
- /* Reset some signals that are accepted by postmaster but not here */
- pqsignal(SIGCHLD, SIG_DFL);
- PG_SETMASK(&UnBlockSig);
-
- /*
- * Identify myself via ps
- */
- init_ps_display("stats collector", "", "", "");
-
- /*
- * Read in existing stats files or initialize the stats to zero.
- */
- pgStatRunningInCollector = true;
- pgStatDBHash = pgstat_read_statsfiles(InvalidOid, true, true);
-
- /*
- * Loop to process messages until we get SIGQUIT or detect ungraceful
- * death of our parent postmaster.
- *
- * For performance reasons, we don't want to do ResetLatch/WaitLatch after
- * every message; instead, do that only after a recv() fails to obtain a
- * message.  (This effectively means that if backends are sending us stuff
- * like mad, we won't notice postmaster death until things slack off a
- * bit; which seems fine.) To do that, we have an inner loop that
- * iterates as long as recv() succeeds.  We do recognize got_SIGHUP inside
- * the inner loop, which means that such interrupts will get serviced but
- * the latch won't get cleared until next time there is a break in the
- * action.
- */
- for (;;)
- {
- /* Clear any already-pending wakeups */
- ResetLatch(MyLatch);
-
- /*
- * Quit if we get SIGQUIT from the postmaster.
- */
- if (need_exit)
- break;
-
- /*
- * Inner loop iterates as long as we keep getting messages, or until
- * need_exit becomes set.
- */
- while (!need_exit)
- {
- /*
- * Reload configuration if we got SIGHUP from the postmaster.
- */
- if (got_SIGHUP)
- {
- got_SIGHUP = false;
- ProcessConfigFile(PGC_SIGHUP);
- }
-
- /*
- * Write the stats file(s) if a new request has arrived that is
- * not satisfied by existing file(s).
- */
- if (pgstat_write_statsfile_needed())
- pgstat_write_statsfiles(false, false);
-
- /*
- * Try to receive and process a message.  This will not block,
- * since the socket is set to non-blocking mode.
- *
- * XXX On Windows, we have to force pgwin32_recv to cooperate,
- * despite the previous use of pg_set_noblock() on the socket.
- * This is extremely broken and should be fixed someday.
- */
-#ifdef WIN32
- pgwin32_noblock = 1;
-#endif
-
- len = recv(pgStatSock, (char *) &msg,
-   sizeof(PgStat_Msg), 0);
-
-#ifdef WIN32
- pgwin32_noblock = 0;
-#endif
-
- if (len < 0)
- {
- if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
- break; /* out of inner loop */
- ereport(ERROR,
- (errcode_for_socket_access(),
- errmsg("could not read statistics message: %m")));
- }
-
- /*
- * We ignore messages that are smaller than our common header
- */
- if (len < sizeof(PgStat_MsgHdr))
- continue;
-
- /*
- * The received length must match the length in the header
- */
- if (msg.msg_hdr.m_size != len)
- continue;
-
- /*
- * O.K. - we accept this message.  Process it.
- */
- switch (msg.msg_hdr.m_type)
- {
- case PGSTAT_MTYPE_DUMMY:
- break;
-
- case PGSTAT_MTYPE_INQUIRY:
- pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_TABSTAT:
- pgstat_recv_tabstat((PgStat_MsgTabstat *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_TABPURGE:
- pgstat_recv_tabpurge((PgStat_MsgTabpurge *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_DROPDB:
- pgstat_recv_dropdb((PgStat_MsgDropdb *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_RESETCOUNTER:
- pgstat_recv_resetcounter((PgStat_MsgResetcounter *) &msg,
- len);
- break;
-
- case PGSTAT_MTYPE_RESETSHAREDCOUNTER:
- pgstat_recv_resetsharedcounter(
-   (PgStat_MsgResetsharedcounter *) &msg,
-   len);
- break;
-
- case PGSTAT_MTYPE_RESETSINGLECOUNTER:
- pgstat_recv_resetsinglecounter(
-   (PgStat_MsgResetsinglecounter *) &msg,
-   len);
- break;
-
- case PGSTAT_MTYPE_AUTOVAC_START:
- pgstat_recv_autovac((PgStat_MsgAutovacStart *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_VACUUM:
- pgstat_recv_vacuum((PgStat_MsgVacuum *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_ANALYZE:
- pgstat_recv_analyze((PgStat_MsgAnalyze *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_ARCHIVER:
- pgstat_recv_archiver((PgStat_MsgArchiver *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_BGWRITER:
- pgstat_recv_bgwriter((PgStat_MsgBgWriter *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_FUNCSTAT:
- pgstat_recv_funcstat((PgStat_MsgFuncstat *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_FUNCPURGE:
- pgstat_recv_funcpurge((PgStat_MsgFuncpurge *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_RECOVERYCONFLICT:
- pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_DEADLOCK:
- pgstat_recv_deadlock((PgStat_MsgDeadlock *) &msg, len);
- break;
-
- case PGSTAT_MTYPE_TEMPFILE:
- pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
- break;
-
- default:
- break;
- }
- } /* end of inner message-processing loop */
-
- /* Sleep until there's something to do */
-#ifndef WIN32
- wr = WaitLatchOrSocket(MyLatch,
-   WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE,
-   pgStatSock, -1L,
-   WAIT_EVENT_PGSTAT_MAIN);
-#else
-
- /*
- * Windows, at least in its Windows Server 2003 R2 incarnation,
- * sometimes loses FD_READ events.  Waking up and retrying the recv()
- * fixes that, so don't sleep indefinitely.  This is a crock of the
- * first water, but until somebody wants to debug exactly what's
- * happening there, this is the best we can do.  The two-second
- * timeout matches our pre-9.2 behavior, and needs to be short enough
- * to not provoke "using stale statistics" complaints from
- * backend_read_statsfile.
- */
- wr = WaitLatchOrSocket(MyLatch,
-   WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | WL_TIMEOUT,
-   pgStatSock,
-   2 * 1000L /* msec */ ,
-   WAIT_EVENT_PGSTAT_MAIN);
-#endif
-
- /*
- * Emergency bailout if postmaster has died.  This is to avoid the
- * necessity for manual cleanup of all postmaster children.
- */
- if (wr & WL_POSTMASTER_DEATH)
- break;
- } /* end of outer loop */
-
- /*
- * Save the final stats to reuse at next startup.
- */
- pgstat_write_statsfiles(true, true);
-
- exit(0);
-}
-
-
-/* SIGQUIT signal handler for collector process */
-static void
-pgstat_exit(SIGNAL_ARGS)
-{
- int save_errno = errno;
-
- need_exit = true;
- SetLatch(MyLatch);
-
- errno = save_errno;
-}
-
-/* SIGHUP handler for collector process */
-static void
-pgstat_sighup_handler(SIGNAL_ARGS)
-{
- int save_errno = errno;
-
- got_SIGHUP = true;
- SetLatch(MyLatch);
-
- errno = save_errno;
-}
-
-/*
- * Subroutine to clear stats in a database entry
- *
- * Tables and functions hashes are initialized to empty.
- */
-static void
-reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
-{
- HASHCTL hash_ctl;
-
- dbentry->n_xact_commit = 0;
- dbentry->n_xact_rollback = 0;
- dbentry->n_blocks_fetched = 0;
- dbentry->n_blocks_hit = 0;
- dbentry->n_tuples_returned = 0;
- dbentry->n_tuples_fetched = 0;
- dbentry->n_tuples_inserted = 0;
- dbentry->n_tuples_updated = 0;
- dbentry->n_tuples_deleted = 0;
- dbentry->last_autovac_time = 0;
- dbentry->n_conflict_tablespace = 0;
- dbentry->n_conflict_lock = 0;
- dbentry->n_conflict_snapshot = 0;
- dbentry->n_conflict_bufferpin = 0;
- dbentry->n_conflict_startup_deadlock = 0;
- dbentry->n_temp_files = 0;
- dbentry->n_temp_bytes = 0;
- dbentry->n_deadlocks = 0;
- dbentry->n_block_read_time = 0;
- dbentry->n_block_write_time = 0;
-
- dbentry->stat_reset_timestamp = GetCurrentTimestamp();
- dbentry->stats_timestamp = 0;
-
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatTabEntry);
- dbentry->tables = hash_create("Per-database table",
-  PGSTAT_TAB_HASH_SIZE,
-  &hash_ctl,
-  HASH_ELEM | HASH_BLOBS);
-
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry);
- dbentry->functions = hash_create("Per-database function",
- PGSTAT_FUNCTION_HASH_SIZE,
- &hash_ctl,
- HASH_ELEM | HASH_BLOBS);
-}
-
-/*
- * Lookup the hash table entry for the specified database. If no hash
- * table entry exists, initialize it, if the create parameter is true.
- * Else, return NULL.
- */
-static PgStat_StatDBEntry *
-pgstat_get_db_entry(Oid databaseid, bool create)
-{
- PgStat_StatDBEntry *result;
- bool found;
- HASHACTION action = (create ? HASH_ENTER : HASH_FIND);
-
- /* Lookup or create the hash table entry for this database */
- result = (PgStat_StatDBEntry *) hash_search(pgStatDBHash,
- &databaseid,
- action, &found);
-
- if (!create && !found)
- return NULL;
-
- /*
- * If not found, initialize the new one.  This creates empty hash tables
- * for tables and functions, too.
- */
- if (!found)
- reset_dbentry_counters(result);
-
- return result;
-}
-
-
-/*
- * Lookup the hash table entry for the specified table. If no hash
- * table entry exists, initialize it, if the create parameter is true.
- * Else, return NULL.
- */
-static PgStat_StatTabEntry *
-pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
-{
- PgStat_StatTabEntry *result;
- bool found;
- HASHACTION action = (create ? HASH_ENTER : HASH_FIND);
-
- /* Lookup or create the hash table entry for this table */
- result = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
- &tableoid,
- action, &found);
-
- if (!create && !found)
- return NULL;
-
- /* If not found, initialize the new one. */
- if (!found)
- {
- result->numscans = 0;
- result->tuples_returned = 0;
- result->tuples_fetched = 0;
- result->tuples_inserted = 0;
- result->tuples_updated = 0;
- result->tuples_deleted = 0;
- result->tuples_hot_updated = 0;
- result->n_live_tuples = 0;
- result->n_dead_tuples = 0;
- result->changes_since_analyze = 0;
- result->blocks_fetched = 0;
- result->blocks_hit = 0;
- result->vacuum_timestamp = 0;
- result->vacuum_count = 0;
- result->autovac_vacuum_timestamp = 0;
- result->autovac_vacuum_count = 0;
- result->analyze_timestamp = 0;
- result->analyze_count = 0;
- result->autovac_analyze_timestamp = 0;
- result->autovac_analyze_count = 0;
- }
-
- return result;
-}
-
-
-/* ----------
- * pgstat_write_statsfiles() -
- * Write the global statistics file, as well as requested DB files.
- *
- * 'permanent' specifies writing to the permanent files not temporary ones.
- * When true (happens only when the collector is shutting down), also remove
- * the temporary files so that backends starting up under a new postmaster
- * can't read old data before the new collector is ready.
- *
- * When 'allDbs' is false, only the requested databases (listed in
- * pending_write_requests) will be written; otherwise, all databases
- * will be written.
- * ----------
- */
-static void
-pgstat_write_statsfiles(bool permanent, bool allDbs)
-{
- HASH_SEQ_STATUS hstat;
- PgStat_StatDBEntry *dbentry;
- FILE   *fpout;
- int32 format_id;
- const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
- const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
- int rc;
-
- elog(DEBUG2, "writing stats file \"%s\"", statfile);
-
- /*
- * Open the statistics temp file to write out the current values.
- */
- fpout = AllocateFile(tmpfile, PG_BINARY_W);
- if (fpout == NULL)
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not open temporary statistics file \"%s\": %m",
- tmpfile)));
- return;
- }
-
- /*
- * Set the timestamp of the stats file.
- */
- globalStats.stats_timestamp = GetCurrentTimestamp();
-
- /*
- * Write the file header --- currently just a format ID.
- */
- format_id = PGSTAT_FILE_FORMAT_ID;
- rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
-
- /*
- * Write global stats struct
- */
- rc = fwrite(&globalStats, sizeof(globalStats), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
-
- /*
- * Write archiver stats struct
- */
- rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
-
- /*
- * Walk through the database table.
- */
- hash_seq_init(&hstat, pgStatDBHash);
- while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
- {
- /*
- * Write out the table and function stats for this DB into the
- * appropriate per-DB stat file, if required.
- */
- if (allDbs || pgstat_db_requested(dbentry->databaseid))
- {
- /* Make DB's timestamp consistent with the global stats */
- dbentry->stats_timestamp = globalStats.stats_timestamp;
-
- pgstat_write_db_statsfile(dbentry, permanent);
- }
-
- /*
- * Write out the DB entry. We don't write the tables or functions
- * pointers, since they're of no use to any other process.
- */
- fputc('D', fpout);
- rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
- }
-
- /*
- * No more output to be done. Close the temp file and replace the old
- * pgstat.stat with it.  The ferror() check replaces testing for error
- * after each individual fputc or fwrite above.
- */
- fputc('E', fpout);
-
- if (ferror(fpout))
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not write temporary statistics file \"%s\": %m",
- tmpfile)));
- FreeFile(fpout);
- unlink(tmpfile);
- }
- else if (FreeFile(fpout) < 0)
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not close temporary statistics file \"%s\": %m",
- tmpfile)));
- unlink(tmpfile);
- }
- else if (rename(tmpfile, statfile) < 0)
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
- tmpfile, statfile)));
- unlink(tmpfile);
- }
-
- if (permanent)
- unlink(pgstat_stat_filename);
-
- /*
- * Now throw away the list of requests.  Note that requests sent after we
- * started the write are still waiting on the network socket.
- */
- list_free(pending_write_requests);
- pending_write_requests = NIL;
-}
-
-/*
- * return the filename for a DB stat file; filename is the output buffer,
- * of length len.
- */
-static void
-get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
- char *filename, int len)
-{
- int printed;
-
- /* NB -- pgstat_reset_remove_files knows about the pattern this uses */
- printed = snprintf(filename, len, "%s/db_%u.%s",
-   permanent ? PGSTAT_STAT_PERMANENT_DIRECTORY :
-   pgstat_stat_directory,
-   databaseid,
-   tempname ? "tmp" : "stat");
- if (printed >= len)
- elog(ERROR, "overlength pgstat path");
-}
-
-/* ----------
- * pgstat_write_db_statsfile() -
- * Write the stat file for a single database.
- *
- * If writing to the permanent file (happens when the collector is
- * shutting down only), remove the temporary file so that backends
- * starting up under a new postmaster can't read the old data before
- * the new collector is ready.
- * ----------
- */
-static void
-pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent)
-{
- HASH_SEQ_STATUS tstat;
- HASH_SEQ_STATUS fstat;
- PgStat_StatTabEntry *tabentry;
- PgStat_StatFuncEntry *funcentry;
- FILE   *fpout;
- int32 format_id;
- Oid dbid = dbentry->databaseid;
- int rc;
- char tmpfile[MAXPGPATH];
- char statfile[MAXPGPATH];
-
- get_dbstat_filename(permanent, true, dbid, tmpfile, MAXPGPATH);
- get_dbstat_filename(permanent, false, dbid, statfile, MAXPGPATH);
-
- elog(DEBUG2, "writing stats file \"%s\"", statfile);
-
- /*
- * Open the statistics temp file to write out the current values.
- */
- fpout = AllocateFile(tmpfile, PG_BINARY_W);
- if (fpout == NULL)
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not open temporary statistics file \"%s\": %m",
- tmpfile)));
- return;
- }
-
- /*
- * Write the file header --- currently just a format ID.
- */
- format_id = PGSTAT_FILE_FORMAT_ID;
- rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
-
- /*
- * Walk through the database's access stats per table.
- */
- hash_seq_init(&tstat, dbentry->tables);
- while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
- {
- fputc('T', fpout);
- rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
- }
-
- /*
- * Walk through the database's function stats table.
- */
- hash_seq_init(&fstat, dbentry->functions);
- while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
- {
- fputc('F', fpout);
- rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
- (void) rc; /* we'll check for error with ferror */
- }
-
- /*
- * No more output to be done. Close the temp file and replace the old
- * pgstat.stat with it.  The ferror() check replaces testing for error
- * after each individual fputc or fwrite above.
- */
- fputc('E', fpout);
-
- if (ferror(fpout))
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not write temporary statistics file \"%s\": %m",
- tmpfile)));
- FreeFile(fpout);
- unlink(tmpfile);
- }
- else if (FreeFile(fpout) < 0)
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not close temporary statistics file \"%s\": %m",
- tmpfile)));
- unlink(tmpfile);
- }
- else if (rename(tmpfile, statfile) < 0)
- {
- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
- tmpfile, statfile)));
- unlink(tmpfile);
- }
-
- if (permanent)
- {
- get_dbstat_filename(false, false, dbid, statfile, MAXPGPATH);
-
- elog(DEBUG2, "removing temporary stats file \"%s\"", statfile);
- unlink(statfile);
- }
-}
-
-/* ----------
- * pgstat_read_statsfiles() -
- *
- * Reads in some existing statistics collector files and returns the
- * databases hash table that is the top level of the data.
- *
- * If 'onlydb' is not InvalidOid, it means we only want data for that DB
- * plus the shared catalogs ("DB 0").  We'll still populate the DB hash
- * table for all databases, but we don't bother even creating table/function
- * hash tables for other databases.
- *
- * 'permanent' specifies reading from the permanent files not temporary ones.
- * When true (happens only when the collector is starting up), remove the
- * files after reading; the in-memory status is now authoritative, and the
- * files would be out of date in case somebody else reads them.
- *
- * If a 'deep' read is requested, table/function stats are read, otherwise
- * the table/function hash tables remain empty.
- * ----------
- */
-static HTAB *
-pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatDBEntry dbbuf;
- HASHCTL hash_ctl;
- HTAB   *dbhash;
- FILE   *fpin;
- int32 format_id;
- bool found;
- const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
-
- /*
- * The tables will live in pgStatLocalContext.
- */
- pgstat_setup_memcxt();
-
- /*
- * Create the DB hashtable
- */
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatDBEntry);
- hash_ctl.hcxt = pgStatLocalContext;
- dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
- /*
- * Clear out global and archiver statistics so they start from zero in
- * case we can't load an existing statsfile.
- */
- memset(&globalStats, 0, sizeof(globalStats));
- memset(&archiverStats, 0, sizeof(archiverStats));
-
- /*
- * Set the current timestamp (will be kept only in case we can't load an
- * existing statsfile).
- */
- globalStats.stat_reset_timestamp = GetCurrentTimestamp();
- archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
- /*
- * Try to open the stats file. If it doesn't exist, the backends simply
- * return zero for anything and the collector simply starts from scratch
- * with empty counters.
- *
- * ENOENT is a possibility if the stats collector is not running or has
- * not yet written the stats file the first time.  Any other failure
- * condition is suspicious.
- */
- if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
- {
- if (errno != ENOENT)
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errcode_for_file_access(),
- errmsg("could not open statistics file \"%s\": %m",
- statfile)));
- return dbhash;
- }
-
- /*
- * Verify it's of the expected format.
- */
- if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
- format_id != PGSTAT_FILE_FORMAT_ID)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- goto done;
- }
-
- /*
- * Read global stats struct
- */
- if (fread(&globalStats, 1, sizeof(globalStats), fpin) != sizeof(globalStats))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- memset(&globalStats, 0, sizeof(globalStats));
- goto done;
- }
-
- /*
- * In the collector, disregard the timestamp we read from the permanent
- * stats file; we should be willing to write a temp stats file immediately
- * upon the first request from any backend.  This only matters if the old
- * file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not
- * an unusual scenario.
- */
- if (pgStatRunningInCollector)
- globalStats.stats_timestamp = 0;
-
- /*
- * Read archiver stats struct
- */
- if (fread(&archiverStats, 1, sizeof(archiverStats), fpin) != sizeof(archiverStats))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- memset(&archiverStats, 0, sizeof(archiverStats));
- goto done;
- }
-
- /*
- * We found an existing collector stats file. Read it and put all the
- * hashtable entries into place.
- */
- for (;;)
- {
- switch (fgetc(fpin))
- {
- /*
- * 'D' A PgStat_StatDBEntry struct describing a database
- * follows.
- */
- case 'D':
- if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
-  fpin) != offsetof(PgStat_StatDBEntry, tables))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- /*
- * Add to the DB hash
- */
- dbentry = (PgStat_StatDBEntry *) hash_search(dbhash,
- (void *) &dbbuf.databaseid,
- HASH_ENTER,
- &found);
- if (found)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- memcpy(dbentry, &dbbuf, sizeof(PgStat_StatDBEntry));
- dbentry->tables = NULL;
- dbentry->functions = NULL;
-
- /*
- * In the collector, disregard the timestamp we read from the
- * permanent stats file; we should be willing to write a temp
- * stats file immediately upon the first request from any
- * backend.
- */
- if (pgStatRunningInCollector)
- dbentry->stats_timestamp = 0;
-
- /*
- * Don't create tables/functions hashtables for uninteresting
- * databases.
- */
- if (onlydb != InvalidOid)
- {
- if (dbbuf.databaseid != onlydb &&
- dbbuf.databaseid != InvalidOid)
- break;
- }
-
- memset(&hash_ctl, 0, sizeof(hash_ctl));
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatTabEntry);
- hash_ctl.hcxt = pgStatLocalContext;
- dbentry->tables = hash_create("Per-database table",
-  PGSTAT_TAB_HASH_SIZE,
-  &hash_ctl,
-  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
- hash_ctl.keysize = sizeof(Oid);
- hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry);
- hash_ctl.hcxt = pgStatLocalContext;
- dbentry->functions = hash_create("Per-database function",
- PGSTAT_FUNCTION_HASH_SIZE,
- &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
- /*
- * If requested, read the data from the database-specific
- * file.  Otherwise we just leave the hashtables empty.
- */
- if (deep)
- pgstat_read_db_statsfile(dbentry->databaseid,
- dbentry->tables,
- dbentry->functions,
- permanent);
-
- break;
-
- case 'E':
- goto done;
-
- default:
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
- }
-
-done:
- FreeFile(fpin);
-
- /* If requested to read the permanent file, also get rid of it. */
- if (permanent)
- {
- elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
- unlink(statfile);
- }
-
- return dbhash;
-}
-
-
-/* ----------
- * pgstat_read_db_statsfile() -
- *
- * Reads in the existing statistics collector file for the given database,
- * filling the passed-in tables and functions hash tables.
- *
- * As in pgstat_read_statsfiles, if the permanent file is requested, it is
- * removed after reading.
- *
- * Note: this code has the ability to skip storing per-table or per-function
- * data, if NULL is passed for the corresponding hashtable.  That's not used
- * at the moment though.
- * ----------
- */
-static void
-pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
- bool permanent)
-{
- PgStat_StatTabEntry *tabentry;
- PgStat_StatTabEntry tabbuf;
- PgStat_StatFuncEntry funcbuf;
- PgStat_StatFuncEntry *funcentry;
- FILE   *fpin;
- int32 format_id;
- bool found;
- char statfile[MAXPGPATH];
-
- get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
-
- /*
- * Try to open the stats file. If it doesn't exist, the backends simply
- * return zero for anything and the collector simply starts from scratch
- * with empty counters.
- *
- * ENOENT is a possibility if the stats collector is not running or has
- * not yet written the stats file the first time.  Any other failure
- * condition is suspicious.
- */
- if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
- {
- if (errno != ENOENT)
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errcode_for_file_access(),
- errmsg("could not open statistics file \"%s\": %m",
- statfile)));
- return;
- }
-
- /*
- * Verify it's of the expected format.
- */
- if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
- format_id != PGSTAT_FILE_FORMAT_ID)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- goto done;
- }
-
- /*
- * We found an existing collector stats file. Read it and put all the
- * hashtable entries into place.
- */
- for (;;)
- {
- switch (fgetc(fpin))
- {
- /*
- * 'T' A PgStat_StatTabEntry follows.
- */
- case 'T':
- if (fread(&tabbuf, 1, sizeof(PgStat_StatTabEntry),
-  fpin) != sizeof(PgStat_StatTabEntry))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- /*
- * Skip if table data not wanted.
- */
- if (tabhash == NULL)
- break;
-
- tabentry = (PgStat_StatTabEntry *) hash_search(tabhash,
-   (void *) &tabbuf.tableid,
-   HASH_ENTER, &found);
-
- if (found)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- memcpy(tabentry, &tabbuf, sizeof(tabbuf));
- break;
-
- /*
- * 'F' A PgStat_StatFuncEntry follows.
- */
- case 'F':
- if (fread(&funcbuf, 1, sizeof(PgStat_StatFuncEntry),
-  fpin) != sizeof(PgStat_StatFuncEntry))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- /*
- * Skip if function data not wanted.
- */
- if (funchash == NULL)
- break;
-
- funcentry = (PgStat_StatFuncEntry *) hash_search(funchash,
- (void *) &funcbuf.functionid,
- HASH_ENTER, &found);
-
- if (found)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- memcpy(funcentry, &funcbuf, sizeof(funcbuf));
- break;
-
- /*
- * 'E' The EOF marker of a complete stats file.
- */
- case 'E':
- goto done;
-
- default:
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
- }
-
-done:
- FreeFile(fpin);
-
- if (permanent)
- {
- elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
- unlink(statfile);
- }
-}
-
-/* ----------
- * pgstat_read_db_statsfile_timestamp() -
- *
- * Attempt to determine the timestamp of the last db statfile write.
- * Returns true if successful; the timestamp is stored in *ts.
- *
- * This needs to be careful about handling databases for which no stats file
- * exists, such as databases without a stat entry or those not yet written:
- *
- * - if there's a database entry in the global file, return the corresponding
- * stats_timestamp value.
- *
- * - if there's no db stat entry (e.g. for a new or inactive database),
- * there's no stats_timestamp value, but also nothing to write so we return
- * the timestamp of the global statfile.
- * ----------
- */
-static bool
-pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
-   TimestampTz *ts)
-{
- PgStat_StatDBEntry dbentry;
- PgStat_GlobalStats myGlobalStats;
- PgStat_ArchiverStats myArchiverStats;
- FILE   *fpin;
- int32 format_id;
- const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
-
- /*
- * Try to open the stats file.  As above, anything but ENOENT is worthy of
- * complaining about.
- */
- if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
- {
- if (errno != ENOENT)
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errcode_for_file_access(),
- errmsg("could not open statistics file \"%s\": %m",
- statfile)));
- return false;
- }
-
- /*
- * Verify it's of the expected format.
- */
- if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
- format_id != PGSTAT_FILE_FORMAT_ID)
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- FreeFile(fpin);
- return false;
- }
-
- /*
- * Read global stats struct
- */
- if (fread(&myGlobalStats, 1, sizeof(myGlobalStats),
-  fpin) != sizeof(myGlobalStats))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- FreeFile(fpin);
- return false;
- }
-
- /*
- * Read archiver stats struct
- */
- if (fread(&myArchiverStats, 1, sizeof(myArchiverStats),
-  fpin) != sizeof(myArchiverStats))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"", statfile)));
- FreeFile(fpin);
- return false;
- }
-
- /* By default, we're going to return the timestamp of the global file. */
- *ts = myGlobalStats.stats_timestamp;
-
- /*
- * We found an existing collector stats file.  Read it and look for a
- * record for the requested database.  If found, use its timestamp.
- */
- for (;;)
- {
- switch (fgetc(fpin))
- {
- /*
- * 'D' A PgStat_StatDBEntry struct describing a database
- * follows.
- */
- case 'D':
- if (fread(&dbentry, 1, offsetof(PgStat_StatDBEntry, tables),
-  fpin) != offsetof(PgStat_StatDBEntry, tables))
- {
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
-
- /*
- * If this is the DB we're looking for, save its timestamp and
- * we're done.
- */
- if (dbentry.databaseid == databaseid)
- {
- *ts = dbentry.stats_timestamp;
- goto done;
- }
-
- break;
-
- case 'E':
- goto done;
-
- default:
- ereport(pgStatRunningInCollector ? LOG : WARNING,
- (errmsg("corrupted statistics file \"%s\"",
- statfile)));
- goto done;
- }
- }
-
-done:
- FreeFile(fpin);
- return true;
-}
-
-/*
- * If not already done, read the statistics collector stats file into
- * some hash tables.  The results will be kept until pgstat_clear_snapshot()
- * is called (typically, at end of transaction).
- */
-static void
-backend_read_statsfile(void)
-{
- TimestampTz min_ts = 0;
- TimestampTz ref_ts = 0;
- Oid inquiry_db;
- int count;
-
- /* already read it? */
- if (pgStatDBHash)
- return;
- Assert(!pgStatRunningInCollector);
-
- /*
- * In a normal backend, we check staleness of the data for our own DB, and
- * so we send MyDatabaseId in inquiry messages.  In the autovac launcher,
- * check staleness of the shared-catalog data, and send InvalidOid in
- * inquiry messages so as not to force writing unnecessary data.
- */
- if (IsAutoVacuumLauncherProcess())
- inquiry_db = InvalidOid;
- else
- inquiry_db = MyDatabaseId;
-
- /*
- * Loop until fresh enough stats file is available or we ran out of time.
- * The stats inquiry message is sent repeatedly in case collector drops
- * it; but not every single time, as that just swamps the collector.
- */
- for (count = 0; count < PGSTAT_POLL_LOOP_COUNT; count++)
- {
- bool ok;
- TimestampTz file_ts = 0;
- TimestampTz cur_ts;
-
- CHECK_FOR_INTERRUPTS();
-
- ok = pgstat_read_db_statsfile_timestamp(inquiry_db, false, &file_ts);
-
- cur_ts = GetCurrentTimestamp();
- /* Calculate min acceptable timestamp, if we didn't already */
- if (count == 0 || cur_ts < ref_ts)
- {
- /*
- * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL
- * msec before now.  This indirectly ensures that the collector
- * needn't write the file more often than PGSTAT_STAT_INTERVAL. In
- * an autovacuum worker, however, we want a lower delay to avoid
- * using stale data, so we use PGSTAT_RETRY_DELAY (since the
- * number of workers is low, this shouldn't be a problem).
- *
- * We don't recompute min_ts after sleeping, except in the
- * unlikely case that cur_ts went backwards.  So we might end up
- * accepting a file a bit older than PGSTAT_STAT_INTERVAL.  In
- * practice that shouldn't happen, though, as long as the sleep
- * time is less than PGSTAT_STAT_INTERVAL; and we don't want to
- * tell the collector that our cutoff time is less than what we'd
- * actually accept.
- */
- ref_ts = cur_ts;
- if (IsAutoVacuumWorkerProcess())
- min_ts = TimestampTzPlusMilliseconds(ref_ts,
- -PGSTAT_RETRY_DELAY);
- else
- min_ts = TimestampTzPlusMilliseconds(ref_ts,
- -PGSTAT_STAT_INTERVAL);
- }
-
- /*
- * If the file timestamp is actually newer than cur_ts, we must have
- * had a clock glitch (system time went backwards) or there is clock
- * skew between our processor and the stats collector's processor.
- * Accept the file, but send an inquiry message anyway to make
- * pgstat_recv_inquiry do a sanity check on the collector's time.
- */
- if (ok && file_ts > cur_ts)
- {
- /*
- * A small amount of clock skew between processors isn't terribly
- * surprising, but a large difference is worth logging.  We
- * arbitrarily define "large" as 1000 msec.
- */
- if (file_ts >= TimestampTzPlusMilliseconds(cur_ts, 1000))
- {
- char   *filetime;
- char   *mytime;
-
- /* Copy because timestamptz_to_str returns a static buffer */
- filetime = pstrdup(timestamptz_to_str(file_ts));
- mytime = pstrdup(timestamptz_to_str(cur_ts));
- elog(LOG, "stats collector's time %s is later than backend local time %s",
- filetime, mytime);
- pfree(filetime);
- pfree(mytime);
- }
-
- pgstat_send_inquiry(cur_ts, min_ts, inquiry_db);
- break;
- }
-
- /* Normal acceptance case: file is not older than cutoff time */
- if (ok && file_ts >= min_ts)
- break;
-
- /* Not there or too old, so kick the collector and wait a bit */
- if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
- pgstat_send_inquiry(cur_ts, min_ts, inquiry_db);
-
- pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
- }
-
- if (count >= PGSTAT_POLL_LOOP_COUNT)
- ereport(LOG,
- (errmsg("using stale statistics instead of current ones "
- "because stats collector is not responding")));
-
- /*
- * Autovacuum launcher wants stats about all databases, but a shallow read
- * is sufficient.  Regular backends want a deep read for just the tables
- * they can see (MyDatabaseId + shared catalogs).
- */
- if (IsAutoVacuumLauncherProcess())
- pgStatDBHash = pgstat_read_statsfiles(InvalidOid, false, false);
- else
- pgStatDBHash = pgstat_read_statsfiles(MyDatabaseId, false, true);
-}
-
-
-/* ----------
- * pgstat_setup_memcxt() -
- *
- * Create pgStatLocalContext, if not already done.
- * ----------
- */
-static void
-pgstat_setup_memcxt(void)
-{
- if (!pgStatLocalContext)
- pgStatLocalContext = AllocSetContextCreate(TopMemoryContext,
-   "Statistics snapshot",
-   ALLOCSET_SMALL_SIZES);
-}
-
-
-/* ----------
- * pgstat_clear_snapshot() -
- *
- * Discard any data collected in the current transaction.  Any subsequent
- * request will cause new snapshots to be read.
- *
- * This is also invoked during transaction commit or abort to discard
- * the no-longer-wanted snapshot.
- * ----------
- */
-void
-pgstat_clear_snapshot(void)
-{
- /* Release memory, if any was allocated */
- if (pgStatLocalContext)
- MemoryContextDelete(pgStatLocalContext);
-
- /* Reset variables */
- pgStatLocalContext = NULL;
- pgStatDBHash = NULL;
- localBackendStatusTable = NULL;
- localNumBackends = 0;
-}
-
-
-/* ----------
- * pgstat_recv_inquiry() -
- *
- * Process stat inquiry requests.
- * ----------
- */
-static void
-pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- elog(DEBUG2, "received inquiry for database %u", msg->databaseid);
-
- /*
- * If there's already a write request for this DB, there's nothing to do.
- *
- * Note that if a request is found, we return early and skip the below
- * check for clock skew.  This is okay, since the only way for a DB
- * request to be present in the list is that we have been here since the
- * last write round.  It seems sufficient to check for clock skew once per
- * write round.
- */
- if (list_member_oid(pending_write_requests, msg->databaseid))
- return;
-
- /*
- * Check to see if we last wrote this database at a time >= the requested
- * cutoff time.  If so, this is a stale request that was generated before
- * we updated the DB file, and we don't need to do so again.
- *
- * If the requestor's local clock time is older than stats_timestamp, we
- * should suspect a clock glitch, ie system time going backwards; though
- * the more likely explanation is just delayed message receipt.  It is
- * worth expending a GetCurrentTimestamp call to be sure, since a large
- * retreat in the system clock reading could otherwise cause us to neglect
- * to update the stats file for a long time.
- */
- dbentry = pgstat_get_db_entry(msg->databaseid, false);
- if (dbentry == NULL)
- {
- /*
- * We have no data for this DB.  Enter a write request anyway so that
- * the global stats will get updated.  This is needed to prevent
- * backend_read_statsfile from waiting for data that we cannot supply,
- * in the case of a new DB that nobody has yet reported any stats for.
- * See the behavior of pgstat_read_db_statsfile_timestamp.
- */
- }
- else if (msg->clock_time < dbentry->stats_timestamp)
- {
- TimestampTz cur_ts = GetCurrentTimestamp();
-
- if (cur_ts < dbentry->stats_timestamp)
- {
- /*
- * Sure enough, time went backwards.  Force a new stats file write
- * to get back in sync; but first, log a complaint.
- */
- char   *writetime;
- char   *mytime;
-
- /* Copy because timestamptz_to_str returns a static buffer */
- writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
- mytime = pstrdup(timestamptz_to_str(cur_ts));
- elog(LOG,
- "stats_timestamp %s is later than collector's time %s for database %u",
- writetime, mytime, dbentry->databaseid);
- pfree(writetime);
- pfree(mytime);
- }
- else
- {
- /*
- * Nope, it's just an old request.  Assuming msg's clock_time is
- * >= its cutoff_time, it must be stale, so we can ignore it.
- */
- return;
- }
- }
- else if (msg->cutoff_time <= dbentry->stats_timestamp)
- {
- /* Stale request, ignore it */
- return;
- }
-
- /*
- * We need to write this DB, so create a request.
- */
- pending_write_requests = lappend_oid(pending_write_requests,
- msg->databaseid);
-}
-
-
-/* ----------
- * pgstat_recv_tabstat() -
- *
- * Count what the backend has done.
- * ----------
- */
-static void
-pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
- int i;
- bool found;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- /*
- * Update database-wide stats.
- */
- dbentry->n_xact_commit += (PgStat_Counter) (msg->m_xact_commit);
- dbentry->n_xact_rollback += (PgStat_Counter) (msg->m_xact_rollback);
- dbentry->n_block_read_time += msg->m_block_read_time;
- dbentry->n_block_write_time += msg->m_block_write_time;
-
- /*
- * Process all table entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++)
- {
- PgStat_TableEntry *tabmsg = &(msg->m_entry[i]);
-
- tabentry = (PgStat_StatTabEntry *) hash_search(dbentry->tables,
-   (void *) &(tabmsg->t_id),
-   HASH_ENTER, &found);
-
- if (!found)
- {
- /*
- * If it's a new table entry, initialize counters to the values we
- * just got.
- */
- tabentry->numscans = tabmsg->t_counts.t_numscans;
- tabentry->tuples_returned = tabmsg->t_counts.t_tuples_returned;
- tabentry->tuples_fetched = tabmsg->t_counts.t_tuples_fetched;
- tabentry->tuples_inserted = tabmsg->t_counts.t_tuples_inserted;
- tabentry->tuples_updated = tabmsg->t_counts.t_tuples_updated;
- tabentry->tuples_deleted = tabmsg->t_counts.t_tuples_deleted;
- tabentry->tuples_hot_updated = tabmsg->t_counts.t_tuples_hot_updated;
- tabentry->n_live_tuples = tabmsg->t_counts.t_delta_live_tuples;
- tabentry->n_dead_tuples = tabmsg->t_counts.t_delta_dead_tuples;
- tabentry->changes_since_analyze = tabmsg->t_counts.t_changed_tuples;
- tabentry->blocks_fetched = tabmsg->t_counts.t_blocks_fetched;
- tabentry->blocks_hit = tabmsg->t_counts.t_blocks_hit;
-
- tabentry->vacuum_timestamp = 0;
- tabentry->vacuum_count = 0;
- tabentry->autovac_vacuum_timestamp = 0;
- tabentry->autovac_vacuum_count = 0;
- tabentry->analyze_timestamp = 0;
- tabentry->analyze_count = 0;
- tabentry->autovac_analyze_timestamp = 0;
- tabentry->autovac_analyze_count = 0;
- }
- else
- {
- /*
- * Otherwise add the values to the existing entry.
- */
- tabentry->numscans += tabmsg->t_counts.t_numscans;
- tabentry->tuples_returned += tabmsg->t_counts.t_tuples_returned;
- tabentry->tuples_fetched += tabmsg->t_counts.t_tuples_fetched;
- tabentry->tuples_inserted += tabmsg->t_counts.t_tuples_inserted;
- tabentry->tuples_updated += tabmsg->t_counts.t_tuples_updated;
- tabentry->tuples_deleted += tabmsg->t_counts.t_tuples_deleted;
- tabentry->tuples_hot_updated += tabmsg->t_counts.t_tuples_hot_updated;
- /* If table was truncated, first reset the live/dead counters */
- if (tabmsg->t_counts.t_truncated)
- {
- tabentry->n_live_tuples = 0;
- tabentry->n_dead_tuples = 0;
- }
- tabentry->n_live_tuples += tabmsg->t_counts.t_delta_live_tuples;
- tabentry->n_dead_tuples += tabmsg->t_counts.t_delta_dead_tuples;
- tabentry->changes_since_analyze += tabmsg->t_counts.t_changed_tuples;
- tabentry->blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
- tabentry->blocks_hit += tabmsg->t_counts.t_blocks_hit;
- }
-
- /* Clamp n_live_tuples in case of negative delta_live_tuples */
- tabentry->n_live_tuples = Max(tabentry->n_live_tuples, 0);
- /* Likewise for n_dead_tuples */
- tabentry->n_dead_tuples = Max(tabentry->n_dead_tuples, 0);
-
- /*
- * Add per-table stats to the per-database entry, too.
- */
- dbentry->n_tuples_returned += tabmsg->t_counts.t_tuples_returned;
- dbentry->n_tuples_fetched += tabmsg->t_counts.t_tuples_fetched;
- dbentry->n_tuples_inserted += tabmsg->t_counts.t_tuples_inserted;
- dbentry->n_tuples_updated += tabmsg->t_counts.t_tuples_updated;
- dbentry->n_tuples_deleted += tabmsg->t_counts.t_tuples_deleted;
- dbentry->n_blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
- dbentry->n_blocks_hit += tabmsg->t_counts.t_blocks_hit;
- }
-}
-
-
-/* ----------
- * pgstat_recv_tabpurge() -
- *
- * Arrange for dead table removal.
- * ----------
- */
-static void
-pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- int i;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- /*
- * No need to purge if we don't even know the database.
- */
- if (!dbentry || !dbentry->tables)
- return;
-
- /*
- * Process all table entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++)
- {
- /* Remove from hashtable if present; we don't care if it's not. */
- (void) hash_search(dbentry->tables,
-   (void *) &(msg->m_tableid[i]),
-   HASH_REMOVE, NULL);
- }
-}
-
-
-/* ----------
- * pgstat_recv_dropdb() -
- *
- * Arrange for dead database removal
- * ----------
- */
-static void
-pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
-{
- Oid dbid = msg->m_databaseid;
- PgStat_StatDBEntry *dbentry;
-
- /*
- * Lookup the database in the hashtable.
- */
- dbentry = pgstat_get_db_entry(dbid, false);
-
- /*
- * If found, remove it (along with the db statfile).
- */
- if (dbentry)
- {
- char statfile[MAXPGPATH];
-
- get_dbstat_filename(false, false, dbid, statfile, MAXPGPATH);
-
- elog(DEBUG2, "removing stats file \"%s\"", statfile);
- unlink(statfile);
-
- if (dbentry->tables != NULL)
- hash_destroy(dbentry->tables);
- if (dbentry->functions != NULL)
- hash_destroy(dbentry->functions);
-
- if (hash_search(pgStatDBHash,
- (void *) &dbid,
- HASH_REMOVE, NULL) == NULL)
- ereport(ERROR,
- (errmsg("database hash table corrupted during cleanup --- abort")));
- }
-}
-
-
-/* ----------
- * pgstat_recv_resetcounter() -
- *
- * Reset the statistics for the specified database.
- * ----------
- */
-static void
-pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- /*
- * Lookup the database in the hashtable.  Nothing to do if not there.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- if (!dbentry)
- return;
-
- /*
- * We simply throw away all the database's table entries by recreating a
- * new hash table for them.
- */
- if (dbentry->tables != NULL)
- hash_destroy(dbentry->tables);
- if (dbentry->functions != NULL)
- hash_destroy(dbentry->functions);
-
- dbentry->tables = NULL;
- dbentry->functions = NULL;
-
- /*
- * Reset database-level stats, too.  This creates empty hash tables for
- * tables and functions.
- */
- reset_dbentry_counters(dbentry);
-}
-
-/* ----------
- * pgstat_recv_resetshared() -
- *
- * Reset some shared statistics of the cluster.
- * ----------
- */
-static void
-pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
-{
- if (msg->m_resettarget == RESET_BGWRITER)
- {
- /* Reset the global background writer statistics for the cluster. */
- memset(&globalStats, 0, sizeof(globalStats));
- globalStats.stat_reset_timestamp = GetCurrentTimestamp();
- }
- else if (msg->m_resettarget == RESET_ARCHIVER)
- {
- /* Reset the archiver statistics for the cluster. */
- memset(&archiverStats, 0, sizeof(archiverStats));
- archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
- }
-
- /*
- * Presumably the sender of this message validated the target, don't
- * complain here if it's not valid
- */
-}
-
-/* ----------
- * pgstat_recv_resetsinglecounter() -
- *
- * Reset a statistics for a single object
- * ----------
- */
-static void
-pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- if (!dbentry)
- return;
-
- /* Set the reset timestamp for the whole database */
- dbentry->stat_reset_timestamp = GetCurrentTimestamp();
-
- /* Remove object if it exists, ignore it if not */
- if (msg->m_resettype == RESET_TABLE)
- (void) hash_search(dbentry->tables, (void *) &(msg->m_objectid),
-   HASH_REMOVE, NULL);
- else if (msg->m_resettype == RESET_FUNCTION)
- (void) hash_search(dbentry->functions, (void *) &(msg->m_objectid),
-   HASH_REMOVE, NULL);
-}
-
-/* ----------
- * pgstat_recv_autovac() -
- *
- * Process an autovacuum signalling message.
- * ----------
- */
-static void
-pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- /*
- * Store the last autovacuum time in the database's hashtable entry.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- dbentry->last_autovac_time = msg->m_start_time;
-}
-
-/* ----------
- * pgstat_recv_vacuum() -
- *
- * Process a VACUUM message.
- * ----------
- */
-static void
-pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
-
- /*
- * Store the data in the table's hashtable entry.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
-
- tabentry->n_live_tuples = msg->m_live_tuples;
- tabentry->n_dead_tuples = msg->m_dead_tuples;
-
- if (msg->m_autovacuum)
- {
- tabentry->autovac_vacuum_timestamp = msg->m_vacuumtime;
- tabentry->autovac_vacuum_count++;
- }
- else
- {
- tabentry->vacuum_timestamp = msg->m_vacuumtime;
- tabentry->vacuum_count++;
- }
-}
-
-/* ----------
- * pgstat_recv_analyze() -
- *
- * Process an ANALYZE message.
- * ----------
- */
-static void
-pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- PgStat_StatTabEntry *tabentry;
-
- /*
- * Store the data in the table's hashtable entry.
- */
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
-
- tabentry->n_live_tuples = msg->m_live_tuples;
- tabentry->n_dead_tuples = msg->m_dead_tuples;
-
- /*
- * If commanded, reset changes_since_analyze to zero.  This forgets any
- * changes that were committed while the ANALYZE was in progress, but we
- * have no good way to estimate how many of those there were.
- */
- if (msg->m_resetcounter)
- tabentry->changes_since_analyze = 0;
-
- if (msg->m_autovacuum)
- {
- tabentry->autovac_analyze_timestamp = msg->m_analyzetime;
- tabentry->autovac_analyze_count++;
- }
- else
- {
- tabentry->analyze_timestamp = msg->m_analyzetime;
- tabentry->analyze_count++;
- }
-}
-
-
-/* ----------
- * pgstat_recv_archiver() -
- *
- * Process a ARCHIVER message.
- * ----------
- */
-static void
-pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len)
-{
- if (msg->m_failed)
- {
- /* Failed archival attempt */
- ++archiverStats.failed_count;
- memcpy(archiverStats.last_failed_wal, msg->m_xlog,
-   sizeof(archiverStats.last_failed_wal));
- archiverStats.last_failed_timestamp = msg->m_timestamp;
- }
- else
- {
- /* Successful archival operation */
- ++archiverStats.archived_count;
- memcpy(archiverStats.last_archived_wal, msg->m_xlog,
-   sizeof(archiverStats.last_archived_wal));
- archiverStats.last_archived_timestamp = msg->m_timestamp;
- }
-}
-
-/* ----------
- * pgstat_recv_bgwriter() -
- *
- * Process a BGWRITER message.
- * ----------
- */
-static void
-pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
-{
- globalStats.timed_checkpoints += msg->m_timed_checkpoints;
- globalStats.requested_checkpoints += msg->m_requested_checkpoints;
- globalStats.checkpoint_write_time += msg->m_checkpoint_write_time;
- globalStats.checkpoint_sync_time += msg->m_checkpoint_sync_time;
- globalStats.buf_written_checkpoints += msg->m_buf_written_checkpoints;
- globalStats.buf_written_clean += msg->m_buf_written_clean;
- globalStats.maxwritten_clean += msg->m_maxwritten_clean;
- globalStats.buf_written_backend += msg->m_buf_written_backend;
- globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
- globalStats.buf_alloc += msg->m_buf_alloc;
-}
-
-/* ----------
- * pgstat_recv_recoveryconflict() -
- *
- * Process a RECOVERYCONFLICT message.
- * ----------
- */
-static void
-pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- switch (msg->m_reason)
- {
- case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-
- /*
- * Since we drop the information about the database as soon as it
- * replicates, there is no point in counting these conflicts.
- */
- break;
- case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
- dbentry->n_conflict_tablespace++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_LOCK:
- dbentry->n_conflict_lock++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
- dbentry->n_conflict_snapshot++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
- dbentry->n_conflict_bufferpin++;
- break;
- case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
- dbentry->n_conflict_startup_deadlock++;
- break;
- }
-}
-
-/* ----------
- * pgstat_recv_deadlock() -
- *
- * Process a DEADLOCK message.
- * ----------
- */
-static void
-pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- dbentry->n_deadlocks++;
-}
-
-/* ----------
- * pgstat_recv_tempfile() -
- *
- * Process a TEMPFILE message.
- * ----------
- */
-static void
-pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- dbentry->n_temp_bytes += msg->m_filesize;
- dbentry->n_temp_files += 1;
-}
-
-/* ----------
- * pgstat_recv_funcstat() -
- *
- * Count what the backend has done.
- * ----------
- */
-static void
-pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len)
-{
- PgStat_FunctionEntry *funcmsg = &(msg->m_entry[0]);
- PgStat_StatDBEntry *dbentry;
- PgStat_StatFuncEntry *funcentry;
- int i;
- bool found;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
- /*
- * Process all function entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++, funcmsg++)
- {
- funcentry = (PgStat_StatFuncEntry *) hash_search(dbentry->functions,
- (void *) &(funcmsg->f_id),
- HASH_ENTER, &found);
-
- if (!found)
- {
- /*
- * If it's a new function entry, initialize counters to the values
- * we just got.
- */
- funcentry->f_numcalls = funcmsg->f_numcalls;
- funcentry->f_total_time = funcmsg->f_total_time;
- funcentry->f_self_time = funcmsg->f_self_time;
- }
- else
- {
- /*
- * Otherwise add the values to the existing entry.
- */
- funcentry->f_numcalls += funcmsg->f_numcalls;
- funcentry->f_total_time += funcmsg->f_total_time;
- funcentry->f_self_time += funcmsg->f_self_time;
- }
- }
-}
-
-/* ----------
- * pgstat_recv_funcpurge() -
- *
- * Arrange for dead function removal.
- * ----------
- */
-static void
-pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
-{
- PgStat_StatDBEntry *dbentry;
- int i;
-
- dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
-
- /*
- * No need to purge if we don't even know the database.
- */
- if (!dbentry || !dbentry->functions)
- return;
-
- /*
- * Process all function entries in the message.
- */
- for (i = 0; i < msg->m_nentries; i++)
- {
- /* Remove from hashtable if present; we don't care if it's not. */
- (void) hash_search(dbentry->functions,
-   (void *) &(msg->m_functionid[i]),
-   HASH_REMOVE, NULL);
- }
-}
-
-/* ----------
- * pgstat_write_statsfile_needed() -
- *
- * Do we need to write out any stats files?
- * ----------
- */
-static bool
-pgstat_write_statsfile_needed(void)
-{
- if (pending_write_requests != NIL)
- return true;
-
- /* Everything was written recently */
- return false;
-}
-
-/* ----------
- * pgstat_db_requested() -
- *
- * Checks whether stats for a particular DB need to be written to a file.
- * ----------
- */
-static bool
-pgstat_db_requested(Oid databaseid)
-{
- /*
- * If any requests are outstanding at all, we should write the stats for
- * shared catalogs (the "database" with OID 0).  This ensures that
- * backends will see up-to-date stats for shared catalogs, even though
- * they send inquiry messages mentioning only their own DB.
- */
- if (databaseid == InvalidOid && pending_write_requests != NIL)
- return true;
-
- /* Search to see if there's an open request to write this database. */
- if (list_member_oid(pending_write_requests, databaseid))
- return true;
-
- return false;
-}
-
-/*
- * Convert a potentially unsafely truncated activity string (see
- * PgBackendStatus.st_activity_raw's documentation) into a correctly truncated
- * one.
- *
- * The returned string is allocated in the caller's memory context and may be
- * freed.
- */
-char *
-pgstat_clip_activity(const char *raw_activity)
-{
- char   *activity;
- int rawlen;
- int cliplen;
-
- /*
- * Some callers, like pgstat_get_backend_current_activity(), do not
- * guarantee that the buffer isn't concurrently modified. We try to take
- * care that the buffer is always terminated by a NUL byte regardless, but
- * let's still be paranoid about the string's length. In those cases the
- * underlying buffer is guaranteed to be pgstat_track_activity_query_size
- * large.
- */
- activity = pnstrdup(raw_activity, pgstat_track_activity_query_size - 1);
-
- /* now double-guaranteed to be NUL terminated */
- rawlen = strlen(activity);
-
- /*
- * All supported server-encodings make it possible to determine the length
- * of a multi-byte character from its first byte (this is not the case for
- * client encodings, see GB18030). As st_activity is always stored using
- * server encoding, this allows us to perform multi-byte aware truncation,
- * even if the string earlier was truncated in the middle of a multi-byte
- * character.
- */
- cliplen = pg_mbcliplen(activity, rawlen,
-   pgstat_track_activity_query_size - 1);
-
- activity[cliplen] = '\0';
-
- return activity;
-}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 65eab02b3e..d3ec828657 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -95,6 +95,7 @@
 
 #include "access/transam.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "bootstrap/bootstrap.h"
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
@@ -255,7 +256,6 @@ static pid_t StartupPID = 0,
  WalReceiverPID = 0,
  AutoVacPID = 0,
  PgArchPID = 0,
- PgStatPID = 0,
  SysLoggerPID = 0;
 
 /* Startup process's status */
@@ -502,7 +502,6 @@ typedef struct
  PGPROC   *AuxiliaryProcs;
  PGPROC   *PreparedXactProcs;
  PMSignalData *PMSignalState;
- InheritableSocket pgStatSock;
  pid_t PostmasterPid;
  TimestampTz PgStartTime;
  TimestampTz PgReloadTime;
@@ -1298,12 +1297,6 @@ PostmasterMain(int argc, char *argv[])
 
  whereToSendOutput = DestNone;
 
- /*
- * Initialize stats collection subsystem (this does NOT start the
- * collector process!)
- */
- pgstat_init();
-
  /*
  * Initialize the autovacuum subsystem (again, no process start yet)
  */
@@ -1752,11 +1745,6 @@ ServerLoop(void)
  start_autovac_launcher = false; /* signal processed */
  }
 
- /* If we have lost the stats collector, try to start a new one */
- if (PgStatPID == 0 &&
- (pmState == PM_RUN || pmState == PM_HOT_STANDBY))
- PgStatPID = pgstat_start();
-
  /* If we have lost the archiver, try to start a new one. */
  if (PgArchPID == 0 && PgArchStartupAllowed())
  PgArchPID = StartArchiver();
@@ -2591,8 +2579,6 @@ SIGHUP_handler(SIGNAL_ARGS)
  signal_child(PgArchPID, SIGHUP);
  if (SysLoggerPID != 0)
  signal_child(SysLoggerPID, SIGHUP);
- if (PgStatPID != 0)
- signal_child(PgStatPID, SIGHUP);
 
  /* Reload authentication config files too */
  if (!load_hba())
@@ -2923,8 +2909,6 @@ reaper(SIGNAL_ARGS)
  AutoVacPID = StartAutoVacLauncher();
  if (PgArchStartupAllowed() && PgArchPID == 0)
  PgArchPID = StartArchiver();
- if (PgStatPID == 0)
- PgStatPID = pgstat_start();
 
  /* workers may be scheduled to start now */
  maybe_start_bgworkers();
@@ -2991,13 +2975,6 @@ reaper(SIGNAL_ARGS)
  SignalChildren(SIGUSR2);
 
  pmState = PM_SHUTDOWN_2;
-
- /*
- * We can also shut down the stats collector now; there's
- * nothing left for it to do.
- */
- if (PgStatPID != 0)
- signal_child(PgStatPID, SIGQUIT);
  }
  else
  {
@@ -3072,22 +3049,6 @@ reaper(SIGNAL_ARGS)
  continue;
  }
 
- /*
- * Was it the statistics collector?  If so, just try to start a new
- * one; no need to force reset of the rest of the system.  (If fail,
- * we'll try again in future cycles of the main loop.)
- */
- if (pid == PgStatPID)
- {
- PgStatPID = 0;
- if (!EXIT_STATUS_0(exitstatus))
- LogChildExit(LOG, _("statistics collector process"),
- pid, exitstatus);
- if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
- PgStatPID = pgstat_start();
- continue;
- }
-
  /* Was it the system logger?  If so, try to start a new one */
  if (pid == SysLoggerPID)
  {
@@ -3546,22 +3507,6 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
  signal_child(PgArchPID, SIGQUIT);
  }
 
- /*
- * Force a power-cycle of the pgstat process too.  (This isn't absolutely
- * necessary, but it seems like a good idea for robustness, and it
- * simplifies the state-machine logic in the case where a shutdown request
- * arrives during crash processing.)
- */
- if (PgStatPID != 0 && take_action)
- {
- ereport(DEBUG2,
- (errmsg_internal("sending %s to process %d",
- "SIGQUIT",
- (int) PgStatPID)));
- signal_child(PgStatPID, SIGQUIT);
- allow_immediate_pgstat_restart();
- }
-
  /* We do NOT restart the syslogger */
 
  if (Shutdown != ImmediateShutdown)
@@ -3757,8 +3702,6 @@ PostmasterStateMachine(void)
  SignalChildren(SIGQUIT);
  if (PgArchPID != 0)
  signal_child(PgArchPID, SIGQUIT);
- if (PgStatPID != 0)
- signal_child(PgStatPID, SIGQUIT);
  }
  }
  }
@@ -3797,8 +3740,7 @@ PostmasterStateMachine(void)
  * normal state transition leading up to PM_WAIT_DEAD_END, or during
  * FatalError processing.
  */
- if (dlist_is_empty(&BackendList) &&
- PgArchPID == 0 && PgStatPID == 0)
+ if (dlist_is_empty(&BackendList) && PgArchPID == 0)
  {
  /* These other guys should be dead already */
  Assert(StartupPID == 0);
@@ -3999,8 +3941,6 @@ TerminateChildren(int signal)
  signal_child(AutoVacPID, signal);
  if (PgArchPID != 0)
  signal_child(PgArchPID, signal);
- if (PgStatPID != 0)
- signal_child(PgStatPID, signal);
 }
 
 /*
@@ -4973,18 +4913,6 @@ SubPostmasterMain(int argc, char *argv[])
 
  StartBackgroundWorker();
  }
- if (strcmp(argv[1], "--forkarch") == 0)
- {
- /* Do not want to attach to shared memory */
-
- PgArchiverMain(argc, argv); /* does not return */
- }
- if (strcmp(argv[1], "--forkcol") == 0)
- {
- /* Do not want to attach to shared memory */
-
- PgstatCollectorMain(argc, argv); /* does not return */
- }
  if (strcmp(argv[1], "--forklog") == 0)
  {
  /* Do not want to attach to shared memory */
@@ -5097,12 +5025,6 @@ sigusr1_handler(SIGNAL_ARGS)
  if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
  pmState == PM_RECOVERY && Shutdown == NoShutdown)
  {
- /*
- * Likewise, start other special children as needed.
- */
- Assert(PgStatPID == 0);
- PgStatPID = pgstat_start();
-
  ereport(LOG,
  (errmsg("database system is ready to accept read only connections")));
 
@@ -5972,7 +5894,6 @@ extern slock_t *ShmemLock;
 extern slock_t *ProcStructLock;
 extern PGPROC *AuxiliaryProcs;
 extern PMSignalData *PMSignalState;
-extern pgsocket pgStatSock;
 extern pg_time_t first_syslogger_file_time;
 
 #ifndef WIN32
@@ -6025,8 +5946,6 @@ save_backend_variables(BackendParameters *param, Port *port,
  param->AuxiliaryProcs = AuxiliaryProcs;
  param->PreparedXactProcs = PreparedXactProcs;
  param->PMSignalState = PMSignalState;
- if (!write_inheritable_socket(&param->pgStatSock, pgStatSock, childPid))
- return false;
 
  param->PostmasterPid = PostmasterPid;
  param->PgStartTime = PgStartTime;
@@ -6258,7 +6177,6 @@ restore_backend_variables(BackendParameters *param, Port *port)
  AuxiliaryProcs = param->AuxiliaryProcs;
  PreparedXactProcs = param->PreparedXactProcs;
  PMSignalState = param->PMSignalState;
- read_inheritable_socket(&pgStatSock, &param->pgStatSock);
 
  PostmasterPid = param->PostmasterPid;
  PgStartTime = param->PgStartTime;
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index d1ea46deb8..3accdf7bcf 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -31,11 +31,11 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "bestatus.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
-#include "pgstat.h"
 #include "pgtime.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/postmaster.h"
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a6fdba3f41..0de04159d5 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -45,9 +45,9 @@
 #include <unistd.h>
 
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/walwriter.h"
 #include "storage/bufmgr.h"
 #include "storage/condition_variable.h"
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index def6c03dd0..e30b2dbcf0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,6 +17,7 @@
 #include <time.h>
 
 #include "access/xlog_internal.h" /* for pg_start/stop_backup */
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "common/file_perm.h"
 #include "lib/stringinfo.h"
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 7027737e67..75a3208f74 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -22,11 +22,11 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "access/xlog.h"
+#include "bestatus.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2b0d889c3b..ab967d7d65 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -19,7 +19,7 @@
 
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
+#include "bestatus.h"
 
 #include "access/heapam.h"
 #include "access/htup.h"
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index ca51318dbb..a5685b8e7e 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -77,13 +77,12 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
-
+#include "bestatus.h"
 #include "catalog/indexing.h"
 #include "nodes/execnodes.h"
 
 #include "replication/origin.h"
 #include "replication/logical.h"
-#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index b79ce5db95..b90768be86 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -61,10 +61,10 @@
 #include "access/tuptoaster.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "catalog/catalog.h"
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
 #include "replication/slot.h"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 4053482420..a30f1e012e 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -126,7 +126,7 @@
 #include "access/transam.h"
 #include "access/xact.h"
 
-#include "pgstat.h"
+#include "bestatus.h"
 
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index d87cf8afe5..fafef0578a 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -86,26 +86,28 @@
 #include "postgres.h"
 
 #include "miscadmin.h"
-#include "pgstat.h"
 
 #include "access/heapam.h"
 #include "access/xact.h"
 
+#include "bestatus.h"
+
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_type.h"
 
 #include "commands/copy.h"
 
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 
 #include "replication/logicallauncher.h"
 #include "replication/logicalrelation.h"
 #include "replication/walreceiver.h"
 #include "replication/worker_internal.h"
 
-#include "utils/snapmgr.h"
 #include "storage/ipc.h"
 
+#include "utils/snapmgr.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -128,7 +130,7 @@ finish_sync_worker(void)
  if (IsTransactionState())
  {
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(true);
  }
 
  /* And flush all writes. */
@@ -144,6 +146,9 @@ finish_sync_worker(void)
  /* Find the main apply worker and signal it. */
  logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
 
+ /* clean up retained statistics */
+ pgstat_update_stat(true);
+
  /* Stop gracefully */
  proc_exit(0);
 }
@@ -525,7 +530,7 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
  if (started_tx)
  {
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
  }
 }
 
@@ -863,7 +868,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
    MyLogicalRepWorker->relstate,
    MyLogicalRepWorker->relstate_lsn);
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
 
  /*
  * We want to do the table data sync in a single transaction.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index de23ced9af..d8d7b35058 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -31,6 +31,8 @@
 #include "access/xact.h"
 #include "access/xlog_internal.h"
 
+#include "bestatus.h"
+
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_subscription.h"
@@ -42,17 +44,20 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 
+#include "funcapi.h"
+
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
 
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
 
 #include "nodes/makefuncs.h"
 
 #include "optimizer/planner.h"
 
 #include "parser/parse_relation.h"
-
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/walwriter.h"
@@ -493,7 +498,7 @@ apply_handle_commit(StringInfo s)
  replorigin_session_origin_timestamp = commit_data.committime;
 
  CommitTransactionCommand();
- pgstat_report_stat(false);
+ pgstat_update_stat(false);
 
  store_flush_position(commit_data.end_lsn);
  }
@@ -1327,6 +1332,8 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
  }
 
  send_feedback(last_received, requestReply, requestReply);
+
+ pgstat_update_stat(false);
  }
  }
 }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33b23b6b6d..c60e69302a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -41,9 +41,9 @@
 
 #include "access/transam.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "common/string.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
 #include "storage/proc.h"
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 6c160c13c6..02ec91d98e 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -75,8 +75,8 @@
 #include <unistd.h>
 
 #include "access/xact.h"
+#include "bestatus.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/syncrep.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2e90944ad5..bdca25499d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -50,6 +50,7 @@
 #include "access/timeline.h"
 #include "access/transam.h"
 #include "access/xlog_internal.h"
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "common/ip.h"
@@ -57,7 +58,6 @@
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/ipc.h"
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2d2eb23eb7..3de752bd4c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -56,6 +56,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 
+#include "bestatus.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
@@ -65,7 +66,6 @@
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
 #include "nodes/replnodes.h"
-#include "pgstat.h"
 #include "replication/basebackup.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
diff --git a/src/backend/statmon/Makefile b/src/backend/statmon/Makefile
new file mode 100644
index 0000000000..64a04878e3
--- /dev/null
+++ b/src/backend/statmon/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/statmon
+#
+# IDENTIFICATION
+#    src/backend/statmon/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/statmon
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = pgstat.o bestatus.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statmon/bestatus.c b/src/backend/statmon/bestatus.c
new file mode 100644
index 0000000000..2a251ae1b5
--- /dev/null
+++ b/src/backend/statmon/bestatus.c
@@ -0,0 +1,1779 @@
+/* ----------
+ * bestatus.c
+ *
+ * Backend status monitor
+ *
+ * Status data is stored in shared memory. Every backends updates and read it
+ * individually.
+ *
+ * Copyright (c) 2001-2019, PostgreSQL Global Development Group
+ *
+ * src/backend/statmon/bestatus.c
+ * ----------
+ */
+#include "postgres.h"
+
+#include "bestatus.h"
+
+#include "access/xact.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "postmaster/autovacuum.h"
+#include "replication/walsender.h"
+#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/sinvaladt.h"
+#include "utils/ascii.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/probes.h"
+
+
+/* Status for backends including auxiliary */
+static LocalPgBackendStatus *localBackendStatusTable = NULL;
+
+/* Total number of backends including auxiliary */
+static int localNumBackends = 0;
+
+/* ----------
+ * Total number of backends including auxiliary
+ *
+ * We reserve a slot for each possible BackendId, plus one for each
+ * possible auxiliary process type.  (This scheme assumes there is not
+ * more than one of any auxiliary process type at a time.) MaxBackends
+ * includes autovacuum workers and background workers as well.
+ * ----------
+ */
+#define NumBackendStatSlots (MaxBackends + NUM_AUXPROCTYPES)
+
+
+/* ----------
+ * GUC parameters
+ * ----------
+ */
+bool pgstat_track_activities = false;
+int pgstat_track_activity_query_size = 1024;
+
+static MemoryContext pgBeStatLocalContext = NULL;
+
+/* ------------------------------------------------------------
+ * Functions for management of the shared-memory PgBackendStatus array
+ * ------------------------------------------------------------
+ */
+
+static PgBackendStatus *BackendStatusArray = NULL;
+static PgBackendStatus *MyBEEntry = NULL;
+static char *BackendAppnameBuffer = NULL;
+static char *BackendClientHostnameBuffer = NULL;
+static char *BackendActivityBuffer = NULL;
+static Size BackendActivityBufferSize = 0;
+#ifdef USE_SSL
+static PgBackendSSLStatus *BackendSslStatusBuffer = NULL;
+#endif
+
+static const char *pgstat_get_wait_activity(WaitEventActivity w);
+static const char *pgstat_get_wait_client(WaitEventClient w);
+static const char *pgstat_get_wait_ipc(WaitEventIPC w);
+static const char *pgstat_get_wait_timeout(WaitEventTimeout w);
+static const char *pgstat_get_wait_io(WaitEventIO w);
+static void pgstat_setup_memcxt(void);
+static void bestatus_clear_snapshot(void);
+static void pgstat_beshutdown_hook(int code, Datum arg);
+/*
+ * Report shared-memory space needed by CreateSharedBackendStatus.
+ */
+Size
+BackendStatusShmemSize(void)
+{
+ Size size;
+
+ /* BackendStatusArray: */
+ size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
+ /* BackendAppnameBuffer: */
+ size = add_size(size,
+ mul_size(NAMEDATALEN, NumBackendStatSlots));
+ /* BackendClientHostnameBuffer: */
+ size = add_size(size,
+ mul_size(NAMEDATALEN, NumBackendStatSlots));
+ /* BackendActivityBuffer: */
+ size = add_size(size,
+ mul_size(pgstat_track_activity_query_size, NumBackendStatSlots));
+#ifdef USE_SSL
+ /* BackendSslStatusBuffer: */
+ size = add_size(size,
+ mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots));
+#endif
+ return size;
+}
+
+/*
+ * Initialize the shared status array and several string buffers
+ * during postmaster startup.
+ */
+void
+CreateSharedBackendStatus(void)
+{
+ Size size;
+ bool found;
+ int i;
+ char   *buffer;
+
+ /* Create or attach to the shared array */
+ size = mul_size(sizeof(PgBackendStatus), NumBackendStatSlots);
+ BackendStatusArray = (PgBackendStatus *)
+ ShmemInitStruct("Backend Status Array", size, &found);
+
+ if (!found)
+ {
+ /*
+ * We're the first - initialize.
+ */
+ MemSet(BackendStatusArray, 0, size);
+ }
+
+ /* Create or attach to the shared appname buffer */
+ size = mul_size(NAMEDATALEN, NumBackendStatSlots);
+ BackendAppnameBuffer = (char *)
+ ShmemInitStruct("Backend Application Name Buffer", size, &found);
+
+ if (!found)
+ {
+ MemSet(BackendAppnameBuffer, 0, size);
+
+ /* Initialize st_appname pointers. */
+ buffer = BackendAppnameBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_appname = buffer;
+ buffer += NAMEDATALEN;
+ }
+ }
+
+ /* Create or attach to the shared client hostname buffer */
+ size = mul_size(NAMEDATALEN, NumBackendStatSlots);
+ BackendClientHostnameBuffer = (char *)
+ ShmemInitStruct("Backend Client Host Name Buffer", size, &found);
+
+ if (!found)
+ {
+ MemSet(BackendClientHostnameBuffer, 0, size);
+
+ /* Initialize st_clienthostname pointers. */
+ buffer = BackendClientHostnameBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_clienthostname = buffer;
+ buffer += NAMEDATALEN;
+ }
+ }
+
+ /* Create or attach to the shared activity buffer */
+ BackendActivityBufferSize = mul_size(pgstat_track_activity_query_size,
+ NumBackendStatSlots);
+ BackendActivityBuffer = (char *)
+ ShmemInitStruct("Backend Activity Buffer",
+ BackendActivityBufferSize,
+ &found);
+
+ if (!found)
+ {
+ MemSet(BackendActivityBuffer, 0, BackendActivityBufferSize);
+
+ /* Initialize st_activity pointers. */
+ buffer = BackendActivityBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_activity_raw = buffer;
+ buffer += pgstat_track_activity_query_size;
+ }
+ }
+
+#ifdef USE_SSL
+ /* Create or attach to the shared SSL status buffer */
+ size = mul_size(sizeof(PgBackendSSLStatus), NumBackendStatSlots);
+ BackendSslStatusBuffer = (PgBackendSSLStatus *)
+ ShmemInitStruct("Backend SSL Status Buffer", size, &found);
+
+ if (!found)
+ {
+ PgBackendSSLStatus *ptr;
+
+ MemSet(BackendSslStatusBuffer, 0, size);
+
+ /* Initialize st_sslstatus pointers. */
+ ptr = BackendSslStatusBuffer;
+ for (i = 0; i < NumBackendStatSlots; i++)
+ {
+ BackendStatusArray[i].st_sslstatus = ptr;
+ ptr++;
+ }
+ }
+#endif
+}
+
+/* ----------
+ * pgstat_bearray_initialize() -
+ *
+ * Initialize pgstats state, and set up our on-proc-exit hook.
+ * Called from InitPostgres and AuxiliaryProcessMain. For auxiliary process,
+ * MyBackendId is invalid. Otherwise, MyBackendId must be set,
+ * but we must not have started any transaction yet (since the
+ * exit hook must run after the last transaction exit).
+ * NOTE: MyDatabaseId isn't set yet; so the shutdown hook has to be careful.
+ * ----------
+ */
+void
+pgstat_bearray_initialize(void)
+{
+ /* Initialize MyBEEntry */
+ if (MyBackendId != InvalidBackendId)
+ {
+ Assert(MyBackendId >= 1 && MyBackendId <= MaxBackends);
+ MyBEEntry = &BackendStatusArray[MyBackendId - 1];
+ }
+ else
+ {
+ /* Must be an auxiliary process */
+ Assert(MyAuxProcType != NotAnAuxProcess);
+
+ /*
+ * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
+ * have a BackendId, the slot is statically allocated based on the
+ * auxiliary process type (MyAuxProcType).  Backends use slots indexed
+ * in the range from 1 to MaxBackends (inclusive), so we use
+ * MaxBackends + AuxBackendType + 1 as the index of the slot for an
+ * auxiliary process.
+ */
+ MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
+ }
+
+ /* Set up a process-exit hook to clean up */
+ before_shmem_exit(pgstat_beshutdown_hook, 0);
+}
+
+/*
+ * Shut down a single backend's statistics reporting at process exit.
+ *
+ * Flush any remaining statistics counts out to the collector.
+ * Without this, operations triggered during backend exit (such as
+ * temp table deletions) won't be counted.
+ *
+ * Lastly, clear out our entry in the PgBackendStatus array.
+ */
+static void
+pgstat_beshutdown_hook(int code, Datum arg)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Clear my status entry, following the protocol of bumping st_changecount
+ * before and after.  We use a volatile pointer here to ensure the
+ * compiler doesn't try to get cute.
+ */
+ pgstat_increment_changecount_before(beentry);
+
+ beentry->st_procpid = 0; /* mark invalid */
+
+ pgstat_increment_changecount_after(beentry);
+}
+
+
+/* ----------
+ * pgstat_bestart() -
+ *
+ * Initialize this backend's entry in the PgBackendStatus array.
+ * Called from InitPostgres.
+ *
+ * Apart from auxiliary processes, MyBackendId, MyDatabaseId,
+ * session userid, and application_name must be set for a
+ * backend (hence, this cannot be combined with pgstat_initialize).
+ * ----------
+ */
+void
+pgstat_bestart(void)
+{
+ SockAddr clientaddr;
+ volatile PgBackendStatus *beentry;
+
+ /*
+ * To minimize the time spent modifying the PgBackendStatus entry, fetch
+ * all the needed data first.
+ */
+
+ /*
+ * We may not have a MyProcPort (eg, if this is the autovacuum process).
+ * If so, use all-zeroes client address, which is dealt with specially in
+ * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+ */
+ if (MyProcPort)
+ memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
+ else
+ MemSet(&clientaddr, 0, sizeof(clientaddr));
+
+ /*
+ * Initialize my status entry, following the protocol of bumping
+ * st_changecount before and after; and make sure it's even afterwards. We
+ * use a volatile pointer here to ensure the compiler doesn't try to get
+ * cute.
+ */
+ beentry = MyBEEntry;
+
+ /* pgstats state must be initialized from pgstat_initialize() */
+ Assert(beentry != NULL);
+
+ if (MyBackendId != InvalidBackendId)
+ {
+ if (IsAutoVacuumLauncherProcess())
+ {
+ /* Autovacuum Launcher */
+ beentry->st_backendType = B_AUTOVAC_LAUNCHER;
+ }
+ else if (IsAutoVacuumWorkerProcess())
+ {
+ /* Autovacuum Worker */
+ beentry->st_backendType = B_AUTOVAC_WORKER;
+ }
+ else if (am_walsender)
+ {
+ /* Wal sender */
+ beentry->st_backendType = B_WAL_SENDER;
+ }
+ else if (IsBackgroundWorker)
+ {
+ /* bgworker */
+ beentry->st_backendType = B_BG_WORKER;
+ }
+ else
+ {
+ /* client-backend */
+ beentry->st_backendType = B_BACKEND;
+ }
+ }
+ else
+ {
+ /* Must be an auxiliary process */
+ Assert(MyAuxProcType != NotAnAuxProcess);
+ switch (MyAuxProcType)
+ {
+ case StartupProcess:
+ beentry->st_backendType = B_STARTUP;
+ break;
+ case BgWriterProcess:
+ beentry->st_backendType = B_BG_WRITER;
+ break;
+ case CheckpointerProcess:
+ beentry->st_backendType = B_CHECKPOINTER;
+ break;
+ case WalWriterProcess:
+ beentry->st_backendType = B_WAL_WRITER;
+ break;
+ case WalReceiverProcess:
+ beentry->st_backendType = B_WAL_RECEIVER;
+ break;
+ case ArchiverProcess:
+ beentry->st_backendType = B_ARCHIVER;
+ break;
+ default:
+ elog(FATAL, "unrecognized process type: %d",
+ (int) MyAuxProcType);
+ proc_exit(1);
+ }
+ }
+
+ do
+ {
+ pgstat_increment_changecount_before(beentry);
+ } while ((beentry->st_changecount & 1) == 0);
+
+ beentry->st_procpid = MyProcPid;
+ beentry->st_proc_start_timestamp = MyStartTimestamp;
+ beentry->st_activity_start_timestamp = 0;
+ beentry->st_state_start_timestamp = 0;
+ beentry->st_xact_start_timestamp = 0;
+ beentry->st_databaseid = MyDatabaseId;
+
+ /* We have userid for client-backends, wal-sender and bgworker processes */
+ if (beentry->st_backendType == B_BACKEND
+ || beentry->st_backendType == B_WAL_SENDER
+ || beentry->st_backendType == B_BG_WORKER)
+ beentry->st_userid = GetSessionUserId();
+ else
+ beentry->st_userid = InvalidOid;
+
+ beentry->st_clientaddr = clientaddr;
+ if (MyProcPort && MyProcPort->remote_hostname)
+ strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
+ NAMEDATALEN);
+ else
+ beentry->st_clienthostname[0] = '\0';
+#ifdef USE_SSL
+ if (MyProcPort && MyProcPort->ssl != NULL)
+ {
+ beentry->st_ssl = true;
+ beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
+ beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
+ strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
+ strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
+ be_tls_get_peerdn_name(MyProcPort, beentry->st_sslstatus->ssl_clientdn, NAMEDATALEN);
+ }
+ else
+ {
+ beentry->st_ssl = false;
+ }
+#else
+ beentry->st_ssl = false;
+#endif
+ beentry->st_state = STATE_UNDEFINED;
+ beentry->st_appname[0] = '\0';
+ beentry->st_activity_raw[0] = '\0';
+ /* Also make sure the last byte in each string area is always 0 */
+ beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
+ beentry->st_appname[NAMEDATALEN - 1] = '\0';
+ beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
+ beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
+ beentry->st_progress_command_target = InvalidOid;
+
+ /*
+ * we don't zero st_progress_param here to save cycles; nobody should
+ * examine it until st_progress_command has been set to something other
+ * than PROGRESS_COMMAND_INVALID
+ */
+
+ pgstat_increment_changecount_after(beentry);
+
+ /* Update app name to current GUC setting */
+ if (application_name)
+ pgstat_report_appname(application_name);
+}
+
+/* ----------
+ * pgstat_read_current_status() -
+ *
+ * Copy the current contents of the PgBackendStatus array to local memory,
+ * if not already done in this transaction.
+ * ----------
+ */
+static void
+pgstat_read_current_status(void)
+{
+ volatile PgBackendStatus *beentry;
+ LocalPgBackendStatus *localtable;
+ LocalPgBackendStatus *localentry;
+ char   *localappname,
+   *localclienthostname,
+   *localactivity;
+#ifdef USE_SSL
+ PgBackendSSLStatus *localsslstatus;
+#endif
+ int i;
+
+ Assert(IsUnderPostmaster);
+
+ if (localBackendStatusTable)
+ return; /* already done */
+
+ pgstat_setup_memcxt();
+
+ localtable = (LocalPgBackendStatus *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   sizeof(LocalPgBackendStatus) * NumBackendStatSlots);
+ localappname = (char *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   NAMEDATALEN * NumBackendStatSlots);
+ localclienthostname = (char *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   NAMEDATALEN * NumBackendStatSlots);
+ localactivity = (char *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   pgstat_track_activity_query_size * NumBackendStatSlots);
+#ifdef USE_SSL
+ localsslstatus = (PgBackendSSLStatus *)
+ MemoryContextAlloc(pgBeStatLocalContext,
+   sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
+#endif
+
+ localNumBackends = 0;
+
+ beentry = BackendStatusArray;
+ localentry = localtable;
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * Follow the protocol of retrying if st_changecount changes while we
+ * copy the entry, or if it's odd.  (The check for odd is needed to
+ * cover the case where we are able to completely copy the entry while
+ * the source backend is between increment steps.) We use a volatile
+ * pointer here to ensure the compiler doesn't try to get cute.
+ */
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_save_changecount_before(beentry, before_changecount);
+
+ localentry->backendStatus.st_procpid = beentry->st_procpid;
+ if (localentry->backendStatus.st_procpid > 0)
+ {
+ memcpy(&localentry->backendStatus, (char *) beentry, sizeof(PgBackendStatus));
+
+ /*
+ * strcpy is safe even if the string is modified concurrently,
+ * because there's always a \0 at the end of the buffer.
+ */
+ strcpy(localappname, (char *) beentry->st_appname);
+ localentry->backendStatus.st_appname = localappname;
+ strcpy(localclienthostname, (char *) beentry->st_clienthostname);
+ localentry->backendStatus.st_clienthostname = localclienthostname;
+ strcpy(localactivity, (char *) beentry->st_activity_raw);
+ localentry->backendStatus.st_activity_raw = localactivity;
+ localentry->backendStatus.st_ssl = beentry->st_ssl;
+#ifdef USE_SSL
+ if (beentry->st_ssl)
+ {
+ memcpy(localsslstatus, beentry->st_sslstatus, sizeof(PgBackendSSLStatus));
+ localentry->backendStatus.st_sslstatus = localsslstatus;
+ }
+#endif
+ }
+
+ pgstat_save_changecount_after(beentry, after_changecount);
+ if (before_changecount == after_changecount &&
+ (before_changecount & 1) == 0)
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ beentry++;
+ /* Only valid entries get included into the local array */
+ if (localentry->backendStatus.st_procpid > 0)
+ {
+ BackendIdGetTransactionIds(i,
+   &localentry->backend_xid,
+   &localentry->backend_xmin);
+
+ localentry++;
+ localappname += NAMEDATALEN;
+ localclienthostname += NAMEDATALEN;
+ localactivity += pgstat_track_activity_query_size;
+#ifdef USE_SSL
+ localsslstatus++;
+#endif
+ localNumBackends++;
+ }
+ }
+
+ /* Set the pointer only after completion of a valid table */
+ localBackendStatusTable = localtable;
+}
+
+
+/* ----------
+ * pgstat_fetch_stat_beentry() -
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * our local copy of the current-activity entry for one backend.
+ *
+ * NB: caller is responsible for a check if the user is permitted to see
+ * this info (especially the querystring).
+ * ----------
+ */
+PgBackendStatus *
+pgstat_fetch_stat_beentry(int beid)
+{
+ pgstat_read_current_status();
+
+ if (beid < 1 || beid > localNumBackends)
+ return NULL;
+
+ return &localBackendStatusTable[beid - 1].backendStatus;
+}
+
+
+/* ----------
+ * pgstat_fetch_stat_local_beentry() -
+ *
+ * Like pgstat_fetch_stat_beentry() but with locally computed additions (like
+ * xid and xmin values of the backend)
+ *
+ * NB: caller is responsible for a check if the user is permitted to see
+ * this info (especially the querystring).
+ * ----------
+ */
+LocalPgBackendStatus *
+pgstat_fetch_stat_local_beentry(int beid)
+{
+ pgstat_read_current_status();
+
+ if (beid < 1 || beid > localNumBackends)
+ return NULL;
+
+ return &localBackendStatusTable[beid - 1];
+}
+
+
+/* ----------
+ * pgstat_fetch_stat_numbackends() -
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the maximum current backend id.
+ * ----------
+ */
+int
+pgstat_fetch_stat_numbackends(void)
+{
+ pgstat_read_current_status();
+
+ return localNumBackends;
+}
+
+/* ----------
+ * pgstat_get_wait_event_type() -
+ *
+ * Return a string representing the current wait event type, backend is
+ * waiting on.
+ */
+const char *
+pgstat_get_wait_event_type(uint32 wait_event_info)
+{
+ uint32 classId;
+ const char *event_type;
+
+ /* report process as not waiting. */
+ if (wait_event_info == 0)
+ return NULL;
+
+ classId = wait_event_info & 0xFF000000;
+
+ switch (classId)
+ {
+ case PG_WAIT_LWLOCK:
+ event_type = "LWLock";
+ break;
+ case PG_WAIT_LOCK:
+ event_type = "Lock";
+ break;
+ case PG_WAIT_BUFFER_PIN:
+ event_type = "BufferPin";
+ break;
+ case PG_WAIT_ACTIVITY:
+ event_type = "Activity";
+ break;
+ case PG_WAIT_CLIENT:
+ event_type = "Client";
+ break;
+ case PG_WAIT_EXTENSION:
+ event_type = "Extension";
+ break;
+ case PG_WAIT_IPC:
+ event_type = "IPC";
+ break;
+ case PG_WAIT_TIMEOUT:
+ event_type = "Timeout";
+ break;
+ case PG_WAIT_IO:
+ event_type = "IO";
+ break;
+ default:
+ event_type = "???";
+ break;
+ }
+
+ return event_type;
+}
+
+/* ----------
+ * pgstat_get_wait_event() -
+ *
+ * Return a string representing the current wait event, backend is
+ * waiting on.
+ */
+const char *
+pgstat_get_wait_event(uint32 wait_event_info)
+{
+ uint32 classId;
+ uint16 eventId;
+ const char *event_name;
+
+ /* report process as not waiting. */
+ if (wait_event_info == 0)
+ return NULL;
+
+ classId = wait_event_info & 0xFF000000;
+ eventId = wait_event_info & 0x0000FFFF;
+
+ switch (classId)
+ {
+ case PG_WAIT_LWLOCK:
+ event_name = GetLWLockIdentifier(classId, eventId);
+ break;
+ case PG_WAIT_LOCK:
+ event_name = GetLockNameFromTagType(eventId);
+ break;
+ case PG_WAIT_BUFFER_PIN:
+ event_name = "BufferPin";
+ break;
+ case PG_WAIT_ACTIVITY:
+ {
+ WaitEventActivity w = (WaitEventActivity) wait_event_info;
+
+ event_name = pgstat_get_wait_activity(w);
+ break;
+ }
+ case PG_WAIT_CLIENT:
+ {
+ WaitEventClient w = (WaitEventClient) wait_event_info;
+
+ event_name = pgstat_get_wait_client(w);
+ break;
+ }
+ case PG_WAIT_EXTENSION:
+ event_name = "Extension";
+ break;
+ case PG_WAIT_IPC:
+ {
+ WaitEventIPC w = (WaitEventIPC) wait_event_info;
+
+ event_name = pgstat_get_wait_ipc(w);
+ break;
+ }
+ case PG_WAIT_TIMEOUT:
+ {
+ WaitEventTimeout w = (WaitEventTimeout) wait_event_info;
+
+ event_name = pgstat_get_wait_timeout(w);
+ break;
+ }
+ case PG_WAIT_IO:
+ {
+ WaitEventIO w = (WaitEventIO) wait_event_info;
+
+ event_name = pgstat_get_wait_io(w);
+ break;
+ }
+ default:
+ event_name = "unknown wait event";
+ break;
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_activity() -
+ *
+ * Convert WaitEventActivity to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_activity(WaitEventActivity w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_ARCHIVER_MAIN:
+ event_name = "ArchiverMain";
+ break;
+ case WAIT_EVENT_AUTOVACUUM_MAIN:
+ event_name = "AutoVacuumMain";
+ break;
+ case WAIT_EVENT_BGWRITER_HIBERNATE:
+ event_name = "BgWriterHibernate";
+ break;
+ case WAIT_EVENT_BGWRITER_MAIN:
+ event_name = "BgWriterMain";
+ break;
+ case WAIT_EVENT_CHECKPOINTER_MAIN:
+ event_name = "CheckpointerMain";
+ break;
+ case WAIT_EVENT_LOGICAL_APPLY_MAIN:
+ event_name = "LogicalApplyMain";
+ break;
+ case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
+ event_name = "LogicalLauncherMain";
+ break;
+ case WAIT_EVENT_RECOVERY_WAL_ALL:
+ event_name = "RecoveryWalAll";
+ break;
+ case WAIT_EVENT_RECOVERY_WAL_STREAM:
+ event_name = "RecoveryWalStream";
+ break;
+ case WAIT_EVENT_SYSLOGGER_MAIN:
+ event_name = "SysLoggerMain";
+ break;
+ case WAIT_EVENT_WAL_RECEIVER_MAIN:
+ event_name = "WalReceiverMain";
+ break;
+ case WAIT_EVENT_WAL_SENDER_MAIN:
+ event_name = "WalSenderMain";
+ break;
+ case WAIT_EVENT_WAL_WRITER_MAIN:
+ event_name = "WalWriterMain";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_client() -
+ *
+ * Convert WaitEventClient to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_client(WaitEventClient w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_CLIENT_READ:
+ event_name = "ClientRead";
+ break;
+ case WAIT_EVENT_CLIENT_WRITE:
+ event_name = "ClientWrite";
+ break;
+ case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
+ event_name = "LibPQWalReceiverConnect";
+ break;
+ case WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE:
+ event_name = "LibPQWalReceiverReceive";
+ break;
+ case WAIT_EVENT_SSL_OPEN_SERVER:
+ event_name = "SSLOpenServer";
+ break;
+ case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
+ event_name = "WalReceiverWaitStart";
+ break;
+ case WAIT_EVENT_WAL_SENDER_WAIT_WAL:
+ event_name = "WalSenderWaitForWAL";
+ break;
+ case WAIT_EVENT_WAL_SENDER_WRITE_DATA:
+ event_name = "WalSenderWriteData";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_ipc() -
+ *
+ * Convert WaitEventIPC to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_ipc(WaitEventIPC w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_BGWORKER_SHUTDOWN:
+ event_name = "BgWorkerShutdown";
+ break;
+ case WAIT_EVENT_BGWORKER_STARTUP:
+ event_name = "BgWorkerStartup";
+ break;
+ case WAIT_EVENT_BTREE_PAGE:
+ event_name = "BtreePage";
+ break;
+ case WAIT_EVENT_CLOG_GROUP_UPDATE:
+ event_name = "ClogGroupUpdate";
+ break;
+ case WAIT_EVENT_EXECUTE_GATHER:
+ event_name = "ExecuteGather";
+ break;
+ case WAIT_EVENT_HASH_BATCH_ALLOCATING:
+ event_name = "Hash/Batch/Allocating";
+ break;
+ case WAIT_EVENT_HASH_BATCH_ELECTING:
+ event_name = "Hash/Batch/Electing";
+ break;
+ case WAIT_EVENT_HASH_BATCH_LOADING:
+ event_name = "Hash/Batch/Loading";
+ break;
+ case WAIT_EVENT_HASH_BUILD_ALLOCATING:
+ event_name = "Hash/Build/Allocating";
+ break;
+ case WAIT_EVENT_HASH_BUILD_ELECTING:
+ event_name = "Hash/Build/Electing";
+ break;
+ case WAIT_EVENT_HASH_BUILD_HASHING_INNER:
+ event_name = "Hash/Build/HashingInner";
+ break;
+ case WAIT_EVENT_HASH_BUILD_HASHING_OUTER:
+ event_name = "Hash/Build/HashingOuter";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATING:
+ event_name = "Hash/GrowBatches/Allocating";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_DECIDING:
+ event_name = "Hash/GrowBatches/Deciding";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_ELECTING:
+ event_name = "Hash/GrowBatches/Electing";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_FINISHING:
+ event_name = "Hash/GrowBatches/Finishing";
+ break;
+ case WAIT_EVENT_HASH_GROW_BATCHES_REPARTITIONING:
+ event_name = "Hash/GrowBatches/Repartitioning";
+ break;
+ case WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATING:
+ event_name = "Hash/GrowBuckets/Allocating";
+ break;
+ case WAIT_EVENT_HASH_GROW_BUCKETS_ELECTING:
+ event_name = "Hash/GrowBuckets/Electing";
+ break;
+ case WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING:
+ event_name = "Hash/GrowBuckets/Reinserting";
+ break;
+ case WAIT_EVENT_LOGICAL_SYNC_DATA:
+ event_name = "LogicalSyncData";
+ break;
+ case WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE:
+ event_name = "LogicalSyncStateChange";
+ break;
+ case WAIT_EVENT_MQ_INTERNAL:
+ event_name = "MessageQueueInternal";
+ break;
+ case WAIT_EVENT_MQ_PUT_MESSAGE:
+ event_name = "MessageQueuePutMessage";
+ break;
+ case WAIT_EVENT_MQ_RECEIVE:
+ event_name = "MessageQueueReceive";
+ break;
+ case WAIT_EVENT_MQ_SEND:
+ event_name = "MessageQueueSend";
+ break;
+ case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
+ event_name = "ParallelBitmapScan";
+ break;
+ case WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN:
+ event_name = "ParallelCreateIndexScan";
+ break;
+ case WAIT_EVENT_PARALLEL_FINISH:
+ event_name = "ParallelFinish";
+ break;
+ case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
+ event_name = "ProcArrayGroupUpdate";
+ break;
+ case WAIT_EVENT_PROMOTE:
+ event_name = "Promote";
+ break;
+ case WAIT_EVENT_REPLICATION_ORIGIN_DROP:
+ event_name = "ReplicationOriginDrop";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_DROP:
+ event_name = "ReplicationSlotDrop";
+ break;
+ case WAIT_EVENT_SAFE_SNAPSHOT:
+ event_name = "SafeSnapshot";
+ break;
+ case WAIT_EVENT_SYNC_REP:
+ event_name = "SyncRep";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_timeout() -
+ *
+ * Convert WaitEventTimeout to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_timeout(WaitEventTimeout w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_BASE_BACKUP_THROTTLE:
+ event_name = "BaseBackupThrottle";
+ break;
+ case WAIT_EVENT_PG_SLEEP:
+ event_name = "PgSleep";
+ break;
+ case WAIT_EVENT_RECOVERY_APPLY_DELAY:
+ event_name = "RecoveryApplyDelay";
+ break;
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+/* ----------
+ * pgstat_get_wait_io() -
+ *
+ * Convert WaitEventIO to string.
+ * ----------
+ */
+static const char *
+pgstat_get_wait_io(WaitEventIO w)
+{
+ const char *event_name = "unknown wait event";
+
+ switch (w)
+ {
+ case WAIT_EVENT_BUFFILE_READ:
+ event_name = "BufFileRead";
+ break;
+ case WAIT_EVENT_BUFFILE_WRITE:
+ event_name = "BufFileWrite";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_READ:
+ event_name = "ControlFileRead";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_SYNC:
+ event_name = "ControlFileSync";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE:
+ event_name = "ControlFileSyncUpdate";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_WRITE:
+ event_name = "ControlFileWrite";
+ break;
+ case WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE:
+ event_name = "ControlFileWriteUpdate";
+ break;
+ case WAIT_EVENT_COPY_FILE_READ:
+ event_name = "CopyFileRead";
+ break;
+ case WAIT_EVENT_COPY_FILE_WRITE:
+ event_name = "CopyFileWrite";
+ break;
+ case WAIT_EVENT_DATA_FILE_EXTEND:
+ event_name = "DataFileExtend";
+ break;
+ case WAIT_EVENT_DATA_FILE_FLUSH:
+ event_name = "DataFileFlush";
+ break;
+ case WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC:
+ event_name = "DataFileImmediateSync";
+ break;
+ case WAIT_EVENT_DATA_FILE_PREFETCH:
+ event_name = "DataFilePrefetch";
+ break;
+ case WAIT_EVENT_DATA_FILE_READ:
+ event_name = "DataFileRead";
+ break;
+ case WAIT_EVENT_DATA_FILE_SYNC:
+ event_name = "DataFileSync";
+ break;
+ case WAIT_EVENT_DATA_FILE_TRUNCATE:
+ event_name = "DataFileTruncate";
+ break;
+ case WAIT_EVENT_DATA_FILE_WRITE:
+ event_name = "DataFileWrite";
+ break;
+ case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
+ event_name = "DSMFillZeroWrite";
+ break;
+ case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
+ event_name = "LockFileAddToDataDirRead";
+ break;
+ case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC:
+ event_name = "LockFileAddToDataDirSync";
+ break;
+ case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE:
+ event_name = "LockFileAddToDataDirWrite";
+ break;
+ case WAIT_EVENT_LOCK_FILE_CREATE_READ:
+ event_name = "LockFileCreateRead";
+ break;
+ case WAIT_EVENT_LOCK_FILE_CREATE_SYNC:
+ event_name = "LockFileCreateSync";
+ break;
+ case WAIT_EVENT_LOCK_FILE_CREATE_WRITE:
+ event_name = "LockFileCreateWrite";
+ break;
+ case WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ:
+ event_name = "LockFileReCheckDataDirRead";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC:
+ event_name = "LogicalRewriteCheckpointSync";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC:
+ event_name = "LogicalRewriteMappingSync";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE:
+ event_name = "LogicalRewriteMappingWrite";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_SYNC:
+ event_name = "LogicalRewriteSync";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE:
+ event_name = "LogicalRewriteTruncate";
+ break;
+ case WAIT_EVENT_LOGICAL_REWRITE_WRITE:
+ event_name = "LogicalRewriteWrite";
+ break;
+ case WAIT_EVENT_RELATION_MAP_READ:
+ event_name = "RelationMapRead";
+ break;
+ case WAIT_EVENT_RELATION_MAP_SYNC:
+ event_name = "RelationMapSync";
+ break;
+ case WAIT_EVENT_RELATION_MAP_WRITE:
+ event_name = "RelationMapWrite";
+ break;
+ case WAIT_EVENT_REORDER_BUFFER_READ:
+ event_name = "ReorderBufferRead";
+ break;
+ case WAIT_EVENT_REORDER_BUFFER_WRITE:
+ event_name = "ReorderBufferWrite";
+ break;
+ case WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ:
+ event_name = "ReorderLogicalMappingRead";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_READ:
+ event_name = "ReplicationSlotRead";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC:
+ event_name = "ReplicationSlotRestoreSync";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_SYNC:
+ event_name = "ReplicationSlotSync";
+ break;
+ case WAIT_EVENT_REPLICATION_SLOT_WRITE:
+ event_name = "ReplicationSlotWrite";
+ break;
+ case WAIT_EVENT_SLRU_FLUSH_SYNC:
+ event_name = "SLRUFlushSync";
+ break;
+ case WAIT_EVENT_SLRU_READ:
+ event_name = "SLRURead";
+ break;
+ case WAIT_EVENT_SLRU_SYNC:
+ event_name = "SLRUSync";
+ break;
+ case WAIT_EVENT_SLRU_WRITE:
+ event_name = "SLRUWrite";
+ break;
+ case WAIT_EVENT_SNAPBUILD_READ:
+ event_name = "SnapbuildRead";
+ break;
+ case WAIT_EVENT_SNAPBUILD_SYNC:
+ event_name = "SnapbuildSync";
+ break;
+ case WAIT_EVENT_SNAPBUILD_WRITE:
+ event_name = "SnapbuildWrite";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC:
+ event_name = "TimelineHistoryFileSync";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE:
+ event_name = "TimelineHistoryFileWrite";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_READ:
+ event_name = "TimelineHistoryRead";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_SYNC:
+ event_name = "TimelineHistorySync";
+ break;
+ case WAIT_EVENT_TIMELINE_HISTORY_WRITE:
+ event_name = "TimelineHistoryWrite";
+ break;
+ case WAIT_EVENT_TWOPHASE_FILE_READ:
+ event_name = "TwophaseFileRead";
+ break;
+ case WAIT_EVENT_TWOPHASE_FILE_SYNC:
+ event_name = "TwophaseFileSync";
+ break;
+ case WAIT_EVENT_TWOPHASE_FILE_WRITE:
+ event_name = "TwophaseFileWrite";
+ break;
+ case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
+ event_name = "WALSenderTimelineHistoryRead";
+ break;
+ case WAIT_EVENT_WAL_BOOTSTRAP_SYNC:
+ event_name = "WALBootstrapSync";
+ break;
+ case WAIT_EVENT_WAL_BOOTSTRAP_WRITE:
+ event_name = "WALBootstrapWrite";
+ break;
+ case WAIT_EVENT_WAL_COPY_READ:
+ event_name = "WALCopyRead";
+ break;
+ case WAIT_EVENT_WAL_COPY_SYNC:
+ event_name = "WALCopySync";
+ break;
+ case WAIT_EVENT_WAL_COPY_WRITE:
+ event_name = "WALCopyWrite";
+ break;
+ case WAIT_EVENT_WAL_INIT_SYNC:
+ event_name = "WALInitSync";
+ break;
+ case WAIT_EVENT_WAL_INIT_WRITE:
+ event_name = "WALInitWrite";
+ break;
+ case WAIT_EVENT_WAL_READ:
+ event_name = "WALRead";
+ break;
+ case WAIT_EVENT_WAL_SYNC:
+ event_name = "WALSync";
+ break;
+ case WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN:
+ event_name = "WALSyncMethodAssign";
+ break;
+ case WAIT_EVENT_WAL_WRITE:
+ event_name = "WALWrite";
+ break;
+
+ /* no default case, so that compiler will warn */
+ }
+
+ return event_name;
+}
+
+
+/* ----------
+ * pgstat_get_backend_current_activity() -
+ *
+ * Return a string representing the current activity of the backend with
+ * the specified PID.  This looks directly at the BackendStatusArray,
+ * and so will provide current information regardless of the age of our
+ * transaction's snapshot of the status array.
+ *
+ * It is the caller's responsibility to invoke this only for backends whose
+ * state is expected to remain stable while the result is in use.  The
+ * only current use is in deadlock reporting, where we can expect that
+ * the target backend is blocked on a lock.  (There are corner cases
+ * where the target's wait could get aborted while we are looking at it,
+ * but the very worst consequence is to return a pointer to a string
+ * that's been changed, so we won't worry too much.)
+ *
+ * Note: return strings for special cases match pg_stat_get_backend_activity.
+ * ----------
+ */
+const char *
+pgstat_get_backend_current_activity(int pid, bool checkUser)
+{
+ PgBackendStatus *beentry;
+ int i;
+
+ beentry = BackendStatusArray;
+ for (i = 1; i <= MaxBackends; i++)
+ {
+ /*
+ * Although we expect the target backend's entry to be stable, that
+ * doesn't imply that anyone else's is.  To avoid identifying the
+ * wrong backend, while we check for a match to the desired PID we
+ * must follow the protocol of retrying if st_changecount changes
+ * while we examine the entry, or if it's odd.  (This might be
+ * unnecessary, since fetching or storing an int is almost certainly
+ * atomic, but let's play it safe.)  We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_save_changecount_before(vbeentry, before_changecount);
+
+ found = (vbeentry->st_procpid == pid);
+
+ pgstat_save_changecount_after(vbeentry, after_changecount);
+
+ if (before_changecount == after_changecount &&
+ (before_changecount & 1) == 0)
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ {
+ /* Now it is safe to use the non-volatile pointer */
+ if (checkUser && !superuser() && beentry->st_userid != GetUserId())
+ return "<insufficient privilege>";
+ else if (*(beentry->st_activity_raw) == '\0')
+ return "<command string not enabled>";
+ else
+ {
+ /* this'll leak a bit of memory, but that seems acceptable */
+ return pgstat_clip_activity(beentry->st_activity_raw);
+ }
+ }
+
+ beentry++;
+ }
+
+ /* If we get here, caller is in error ... */
+ return "<backend information not available>";
+}
+
+/* ----------
+ * pgstat_get_crashed_backend_activity() -
+ *
+ * Return a string representing the current activity of the backend with
+ * the specified PID.  Like the function above, but reads shared memory with
+ * the expectation that it may be corrupt.  On success, copy the string
+ * into the "buffer" argument and return that pointer.  On failure,
+ * return NULL.
+ *
+ * This function is only intended to be used by the postmaster to report the
+ * query that crashed a backend.  In particular, no attempt is made to
+ * follow the correct concurrency protocol when accessing the
+ * BackendStatusArray.  But that's OK, in the worst case we'll return a
+ * corrupted message.  We also must take care not to trip on ereport(ERROR).
+ * ----------
+ */
+const char *
+pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen)
+{
+ volatile PgBackendStatus *beentry;
+ int i;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return NULL;
+
+ for (i = 1; i <= MaxBackends; i++)
+ {
+ if (beentry->st_procpid == pid)
+ {
+ /* Read pointer just once, so it can't change after validation */
+ const char *activity = beentry->st_activity_raw;
+ const char *activity_last;
+
+ /*
+ * We mustn't access activity string before we verify that it
+ * falls within the BackendActivityBuffer. To make sure that the
+ * entire string including its ending is contained within the
+ * buffer, subtract one activity length from the buffer size.
+ */
+ activity_last = BackendActivityBuffer + BackendActivityBufferSize
+ - pgstat_track_activity_query_size;
+
+ if (activity < BackendActivityBuffer ||
+ activity > activity_last)
+ return NULL;
+
+ /* If no string available, no point in a report */
+ if (activity[0] == '\0')
+ return NULL;
+
+ /*
+ * Copy only ASCII-safe characters so we don't run into encoding
+ * problems when reporting the message; and be sure not to run off
+ * the end of memory.  As only ASCII characters are reported, it
+ * doesn't seem necessary to perform multibyte aware clipping.
+ */
+ ascii_safe_strlcpy(buffer, activity,
+   Min(buflen, pgstat_track_activity_query_size));
+
+ return buffer;
+ }
+
+ beentry++;
+ }
+
+ /* PID not found */
+ retu