More aggressive vacuuming of temporary tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

More aggressive vacuuming of temporary tables

Tom Lane-2
It strikes me that when we are vacuuming a temporary table (which
necessarily will be one of our own session), we don't really need
to care what the global xmin horizon is.  If we're not inside a
user transaction block, then there are no tuples in the table that
could be in-doubt anymore.  Neither are there any snapshots in our
session that could see any dead tuples.  Nor do we give a fig what
other sessions might think of those tuples.  So we could just set
the xmin cutoff as aggressively as possible, which is to say
equal to the nextXid counter.  While vacuuming a temp table is
perhaps not something people do very often, I think when they do
do it they would like us to clean out all the dead tuples not just
some.

Hence I propose 0001 attached.  80% of it is just API additions to allow
passing down the isTopLevel flag so that we can do the right thing in
the CLUSTER case; the important change is in vacuum_set_xid_limits.
(For ease of review, I didn't reindent the existing code in
vacuum_set_xid_limits, but that would be something to do before commit.)

The reason I got interested in this is that yesterday while fooling
with bug #16595, I was annoyed about our poor code test coverage in
access/gin/.  The 0002 patch attached brings coverage for ginvacuum.c
up from 59% to 93%; but as things stand, a long delay has to be inserted
between the DELETE and VACUUM steps, else the VACUUM won't remove the
dead tuples because of concurrent transactions, and we get no coverage
improvement.  Since the table in question is a temp table, that seems
pretty silly.

Thoughts?  Am I missing something important here?

                        regards, tom lane


diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0da444af0..beafb60ac2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,6 +470,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
   params->freeze_table_age,
   params->multixact_freeze_min_age,
   params->multixact_freeze_table_age,
+  true, /* we must be a top-level command */
   &OldestXmin, &FreezeLimit, &xidFullScanLimit,
   &MultiXactCutoff, &mxactFullScanLimit);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 04d12a7ece..0d647e912c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,10 +67,13 @@ typedef struct
 } RelToCluster;
 
 
-static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
+static void rebuild_relation(Relation OldHeap, Oid indexOid,
+ bool isTopLevel, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
- bool verbose, bool *pSwapToastByContent,
- TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+ bool isTopLevel, bool verbose,
+ bool *pSwapToastByContent,
+ TransactionId *pFreezeXid,
+ MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 
 
@@ -170,7 +173,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
  table_close(rel, NoLock);
 
  /* Do the job. */
- cluster_rel(tableOid, indexOid, stmt->options);
+ cluster_rel(tableOid, indexOid, stmt->options, isTopLevel);
  }
  else
  {
@@ -219,7 +222,8 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
  PushActiveSnapshot(GetTransactionSnapshot());
  /* Do the job. */
  cluster_rel(rvtc->tableOid, rvtc->indexOid,
- stmt->options | CLUOPT_RECHECK);
+ stmt->options | CLUOPT_RECHECK,
+ isTopLevel);
  PopActiveSnapshot();
  CommitTransactionCommand();
  }
@@ -250,7 +254,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
  * and error messages should refer to the operation as VACUUM not CLUSTER.
  */
 void
-cluster_rel(Oid tableOid, Oid indexOid, int options)
+cluster_rel(Oid tableOid, Oid indexOid, int options, bool isTopLevel)
 {
  Relation OldHeap;
  bool verbose = ((options & CLUOPT_VERBOSE) != 0);
@@ -400,7 +404,7 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
  TransferPredicateLocksToHeapRelation(OldHeap);
 
  /* rebuild_relation does all the dirty work */
- rebuild_relation(OldHeap, indexOid, verbose);
+ rebuild_relation(OldHeap, indexOid, isTopLevel, verbose);
 
  /* NB: rebuild_relation does table_close() on OldHeap */
 
@@ -545,11 +549,12 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  *
  * OldHeap: table to rebuild --- must be opened and exclusive-locked!
  * indexOid: index to cluster by, or InvalidOid to rewrite in physical order.
+ * isTopLevel: should be passed down from ProcessUtility.
  *
  * NB: this routine closes OldHeap at the right time; caller should not.
  */
 static void
-rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
+rebuild_relation(Relation OldHeap, Oid indexOid, bool isTopLevel, bool verbose)
 {
  Oid tableOid = RelationGetRelid(OldHeap);
  Oid tableSpace = OldHeap->rd_rel->reltablespace;
@@ -577,7 +582,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
    AccessExclusiveLock);
 
  /* Copy the heap data into the new table in the desired order */
- copy_table_data(OIDNewHeap, tableOid, indexOid, verbose,
+ copy_table_data(OIDNewHeap, tableOid, indexOid, isTopLevel, verbose,
  &swap_toast_by_content, &frozenXid, &cutoffMulti);
 
  /*
@@ -728,7 +733,8 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
+copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
+ bool isTopLevel, bool verbose,
  bool *pSwapToastByContent, TransactionId *pFreezeXid,
  MultiXactId *pCutoffMulti)
 {
@@ -826,7 +832,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
  * Since we're going to rewrite the whole table anyway, there's no reason
  * not to be aggressive about this.
  */
- vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0,
+ vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, isTopLevel,
   &OldestXmin, &FreezeXid, NULL, &MultiXactCutoff,
   NULL);
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 23eb605d4c..62b51b078d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -907,6 +907,9 @@ get_all_vacuum_rels(int options)
 /*
  * vacuum_set_xid_limits() -- compute oldestXmin and freeze cutoff points
  *
+ * Input parameters are the target relation, applicable freeze age settings,
+ * and isTopLevel which should be passed down from ProcessUtility.
+ *
  * The output parameters are:
  * - oldestXmin is the cutoff value used to distinguish whether tuples are
  * DEAD or RECENTLY_DEAD (see HeapTupleSatisfiesVacuum).
@@ -931,6 +934,7 @@ vacuum_set_xid_limits(Relation rel,
   int freeze_table_age,
   int multixact_freeze_min_age,
   int multixact_freeze_table_age,
+  bool isTopLevel,
   TransactionId *oldestXmin,
   TransactionId *freezeLimit,
   TransactionId *xidFullScanLimit,
@@ -946,7 +950,24 @@ vacuum_set_xid_limits(Relation rel,
  MultiXactId mxactLimit;
  MultiXactId safeMxactLimit;
 
+ if (RELATION_IS_LOCAL(rel) && !IsInTransactionBlock(isTopLevel))
+ {
+ /*
+ * If we are processing a temp relation (which by prior checks must be
+ * one belonging to our session), and we are not inside any
+ * transaction block, then there can be no tuples in the rel that are
+ * either in-doubt or dead but possibly still interesting to some
+ * snapshot our session holds.  We don't care whether other sessions
+ * could still care about such tuples, either.  So we can aggressively
+ * set the cutoff xmin to be the nextXid.
+ */
+ *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
+ }
+ else
+ {
  /*
+ * Otherwise, calculate the cutoff xmin normally.
+ *
  * We can always ignore processes running lazy vacuum.  This is because we
  * use these values only for deciding which tuples we must keep in the
  * tables.  Since lazy vacuum doesn't write its XID anywhere (usually no
@@ -974,6 +995,7 @@ vacuum_set_xid_limits(Relation rel,
  *oldestXmin = limit_xmin;
  }
  }
+ }
 
  Assert(TransactionIdIsNormal(*oldestXmin));
 
@@ -1907,7 +1929,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
  cluster_options |= CLUOPT_VERBOSE;
 
  /* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
- cluster_rel(relid, InvalidOid, cluster_options);
+ cluster_rel(relid, InvalidOid, cluster_options, true);
  }
  else
  table_relation_vacuum(onerel, params, vac_strategy);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index e05884781b..1eb144204b 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -19,7 +19,8 @@
 
 
 extern void cluster(ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Oid tableOid, Oid indexOid, int options);
+extern void cluster_rel(Oid tableOid, Oid indexOid, int options,
+ bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
    bool recheck, LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a4cd721400..d9475c9989 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -267,6 +267,7 @@ extern void vacuum_set_xid_limits(Relation rel,
   int freeze_min_age, int freeze_table_age,
   int multixact_freeze_min_age,
   int multixact_freeze_table_age,
+  bool isTopLevel,
   TransactionId *oldestXmin,
   TransactionId *freezeLimit,
   TransactionId *xidFullScanLimit,

diff --git a/src/test/regress/expected/gin.out b/src/test/regress/expected/gin.out
index b335466fc4..6402e89c7f 100644
--- a/src/test/regress/expected/gin.out
+++ b/src/test/regress/expected/gin.out
@@ -264,6 +264,27 @@ select count(*) from t_gin_test_tbl where j @> '{}'::int[];
  20006
 (1 row)
 
+-- test vacuuming of posting trees
+delete from t_gin_test_tbl where j @> array[2];
+vacuum t_gin_test_tbl;
+select count(*) from t_gin_test_tbl where j @> array[50];
+ count
+-------
+     0
+(1 row)
+
+select count(*) from t_gin_test_tbl where j @> array[2];
+ count
+-------
+     0
+(1 row)
+
+select count(*) from t_gin_test_tbl where j @> '{}'::int[];
+ count
+-------
+     6
+(1 row)
+
 reset enable_seqscan;
 reset enable_bitmapscan;
 drop table t_gin_test_tbl;
diff --git a/src/test/regress/sql/gin.sql b/src/test/regress/sql/gin.sql
index efb8ef3e96..5194afcc1f 100644
--- a/src/test/regress/sql/gin.sql
+++ b/src/test/regress/sql/gin.sql
@@ -159,6 +159,14 @@ explain (costs off)
 select count(*) from t_gin_test_tbl where j @> '{}'::int[];
 select count(*) from t_gin_test_tbl where j @> '{}'::int[];
 
+-- test vacuuming of posting trees
+delete from t_gin_test_tbl where j @> array[2];
+vacuum t_gin_test_tbl;
+
+select count(*) from t_gin_test_tbl where j @> array[50];
+select count(*) from t_gin_test_tbl where j @> array[2];
+select count(*) from t_gin_test_tbl where j @> '{}'::int[];
+
 reset enable_seqscan;
 reset enable_bitmapscan;
 
Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Michael Paquier-2
On Fri, Aug 28, 2020 at 11:46:49AM -0400, Tom Lane wrote:
> Hence I propose 0001 attached.  80% of it is just API additions to allow
> passing down the isTopLevel flag so that we can do the right thing in
> the CLUSTER case; the important change is in vacuum_set_xid_limits.
> (For ease of review, I didn't reindent the existing code in
> vacuum_set_xid_limits, but that would be something to do before commit.)

I got to wonder lately if we should not have a static state like what
we do for MyXactFlags to track down if a utility query is a top-level
one or not as most of the time we just want to check the flag as
passed down by standard_ProcessUtility().  I have faced this problem
as well lately when pushing down a PreventInTransactionBlock() for
some stuff with REINDEX for example.  Not sure how reliable this would
be though..  Passing isTopLevel down the road is a no-brainer, and
perhaps having a static value would create more problems than it
solves in practice.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Tom Lane-2
Michael Paquier <[hidden email]> writes:
> I got to wonder lately if we should not have a static state like what
> we do for MyXactFlags to track down if a utility query is a top-level
> one or not as most of the time we just want to check the flag as
> passed down by standard_ProcessUtility().  I have faced this problem
> as well lately when pushing down a PreventInTransactionBlock() for
> some stuff with REINDEX for example.  Not sure how reliable this would
> be though..  Passing isTopLevel down the road is a no-brainer, and
> perhaps having a static value would create more problems than it
> solves in practice.

Hm.  I suppose you'd need a PG_TRY block to ensure that the setting
got restored correctly after an error, so maintaining it that way
would be rather expensive.  Also it just doesn't seem like
transaction-wide state, so having a static for it feels like the
wrong thing.

[thinks for awhile...]  Would it make any sense to mark Portals as being
top-level or not, and then code that needs this info could look to
"ActivePortal->isTopLevel"?  We are already paying the PG_TRY overhead
to maintain the ActivePortal variable safely.  I'm not sure though
whether the Portal is granular enough.  Do we set up a separate
Portal for subcommands?

In the big scheme of things, though, we don't need this info in so
many places that passing it down as a parameter is an undue burden.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-08-28 11:46:49 -0400, Tom Lane wrote:

> It strikes me that when we are vacuuming a temporary table (which
> necessarily will be one of our own session), we don't really need
> to care what the global xmin horizon is.  If we're not inside a
> user transaction block, then there are no tuples in the table that
> could be in-doubt anymore.  Neither are there any snapshots in our
> session that could see any dead tuples.  Nor do we give a fig what
> other sessions might think of those tuples.  So we could just set
> the xmin cutoff as aggressively as possible, which is to say
> equal to the nextXid counter.  While vacuuming a temp table is
> perhaps not something people do very often, I think when they do
> do it they would like us to clean out all the dead tuples not just
> some.

That seems like a good idea.

I've been toying with a patch that introduces more smarts about when a
row is removable, by looking more closely whether a specific row
versions are visible (e.g. in the common case of one old snapshot and
lots of newer rows). But that's orders of magnitude more complicated. So
going for something as simple as this seems like a good idea.


> Hence I propose 0001 attached.  80% of it is just API additions to allow
> passing down the isTopLevel flag so that we can do the right thing in
> the CLUSTER case; the important change is in vacuum_set_xid_limits.
> (For ease of review, I didn't reindent the existing code in
> vacuum_set_xid_limits, but that would be something to do before commit.)

I did wonder for a moment if it could be worthwhile to just implement
this for VACUUM and leave CLUSTER alone, to avoid having to deal with
the toplevel stuff. But I think it's better to be consistent.

But now I do wonder why we need to know whether the command is top level
or not? Why isn't the correct thing to instead look at what the current
backend's xmin is? Seems like you could just replace
    *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
with
    *oldestXmin = MyProc->xmin;
    Assert(TransactionIdIsValid(*oldestXmin));

and you'd not need to care about whether the CLUSTER is top-level or
not? Without, I think, loosing any aggressiveness in the top-level case?
Perhaps even worth moving this logic into
GetOldestNonRemovableTransactionId().

I know you already pushed this, but I vote for revising it this way if
you don't see an issue?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Tom Lane-2
Andres Freund <[hidden email]> writes:
> But now I do wonder why we need to know whether the command is top level
> or not? Why isn't the correct thing to instead look at what the current
> backend's xmin is? Seems like you could just replace
>     *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
> with
>     *oldestXmin = MyProc->xmin;
>     Assert(TransactionIdIsValid(*oldestXmin));

Ummm ... since VACUUM doesn't run inside a transaction, it won't be
advertising an xmin will it?

Maybe you could make something like this work, but I think it'd still
have to treat CLUSTER as a special case.  Not sure it's worth it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Andres Freund
Hi,

On 2020-09-08 15:24:54 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > But now I do wonder why we need to know whether the command is top level
> > or not? Why isn't the correct thing to instead look at what the current
> > backend's xmin is? Seems like you could just replace
> >     *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
> > with
> >     *oldestXmin = MyProc->xmin;
> >     Assert(TransactionIdIsValid(*oldestXmin));
>
> Ummm ... since VACUUM doesn't run inside a transaction, it won't be
> advertising an xmin will it?

We do run it in a transaction though:

static bool
vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
{
...
        /* Begin a transaction for vacuuming this relation */
        StartTransactionCommand();

        /*
         * Need to acquire a snapshot to prevent pg_subtrans from being truncated,
         * cutoff xids in local memory wrapping around, and to have updated xmin
         * horizons.
         */
        PushActiveSnapshot(GetTransactionSnapshot());


> Maybe you could make something like this work, but I think it'd still
> have to treat CLUSTER as a special case.  Not sure it's worth it.

Why would CLUSTER need to be special cased? We'd precisely retain the
rows we need to, I think? Given that we'd exactly use the snapshot that
rules that determines which rows need to be retained?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2020-09-08 15:24:54 -0400, Tom Lane wrote:
>> Andres Freund <[hidden email]> writes:
>>> But now I do wonder why we need to know whether the command is top level
>>> or not? Why isn't the correct thing to instead look at what the current
>>> backend's xmin is? Seems like you could just replace
>>> *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
>>> with
>>> *oldestXmin = MyProc->xmin;
>>> Assert(TransactionIdIsValid(*oldestXmin));

>> Ummm ... since VACUUM doesn't run inside a transaction, it won't be
>> advertising an xmin will it?

> We do run it in a transaction though:

Ah.  But still, this is not the semantics I want for VACUUM, because the
process xmin will involve other processes' behavior.  The point of the
change as far as I'm concerned is to ensure vacuuming of dead temp rows
independent of anything else in the system.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Stephen Frost
In reply to this post by Andres Freund
Greetings,

* Andres Freund ([hidden email]) wrote:

> On 2020-08-28 11:46:49 -0400, Tom Lane wrote:
> > It strikes me that when we are vacuuming a temporary table (which
> > necessarily will be one of our own session), we don't really need
> > to care what the global xmin horizon is.  If we're not inside a
> > user transaction block, then there are no tuples in the table that
> > could be in-doubt anymore.  Neither are there any snapshots in our
> > session that could see any dead tuples.  Nor do we give a fig what
> > other sessions might think of those tuples.  So we could just set
> > the xmin cutoff as aggressively as possible, which is to say
> > equal to the nextXid counter.  While vacuuming a temp table is
> > perhaps not something people do very often, I think when they do
> > do it they would like us to clean out all the dead tuples not just
> > some.
>
> That seems like a good idea.
Agreed.

> I've been toying with a patch that introduces more smarts about when a
> row is removable, by looking more closely whether a specific row
> versions are visible (e.g. in the common case of one old snapshot and
> lots of newer rows). But that's orders of magnitude more complicated. So
> going for something as simple as this seems like a good idea.

I've wondered about this for a long time- very cool that you've found
time to actually work on a patch.  A couple of different ideas were
discussed previously about how to do that kind of a check- mind talking
about what method you're using, or perhaps just sharing that patch? :)

The potential of such an improvement is huge.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Andres Freund
Hi,

On 2020-09-09 10:14:04 -0400, Stephen Frost wrote:

> > I've been toying with a patch that introduces more smarts about when a
> > row is removable, by looking more closely whether a specific row
> > versions are visible (e.g. in the common case of one old snapshot and
> > lots of newer rows). But that's orders of magnitude more complicated. So
> > going for something as simple as this seems like a good idea.
>
> I've wondered about this for a long time- very cool that you've found
> time to actually work on a patch.  A couple of different ideas were
> discussed previously about how to do that kind of a check- mind talking
> about what method you're using, or perhaps just sharing that patch? :)

It's very very early, and it doesn't really work. I basically tried to
just plug a bit more intelligence into the dead tuple detection (which
now can easily store more and incrementally built state with the recent
changes for snapshot scalability).  Detection that tuples newer than the
horizon are dead isn't that complicated - what's hard is not breaking
things due to ctid chains lacking intermediate versions.  To avoid that
I had to restrict it to inserted (not updated) tuples that were
subsequently deleted. And my heuristic only supported only one old
snapshot.

Building a bsearchable list of ranges of valid (xmin-xmax] ranges isn't
that hard. Some care needs to be taken to make the list non-overlapping,
but that's easy enough by just merging entries.

Obviously lookup in such a more complicated structure isn't free. Nor is
building it. So we'd need some heuristics about when to do so. It'd
probably be OK to occasionally look at the age of the oldest in-progress
statement, to infer the time for old snapshots based on that. And then
we could have a GUC that says when it's worth doing more complicated
lookups.

I don't have a handle on how to deal with the ctid chaining for
intermediate row versions.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-09-08 18:13:58 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2020-09-08 15:24:54 -0400, Tom Lane wrote:
> >> Andres Freund <[hidden email]> writes:
> >>> But now I do wonder why we need to know whether the command is top level
> >>> or not? Why isn't the correct thing to instead look at what the current
> >>> backend's xmin is? Seems like you could just replace
> >>> *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
> >>> with
> >>> *oldestXmin = MyProc->xmin;
> >>> Assert(TransactionIdIsValid(*oldestXmin));
>
> >> Ummm ... since VACUUM doesn't run inside a transaction, it won't be
> >> advertising an xmin will it?
>
> > We do run it in a transaction though:
>
> Ah.  But still, this is not the semantics I want for VACUUM, because the
> process xmin will involve other processes' behavior.  The point of the
> change as far as I'm concerned is to ensure vacuuming of dead temp rows
> independent of anything else in the system.

I was thinking about this a bit more, and I think the answer might be to
use Min(latestCompletedXid, MyProc->xid). That would, as far as I can
tell, never miss something vacuumable in a temporary table, doesn't
require to know whether we're running as the top-level command.

The reason for preferring latestCompletedXid over nextXid is that the
former is protected by ProcArrayLock and already accessed in
GetSnapshotData(), so we can cheaply compute the horizons as part of
pruning.

I think that cannot miss something vacuumable in a temp table for VACUUM
because that would have to have been left over by an already completed
transaction (by us, before the VACUUM).

In addition this allows HOT pruning etc on temp tables to be just as
aggressive as VACUUM is.

I wrote a patch to do so for [1], but it seemed topically more relevant
here. Running tests in a loop, no failures after the first few
iterations.

Greetings,

Andres Freund

[1] https://postgr.es/m/20200921212003.wrizvknpkim2whzo%40alap3.anarazel.de


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Peter Geoghegan-4
In reply to this post by Andres Freund
On Wed, Sep 9, 2020 at 11:02 AM Andres Freund <[hidden email]> wrote:
> Obviously lookup in such a more complicated structure isn't free. Nor is
> building it. So we'd need some heuristics about when to do so. It'd
> probably be OK to occasionally look at the age of the oldest in-progress
> statement, to infer the time for old snapshots based on that. And then
> we could have a GUC that says when it's worth doing more complicated
> lookups.
>
> I don't have a handle on how to deal with the ctid chaining for
> intermediate row versions.

I wonder if it makes sense to add the feature to nbtree first. This
side-steps the practical problem of having to figure out ctid
chaining. You can prove the idea in a relatively low risk way, while
still creating significant value for users.

We can reason about versioning within indexes without having
authoritative information about it close at hand. The trick is to
delay everything until it looks like we'll have to split the page. I
can see very substantial improvements to index bloat from non-HOT
updates with the deduplication-deletion patch I've been working on:

https://postgr.es/m/CAH2-WznjQQwSPSxBiZ6wQyBiKqFmfdjOdeqp28otqw551T7jcQ@...

Importantly, the patch tends to bound the number of distinct index
entries *per logical row* in the presence of non-HOT updates (at least
for indexes that are not logically modified by the update). ISTM that
this is what really matters. The patch doesn't just reduce index bloat
-- it greatly caps the number of heap pages any given primary key
point lookup query will do. So in practice we eagerly kill precisely
the garbage index tuples that would have caused us the most trouble
without the new mechanism in place.

Avoiding a page split is a big win, so we can probably justify
relatively expensive lookup calls to make this work. This seems
especially likely to be true because we can probably ratchet up to
that only when we see that it will win. If a unique index has 10
entries for one value (10 versions), which is inevitable once HOT
pruning fails, then how many of those 10 do we really need? The answer
is perhaps just 2 or 3 maybe 99%+ of the time. I strongly suspect that
preventing page splits is more valuable than opportunistic heap
pruning (unless the pruning itself prevents page splits, which it may
or may not). Logically unnecessary page splits have obvious lasting
consequences, are usually highly correlated, and affect performance in
ways that are non-linear and likely very harmful in the real world.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: More aggressive vacuuming of temporary tables

Andres Freund
In reply to this post by Andres Freund
Hi,

On 2020-10-14 13:31:03 -0700, Andres Freund wrote:

> I was thinking about this a bit more, and I think the answer might be to
> use Min(latestCompletedXid, MyProc->xid). That would, as far as I can
> tell, never miss something vacuumable in a temporary table, doesn't
> require to know whether we're running as the top-level command.
>
> The reason for preferring latestCompletedXid over nextXid is that the
> former is protected by ProcArrayLock and already accessed in
> GetSnapshotData(), so we can cheaply compute the horizons as part of
> pruning.
>
> I think that cannot miss something vacuumable in a temp table for VACUUM
> because that would have to have been left over by an already completed
> transaction (by us, before the VACUUM).
>
> In addition this allows HOT pruning etc on temp tables to be just as
> aggressive as VACUUM is.
>
> I wrote a patch to do so for [1], but it seemed topically more relevant
> here. Running tests in a loop, no failures after the first few
> iterations.
>
> [1] https://postgr.es/m/20200921212003.wrizvknpkim2whzo%40alap3.anarazel.de

Pushed this change in logic. The only "real" change is that the horizon
for backends without an xid needs to be latestCompletedXid + 1, rather
than just latestCompletedXid. The horizon indicates the oldest
*non*-removable xid, and for temp tables latestCompletedXid can always
be vacuumed when no xid is assigned.

Greetings,

Andres Freund