recovering from "found xmin ... from before relfrozenxid ..."

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
142 messages Options
1 ... 45678
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
On Wed, Sep 16, 2020 at 11:48 AM Tom Lane <[hidden email]> wrote:
> Robert Haas <[hidden email]> writes:
> > Tom, I know that you often have strong feelings about the exact
> > wording and details of this kind of stuff, so if you feel moved to
> > commit something that is fine with me. If not, I will take my best
> > shot at it.
>
> I'm not feeling terribly picky about it --- so it's all yours.

OK. After some more study of the thread and some more experimentation,
I came up with the attached. I'll go ahead and commit this if nobody
objects.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-pg_surgery-Try-to-stabilize-regression-tests.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
Robert Haas <[hidden email]> writes:
> OK. After some more study of the thread and some more experimentation,
> I came up with the attached. I'll go ahead and commit this if nobody
> objects.

This is OK by me.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Ashutosh Sharma
On Thu, Sep 17, 2020 at 12:14 AM Tom Lane <[hidden email]> wrote:
>
> Robert Haas <[hidden email]> writes:
> > OK. After some more study of the thread and some more experimentation,
> > I came up with the attached. I'll go ahead and commit this if nobody
> > objects.
>
> This is OK by me.
>

Looks good to me too.

Thanks,

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
On Wed, Sep 16, 2020 at 10:27 PM Ashutosh Sharma <[hidden email]> wrote:
> > This is OK by me.
>
> Looks good to me too.

Cool, thanks to both of you for looking. Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
Robert Haas <[hidden email]> writes:
> Cool, thanks to both of you for looking. Committed.

Hmph, according to whelk this is worse not better:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=whelk&dt=2020-09-18%2017%3A42%3A11

I'm at a loss to understand what's going on there.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

akapila
On Sat, Sep 19, 2020 at 4:03 AM Tom Lane <[hidden email]> wrote:

>
> Robert Haas <[hidden email]> writes:
> > Cool, thanks to both of you for looking. Committed.
>
> Hmph, according to whelk this is worse not better:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=whelk&dt=2020-09-18%2017%3A42%3A11
>
> I'm at a loss to understand what's going on there.
>

I think our assumption that changing the tests to have temp tables
will make them safe w.r.t concurrent activity doesn't seem to be
correct. We do set OldestXmin for temp tables aggressive enough that
it allows us to remove all dead tuples but the test case behavior lies
on whether we are able to prune the chain. AFAICS, we are using
different cut-offs in heap_page_prune when it is called via
lazy_scan_heap. So that seems to be causing both the failures.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
Amit Kapila <[hidden email]> writes:
> I think our assumption that changing the tests to have temp tables
> will make them safe w.r.t concurrent activity doesn't seem to be
> correct. We do set OldestXmin for temp tables aggressive enough that
> it allows us to remove all dead tuples but the test case behavior lies
> on whether we are able to prune the chain. AFAICS, we are using
> different cut-offs in heap_page_prune when it is called via
> lazy_scan_heap. So that seems to be causing both the failures.

Hm, reasonable theory.

I was able to partially reproduce whelk's failure here.  I got a
couple of cases of "cannot freeze committed xmax", which then leads
to the second NOTICE diff; but I couldn't reproduce the first
NOTICE diff.  That was out of about a thousand tries :-( so it's not
looking like a promising thing to reproduce without modifying the test.

I wonder whether "cannot freeze committed xmax" doesn't represent an
actual bug, ie is a7212be8b setting the cutoff *too* aggressively?
But if so, why's it so hard to reproduce?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
I wrote:
> I was able to partially reproduce whelk's failure here.  I got a
> couple of cases of "cannot freeze committed xmax", which then leads
> to the second NOTICE diff; but I couldn't reproduce the first
> NOTICE diff.  That was out of about a thousand tries :-( so it's not
> looking like a promising thing to reproduce without modifying the test.

... however, it's trivial to reproduce via manual interference,
using the same strategy discussed recently for another case:
add a pg_sleep at the start of the heap_surgery.sql script,
run "make installcheck", and while that's running start another
session in which you begin a serializable transaction, execute
any old SELECT, and wait.  AFAICT this reproduces all of whelk's
symptoms with 100% reliability.

With a little more effort, this could be automated by putting
some long-running transaction (likely, it needn't be any more
complicated than "select pg_sleep(10)") in a second test script
launched in parallel with heap_surgery.sql.

So this confirms the suspicion that the cause of the buildfarm
failures is a concurrently-open transaction, presumably from
autovacuum.  I don't have time to poke further right now.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
I wrote:
> So this confirms the suspicion that the cause of the buildfarm
> failures is a concurrently-open transaction, presumably from
> autovacuum.  I don't have time to poke further right now.

I spent some more time analyzing this, and there seem to be two distinct
issues:

1. My patch a7212be8b does indeed have a problem.  It will allow
vacuum_set_xid_limits to compute freezeLimit = nextXid for a temp
table if freeze_min_age is zero (ie VACUUM FREEZE).  If there's
any concurrent transactions, this falls foul of
heap_prepare_freeze_tuple's expectation that

 * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
 * XID older than it could neither be running nor seen as running by any
 * open transaction.  This ensures that the replacement will not change
 * anyone's idea of the tuple state.

The "cannot freeze committed xmax" error message appears to be banking on
the assumption that we'd not reach heap_prepare_freeze_tuple for any
committed-dead tuple unless its xmax is past the specified cutoff_xid.

2. As Amit suspected, there's an inconsistency between pruneheap.c's
rules for which tuples are removable and vacuum.c's rules for that.
This seems like a massive bug in its own right: what's the point of
pruneheap.c going to huge effort to decide whether it should keep a
tuple if vacuum will then kill it anyway?  I do not understand why
whoever put in the GlobalVisState stuff only applied it in pruneheap.c
and not VACUUM proper.

These two points interact, in that we don't get to the "cannot freeze"
failure except when pruneheap has decided not to remove something that
is removable according to VACUUM's rules.  (VACUUM doesn't actually
remove it, because lazy_scan_heap won't try to remove HeapOnly tuples
even when it thinks they're HEAPTUPLE_DEAD; but then it tries to freeze
the tuple, and heap_prepare_freeze_tuple spits up.)  However, if I revert
a7212be8b then the pg_surgery test still fails in the presence of a
concurrent transaction (both of the expected "skipping TID" notices
disappear).  So reverting that patch wouldn't get us out of trouble.

I think to move forward, we need to figure out what the freezing
behavior ought to be for temp tables.  We could make it the same
as it was before a7212be8b, which'd just require some more complexity
in vacuum_set_xid_limits.  However, that negates the idea that we'd
like VACUUM's behavior on a temp table to be fully independent of
whether concurrent transactions exist.  I'd prefer to allow a7212be8b's
behavior to stand, but then it seems we need to lobotomize the error
check in heap_prepare_freeze_tuple to some extent.

Independently of that, it seems like we need to fix things so that
when pruneheap.c is called by vacuum, it makes EXACTLY the same
dead-or-not-dead decisions that the main vacuum code makes.  This
business with applying some GlobalVisState rule or other instead
seems just as unsafe as can be.

AFAICS, there is no chance of the existing pg_surgery regression test
being fully stable if we don't fix both things.

BTW, attached is a quick-hack patch to allow automated testing
of this scenario, along the lines I sketched yesterday.  This
test passes if you run the two scripts serially, but not when
you run them in parallel.  I'm not proposing this for commit;
it's a hack and its timing behavior is probably not stable enough
for the buildfarm.  But it's pretty useful for poking at these
issues.

                        regards, tom lane


diff --git a/contrib/pg_surgery/Makefile b/contrib/pg_surgery/Makefile
index a66776c4c4..a7bcfca0fb 100644
--- a/contrib/pg_surgery/Makefile
+++ b/contrib/pg_surgery/Makefile
@@ -9,7 +9,7 @@ EXTENSION = pg_surgery
 DATA = pg_surgery--1.0.sql
 PGFILEDESC = "pg_surgery - perform surgery on a damaged relation"
 
-REGRESS = heap_surgery
+REGRESS = --schedule=surgery_schedule
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/pg_surgery/expected/heap_surgery.out b/contrib/pg_surgery/expected/heap_surgery.out
index d4a757ffa0..6ef5597e7c 100644
--- a/contrib/pg_surgery/expected/heap_surgery.out
+++ b/contrib/pg_surgery/expected/heap_surgery.out
@@ -1,4 +1,10 @@
 create extension pg_surgery;
+select pg_sleep(0.1);  -- ensure concurrent transaction is ready
+ pg_sleep
+----------
+
+(1 row)
+
 -- create a normal heap table and insert some rows.
 -- use a temp table so that vacuum behavior doesn't depend on global xmin
 create temp table htab (a int);
diff --git a/contrib/pg_surgery/expected/sleep.out b/contrib/pg_surgery/expected/sleep.out
new file mode 100644
index 0000000000..208e31163d
--- /dev/null
+++ b/contrib/pg_surgery/expected/sleep.out
@@ -0,0 +1,6 @@
+select pg_sleep(1.0);  -- long enough for whole heap_surgery test
+ pg_sleep
+----------
+
+(1 row)
+
diff --git a/contrib/pg_surgery/sql/heap_surgery.sql b/contrib/pg_surgery/sql/heap_surgery.sql
index 6526b27535..212c657f3c 100644
--- a/contrib/pg_surgery/sql/heap_surgery.sql
+++ b/contrib/pg_surgery/sql/heap_surgery.sql
@@ -1,5 +1,7 @@
 create extension pg_surgery;
 
+select pg_sleep(0.1);  -- ensure concurrent transaction is ready
+
 -- create a normal heap table and insert some rows.
 -- use a temp table so that vacuum behavior doesn't depend on global xmin
 create temp table htab (a int);
diff --git a/contrib/pg_surgery/sql/sleep.sql b/contrib/pg_surgery/sql/sleep.sql
new file mode 100644
index 0000000000..a85c25bca2
--- /dev/null
+++ b/contrib/pg_surgery/sql/sleep.sql
@@ -0,0 +1 @@
+select pg_sleep(1.0);  -- long enough for whole heap_surgery test
diff --git a/contrib/pg_surgery/surgery_schedule b/contrib/pg_surgery/surgery_schedule
new file mode 100644
index 0000000000..1bb3aa5e53
--- /dev/null
+++ b/contrib/pg_surgery/surgery_schedule
@@ -0,0 +1 @@
+test: heap_surgery sleep
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Ashutosh Sharma
Hi Tom,

On Sun, Sep 20, 2020 at 10:43 PM Tom Lane <[hidden email]> wrote:

>
> I wrote:
> > So this confirms the suspicion that the cause of the buildfarm
> > failures is a concurrently-open transaction, presumably from
> > autovacuum.  I don't have time to poke further right now.
>
> I spent some more time analyzing this, and there seem to be two distinct
> issues:
>
> 1. My patch a7212be8b does indeed have a problem.  It will allow
> vacuum_set_xid_limits to compute freezeLimit = nextXid for a temp
> table if freeze_min_age is zero (ie VACUUM FREEZE).  If there's
> any concurrent transactions, this falls foul of
> heap_prepare_freeze_tuple's expectation that
>
>  * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
>  * XID older than it could neither be running nor seen as running by any
>  * open transaction.  This ensures that the replacement will not change
>  * anyone's idea of the tuple state.
>
> The "cannot freeze committed xmax" error message appears to be banking on
> the assumption that we'd not reach heap_prepare_freeze_tuple for any
> committed-dead tuple unless its xmax is past the specified cutoff_xid.
>
> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
> rules for which tuples are removable and vacuum.c's rules for that.
> This seems like a massive bug in its own right: what's the point of
> pruneheap.c going to huge effort to decide whether it should keep a
> tuple if vacuum will then kill it anyway?  I do not understand why
> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
> and not VACUUM proper.
>
> These two points interact, in that we don't get to the "cannot freeze"
> failure except when pruneheap has decided not to remove something that
> is removable according to VACUUM's rules.  (VACUUM doesn't actually
> remove it, because lazy_scan_heap won't try to remove HeapOnly tuples
> even when it thinks they're HEAPTUPLE_DEAD; but then it tries to freeze
> the tuple, and heap_prepare_freeze_tuple spits up.)  However, if I revert
> a7212be8b then the pg_surgery test still fails in the presence of a
> concurrent transaction (both of the expected "skipping TID" notices
> disappear).  So reverting that patch wouldn't get us out of trouble.
>
> I think to move forward, we need to figure out what the freezing
> behavior ought to be for temp tables.  We could make it the same
> as it was before a7212be8b, which'd just require some more complexity
> in vacuum_set_xid_limits.  However, that negates the idea that we'd
> like VACUUM's behavior on a temp table to be fully independent of
> whether concurrent transactions exist.  I'd prefer to allow a7212be8b's
> behavior to stand, but then it seems we need to lobotomize the error
> check in heap_prepare_freeze_tuple to some extent.
>
> Independently of that, it seems like we need to fix things so that
> when pruneheap.c is called by vacuum, it makes EXACTLY the same
> dead-or-not-dead decisions that the main vacuum code makes.  This
> business with applying some GlobalVisState rule or other instead
> seems just as unsafe as can be.
>
> AFAICS, there is no chance of the existing pg_surgery regression test
> being fully stable if we don't fix both things.
>

From the explanation you provided above it seems like the test-case
for pg_surgery module is failing because there is some issue with the
changes done in a7212be8b commit (shown below). In other words, I
believe that the test-case for pg_surgery has actually detected an
issue in this commit.

commit a7212be8b9e0885ee769e8c55f99ef742cda487b
Author: Tom Lane <[hidden email]>
Date:   Tue Sep 1 18:37:12 2020 -0400

    Set cutoff xmin more aggressively when vacuuming a temporary table.

    ....

So, do you mean to say that if the issues related to temp tables
induced by the above commit is fixed, it will make the regression test
for pg_surgery stable?

Please let me know if I am missing something here. Thank you.

> BTW, attached is a quick-hack patch to allow automated testing
> of this scenario, along the lines I sketched yesterday.  This
> test passes if you run the two scripts serially, but not when
> you run them in parallel.  I'm not proposing this for commit;
> it's a hack and its timing behavior is probably not stable enough
> for the buildfarm.  But it's pretty useful for poking at these
> issues.
>

Yeah, understood, thanks for sharing this.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

akapila
In reply to this post by Tom Lane-2
On Sun, Sep 20, 2020 at 10:43 PM Tom Lane <[hidden email]> wrote:
>
> I wrote:
> > So this confirms the suspicion that the cause of the buildfarm
> > failures is a concurrently-open transaction, presumably from
> > autovacuum.  I don't have time to poke further right now.
>
..

> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
> rules for which tuples are removable and vacuum.c's rules for that.
> This seems like a massive bug in its own right: what's the point of
> pruneheap.c going to huge effort to decide whether it should keep a
> tuple if vacuum will then kill it anyway?  I do not understand why
> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
> and not VACUUM proper.
>
> These two points interact, in that we don't get to the "cannot freeze"
> failure except when pruneheap has decided not to remove something that
> is removable according to VACUUM's rules.  (VACUUM doesn't actually
> remove it, because lazy_scan_heap won't try to remove HeapOnly tuples
> even when it thinks they're HEAPTUPLE_DEAD; but then it tries to freeze
> the tuple, and heap_prepare_freeze_tuple spits up.)  However, if I revert
> a7212be8b then the pg_surgery test still fails in the presence of a
> concurrent transaction (both of the expected "skipping TID" notices
> disappear).  So reverting that patch wouldn't get us out of trouble.
>
> I think to move forward, we need to figure out what the freezing
> behavior ought to be for temp tables.  We could make it the same
> as it was before a7212be8b, which'd just require some more complexity
> in vacuum_set_xid_limits.  However, that negates the idea that we'd
> like VACUUM's behavior on a temp table to be fully independent of
> whether concurrent transactions exist.  I'd prefer to allow a7212be8b's
> behavior to stand, but then it seems we need to lobotomize the error
> check in heap_prepare_freeze_tuple to some extent.
>
> Independently of that, it seems like we need to fix things so that
> when pruneheap.c is called by vacuum, it makes EXACTLY the same
> dead-or-not-dead decisions that the main vacuum code makes.  This
> business with applying some GlobalVisState rule or other instead
> seems just as unsafe as can be.
>

Yeah, on a quick look it seems before commit dc7420c2c9 the
pruneheap.c and the main Vacuum code use to make the same decision and
that is commit which has introduced GlobalVisState stuff.

> AFAICS, there is no chance of the existing pg_surgery regression test
> being fully stable if we don't fix both things.
>

What if ensure that it runs with autovacuum = off and there is no
parallel test running? I am not sure about the second part but if we
can do that then the test will be probably stable.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
Amit Kapila <[hidden email]> writes:
> On Sun, Sep 20, 2020 at 10:43 PM Tom Lane <[hidden email]> wrote:
>> AFAICS, there is no chance of the existing pg_surgery regression test
>> being fully stable if we don't fix both things.

> What if ensure that it runs with autovacuum = off and there is no
> parallel test running? I am not sure about the second part but if we
> can do that then the test will be probably stable.

Then it'll not be usable under "make installcheck", which is not
very nice.  It's also arguable that you aren't testing pg_surgery
under real-world conditions if you do it like that.

Moreover, I think that both of these points need to be addressed
anyway, as they represent bugs that are reachable independently
of pg_surgery.  Admittedly, we do not have a test case that
proves that the inconsistency between pruneheap and vacuum has
any bad effects in the absence of a7212be8b.  But do you really
want to bet that there are none?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
In reply to this post by Tom Lane-2
On Sun, Sep 20, 2020 at 1:13 PM Tom Lane <[hidden email]> wrote:

> 1. My patch a7212be8b does indeed have a problem.  It will allow
> vacuum_set_xid_limits to compute freezeLimit = nextXid for a temp
> table if freeze_min_age is zero (ie VACUUM FREEZE).  If there's
> any concurrent transactions, this falls foul of
> heap_prepare_freeze_tuple's expectation that
>
>  * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
>  * XID older than it could neither be running nor seen as running by any
>  * open transaction.  This ensures that the replacement will not change
>  * anyone's idea of the tuple state.
>
> The "cannot freeze committed xmax" error message appears to be banking on
> the assumption that we'd not reach heap_prepare_freeze_tuple for any
> committed-dead tuple unless its xmax is past the specified cutoff_xid.
>
> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
> rules for which tuples are removable and vacuum.c's rules for that.
> This seems like a massive bug in its own right: what's the point of
> pruneheap.c going to huge effort to decide whether it should keep a
> tuple if vacuum will then kill it anyway?  I do not understand why
> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
> and not VACUUM proper.

I am not sure I fully understand why you're contrasting pruneheap.c
with vacuum here, because vacuum just does HOT pruning to remove dead
tuples - maybe calling the relevant functions with different
arguments, but it doesn't have its own independent logic for that.

The key point is that the freezing code isn't, or at least
historically wasn't, very smart about dead tuples. For example, I
think if you told it to freeze something that was dead it would just
do it, which is obviously bad. And that's why Andres stuck those
sanity checks in there. But it's still pretty fragile. I think perhaps
the pruning code should be rewritten in such a way that it can be
combined with the code that freezes and marks pages all-visible, so
that there's not so much action at a distance, but such an endeavor is
in itself pretty scary, and certainly not back-patchable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Sun, Sep 20, 2020 at 1:13 PM Tom Lane <[hidden email]> wrote:
>> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
>> rules for which tuples are removable and vacuum.c's rules for that.
>> This seems like a massive bug in its own right: what's the point of
>> pruneheap.c going to huge effort to decide whether it should keep a
>> tuple if vacuum will then kill it anyway?  I do not understand why
>> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
>> and not VACUUM proper.

> I am not sure I fully understand why you're contrasting pruneheap.c
> with vacuum here, because vacuum just does HOT pruning to remove dead
> tuples - maybe calling the relevant functions with different
> arguments, but it doesn't have its own independent logic for that.

Right, but what we end up with is that the very same tuple xmin and
xmax might result in pruning/deletion, or not, depending on whether
it's part of a HOT chain or not.  That's at best pretty weird, and
at worst it means that corner-case bugs in other places are triggered
in only one of the two scenarios ... which is what we have here.

> The key point is that the freezing code isn't, or at least
> historically wasn't, very smart about dead tuples. For example, I
> think if you told it to freeze something that was dead it would just
> do it, which is obviously bad. And that's why Andres stuck those
> sanity checks in there. But it's still pretty fragile. I think perhaps
> the pruning code should be rewritten in such a way that it can be
> combined with the code that freezes and marks pages all-visible, so
> that there's not so much action at a distance, but such an endeavor is
> in itself pretty scary, and certainly not back-patchable.

Not sure.  The pruning code is trying to serve two masters, that is
both VACUUM and on-the-fly cleanup during ordinary queries.  If you
try to merge it with other tasks that VACUUM does, you're going to
have a mess for the second usage.  I fear there's going to be pretty
strong conservation of cruft either way.

FWIW, weakening the sanity checks in heap_prepare_freeze_tuple is
*not* my preferred fix here.  But it'll take some work in other
places to preserve them.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
On Mon, Sep 21, 2020 at 2:21 PM Tom Lane <[hidden email]> wrote:
> Right, but what we end up with is that the very same tuple xmin and
> xmax might result in pruning/deletion, or not, depending on whether
> it's part of a HOT chain or not.  That's at best pretty weird, and
> at worst it means that corner-case bugs in other places are triggered
> in only one of the two scenarios ... which is what we have here.

I'm not sure I really understand how that's happening, because surely
HOT chains and non-HOT chains are pruned by the same code, but it
doesn't sound good.

> FWIW, weakening the sanity checks in heap_prepare_freeze_tuple is
> *not* my preferred fix here.  But it'll take some work in other
> places to preserve them.

Make sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
Hi,

On 2020-09-21 16:02:29 -0400, Robert Haas wrote:

> On Mon, Sep 21, 2020 at 2:21 PM Tom Lane <[hidden email]> wrote:
> > Right, but what we end up with is that the very same tuple xmin and
> > xmax might result in pruning/deletion, or not, depending on whether
> > it's part of a HOT chain or not.  That's at best pretty weird, and
> > at worst it means that corner-case bugs in other places are triggered
> > in only one of the two scenarios ... which is what we have here.
>
> I'm not sure I really understand how that's happening, because surely
> HOT chains and non-HOT chains are pruned by the same code, but it
> doesn't sound good.

Not necessarily, unfortunately:

        case HEAPTUPLE_DEAD:

            /*
             * Ordinarily, DEAD tuples would have been removed by
             * heap_page_prune(), but it's possible that the tuple
             * state changed since heap_page_prune() looked.  In
             * particular an INSERT_IN_PROGRESS tuple could have
             * changed to DEAD if the inserter aborted.  So this
             * cannot be considered an error condition.
             *
             * If the tuple is HOT-updated then it must only be
             * removed by a prune operation; so we keep it just as if
             * it were RECENTLY_DEAD.  Also, if it's a heap-only
             * tuple, we choose to keep it, because it'll be a lot
             * cheaper to get rid of it in the next pruning pass than
             * to treat it like an indexed tuple. Finally, if index
             * cleanup is disabled, the second heap pass will not
             * execute, and the tuple will not get removed, so we must
             * treat it like any other dead tuple that we choose to
             * keep.
             *
             * If this were to happen for a tuple that actually needed
             * to be deleted, we'd be in trouble, because it'd
             * possibly leave a tuple below the relation's xmin
             * horizon alive.  heap_prepare_freeze_tuple() is prepared
             * to detect that case and abort the transaction,
             * preventing corruption.
             */
            if (HeapTupleIsHotUpdated(&tuple) ||
                HeapTupleIsHeapOnly(&tuple) ||
                params->index_cleanup == VACOPT_TERNARY_DISABLED)
                nkeep += 1;
            else
                tupgone = true; /* we can delete the tuple */
            all_visible = false;


So if e.g. a transaction aborts between the heap_page_prune and this
check the pruning behaviour depends on whether the tuple is part of a
HOT chain or not.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> On Mon, Sep 21, 2020 at 2:21 PM Tom Lane <[hidden email]> wrote:
>> Right, but what we end up with is that the very same tuple xmin and
>> xmax might result in pruning/deletion, or not, depending on whether
>> it's part of a HOT chain or not.  That's at best pretty weird, and
>> at worst it means that corner-case bugs in other places are triggered
>> in only one of the two scenarios ... which is what we have here.

> I'm not sure I really understand how that's happening, because surely
> HOT chains and non-HOT chains are pruned by the same code, but it
> doesn't sound good.

No, they're not.  lazy_scan_heap() will never remove a tuple that
is HeapTupleIsHotUpdated or HeapTupleIsHeapOnly, even if it thinks
it's DEAD -- cf. vacuumlazy.c, about line 1350.  So tuples in
a HOT chain are deleted exactly when pruneheap.c sees fit to do so.
OTOH, for tuples not in a HOT chain, the decision is (I believe)
entirely on lazy_scan_heap().  And the core of my complaint is that
pruneheap.c's decisions about what is DEAD are not reliably identical
to what HeapTupleSatisfiesVacuum thinks.

I don't mind if a free-standing prune operation has its own rules,
but when it's invoked by VACUUM it ought to follow VACUUM's rules about
what is dead or alive.  What remains unclear at this point is whether
we ought to import some of the intelligence added by the GlobalVisState
patch into VACUUM's behavior, or just dumb down pruneheap.c so that
it applies exactly the HeapTupleSatisfiesVacuum rules when invoked
by VACUUM.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-09-20 13:13:16 -0400, Tom Lane wrote:
> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
> rules for which tuples are removable and vacuum.c's rules for that.
> This seems like a massive bug in its own right: what's the point of
> pruneheap.c going to huge effort to decide whether it should keep a
> tuple if vacuum will then kill it anyway?  I do not understand why
> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
> and not VACUUM proper.

The reason for that is that the GlobalVisState stuff is computed
heuristically (and then re-checked if that's not sufficient to prune a
tuple, unless already done so). That's done so GetSnapshotData() doesn't
have to look at each backends ->xmin, which is quite a massive speedup
at higher connection counts, as each backends ->xmin changes much more
often than each backend's xid.

But for VACUUM we need to do the accurate scan of the procarray anyway,
because we need an accurate value for things other than HOT pruning
decisions.

What do you exactly mean with the "going to huge effort to decide" bit?


> I think to move forward, we need to figure out what the freezing
> behavior ought to be for temp tables.  We could make it the same
> as it was before a7212be8b, which'd just require some more complexity
> in vacuum_set_xid_limits.  However, that negates the idea that we'd
> like VACUUM's behavior on a temp table to be fully independent of
> whether concurrent transactions exist.  I'd prefer to allow a7212be8b's
> behavior to stand, but then it seems we need to lobotomize the error
> check in heap_prepare_freeze_tuple to some extent.

I think that's an argument for what I suggested elsewhere, which is that
we should move the logic for a different horizon for temp tables out of
vacuum_set_xid_limits, and into procarray.


> Independently of that, it seems like we need to fix things so that
> when pruneheap.c is called by vacuum, it makes EXACTLY the same
> dead-or-not-dead decisions that the main vacuum code makes.  This
> business with applying some GlobalVisState rule or other instead
> seems just as unsafe as can be.

It's not great, I agree. Not sure there is a super nice answer
though. Note that, even before my changes, vacuumlazy can decide
differently than pruning whether a tuple is live. E.g. when an inserting
transaction aborts. That's pretty much unavoidable as long as we have
multiple HTSV calls for a tuple, since none of our locking can (nor
should) prevent concurrent transactions from aborting.

Before your new code avoiding the GetOldestNonRemovableTransactionId()
call for temp tables, the GlobalVis* can never be more pessimistic than
decisions based ona prior GetOldestNonRemovableTransactionId call (as
that internally updates the heuristic horizons used by GlobalVis).

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Tom Lane-2
Andres Freund <[hidden email]> writes:
> The reason for that is that the GlobalVisState stuff is computed
> heuristically (and then re-checked if that's not sufficient to prune a
> tuple, unless already done so). That's done so GetSnapshotData() doesn't
> have to look at each backends ->xmin, which is quite a massive speedup
> at higher connection counts, as each backends ->xmin changes much more
> often than each backend's xid.

OK.

> What do you exactly mean with the "going to huge effort to decide" bit?

I'd looked at all the complexity around GlobalVisState, but failed to
register that it should be pretty cheap on a per-tuple basis.  So never
mind that complaint.  The point that remains is just that it's different
from HeapTupleSatisfiesVacuum's rules.

>> I think to move forward, we need to figure out what the freezing
>> behavior ought to be for temp tables.  We could make it the same
>> as it was before a7212be8b, which'd just require some more complexity
>> in vacuum_set_xid_limits.  However, that negates the idea that we'd
>> like VACUUM's behavior on a temp table to be fully independent of
>> whether concurrent transactions exist.  I'd prefer to allow a7212be8b's
>> behavior to stand, but then it seems we need to lobotomize the error
>> check in heap_prepare_freeze_tuple to some extent.

> I think that's an argument for what I suggested elsewhere, which is that
> we should move the logic for a different horizon for temp tables out of
> vacuum_set_xid_limits, and into procarray.

But procarray does not seem like a great place for
table-persistence-dependent decisions either?

>> Independently of that, it seems like we need to fix things so that
>> when pruneheap.c is called by vacuum, it makes EXACTLY the same
>> dead-or-not-dead decisions that the main vacuum code makes.  This
>> business with applying some GlobalVisState rule or other instead
>> seems just as unsafe as can be.

> It's not great, I agree. Not sure there is a super nice answer
> though. Note that, even before my changes, vacuumlazy can decide
> differently than pruning whether a tuple is live. E.g. when an inserting
> transaction aborts. That's pretty much unavoidable as long as we have
> multiple HTSV calls for a tuple, since none of our locking can (nor
> should) prevent concurrent transactions from aborting.

It's clear that if the environment changes between test A and test B,
we might get different results.  What I'm not happy about is that the
rules are different, so we might get different results even if the
environment did not change.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
Hi,

On 2020-09-21 16:40:40 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> >> I think to move forward, we need to figure out what the freezing
> >> behavior ought to be for temp tables.  We could make it the same
> >> as it was before a7212be8b, which'd just require some more complexity
> >> in vacuum_set_xid_limits.  However, that negates the idea that we'd
> >> like VACUUM's behavior on a temp table to be fully independent of
> >> whether concurrent transactions exist.  I'd prefer to allow a7212be8b's
> >> behavior to stand, but then it seems we need to lobotomize the error
> >> check in heap_prepare_freeze_tuple to some extent.
>
> > I think that's an argument for what I suggested elsewhere, which is that
> > we should move the logic for a different horizon for temp tables out of
> > vacuum_set_xid_limits, and into procarray.
>
> But procarray does not seem like a great place for
> table-persistence-dependent decisions either?

That ship has sailed a long long time ago though. GetOldestXmin() has
looked at the passed in relation for a quite a while, and even before
that we had logic about 'allDbs' etc.  It doesn't easily seem possible
to avoid that, given how intimately that's coupled with how snapshots
are built and used, database & vacuumFlags checks etc.

Greetings,

Andres Freund


1 ... 45678