recovering from "found xmin ... from before relfrozenxid ..."

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
142 messages Options
12345 ... 8
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Magnus Hagander-2
On Tue, Jul 14, 2020 at 1:52 PM Robert Haas <[hidden email]> wrote:
On Tue, Jul 14, 2020 at 4:59 AM Magnus Hagander <[hidden email]> wrote:
> The countersable of this is pg_resetwal. The number of people who have broken their database with that tool is not small.

Very true.

> That said, we could have a separate "class" of extensions/tools in the distribution, and encourage packagers to pack them up as separate packages for example. Technically they don't have to be in the same source repository at all of course, but I have a feeling some of them might be a lot easier to maintain if they are. And then the user would just have to install something like "postgresql-14-wizardtools". They'd still be available to everybody, of course, but at least the knives would be in a closed drawer until intentionally picked up.

I don't think that does much to help with the immediate problem here,
because people are being bitten by this problem *now* and a packaging
change like this will take a long time to happen and become standard
out there, but I think it's a good idea.

I don't think that it necessarily has to be. As long as we're talking about adding something and not actually changing their existing packages, getting this into both yum and apt shouldn't be *that* hard, if it's coordinated well with Christoph and Devrim (obviously that's based on my experience and they will have to give a more complete answer themselves). It would be a lot more complicated if it involved changing an existing package.

--
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
In reply to this post by Fujii Masao-4
On Mon, Jul 13, 2020 at 9:29 PM Fujii Masao <[hidden email]> wrote:
> But updating this tool can fit to the release schedule and
> policy of PostgreSQL?
>
> While investigating the problem by using this tool, we may want to
> add new feature into the tool because it's necessary for the investigation.
> But users would need to wait for next minor version release, to use this
> new feature.

Yeah, that's a point that needs careful thought. I don't think it
means that we shouldn't have something in core; after all, this is a
problem that is created in part by the way that PostgreSQL itself
works, and I think it would be quite unfriendly if we refused to do
anything about that in the core distribution. On the other hand, it
might be a good reason not to back-patch, which is something most
people don't seem enthusiastic about anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
In reply to this post by Magnus Hagander-2
On Tue, Jul 14, 2020 at 8:25 AM Magnus Hagander <[hidden email]> wrote:
> I don't think that it necessarily has to be. As long as we're talking about adding something and not actually changing their existing packages, getting this into both yum and apt shouldn't be *that* hard, if it's coordinated well with Christoph and Devrim (obviously that's based on my experience and they will have to give a more complete answer themselves). It would be a lot more complicated if it involved changing an existing package.

I mean, you presumably could not move pg_resetwal to this new package
in existing branches, right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Magnus Hagander-2


On Tue, Jul 14, 2020 at 4:09 PM Robert Haas <[hidden email]> wrote:
On Tue, Jul 14, 2020 at 8:25 AM Magnus Hagander <[hidden email]> wrote:
> I don't think that it necessarily has to be. As long as we're talking about adding something and not actually changing their existing packages, getting this into both yum and apt shouldn't be *that* hard, if it's coordinated well with Christoph and Devrim (obviously that's based on my experience and they will have to give a more complete answer themselves). It would be a lot more complicated if it involved changing an existing package.

I mean, you presumably could not move pg_resetwal to this new package
in existing branches, right?

Probably and eventually. But that can be done for 14+ (or 13+ depending on how "done" the packaging is there -- we should just make sure that hits the biggest platform in the same release). 

--
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Stephen Frost
Greetings,

* Magnus Hagander ([hidden email]) wrote:

> On Tue, Jul 14, 2020 at 4:09 PM Robert Haas <[hidden email]> wrote:
> > On Tue, Jul 14, 2020 at 8:25 AM Magnus Hagander <[hidden email]>
> > wrote:
> > > I don't think that it necessarily has to be. As long as we're talking
> > about adding something and not actually changing their existing packages,
> > getting this into both yum and apt shouldn't be *that* hard, if it's
> > coordinated well with Christoph and Devrim (obviously that's based on my
> > experience and they will have to give a more complete answer themselves).
> > It would be a lot more complicated if it involved changing an existing
> > package.
> >
> > I mean, you presumably could not move pg_resetwal to this new package
> > in existing branches, right?
>
> Probably and eventually. But that can be done for 14+ (or 13+ depending on
> how "done" the packaging is there -- we should just make sure that hits the
> biggest platform in the same release).
Considering we just got rid of the -contrib independent package on at
least Debian-based systems, it doesn't really seem likely that the
packagers are going to be anxious to create a new one- they are not
without costs.

Also, in such dire straits as this thread is contemplating, I would
think we'd *really* like to have access to these tools with as small an
amount of change as absolutely possible to the system: what if
pg_extension itself got munged and we aren't able to install this new
contrib module, for example?

I would suggest that, instead, we make this part of core, but have it be
in a relatively clearly marked special schema that isn't part of
search_path by default- eg: pg_hacks, or pg_dirty_hands (I kinda like
the latter though it seems a bit unprofessional for us).

I'd also certainly be in support of having a contrib module with the
same functions that's independent from core and available and able to be
installed on pre-v14 systems.  I'd further support having another repo
that's "core maintained" or however we want to phrase it which includes
this proposed module (and possibly all of contrib) and which has a
different release cadence and requirements for what gets into it, has
its own packages, etc.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2020-Jul-13, Andres Freund wrote:

> Hi,
>
> On 2020-07-13 17:12:18 -0400, Robert Haas wrote:
> > 1. There's nothing to identify the tuple that has the problem, and no
> > way to know how many more of them there might be. Back-patching
> > b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
> > part of this.
>
> Not fully, I'm afraid. Afaict it doesn't currently tell you the item
> pointer offset, just the block numer, right? We probably should extend
> it to also include the offset...
Just having the block number is already a tremendous step forward; with
that you can ask the customer to set a pageinspect dump of tuple
headers, and then the problem is obvious.  Now if you want to add block
number to that, by all means do so.

FWIW I do support the idea of backpatching the vacuum errcontext commit.

One useful thing to do is to mark a tuple frozen unconditionally if it's
marked hinted XMIN_COMMITTED; no need to consult pg_clog in that case.
The attached (for 9.6) does that; IIRC it would have helped in a couple
of cases.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-Don-t-test-for-xact-commit-if-tuple-is-hinted.patch (945 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2020-07-13 21:18:10 -0400, Robert Haas wrote:

> On Mon, Jul 13, 2020 at 9:10 PM Andres Freund <[hidden email]> wrote:
> > > What if clog has been truncated so that the xmin can't be looked up?
> >
> > That's possible, but probably only in cases where xmin actually
> > committed.
>
> Isn't that the normal case? I'm imagining something like:
>
> - Tuple gets inserted. Transaction commits.
> - VACUUM processes table.
> - Mischievous fairies mark page all-visible in the visibility map.
> - VACUUM runs lots more times, relfrozenxid advances, but without ever
> looking at the page in question, because it's all-visible.
> - clog is truncated, rendering xmin no longer accessible.
> - User runs VACUUM disabling page skipping, gets ERROR.
> - User deletes offending tuple.
> - At this point, I think the tuple is both invisible and unprunable?
> - Fairies happy, user sad.

I'm not saying it's impossible that that happens, but the cases I did
investigate didn't look like this. If something just roguely wrote to
the VM I'd expect a lot more "is not marked all-visible but visibility
map bit is set in relation" type WARNINGs, and I've not seen much of
those (they're WARNINGs though, so maybe we wouldn't). Presumably this
wouldn't always just happen with tuples that'd trigger an error first
during hot pruning.

I've definitely seen indications of both datfrozenxid and relfrozenxid
getting corrupted (in particular vac_update_datfrozenxid being racy as
hell), xid wraparound, indications of multixact problems (although it's
possible we've now fixed those) and some signs of corrupted relcache
entries for shared relations leading to vacuums being skipped.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
In reply to this post by Alvaro Herrera-9
Hi,

On 2020-07-14 13:20:25 -0400, Alvaro Herrera wrote:

> On 2020-Jul-13, Andres Freund wrote:
>
> > Hi,
> >
> > On 2020-07-13 17:12:18 -0400, Robert Haas wrote:
> > > 1. There's nothing to identify the tuple that has the problem, and no
> > > way to know how many more of them there might be. Back-patching
> > > b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
> > > part of this.
> >
> > Not fully, I'm afraid. Afaict it doesn't currently tell you the item
> > pointer offset, just the block numer, right? We probably should extend
> > it to also include the offset...
>
> Just having the block number is already a tremendous step forward; with
> that you can ask the customer to set a pageinspect dump of tuple
> headers, and then the problem is obvious.  Now if you want to add block
> number to that, by all means do so.

offset number I assume?


> One useful thing to do is to mark a tuple frozen unconditionally if it's
> marked hinted XMIN_COMMITTED; no need to consult pg_clog in that case.
> The attached (for 9.6) does that; IIRC it would have helped in a couple
> of cases.

I think it might also have hidden corruption in at least one case where
we subsequently fixed a bug (and helped detect at least one unfixed
bug). That should only be possible if either required clog has been
removed, or if relfrozenxid/datfrozenxid are corrupt, right?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2020-07-14 07:51:27 -0400, Robert Haas wrote:
> On Tue, Jul 14, 2020 at 3:08 AM Peter Eisentraut
> <[hidden email]> wrote:
> > In the meantime, if you're wizard enough to deal with this kind of
> > thing, you could also clone the module from the PG14 tree and build it
> > against older versions manually.
>
> But what if you are NOT a wizard, and a wizard is giving you
> directions? Then having to build from source is a real pain. And
> that's normally the situation I'm in when a customer has this issue.

The "found xmin ... from before relfrozenxid ..." cases should all be
fixable without needing such a function, and without it making fixing
them significantly easier, no? As far as I understand your suggested
solution, you need the tid(s) of these tuples, right? If you have those,
I don't think it's meaningfully harder to INSERT ... DELETE WHERE ctid =
.... or something like that.

ISTM that the hard part is finding all problematic tuples in an
efficient manner (i.e. that doesn't require one manual VACUUM for each
individual block + parsing VACUUMs error message), not "fixing" those
tuples.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2020-Jul-14, Andres Freund wrote:

> Hi,
>
> On 2020-07-14 13:20:25 -0400, Alvaro Herrera wrote:

> > Just having the block number is already a tremendous step forward; with
> > that you can ask the customer to set a pageinspect dump of tuple
> > headers, and then the problem is obvious.  Now if you want to add block
> > number to that, by all means do so.
>
> offset number I assume?

Eh, yeah, that.

> > One useful thing to do is to mark a tuple frozen unconditionally if it's
> > marked hinted XMIN_COMMITTED; no need to consult pg_clog in that case.
> > The attached (for 9.6) does that; IIRC it would have helped in a couple
> > of cases.
>
> I think it might also have hidden corruption in at least one case where
> we subsequently fixed a bug (and helped detect at least one unfixed
> bug). That should only be possible if either required clog has been
> removed, or if relfrozenxid/datfrozenxid are corrupt, right?

Yes, that's precisely the reason I never submitted it :-)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
In reply to this post by Andres Freund
On Tue, Jul 14, 2020 at 3:42 PM Andres Freund <[hidden email]> wrote:

> The "found xmin ... from before relfrozenxid ..." cases should all be
> fixable without needing such a function, and without it making fixing
> them significantly easier, no? As far as I understand your suggested
> solution, you need the tid(s) of these tuples, right? If you have those,
> I don't think it's meaningfully harder to INSERT ... DELETE WHERE ctid =
> .... or something like that.
>
> ISTM that the hard part is finding all problematic tuples in an
> efficient manner (i.e. that doesn't require one manual VACUUM for each
> individual block + parsing VACUUMs error message), not "fixing" those
> tuples.

I haven't tried the INSERT ... DELETE approach, but I've definitely
seen a case where a straight UPDATE did not fix the problem; VACUUM
continued failing afterwards. In that case, it was a system catalog
that was affected, and not one where TRUNCATE + re-INSERT was remotely
practical. The only solution I could come up with was to drop the
database and recreate it. Fortunately in that case the affected
database didn't seem to have any actual data in it, but if it had been
a 1TB database I think we would have been in really bad trouble.

Do you have a reason for believing that INSERT ... DELETE is going to
be better than UPDATE? It seems to me that either way you can end up
with a deleted and thus invisible tuple that you still can't get rid
of.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund
Hi,

On 2020-07-14 15:59:21 -0400, Robert Haas wrote:

> On Tue, Jul 14, 2020 at 3:42 PM Andres Freund <[hidden email]> wrote:
> > The "found xmin ... from before relfrozenxid ..." cases should all be
> > fixable without needing such a function, and without it making fixing
> > them significantly easier, no? As far as I understand your suggested
> > solution, you need the tid(s) of these tuples, right? If you have those,
> > I don't think it's meaningfully harder to INSERT ... DELETE WHERE ctid =
> > .... or something like that.
> >
> > ISTM that the hard part is finding all problematic tuples in an
> > efficient manner (i.e. that doesn't require one manual VACUUM for each
> > individual block + parsing VACUUMs error message), not "fixing" those
> > tuples.
>
> I haven't tried the INSERT ... DELETE approach, but I've definitely
> seen a case where a straight UPDATE did not fix the problem; VACUUM
> continued failing afterwards.

The only way I can see that to happen is for the old tuple's multixact
being copied forward. That'd not happen with INSERT ... DELETE.


> In that case, it was a system catalog
> that was affected, and not one where TRUNCATE + re-INSERT was remotely
> practical.

FWIW, an rewriting ALTER TABLE would likely also fix it. But obviously
that'd require allow_system_table_mods...



> Do you have a reason for believing that INSERT ... DELETE is going to
> be better than UPDATE? It seems to me that either way you can end up
> with a deleted and thus invisible tuple that you still can't get rid
> of.

None of the "new" checks around freezing would apply to deleted
tuples. So we shouldn't fail with an error like $subject.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
On Wed, Jul 15, 2020 at 11:41 AM Andres Freund <[hidden email]> wrote:
> > Do you have a reason for believing that INSERT ... DELETE is going to
> > be better than UPDATE? It seems to me that either way you can end up
> > with a deleted and thus invisible tuple that you still can't get rid
> > of.
>
> None of the "new" checks around freezing would apply to deleted
> tuples. So we shouldn't fail with an error like $subject.

It can definitely happen at least transiently:

S1:
rhaas=# create table wubble (a int, b text);
CREATE TABLE
rhaas=# insert into wubble values (1, 'glumpf');
INSERT 0 1

S2:
rhaas=# begin transaction isolation level repeatable read;
BEGIN
rhaas=*# select * from wubble;
 a |   b
---+--------
 1 | glumpf
(1 row)

S1:
rhaas=# delete from wubble;
DELETE 1
rhaas=# update pg_class set relfrozenxid =
(relfrozenxid::text::integer + 1000000)::text::xid where relname =
'wubble';
UPDATE 1
rhaas=# vacuum verbose wubble;
INFO:  vacuuming "public.wubble"
ERROR:  found xmin 528 from before relfrozenxid 1000527
CONTEXT:  while scanning block 0 of relation "public.wubble"

S2:
rhaas=*# commit;
COMMIT

S1:
rhaas=# vacuum verbose wubble;
INFO:  vacuuming "public.wubble"
INFO:  "wubble": removed 1 row versions in 1 pages
INFO:  "wubble": found 1 removable, 0 nonremovable row versions in 1
out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 531
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "wubble": truncated 1 to 0 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 532
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

I see your point, though: the tuple has to be able to survive
HOT-pruning in order to cause a problem when we check whether it needs
freezing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Robert Haas
On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <[hidden email]> wrote:
> I see your point, though: the tuple has to be able to survive
> HOT-pruning in order to cause a problem when we check whether it needs
> freezing.

Here's an example where the new sanity checks fail on an invisible
tuple without any concurrent transactions:

$ initdb
$ pg_ctl start -l ~/logfile
$ createdb
$ psql

create table simpsons (a int, b text);
vacuum freeze;

$ cat > txid.sql
select txid_current();
$ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
$ psql

insert into simpsons values (1, 'homer');

$ pg_ctl stop
$ pg_resetwal -x 1000 $PGDATA
$ pg_ctl start -l ~/logfile
$ psql

update pg_class set relfrozenxid = (relfrozenxid::text::integer +
2000000)::text::xid where relname = 'simpsons';

rhaas=# select * from simpsons;
 a | b
---+---
(0 rows)

rhaas=# vacuum simpsons;
ERROR:  found xmin 1049082 from before relfrozenxid 2000506
CONTEXT:  while scanning block 0 of relation "public.simpsons"

This is a fairly insane situation, because we should have relfrozenxid
< tuple xid < xid counter, but instead we have xid counter < tuple xid
< relfrozenxid, but it demonstrates that it's possible to have a
database which is sufficiently corrupt that you can't escape from the
new sanity checks using only INSERT, UPDATE, and DELETE.

Now, an even easier way to create a table with a tuple that prevents
vacuuming and also can't just be deleted is to simply remove a
required pg_clog file (and maybe restart the server to clear out any
cached data in the SLRUs). What we typically do with customers who
need to recover from that situation today is give them a script to
fabricate a bogus CLOG file that shows all transactions as committed
(or, perhaps, aborted). But I think that the tools proposed on this
thread might be a better approach in certain cases. If the problem is
that a pg_clog file vanished, then recreating it with whatever content
you think is closest to what was probably there before is likely the
best you can do. But if you've got some individual tuples with crazy
xmin values, you don't really want to drop matching files in pg_clog;
it's better to fix the tuples.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Ashutosh Sharma
Hi All,

Attached is the patch that adds heap_force_kill(regclass, tid[]) and heap_force_freeze(regclass, tid[]) functions which Robert mentioned in the first email in this thread. The patch basically adds an extension named pg_surgery that contains these functions.  Please have a look and let me know your feedback. Thank you.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


On Thu, Jul 16, 2020 at 9:44 PM Robert Haas <[hidden email]> wrote:
On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <[hidden email]> wrote:
> I see your point, though: the tuple has to be able to survive
> HOT-pruning in order to cause a problem when we check whether it needs
> freezing.

Here's an example where the new sanity checks fail on an invisible
tuple without any concurrent transactions:

$ initdb
$ pg_ctl start -l ~/logfile
$ createdb
$ psql

create table simpsons (a int, b text);
vacuum freeze;

$ cat > txid.sql
select txid_current();
$ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
$ psql

insert into simpsons values (1, 'homer');

$ pg_ctl stop
$ pg_resetwal -x 1000 $PGDATA
$ pg_ctl start -l ~/logfile
$ psql

update pg_class set relfrozenxid = (relfrozenxid::text::integer +
2000000)::text::xid where relname = 'simpsons';

rhaas=# select * from simpsons;
 a | b
---+---
(0 rows)

rhaas=# vacuum simpsons;
ERROR:  found xmin 1049082 from before relfrozenxid 2000506
CONTEXT:  while scanning block 0 of relation "public.simpsons"

This is a fairly insane situation, because we should have relfrozenxid
< tuple xid < xid counter, but instead we have xid counter < tuple xid
< relfrozenxid, but it demonstrates that it's possible to have a
database which is sufficiently corrupt that you can't escape from the
new sanity checks using only INSERT, UPDATE, and DELETE.

Now, an even easier way to create a table with a tuple that prevents
vacuuming and also can't just be deleted is to simply remove a
required pg_clog file (and maybe restart the server to clear out any
cached data in the SLRUs). What we typically do with customers who
need to recover from that situation today is give them a script to
fabricate a bogus CLOG file that shows all transactions as committed
(or, perhaps, aborted). But I think that the tools proposed on this
thread might be a better approach in certain cases. If the problem is
that a pg_clog file vanished, then recreating it with whatever content
you think is closest to what was probably there before is likely the
best you can do. But if you've got some individual tuples with crazy
xmin values, you don't really want to drop matching files in pg_clog;
it's better to fix the tuples.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



0001-Add-contrib-pg_surgery-to-perform-surgery-on-the-dam.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andrey M. Borodin


> 24 июля 2020 г., в 14:05, Ashutosh Sharma <[hidden email]> написал(а):
>
> Attached is the patch that adds heap_force_kill(regclass, tid[]) and heap_force_freeze(regclass, tid[]) functions which Robert mentioned in the first email in this thread. The patch basically adds an extension named pg_surgery that contains these functions.  Please have a look and let me know your feedback. Thank you.

Thanks for the patch!
I have just few random thoughts.

I think here we should report that we haven't done what was asked.
+ /* Nothing to do if the itemid is unused or already dead. */
+ if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+ continue;

Also, should we try to fix VM along the way?
Are there any caveats with concurrent VACUUM? (I do not see any, just asking)
It would be good to have some checks for interrupts in safe places.

I think we should not trust user entierly here. I'd prefer validation and graceful exit, not a core dump.
+ Assert(noffs <= PageGetMaxOffsetNumber(page));

For some reason we had unlogged versions of these functions. But I do not recall exact rationale..
Also, I'd be happy if we had something like "Restore this tuple iff this does not break unique constraint". To do so we need to sort tids by xmin\xmax, to revive most recent data first.

Thanks!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Ashutosh Sharma
Hi,

Thanks for sharing your thoughts. Please find my comments inline below:
 

I think here we should report that we haven't done what was asked.
+                       /* Nothing to do if the itemid is unused or already dead. */
+                       if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+                               continue;


Okay. Will add a log message saying "skipping tid ... because ..."
 
Also, should we try to fix VM along the way?

I think we should let VACUUM do that.
 
Are there any caveats with concurrent VACUUM? (I do not see any, just asking)

As of now, I don't see any.
 
It would be good to have some checks for interrupts in safe places.

I think I have already added those wherever I felt it was required. If you feel it's missing somewhere, it could be good if you could point it out.
 
I think we should not trust user entierly here. I'd prefer validation and graceful exit, not a core dump.
+               Assert(noffs <= PageGetMaxOffsetNumber(page));


Yeah, sounds reasonable. Will do that.
 
For some reason we had unlogged versions of these functions. But I do not recall exact rationale..
Also, I'd be happy if we had something like "Restore this tuple iff this does not break unique constraint". To do so we need to sort tids by xmin\xmax, to revive most recent data first.

I didn't get this point. Could you please elaborate. 

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Andrey M. Borodin


> 27 июля 2020 г., в 09:36, Ashutosh Sharma <[hidden email]> написал(а):
>
> > Also, should we try to fix VM along the way?
>
> I think we should let VACUUM do that.
Sometimes VACUUM will not get to these pages, because they are marked All Frozen.
An possibly some tuples will get stale on this page again

> > Are there any caveats with concurrent VACUUM? (I do not see any, just asking)
>
> As of now, I don't see any.
VACUUM has collection of dead item pointers. We will not resurrect any of them, right?

> > It would be good to have some checks for interrupts in safe places.
>
> I think I have already added those wherever I felt it was required. If you feel it's missing somewhere, it could be good if you could point it out.
Sorry, I just overlooked that call, everything is fine here.

> > Also, I'd be happy if we had something like "Restore this tuple iff this does not break unique constraint". To do so we need to sort tids by xmin\xmax, to revive most recent data first.
>
> I didn't get this point. Could you please elaborate.
You may have 10 corrupted tuples for the same record, that was updated 9 times. And if you have unique constraint on the table you may want to have only latest version of the row. So you want to kill 9 tuples and freeze 1.

Thanks!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

MBeena Emerson
In reply to this post by Ashutosh Sharma
Hello Ashutosh,

On Fri, 24 Jul 2020 at 14:35, Ashutosh Sharma <[hidden email]> wrote:
>
> Hi All,
>
> Attached is the patch that adds heap_force_kill(regclass, tid[]) and heap_force_freeze(regclass, tid[]) functions which Robert mentioned in the first email in this thread. The patch basically adds an extension named pg_surgery that contains these functions.  Please have a look and let me know your feedback. Thank you.
>

Thanks for the patch.

1. We would be marking buffer dirty and writing wal even if we have
not done any changes( ex if we pass invalid/dead tids). Maybe we can
handle this better?

cosmetic changes
1. Maybe "HTupleSurgicalOption" instead of "HTupleForceOption" and
also the variable names could use surgery instead.
2. extension comment pg_surgery.control "extension to perform surgery
the damaged heap table" -> "extension to perform surgery on the
damaged heap table"

> On Thu, Jul 16, 2020 at 9:44 PM Robert Haas <[hidden email]> wrote:
>>
>> On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <[hidden email]> wrote:
>> > I see your point, though: the tuple has to be able to survive
>> > HOT-pruning in order to cause a problem when we check whether it needs
>> > freezing.
>>
>> Here's an example where the new sanity checks fail on an invisible
>> tuple without any concurrent transactions:
>>
>> $ initdb
>> $ pg_ctl start -l ~/logfile
>> $ createdb
>> $ psql
>>
>> create table simpsons (a int, b text);
>> vacuum freeze;
>>
>> $ cat > txid.sql
>> select txid_current();
>> $ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
>> $ psql
>>
>> insert into simpsons values (1, 'homer');
>>
>> $ pg_ctl stop
>> $ pg_resetwal -x 1000 $PGDATA
>> $ pg_ctl start -l ~/logfile
>> $ psql
>>
>> update pg_class set relfrozenxid = (relfrozenxid::text::integer +
>> 2000000)::text::xid where relname = 'simpsons';
>>
>> rhaas=# select * from simpsons;
>>  a | b
>> ---+---
>> (0 rows)
>>
>> rhaas=# vacuum simpsons;
>> ERROR:  found xmin 1049082 from before relfrozenxid 2000506
>> CONTEXT:  while scanning block 0 of relation "public.simpsons"
>>
>> This is a fairly insane situation, because we should have relfrozenxid
>> < tuple xid < xid counter, but instead we have xid counter < tuple xid
>> < relfrozenxid, but it demonstrates that it's possible to have a
>> database which is sufficiently corrupt that you can't escape from the
>> new sanity checks using only INSERT, UPDATE, and DELETE.
>>
>> Now, an even easier way to create a table with a tuple that prevents
>> vacuuming and also can't just be deleted is to simply remove a
>> required pg_clog file (and maybe restart the server to clear out any
>> cached data in the SLRUs). What we typically do with customers who
>> need to recover from that situation today is give them a script to
>> fabricate a bogus CLOG file that shows all transactions as committed
>> (or, perhaps, aborted). But I think that the tools proposed on this
>> thread might be a better approach in certain cases. If the problem is
>> that a pg_clog file vanished, then recreating it with whatever content
>> you think is closest to what was probably there before is likely the
>> best you can do. But if you've got some individual tuples with crazy
>> xmin values, you don't really want to drop matching files in pg_clog;
>> it's better to fix the tuples.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>


--

M Beena Emerson

Sr. Software Engineer


edbpostgres.com


Reply | Threaded
Open this post in threaded view
|

Re: recovering from "found xmin ... from before relfrozenxid ..."

Ashutosh Sharma
In reply to this post by Andrey M. Borodin

> I think we should let VACUUM do that.
Sometimes VACUUM will not get to these pages, because they are marked All Frozen.
An possibly some tuples will get stale on this page again

Hmm, okay, will have a look into this. Thanks.
 

> > Are there any caveats with concurrent VACUUM? (I do not see any, just asking)
>
> As of now, I don't see any.
VACUUM has collection of dead item pointers. We will not resurrect any of them, right?

We won't be doing any such things.
 
> > It would be good to have some checks for interrupts in safe places.
>
> I think I have already added those wherever I felt it was required. If you feel it's missing somewhere, it could be good if you could point it out.
Sorry, I just overlooked that call, everything is fine here.

Okay, thanks for confirming.
 
> > Also, I'd be happy if we had something like "Restore this tuple iff this does not break unique constraint". To do so we need to sort tids by xmin\xmax, to revive most recent data first.
>
> I didn't get this point. Could you please elaborate.
You may have 10 corrupted tuples for the same record, that was updated 9 times. And if you have unique constraint on the table you may want to have only latest version of the row. So you want to kill 9 tuples and freeze 1.

Okay, in that case, users need to pass the tids of the 9 tuples that they want to kill to heap_force_kill function and the tid of the tuple that they want to be marked as frozen to heap_force_freeze function. Just to inform you that this tool is not used to detect any data corruption, it is just meant to remove/clean the corrupted data in a table so that the operations like vacuum, pg_dump/restore succeeds. It's users responsibility to identify the corrupted data and pass its tid to either of these functions as per the requirement.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
12345 ... 8