Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
While testing an xidStopLimit corner case, I got this:

3656710 2019-01-05 00:05:13.910 GMT LOG:  automatic aggressive vacuum to prevent wraparound of table "test.pg_toast.pg_toast_826": index scans: 0
3656710 2019-01-05 00:05:16.912 GMT LOG:  could not truncate directory "pg_xact": apparent wraparound
3656710 2019-01-05 00:05:16.912 GMT DEBUG:  transaction ID wrap limit is 4294486400, limited by database with OID 1
3656710 2019-01-05 00:05:16.912 GMT WARNING:  database "template1" must be vacuumed within 481499 transactions
3656710 2019-01-05 00:05:16.912 GMT HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

I think the WARNING was correct about having 481499 XIDs left before
xidWrapLimit, and the spurious "apparent wraparound" arose from this
rounding-down in SimpleLruTruncate():

        cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
...
        /*
         * While we are holding the lock, make an important safety check: the
         * planned cutoff point must be <= the current endpoint page. Otherwise we
         * have already wrapped around, and proceeding with the truncation would
         * risk removing the current segment.
         */
        if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
        {
                LWLockRelease(shared->ControlLock);
                ereport(LOG,
                                (errmsg("could not truncate directory \"%s\": apparent wraparound",
                                                ctl->Dir)));

We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return
false for the segment containing the cutoff page.  CLOGPagePrecedes() (and
most SLRU PagePrecedes methods) implements a circular address space.  Hence,
the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true
for the segment furthest in the future relative to the unrounded cutoffPage
(if it exists).  That's bad.  Such a segment rarely exists, because
xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no
more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) =
1048575 XIDs.  Thus, I expect to see this problem at 4.9% of xidStopLimit
values.  I expect this is easier to see with multiStopLimit, which protects
only 100 mxid.

The main consequence is the false alarm.  A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred.  Also, we temporarily
waste disk space in pg_xact.  This feels like a recipe for future bugs.  The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage).  I'm inclined not to
back-patch this; does anyone favor back-patching?

Thanks,
nm

slru-truncate-modulo-v1.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.  A prudent DBA will want to react to
> true wraparound, but no such wraparound has occurred.  Also, we temporarily
> waste disk space in pg_xact.  This feels like a recipe for future bugs.  The
> fix I have in mind, attached, is to change instances of
> ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
> ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage).  I'm inclined not to
> back-patch this; does anyone favor back-patching?

To avoid wasting more of anyone's time: that patch is bad; I'll update this
thread when I have something better.

Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
In reply to this post by Noah Misch-2
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.

That conclusion was incorrect.  On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:

1. The result of the test is valid only until we release the SLRU ControlLock,
   which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
   segments for deletion.  Once we release that lock, latest_page_number can
   advance.  This creates a TOCTOU race condition, allowing excess deletion:

   [local] test=# table trunc_clog_concurrency ;
   ERROR:  could not access status of transaction 2149484247
   DETAIL:  Could not open file "pg_xact/0801": No such file or directory.

2. By the time the "apparent wraparound" test fires, we've already WAL-logged
   the truncation.  clog_redo() suppresses the "apparent wraparound" test,
   then deletes too much.  Startup then fails:

   881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of transaction 708112327
   881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file "pg_xact/02A3": No such file or directory.
   881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited with exit code 1


Fixes are available:

a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
   wrong; I will correct it in a separate message.)

b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
   AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
   checkpoint, or in the startup process.  Hence, also arrange for only one
   backend to call SimpleLruTruncate(AsyncCtl) at a time.

c. Test "apparent wraparound" before writing WAL, and don't WAL-log
   truncations that "apparent wraparound" forces us to skip.

d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
   the TOCTOU race condition, but TransactionIdDidCommit() and other key
   operations would be waiting behind filesystem I/O.

e. Have the SLRU track a "low cutoff" for an ongoing truncation.  Initially,
   the low cutoff is the page furthest in the past relative to cutoffPage (the
   "high cutoff").  If SimpleLruZeroPage() wishes to use a page in the
   truncation range, it would acquire an LWLock and increment the low cutoff.
   Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
   same LWLock and recheck the segment against the latest low cutoff.

With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only.  I don't want (d), which threatens
performance too much.  I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.

Can you see a way to improve on that plan?  Can you see other bugs of this
nature that this plan does not fix?

Thanks,
nm

Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> On further study, I was able to reproduce data loss

> Fixes are available:
>
> a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
>    wrong; I will correct it in a separate message.)

Here's a corrected version.  I now delete a segment only if both its first
page and its last page are considered to precede the cutoff; see the new
comment at SlruMayDeleteSegment().

slru-truncate-modulo-v2.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Tomas Vondra-4
In reply to this post by Noah Misch-2
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:

>On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
>> The main consequence is the false alarm.
>
>That conclusion was incorrect.  On further study, I was able to reproduce data
>loss via either of two weaknesses in the "apparent wraparound" test:
>
>1. The result of the test is valid only until we release the SLRU ControlLock,
>   which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
>   segments for deletion.  Once we release that lock, latest_page_number can
>   advance.  This creates a TOCTOU race condition, allowing excess deletion:
>
>   [local] test=# table trunc_clog_concurrency ;
>   ERROR:  could not access status of transaction 2149484247
>   DETAIL:  Could not open file "pg_xact/0801": No such file or directory.
>
>2. By the time the "apparent wraparound" test fires, we've already WAL-logged
>   the truncation.  clog_redo() suppresses the "apparent wraparound" test,
>   then deletes too much.  Startup then fails:
>
>   881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of transaction 708112327
>   881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file "pg_xact/02A3": No such file or directory.
>   881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited with exit code 1
>
>
>Fixes are available:
>
>a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
>   wrong; I will correct it in a separate message.)
>
>b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
>   AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
>   checkpoint, or in the startup process.  Hence, also arrange for only one
>   backend to call SimpleLruTruncate(AsyncCtl) at a time.
>
>c. Test "apparent wraparound" before writing WAL, and don't WAL-log
>   truncations that "apparent wraparound" forces us to skip.
>
>d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
>   the TOCTOU race condition, but TransactionIdDidCommit() and other key
>   operations would be waiting behind filesystem I/O.
>
>e. Have the SLRU track a "low cutoff" for an ongoing truncation.  Initially,
>   the low cutoff is the page furthest in the past relative to cutoffPage (the
>   "high cutoff").  If SimpleLruZeroPage() wishes to use a page in the
>   truncation range, it would acquire an LWLock and increment the low cutoff.
>   Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
>   same LWLock and recheck the segment against the latest low cutoff.
>
>With both (a) and (b), the only way I'd know to reach the "apparent
>wraparound" message is to restart in single-user mode and burn XIDs to the
>point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
>and I propose (c) for HEAD only.  I don't want (d), which threatens
>performance too much.  I would rather not have (e), because I expect it's more
>complex than (b) and fixes strictly less than (b) fixes.
>
>Can you see a way to improve on that plan?  Can you see other bugs of this
>nature that this plan does not fix?
>

Seems reasonable, although I wonder how much more expensive would just
doing (d) be. It seems by far the least complex solution, and it moves
"just" the SlruScanDirectory() call before the lock. It's true it adds
I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
number of files to be relatively low. Plus we already do SimpleLruWaitIO
and SlruInternalWritePage in the loop.

BTW isn't that an issue that SlruInternalDeleteSegment does not do any
fsync calls after unlinking the segments? If the system crashes/reboots
before this becomes persistent (i.e. some of the segments reappear,
won't that cause a problem)?


It's a bit unfortunate that a patch for a data corruption / loss issue
(even if a low-probability one) fell through multiple commitfests.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:

> On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> >a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
> >  wrong; I will correct it in a separate message.)
> >
> >b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
> >  AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> >  checkpoint, or in the startup process.  Hence, also arrange for only one
> >  backend to call SimpleLruTruncate(AsyncCtl) at a time.
> >
> >c. Test "apparent wraparound" before writing WAL, and don't WAL-log
> >  truncations that "apparent wraparound" forces us to skip.
> >
> >d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
> >  the TOCTOU race condition, but TransactionIdDidCommit() and other key
> >  operations would be waiting behind filesystem I/O.

> >With both (a) and (b), the only way I'd know to reach the "apparent
> >wraparound" message is to restart in single-user mode and burn XIDs to the
> >point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
> >and I propose (c) for HEAD only.  I don't want (d), which threatens
> >performance too much.  I would rather not have (e), because I expect it's more
> >complex than (b) and fixes strictly less than (b) fixes.

> Seems reasonable, although I wonder how much more expensive would just
> doing (d) be. It seems by far the least complex solution, and it moves
> "just" the SlruScanDirectory() call before the lock. It's true it adds
> I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
> number of files to be relatively low. Plus we already do SimpleLruWaitIO
> and SlruInternalWritePage in the loop.

Trivial read-only transactions often need CLogControlLock to check tuple
visibility.  If an unlink() takes 1s, stalling read-only transactions for that
1s is a big problem.  SimpleLruWaitIO() and SlruInternalWritePage() release
the control lock during I/O, then re-acquire it.  (Moreover,
SimpleLruTruncate() rarely calls them.  Calling them implies a page old enough
to truncate was modified after the most recent checkpoint.)

> BTW isn't that an issue that SlruInternalDeleteSegment does not do any
> fsync calls after unlinking the segments? If the system crashes/reboots
> before this becomes persistent (i.e. some of the segments reappear,
> won't that cause a problem)?

I think not; we could turn SlruInternalDeleteSegment() into a no-op, and the
only SQL-visible consequence would be extra disk usage.  CheckPoint fields
tell the server what region of slru data is meaningful, and segments outside
that range merely waste disk space.  (If that's not true and can't be made
true, we'd also need to stop ignoring the unlink() return value.)

> It's a bit unfortunate that a patch for a data corruption / loss issue
> (even if a low-probability one) fell through multiple commitfests.

Thanks for investing in steps to fix that.


Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Tom Lane-2
Noah Misch <[hidden email]> writes:
> On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
>> It's a bit unfortunate that a patch for a data corruption / loss issue
>> (even if a low-probability one) fell through multiple commitfests.

> Thanks for investing in steps to fix that.

Yeah, this patch has been waiting in the queue for way too long :-(.

I spent some time studying this, and I have a few comments/questions:

1. It seems like this discussion is conflating two different issues.
The original complaint was about a seemingly-bogus log message "could
not truncate directory "pg_xact": apparent wraparound".  Your theory
about that, IIUC, is that SimpleLruTruncate's initial round-back of
cutoffPage to a segment boundary moved us from a state where
shared->latest_page_number doesn't logically precede the cutoffPage to
one where it does, triggering the message.  So removing the initial
round-back, and then doing whatever's needful to compensate for that in
the later processing, is a reasonable fix to prevent the bogus warning.
However, you're also discussing whether or not an SLRU segment file that
is close to the wraparound boundary should get removed or not.  As far
as I can see that's 100% independent of issuance of the log message, no?
This might not affect the code substance of the patch at all; but it
seems like we need to be clear about it in our discussion, and maybe the
comments need to change too.

2. I wonder whether we have an issue even with rounding back to the
SLRU page boundary, as is done by each caller before we ever get to
SimpleLruTruncate.  I'm pretty sure that the actual anti-wraparound
protections are all precise to the XID level, so that there's some
daylight between what SimpleLruTruncate is told and the range of
data that the higher-level code thinks is of interest.  Maybe we
should restructure things so that we keep the original cutoff XID
(or equivalent) all the way through, and compare the start/end
positions of a segment file directly to that.

3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem.  I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.

To try to clarify this in my head, I thought about an image of
the modular XID space as an octagon, where each side would correspond to
a segment file if we chose numbers such that there were only 8 possible
segment files.  Let's say that nextXID is somewhere in the bottommost
octagon segment.  The oldest possible value for the truncation cutoff
XID is a bit less than "halfway around" from nextXID; so it could be
in the topmost octagon segment, if the minimum permitted daylight-
till-wraparound is less than the SLRU segment size (which it is).
Then, if we round the cutoff XID "back" to a segment boundary, most
of the current (bottommost) segment is now less than halfway around
from that cutoff point, and in particular the current segment's
starting page is exactly halfway around.  Because of the way that
TransactionIdPrecedes works, the current segment will be considered to
precede that cutoff point (the int32 difference comes out as exactly
2^31 which is negative).  Boom, data loss, because we'll decide the
current segment is removable.

I think that your proposed patch does fix this, but I'm not quite
sold that the corner cases (where the cutoff XID is itself exactly
at a page boundary) are right.  In any case, I think it'd be more
robust to be comparing explicitly against a notion of the latest
in-use page number, instead of backing into it from an assumption
that the cutoff XID itself is less than halfway around.

I wonder if we ought to dodge the problem by having a higher minimum
value of the required daylight-before-wraparound, so that the cutoff
point couldn't be in the diametrically-opposite-current segment but
would have to be at least one segment before that.  In the end,
I believe that all of this logic was written under an assumption
that we should never get into a situation where we are so close
to the wraparound threshold that considerations like these would
manifest.  Maybe we can get it right, but I don't have huge
faith in it.

It also bothers me that some of the callers of SimpleLruTruncate
have explicit one-count backoffs of the cutoff point and others
do not.  There's no obvious reason for the difference, so I wonder
if that isn't something we should have across-the-board, or else
adjust SimpleLruTruncate to do the equivalent thing internally.

I haven't thought much yet about your second point about race
conditions arising from nextXID possibly moving before we
finish the deletion scan.  Maybe we could integrate a fix for
that issue, along the lines of (1) see an SLRU segment file,
(2) determine that it appears to precede the cutoff XID, if so
(3) acquire the control lock and fetch latest_page_number,
compare against that to verify that the segment file is old
and not new, then (4) unlink if that still holds.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
> Yeah, this patch has been waiting in the queue for way too long :-(.

Thanks for reviewing.

> I spent some time studying this, and I have a few comments/questions:
>
> 1. It seems like this discussion is conflating two different issues.
> The original complaint was about a seemingly-bogus log message "could
> not truncate directory "pg_xact": apparent wraparound".  Your theory
> about that, IIUC, is that SimpleLruTruncate's initial round-back of
> cutoffPage to a segment boundary moved us from a state where
> shared->latest_page_number doesn't logically precede the cutoffPage to
> one where it does, triggering the message.  So removing the initial
> round-back, and then doing whatever's needful to compensate for that in
> the later processing, is a reasonable fix to prevent the bogus warning.
> However, you're also discussing whether or not an SLRU segment file that
> is close to the wraparound boundary should get removed or not.  As far

When the newest XID and the oldest XID fall in "opposite" segments in the XID
space, we must not unlink the segment containing the newest XID.  That is the
chief goal at present.

> as I can see that's 100% independent of issuance of the log message, no?

Perhaps confusing is that the first message of the thread and the subject line
contain wrong claims, which I corrected in the 2019-02-13 message[1].  Due to
point (2) in [1], it's essential to make the "apparent wraparound" message a
can't-happen event.  Hence, I'm not looking to improve the message or its
firing conditions.  I want to fix the excess segment deletion, at which point
the message will become unreachable except under single-user mode.

> 2. I wonder whether we have an issue even with rounding back to the
> SLRU page boundary, as is done by each caller before we ever get to
> SimpleLruTruncate.  I'm pretty sure that the actual anti-wraparound
> protections are all precise to the XID level, so that there's some
> daylight between what SimpleLruTruncate is told and the range of
> data that the higher-level code thinks is of interest.  Maybe we
> should restructure things so that we keep the original cutoff XID
> (or equivalent) all the way through, and compare the start/end
> positions of a segment file directly to that.

Currently, slru.c knows nothing about the division of pages into records.
Hmm.  To keep oldestXact all the way through, I suppose the PagePrecedes
callback could become one or more record-oriented (e.g. XID-oriented)
callbacks.  The current scheme just relies on TransactionIdToPage() in
TruncateCLOG().  If TransactionIdToPage() had a bug, all sorts of CLOG lookups
would do wrong things.  Hence, I think today's scheme is tougher to get wrong.
Do you see it differently?

> 3. It feels like the proposed test of cutoff position against both
> ends of a segment is a roundabout way of fixing the problem.  I
> wonder whether we ought not pass *both* the cutoff and the current
> endpoint (latest_page_number) down to the truncation logic, and
> have it compare against both of those values.

Since latest_page_number can keep changing throughout SlruScanDirectory()
execution, that would give a false impression of control.  Better to
demonstrate that the xidWrapLimit machinery keeps latest_page_number within
acceptable constraints than to ascribe significance to a comparison with a
stale latest_page_number.

> To try to clarify this in my head, I thought about an image of
> the modular XID space as an octagon, where each side would correspond to
> a segment file if we chose numbers such that there were only 8 possible
> segment files.  Let's say that nextXID is somewhere in the bottommost
> octagon segment.  The oldest possible value for the truncation cutoff
> XID is a bit less than "halfway around" from nextXID; so it could be
> in the topmost octagon segment, if the minimum permitted daylight-
> till-wraparound is less than the SLRU segment size (which it is).
> Then, if we round the cutoff XID "back" to a segment boundary, most
> of the current (bottommost) segment is now less than halfway around
> from that cutoff point, and in particular the current segment's
> starting page is exactly halfway around.  Because of the way that
> TransactionIdPrecedes works, the current segment will be considered to
> precede that cutoff point (the int32 difference comes out as exactly
> 2^31 which is negative).  Boom, data loss, because we'll decide the
> current segment is removable.

Exactly.
https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
uses your octagon to show the behaviors before and after this patch.

> I think that your proposed patch does fix this, but I'm not quite
> sold that the corner cases (where the cutoff XID is itself exactly
> at a page boundary) are right.

That's a good thing to worry about.  More specifically, I think the edge case
to check is when oldestXact is the last XID of a _segment_.  That case
maximizes the XIDs we can delete.  At that time, xidWrapLimit should likewise
fall near the end of some opposing segment that we refuse to unlink.

> It also bothers me that some of the callers of SimpleLruTruncate
> have explicit one-count backoffs of the cutoff point and others
> do not.  There's no obvious reason for the difference, so I wonder
> if that isn't something we should have across-the-board, or else
> adjust SimpleLruTruncate to do the equivalent thing internally.

Consider the case of PerformOffsetsTruncation().  If newOldestMulti is the
first of a page, then SimpleLruTruncate() gets the previous page.  If that
page and the newOldestMulti page fall in different segments, could we unlink
the segment that contained newOldestMulti, or does some other off-by-one
compensate?  I'm not sure.  I do know that to lose mxact data this way, one
must reach multiStopLimit, restart in single user mode, and consume all the
way to the edge of multiWrapLimit.  Considering the rarity of those
preconditions, I am not inclined to bundle a fix with $SUBJECT.  (Even OID
reuse race conditions may present more risk than this does.)  If someone does
pursue a fix here, I recommend looking at other fixes before making the other
callers subtract one.  The subtraction in TruncateSUBTRANS() and
PerformOffsetsTruncation() is a hack.

[1] https://postgr.es/m/20190214072623.GA1139206@...


Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Tom Lane-2
Noah Misch <[hidden email]> writes:
> On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
>> 1. It seems like this discussion is conflating two different issues.

> When the newest XID and the oldest XID fall in "opposite" segments in the XID
> space, we must not unlink the segment containing the newest XID.  That is the
> chief goal at present.

Got it.  Thanks for clarifying the scope of the patch.

>> 3. It feels like the proposed test of cutoff position against both
>> ends of a segment is a roundabout way of fixing the problem.  I
>> wonder whether we ought not pass *both* the cutoff and the current
>> endpoint (latest_page_number) down to the truncation logic, and
>> have it compare against both of those values.

> Since latest_page_number can keep changing throughout SlruScanDirectory()
> execution, that would give a false impression of control.  Better to
> demonstrate that the xidWrapLimit machinery keeps latest_page_number within
> acceptable constraints than to ascribe significance to a comparison with a
> stale latest_page_number.

Perhaps.  I'm prepared to accept that line of argument so far as the clog
SLRU goes, but I'm not convinced that the other SLRUs have equally robust
defenses against advancing too far.  So on the whole I'd rather that the
SLRU logic handled this issue strictly on the basis of what it knows,
without assumptions about what calling code may be doing.  Still, maybe
we only really care about the risk for the clog SLRU?

>> To try to clarify this in my head, I thought about an image of
>> the modular XID space as an octagon, where each side would correspond to
>> a segment file if we chose numbers such that there were only 8 possible
>> segment files.

> Exactly.
> https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
> uses your octagon to show the behaviors before and after this patch.

Cool, thanks for drafting that up.  (My original sketch was not of
publishable quality ;-).)  To clarify, the upper annotations probably
ought to read "nextXid <= xidWrapLimit"?  And "cutoffPage" ought
to be affixed to the orange dot at lower right of the center image?

I agree that this diagram depicts why we have a problem right now,
and the right-hand image shows what we want to have happen.
What's a little less clear is whether the proposed patch achieves
that effect.

In particular, after studying this awhile, it seems like removal
of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
adjustment isn't really affecting anything.  It's already the case
that just by allowing oldestXact to get rounded back to an SLRU page
boundary, we've created some daylight between oldestXact and the
cutoff point.  Rounding back further within the same SLRU segment
changes nothing.  (For example, suppose that oldestXact is already
within the oldest page of its SLRU segment.  Then either rounding
rule has the same effect.  But there's still a little bit of room for
xidWrapLimit to be in the opposite SLRU segment, allowing trouble.)

So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases.  Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.

In any case, it feels like the specific solution you have here,
of testing both ends of the segment, is a roundabout way of
providing that one-segment slop; and it doesn't help if we decide
we need two-segment slop.  Can we write the test in a way that
explicitly provides N segments of slop?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Noah Misch-2
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:

> Noah Misch <[hidden email]> writes:
> > On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
> >> 3. It feels like the proposed test of cutoff position against both
> >> ends of a segment is a roundabout way of fixing the problem.  I
> >> wonder whether we ought not pass *both* the cutoff and the current
> >> endpoint (latest_page_number) down to the truncation logic, and
> >> have it compare against both of those values.
>
> > Since latest_page_number can keep changing throughout SlruScanDirectory()
> > execution, that would give a false impression of control.  Better to
> > demonstrate that the xidWrapLimit machinery keeps latest_page_number within
> > acceptable constraints than to ascribe significance to a comparison with a
> > stale latest_page_number.
>
> Perhaps.  I'm prepared to accept that line of argument so far as the clog
> SLRU goes, but I'm not convinced that the other SLRUs have equally robust
> defenses against advancing too far.  So on the whole I'd rather that the
> SLRU logic handled this issue strictly on the basis of what it knows,
> without assumptions about what calling code may be doing.  Still, maybe
> we only really care about the risk for the clog SLRU?

PerformOffsetsTruncation() is the most at-risk, since a single VACUUM could
burn millions of multixacts via FreezeMultiXactId() calls.  (To make that
happen in single-user mode, I suspect one could use prepared transactions as
active lockers and/or in-progress updaters.)  I'm not concerned about other
SLRUs.  TruncateCommitTs() moves in lockstep with TruncateCLOG().  The other
SimpleLruTruncate() callers handle data that becomes obsolete at every
postmaster restart.

> > Exactly.
> > https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
> > uses your octagon to show the behaviors before and after this patch.
>
> Cool, thanks for drafting that up.  (My original sketch was not of
> publishable quality ;-).)  To clarify, the upper annotations probably
> ought to read "nextXid <= xidWrapLimit"?

It diagrams the scenario of nextXid reaching xidWrapLimit, so the green dot
represents both values.

> And "cutoffPage" ought
> to be affixed to the orange dot at lower right of the center image?

No; oldestXact and cutoffPage have the same position in that diagram, because
the patch causes the cutoffPage variable to denote the page that contains
oldestXact.  I've now added an orange dot to show that.

> I agree that this diagram depicts why we have a problem right now,
> and the right-hand image shows what we want to have happen.
> What's a little less clear is whether the proposed patch achieves
> that effect.
>
> In particular, after studying this awhile, it seems like removal
> of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
> adjustment isn't really affecting anything.

True.  The set of unlink() calls needs to be the same for oldestXact in the
first page of a segment, in the last page, or in some interior page.  Removing
the rounding neither helps nor hurts correctness.

> So I think what we're actually trying to accomplish here is to
> ensure that instead of deleting up to half of the SLRU space
> before the cutoff, we delete up to half-less-one-segment.
> Maybe it should be half-less-two-segments, just to provide some
> cushion against edge cases.  Reading the first comment in
> SetTransactionIdLimit makes one not want to trust too much in
> arguments based on the exact value of xidWrapLimit, while for
> the other SLRUs it was already unclear whether the edge cases
> were exactly right.

That could be interesting insurance.  While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
more segments.  For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half.  The lower half will stay in place; eventually, XID consumption
will overwrite it.  Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?