Spurious "apparent wraparound" via SimpleLruTruncate() rounding
While testing an xidStopLimit corner case, I got this:
3656710 2019-01-05 00:05:13.910 GMT LOG: automatic aggressive vacuum to prevent wraparound of table "test.pg_toast.pg_toast_826": index scans: 0
3656710 2019-01-05 00:05:16.912 GMT LOG: could not truncate directory "pg_xact": apparent wraparound
3656710 2019-01-05 00:05:16.912 GMT DEBUG: transaction ID wrap limit is 4294486400, limited by database with OID 1
3656710 2019-01-05 00:05:16.912 GMT WARNING: database "template1" must be vacuumed within 481499 transactions
3656710 2019-01-05 00:05:16.912 GMT HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
I think the WARNING was correct about having 481499 XIDs left before
xidWrapLimit, and the spurious "apparent wraparound" arose from this
rounding-down in SimpleLruTruncate():
cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
* While we are holding the lock, make an important safety check: the
* planned cutoff point must be <= the current endpoint page. Otherwise we
* have already wrapped around, and proceeding with the truncation would
* risk removing the current segment.
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
(errmsg("could not truncate directory \"%s\": apparent wraparound",
We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return
false for the segment containing the cutoff page. CLOGPagePrecedes() (and
most SLRU PagePrecedes methods) implements a circular address space. Hence,
the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true
for the segment furthest in the future relative to the unrounded cutoffPage
(if it exists). That's bad. Such a segment rarely exists, because
xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no
more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) =
1048575 XIDs. Thus, I expect to see this problem at 4.9% of xidStopLimit
values. I expect this is easier to see with multiStopLimit, which protects
only 100 mxid.
The main consequence is the false alarm. A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred. Also, we temporarily
waste disk space in pg_xact. This feels like a recipe for future bugs. The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage). I'm inclined not to
back-patch this; does anyone favor back-patching?
Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm. A prudent DBA will want to react to
> true wraparound, but no such wraparound has occurred. Also, we temporarily
> waste disk space in pg_xact. This feels like a recipe for future bugs. The
> fix I have in mind, attached, is to change instances of
> ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
> ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage). I'm inclined not to
> back-patch this; does anyone favor back-patching?
To avoid wasting more of anyone's time: that patch is bad; I'll update this
thread when I have something better.
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.
That conclusion was incorrect. On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:
1. The result of the test is valid only until we release the SLRU ControlLock,
which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
segments for deletion. Once we release that lock, latest_page_number can
advance. This creates a TOCTOU race condition, allowing excess deletion:
[local] test=# table trunc_clog_concurrency ;
ERROR: could not access status of transaction 2149484247
DETAIL: Could not open file "pg_xact/0801": No such file or directory.
2. By the time the "apparent wraparound" test fires, we've already WAL-logged
the truncation. clog_redo() suppresses the "apparent wraparound" test,
then deletes too much. Startup then fails:
881997 2019-02-10 02:53:32.105 GMT FATAL: could not access status of transaction 708112327
881997 2019-02-10 02:53:32.105 GMT DETAIL: Could not open file "pg_xact/02A3": No such file or directory.
881855 2019-02-10 02:53:32.107 GMT LOG: startup process (PID 881997) exited with exit code 1
Fixes are available:
a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
wrong; I will correct it in a separate message.)
b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
checkpoint, or in the startup process. Hence, also arrange for only one
backend to call SimpleLruTruncate(AsyncCtl) at a time.
c. Test "apparent wraparound" before writing WAL, and don't WAL-log
truncations that "apparent wraparound" forces us to skip.
d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes
the TOCTOU race condition, but TransactionIdDidCommit() and other key
operations would be waiting behind filesystem I/O.
e. Have the SLRU track a "low cutoff" for an ongoing truncation. Initially,
the low cutoff is the page furthest in the past relative to cutoffPage (the
"high cutoff"). If SimpleLruZeroPage() wishes to use a page in the
truncation range, it would acquire an LWLock and increment the low cutoff.
Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
same LWLock and recheck the segment against the latest low cutoff.
With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound. Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only. I don't want (d), which threatens
performance too much. I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.
Can you see a way to improve on that plan? Can you see other bugs of this
nature that this plan does not fix?