This is my first time posting here ... Gilles Darold and I are developing a new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses a RegisterXactCallback function to send local transactions remote. But I found that a registered XactCallback is not always called for a successful client ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by FDW, and now the local and remote transaction states are messed up, out of sync. The local database is "eating" the successful rollback.
I attach a git format-patch file 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
The patch fixes the problem and is ready to commit as far as I can tell. It adds some comment lines and three lines of code to CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the callback returns successful (which is usually the case), (3) the new code switches back to TBLOCK_ABORT_END, then continues with old code to CleanupTransaction() as before. If any callback does error out, the TBLOCK_ABORT was the correct block state for an error.
I have not added a regression test since my scenario involves a C module ... I didn't see such a regression test, but somebody can teach me where to put the C module and Makefile if a regression test should be added. Heads up that the scenario to hit this is a bit complex (to me) ... I attach the following unit test files:
1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for debugging.
RegisterSubXactCallback(mySubtransactionCallback) which injects some logging and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW savepoint failed.
And RegisterXactTransaction(myTransactionCallback) with info logging.
2) Makefile, to make the eat_rollback.so
3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which errors out, then later a successful ROLLBACK gets incorrectly eaten (no CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK (now there is CALLBACK info logging).
4) eat_rollback2.out, output without the fix, the rollback at line 68 is successful but there is not myTransactionCallback() INFO output
5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is successful, and now there is a myTransactionCallback() INFO log. Success!
It first failed for me in v11.1, I suspect it failed since before then too. And it is still failing in current master.
Bye the way, we worked around the bug in our FDW by handling sub/xact in the utility hook. A transaction callback is still needed for implicit, internal rollbacks. Another advantage of the workaround is (I think) it reduces the code complexity and improves performance because the entire subxact stack is not unwound to drive each and every subtransaction level to remote. Also sub/transaction statements are sent remote as they arrive (local and remote are more "transactionally" synced, not stacked by FDW for later). And of course, the workaround doesn't hit the above bug, since our utility hook correctly handles the client ROLLBACK. If it makes sense to the community, I could try and patch contrib/postgres_fdw to do what we are doing. But note that I don't need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software defined mainframe with PostgreSQL (of all things).
Hope it helps!
I don't necessarily agree with everything I say. (MM)
0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch (1K) Download Attachment
eat_rollback.c (4K) Download Attachment
Makefile (506 bytes) Download Attachment
eat_rollback2.sql (1K) Download Attachment
eat_rollback2.out (6K) Download Attachment
eat_rollback2.fixed (6K) Download Attachment
Le 25/03/2020 à 03:49, Dave Sharpe a écrit :
As I'm involved in this thread I have given a review to this bug report and I don't think that there is a problem here but as a disclaimer my knowledge on internal transaction management is probably not enough to have a definitive opinion.
Actually the callback function is called when the error is thrown:
this is probably why the callback is not called on the subsequent ROLLBACK execution because abort processing is already done (src/backend/access/transam/xact.c:3890).
With this simple test and the debug extension:
the callback function is also called at error level, not on the ROLLBACK:
So this is probably the expected behavior , however the proposed patch will throw the callback function two times which might not be what we want.
I may be wrong.
-- Gilles Darold http://www.darold.net/
From Gilles Darold <[hidden email]> on 2020-03-26T16:09:04.
this is probably why the callback is not called on the subsequent ROLLBACK execution because abort processing is
So I withdraw this patch and fix. The callback during the error will drive the ROLLBACK remote, as required in the fdw.
Great catch, thanks Gilles!
|Free forum by Nabble||Edit this page|