Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
On 01/29/2018 03:17 PM, Simon Riggs wrote:
> On 29 January 2018 at 14:13, Tomas Vondra <[hidden email]> wrote:
>> 4) inspect the new row (which we still have in reorderbuffer)
>> 5) Kabooom! The row has column "c" which we don't see in the catalog.
> We don't use caches? Why does a cache miss cause it to explode?
We do use caches (and we invalidate them), of course.
But the problem is that by the time we get to lookup the row, it may be
either removed by VACUUM (because the catalog cleanup is more
aggressive) or not reachable using an index (which is the HOT issue
pointed out by Robert earlier in this thread).
>> Having this as responsibility of plugin sounds interesting. It certainly
>> narrows the scope for which we need to solve the abort issue. For 2PC
>> that may be okay as we need to somehow interact with transaction manager
>> as Simon noted. I am not sure if this helps streaming use-case though as
>> there is not going to be any external transaction management involved there.
>> In any case all this interlocking could potentially be made less
>> impact-full by only doing it when we know the transaction did catalog
>> changes prior to currently decoded change (which we do during decoding)
>> since that's the only time we are interested in if it aborted or not.
>> This all leads me to another idea. What if logical decoding provided API
>> for "locking/unlocking" the currently decoded transaction against abort.
>> This function would then be called by both decoding and output plugin
>> before any catalog read. The function can be smart enough to be NOOP if
>> the transaction is not running (ie we are not doing 2PC decoding or
>> streaming) or when the transaction didn't do any catalog modifications
>> (we already have that info easily accessible as bool).
>> That would mean we'd never do any kind of heavy locking for prolonged
>> periods of time (ie network calls) but only during catalog access and
>> only when needed. It would also solve this for both 2PC and streaming
>> and it would be easy to use by plugin authors. Just document that some
>> call should be done before catalog access when in output plugin, can
>> even be Asserted that the call was done probably.
> Yeah, this might work. We already have SET_LOCKTAG_TRANSACTION() via
> which every transaction takes an exclusive lock on its own
> transactionid when it starts, for example. We ideally want a single
> solution to handle 2PC and ongoing (streaming) transactions. We could
> introduce a new SET_LOCKTAG_LOGICALTRANSACTION(). The logical decoding
> process could take a SHARED lock on this, check if the XID is still ok
> to decode, read the catalog and unlock. Abort/Commit transaction
> processing could take this in EXCLUSIVE mode.
> As mentioned above, the plugin API which takes this lock will be smart
> enough to be a NOOP if the transaction is not running (i.e we are not
> doing 2PC decoding or streaming) or when the transaction didn't do any
> catalog modifications.
This latest version takes care of the abort-while-decoding issue along
with additional test cases and documentation changes.
We now maintain a list of processes that are decoding a specific
transactionID and make it a decode groupmember of a decode groupleader
process. The decode groupleader process is basically the PGPROC entry
which points to the prepared 2PC transaction or an ongoing regular
If the 2PC is rollback'ed then FinishPreparedTransactions uses the
decode groupleader process to let all the decode groupmember processes
know that it's aborting. A similar logic can be used for the decoding
of uncommitted transactions. The decode groupmember processes are able
to abort sanely in such a case. We also have two new APIs
"LogicalLockTransaction" and "LogicalUnlockTransaction" that the
decoding backends need to use while doing system or user catalog
tables access. The abort code interlocks with decoding backends that
might be in the process of accessing catalog tables and waits for
those few moments before aborting the transaction.
The implementation uses the LockHashPartitionLockByProc on the decode
groupleader process to control access to these additional fields in
the PGPROC structure amongst the decode groupleader and the other
decode groupmember processes and does not need to use the
ProcArrayLock at all. The implementation is inspired from the
*existing* lockGroupLeader solution which uses a similar technique to
track processes waiting on a leader holding that lock. I believe it's
an optimal solution for this problem of ours.
Have added TAP tests to test multiple decoding backends working on the
same transaction. Used delays in the test-decoding plugin to introduce
waits after making the LogicalLockTransaction call and calling
ROLLBACK to ensure that it interlocks with such decoding backends
which are doing catalog access. Tests working as desired. Also "make
check-world" passes with asserts enabled.
I have separated out the decode groupleader/groupmember changes from
the main logical decoding of 2PC patch into a separate patch for REVIEW
only. Both main patch and this separate review only patch are attached
with this email.