out-of-order XID insertion in KnownAssignedXids

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik
Hi hackers,

Looks like there is a bug with logging running transactions XIDs and
prepared transactions.
One of our customers get error "FATAL: out-of-order XID insertion in
KnownAssignedXids"
trying to apply backup.
WAL contains the following record:

rmgr: Standby     len (rec/tot):     98/    98, tx: 0, lsn:
1418/A9A76C90, prev 1418/A9A76C48, desc: RUNNING_XACTS nextXid
2004495309 latestCompletedXid 2004495307 oldestRunningXid 2004495290; 3
xacts: 2004495290 2004495308 2004495308

As you can notice, XID 2004495308 is encountered twice which cause error
in KnownAssignedXidsAdd:

     if (head > tail &&
         TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
from_xid))
     {
         KnownAssignedXidsDisplay(LOG);
         elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
     }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any transaction.
MarkAsPrepared  will add GXACT to proc array and at this moment there
will be two entries in procarray with the same XID:

(gdb) p procArray->numProcs
$2 = 4
(gdb) p allPgXact[procArray->pgprocnos[0]]
$4 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0
'\000', delayChkpt = 1 '\001', nxids = 0 '\000',
   used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[1]]
$5 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000',
delayChkpt = 0 '\000', nxids = 0 '\000',
   used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[2]]
$6 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000',
delayChkpt = 0 '\000', nxids = 0 '\000',
   used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[3]]
$7 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0
'\000', delayChkpt = 0 '\000', nxids = 0 '\000',
   used = 0 '\000', parent = 0x0}

Then you should just wait for sometime until checkpoint timeout is
triggered and it  logs snapshot:

(gdb) bt
#0  0x00000000007f3dab in GetRunningTransactionData () at procarray.c:2240
#1  0x00000000007fab22 in LogStandbySnapshot () at standby.c:943
#2  0x000000000077cde8 in BackgroundWriterMain () at bgwriter.c:331
#3  0x00000000005377f3 in AuxiliaryProcessMain (argc=2,
argv=0x7ffe00aa00e0) at bootstrap.c:446
#4  0x000000000078e07e in StartChildProcess (type=BgWriterProcess) at
postmaster.c:5323
#5  0x000000000078b6f0 in reaper (postgres_signal_arg=17) at
postmaster.c:2948
#6  <signal handler called>
#7  0x00007f1356d665b3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#8  0x0000000000789931 in ServerLoop () at postmaster.c:1765
#9  0x000000000078906c in PostmasterMain (argc=3, argv=0x1902640) at
postmaster.c:1406
#10 0x00000000006d0e4f in main (argc=3, argv=0x1902640) at main.c:228

Now generated RUNNING_XACTS record contains duplicated XIDs.

I want to ask opinion of community about the best way of fixing this
problem.
Should we avoid storing duplicated XIDs in procarray (by invalidating
XID in original pgaxct)
or eliminate/change check for duplicate in KnownAssignedXidsAdd (for
example just ignore duplicates)?




Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

> As you can notice, XID 2004495308 is encountered twice which cause error in
> KnownAssignedXidsAdd:
>
>     if (head > tail &&
>         TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
>     {
>         KnownAssignedXidsDisplay(LOG);
>         elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
>     }
>
> The probability of this error is very small but it can quite easily
> reproduced: you should just set breakpoint in debugger after calling
> MarkAsPrepared in twophase.c and then try to prepare any transaction.
> MarkAsPrepared  will add GXACT to proc array and at this moment there will
> be two entries in procarray with the same XID:
>
> [snip]
>
> Now generated RUNNING_XACTS record contains duplicated XIDs.
So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block.  Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record.  It is impressive that your customer has been able
to see that first, and then that you have been able to get into that
state with simple steps.

> I want to ask opinion of community about the best way of fixing this
> problem.  Should we avoid storing duplicated XIDs in procarray (by
> invalidating XID in original pgaxct) or eliminate/change check for
> duplicate in KnownAssignedXidsAdd (for example just ignore
> duplicates)?

Hmmmmm...  Please let me think through that first.  It seems to me that
the record should not be generated to begin with.  At least I am able to
confirm what you see.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik


On 05.10.2018 11:04, Michael Paquier wrote:

> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>> As you can notice, XID 2004495308 is encountered twice which cause error in
>> KnownAssignedXidsAdd:
>>
>>      if (head > tail &&
>>          TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
>>      {
>>          KnownAssignedXidsDisplay(LOG);
>>          elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
>>      }
>>
>> The probability of this error is very small but it can quite easily
>> reproduced: you should just set breakpoint in debugger after calling
>> MarkAsPrepared in twophase.c and then try to prepare any transaction.
>> MarkAsPrepared  will add GXACT to proc array and at this moment there will
>> be two entries in procarray with the same XID:
>>
>> [snip]
>>
>> Now generated RUNNING_XACTS record contains duplicated XIDs.
> So, I have been doing exactly that, and if you trigger a manual
> checkpoint then things happen quite correctly if you let the first
> session finish:
> rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>
> If you still maintain the debugger after calling MarkAsPrepared, then
> the manual checkpoint would block.  Now if you actually keep the
> debugger, and wait for a checkpoint timeout to happen, then I can see
> the incorrect record.  It is impressive that your customer has been able
> to see that first, and then that you have been able to get into that
> state with simple steps.


There are about 1000 active clients performing 2PC transactions, so if
you perform backup (which does checkpoint)
then probability seems to be large enough.

I have reproduced this problem without using gdb by just running in
parallel many 2PC transactions and checkpoints:

for ((i=1;i<10;i++))
do
     pgbench -n -T 300000 -M prepared -f t$i.sql postgres > t$i.log &
done

pgbench -n -T 300000 -f checkpoint.sql postgres > checkpoint.log &
wait
------------------------------

tN.sql:

begin;
update t set val=val+1 where pk=N;
prepare transaction 'tN';
commit prepared 'tN';

------------------------------

checkpoint.sql:

checkpoint;



>
>> I want to ask opinion of community about the best way of fixing this
>> problem.  Should we avoid storing duplicated XIDs in procarray (by
>> invalidating XID in original pgaxct) or eliminate/change check for
>> duplicate in KnownAssignedXidsAdd (for example just ignore
>> duplicates)?
> Hmmmmm...  Please let me think through that first.  It seems to me that
> the record should not be generated to begin with.  At least I am able to
> confirm what you see.
> --
> Michael

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik
In reply to this post by Michael Paquier-2


On 05.10.2018 11:04, Michael Paquier wrote:

> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>> As you can notice, XID 2004495308 is encountered twice which cause error in
>> KnownAssignedXidsAdd:
>>
>>      if (head > tail &&
>>          TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
>>      {
>>          KnownAssignedXidsDisplay(LOG);
>>          elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
>>      }
>>
>> The probability of this error is very small but it can quite easily
>> reproduced: you should just set breakpoint in debugger after calling
>> MarkAsPrepared in twophase.c and then try to prepare any transaction.
>> MarkAsPrepared  will add GXACT to proc array and at this moment there will
>> be two entries in procarray with the same XID:
>>
>> [snip]
>>
>> Now generated RUNNING_XACTS record contains duplicated XIDs.
> So, I have been doing exactly that, and if you trigger a manual
> checkpoint then things happen quite correctly if you let the first
> session finish:
> rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>
> If you still maintain the debugger after calling MarkAsPrepared, then
> the manual checkpoint would block.  Now if you actually keep the
> debugger, and wait for a checkpoint timeout to happen, then I can see
> the incorrect record.  It is impressive that your customer has been able
> to see that first, and then that you have been able to get into that
> state with simple steps.
>
>> I want to ask opinion of community about the best way of fixing this
>> problem.  Should we avoid storing duplicated XIDs in procarray (by
>> invalidating XID in original pgaxct) or eliminate/change check for
>> duplicate in KnownAssignedXidsAdd (for example just ignore
>> duplicates)?
> Hmmmmm...  Please let me think through that first.  It seems to me that
> the record should not be generated to begin with.  At least I am able to
> confirm what you see.
The simplest way to fix the problem is to ignore duplicates before
adding them to KnownAssignedXids.
We in any case perform sort i this place...



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


runnings_xids.patch (677 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Mon, Oct 08, 2018 at 12:04:28PM +0300, Konstantin Knizhnik wrote:
> The simplest way to fix the problem is to ignore duplicates before adding
> them to KnownAssignedXids.
> We in any case perform sort i this place...

I may of course be missing something, but shouldn't we not have
duplicates in the first place?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik


On 08.10.2018 12:14, Michael Paquier wrote:
> On Mon, Oct 08, 2018 at 12:04:28PM +0300, Konstantin Knizhnik wrote:
>> The simplest way to fix the problem is to ignore duplicates before adding
>> them to KnownAssignedXids.
>> We in any case perform sort i this place...
> I may of course be missing something, but shouldn't we not have
> duplicates in the first place?
The reason of appearing duplicated XIDs in case of 2PC seems to be clear.
It may be possible to eliminate it by clearing XID of MyPgxact for
prepared transaction.
But there are two problems with it:
1. I am not sure that it will not break something
2. There is obvious race condition between adding GXACT to ProcArrayAdd
and invalidating XID of current transaction.
If it is cleared before calling ProcArrayAdd, then there will be some
moment when XID is not present in procarray.
If it is done after calling ProcArrayAdd, then still it is possible to
see duplicated XID in procarray.

 From my point of view it is easier and less invasive to exclude
duplicates while replaying RUNNING_XIDS record.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Andres Freund
In reply to this post by konstantin knizhnik


On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <[hidden email]> wrote:

>
>
>On 05.10.2018 11:04, Michael Paquier wrote:
>> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>>> As you can notice, XID 2004495308 is encountered twice which cause
>error in
>>> KnownAssignedXidsAdd:
>>>
>>>      if (head > tail &&
>>>          TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
>from_xid))
>>>      {
>>>          KnownAssignedXidsDisplay(LOG);
>>>          elog(ERROR, "out-of-order XID insertion in
>KnownAssignedXids");
>>>      }
>>>
>>> The probability of this error is very small but it can quite easily
>>> reproduced: you should just set breakpoint in debugger after calling
>>> MarkAsPrepared in twophase.c and then try to prepare any
>transaction.
>>> MarkAsPrepared  will add GXACT to proc array and at this moment
>there will
>>> be two entries in procarray with the same XID:
>>>
>>> [snip]
>>>
>>> Now generated RUNNING_XACTS record contains duplicated XIDs.
>> So, I have been doing exactly that, and if you trigger a manual
>> checkpoint then things happen quite correctly if you let the first
>> session finish:
>> rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
>> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
>> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>>
>> If you still maintain the debugger after calling MarkAsPrepared, then
>> the manual checkpoint would block.  Now if you actually keep the
>> debugger, and wait for a checkpoint timeout to happen, then I can see
>> the incorrect record.  It is impressive that your customer has been
>able
>> to see that first, and then that you have been able to get into that
>> state with simple steps.
>>
>>> I want to ask opinion of community about the best way of fixing this
>>> problem.  Should we avoid storing duplicated XIDs in procarray (by
>>> invalidating XID in original pgaxct) or eliminate/change check for
>>> duplicate in KnownAssignedXidsAdd (for example just ignore
>>> duplicates)?
>> Hmmmmm...  Please let me think through that first.  It seems to me
>that
>> the record should not be generated to begin with.  At least I am able
>to
>> confirm what you see.
>
>The simplest way to fix the problem is to ignore duplicates before
>adding them to KnownAssignedXids.
>We in any case perform sort i this place...

I vehemently object to that as the proper course.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik


On 08.10.2018 18:24, Andres Freund wrote:

>
> On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <[hidden email]> wrote:
>>
>> On 05.10.2018 11:04, Michael Paquier wrote:
>>> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>>>> As you can notice, XID 2004495308 is encountered twice which cause
>> error in
>>>> KnownAssignedXidsAdd:
>>>>
>>>>       if (head > tail &&
>>>>           TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
>> from_xid))
>>>>       {
>>>>           KnownAssignedXidsDisplay(LOG);
>>>>           elog(ERROR, "out-of-order XID insertion in
>> KnownAssignedXids");
>>>>       }
>>>>
>>>> The probability of this error is very small but it can quite easily
>>>> reproduced: you should just set breakpoint in debugger after calling
>>>> MarkAsPrepared in twophase.c and then try to prepare any
>> transaction.
>>>> MarkAsPrepared  will add GXACT to proc array and at this moment
>> there will
>>>> be two entries in procarray with the same XID:
>>>>
>>>> [snip]
>>>>
>>>> Now generated RUNNING_XACTS record contains duplicated XIDs.
>>> So, I have been doing exactly that, and if you trigger a manual
>>> checkpoint then things happen quite correctly if you let the first
>>> session finish:
>>> rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
>>> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
>>> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>>>
>>> If you still maintain the debugger after calling MarkAsPrepared, then
>>> the manual checkpoint would block.  Now if you actually keep the
>>> debugger, and wait for a checkpoint timeout to happen, then I can see
>>> the incorrect record.  It is impressive that your customer has been
>> able
>>> to see that first, and then that you have been able to get into that
>>> state with simple steps.
>>>
>>>> I want to ask opinion of community about the best way of fixing this
>>>> problem.  Should we avoid storing duplicated XIDs in procarray (by
>>>> invalidating XID in original pgaxct) or eliminate/change check for
>>>> duplicate in KnownAssignedXidsAdd (for example just ignore
>>>> duplicates)?
>>> Hmmmmm...  Please let me think through that first.  It seems to me
>> that
>>> the record should not be generated to begin with.  At least I am able
>> to
>>> confirm what you see.
>> The simplest way to fix the problem is to ignore duplicates before
>> adding them to KnownAssignedXids.
>> We in any case perform sort i this place...
> I vehemently object to that as the proper course.
And what about adding qsort to GetRunningTransactionData or
LogCurrentRunningXacts and excluding duplicates here?


> Andres

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Andres Freund
On 2018-10-08 18:28:52 +0300, Konstantin Knizhnik wrote:

>
>
> On 08.10.2018 18:24, Andres Freund wrote:
> >
> > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <[hidden email]> wrote:
> > >
> > > On 05.10.2018 11:04, Michael Paquier wrote:
> > > > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
> > > > > As you can notice, XID 2004495308 is encountered twice which cause
> > > error in
> > > > > KnownAssignedXidsAdd:
> > > > >
> > > > >       if (head > tail &&
> > > > >           TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
> > > from_xid))
> > > > >       {
> > > > >           KnownAssignedXidsDisplay(LOG);
> > > > >           elog(ERROR, "out-of-order XID insertion in
> > > KnownAssignedXids");
> > > > >       }
> > > > >
> > > > > The probability of this error is very small but it can quite easily
> > > > > reproduced: you should just set breakpoint in debugger after calling
> > > > > MarkAsPrepared in twophase.c and then try to prepare any
> > > transaction.
> > > > > MarkAsPrepared  will add GXACT to proc array and at this moment
> > > there will
> > > > > be two entries in procarray with the same XID:
> > > > >
> > > > > [snip]
> > > > >
> > > > > Now generated RUNNING_XACTS record contains duplicated XIDs.
> > > > So, I have been doing exactly that, and if you trigger a manual
> > > > checkpoint then things happen quite correctly if you let the first
> > > > session finish:
> > > > rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
> > > > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
> > > > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
> > > >
> > > > If you still maintain the debugger after calling MarkAsPrepared, then
> > > > the manual checkpoint would block.  Now if you actually keep the
> > > > debugger, and wait for a checkpoint timeout to happen, then I can see
> > > > the incorrect record.  It is impressive that your customer has been
> > > able
> > > > to see that first, and then that you have been able to get into that
> > > > state with simple steps.
> > > >
> > > > > I want to ask opinion of community about the best way of fixing this
> > > > > problem.  Should we avoid storing duplicated XIDs in procarray (by
> > > > > invalidating XID in original pgaxct) or eliminate/change check for
> > > > > duplicate in KnownAssignedXidsAdd (for example just ignore
> > > > > duplicates)?
> > > > Hmmmmm...  Please let me think through that first.  It seems to me
> > > that
> > > > the record should not be generated to begin with.  At least I am able
> > > to
> > > > confirm what you see.
> > > The simplest way to fix the problem is to ignore duplicates before
> > > adding them to KnownAssignedXids.
> > > We in any case perform sort i this place...
> > I vehemently object to that as the proper course.
> And what about adding qsort to GetRunningTransactionData or
> LogCurrentRunningXacts and excluding duplicates here?

Sounds less terrible, but still pretty bad.  I think we should fix the
underlying data inconsistency, not paper over it a couple hundred meters
away.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Mon, Oct 08, 2018 at 09:30:49AM -0700, Andres Freund wrote:
> Sounds less terrible, but still pretty bad.  I think we should fix the
> underlying data inconsistency, not paper over it a couple hundred meters
> away.

+1.  I am looking at how to make that possible.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote:
> +1.  I am looking at how to make that possible.

And so...  Going through a bit of investigation the problem is that each
2PC transaction preparing finishes by putting the procarray in a state
where there are two entries referring to the same transaction running as
MarkAsPrepared adds an extra entry on the top of the one already
existing.  This is expected though per the comments in
ProcArrayClearTransaction(), as it assumes that MyProc can be safely
cleaned and it assumes that there is a duplicated entry.  Then,
GetRunningTransactionData() ignores the assumption that those duplicate
entries can exist, which makes the snapshot data taken as incorrect,
generating those incorrect XLOG_RUNNING_XACT records.

The most simple fix I can think of first is to tweak the origin of the
problem in GetRunningTransactionData() so as those duplicate XIDs are
ignored if found as there has to be a state between the time
MarkAsPrepared() inserted the 2PC entry in the procarray and the time
ProcArrayClearTransaction() gets called.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik


On 09.10.2018 10:52, Michael Paquier wrote:
On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote:
+1.  I am looking at how to make that possible.
And so...  Going through a bit of investigation the problem is that each
2PC transaction preparing finishes by putting the procarray in a state
where there are two entries referring to the same transaction running as
MarkAsPrepared adds an extra entry on the top of the one already
existing.  This is expected though per the comments in
ProcArrayClearTransaction(), as it assumes that MyProc can be safely
cleaned and it assumes that there is a duplicated entry.  Then,
GetRunningTransactionData() ignores the assumption that those duplicate
entries can exist, which makes the snapshot data taken as incorrect,
generating those incorrect XLOG_RUNNING_XACT records.

The most simple fix I can think of first is to tweak the origin of the
problem in GetRunningTransactionData() so as those duplicate XIDs are
ignored if found as there has to be a state between the time
MarkAsPrepared() inserted the 2PC entry in the procarray and the time
ProcArrayClearTransaction() gets called.
--
Michael

Right now GetRunningTransactionData is used only once in Postgres code - in LogStandbySnapshot to log RUNNING_XACTS record.
So if this situation is not changed, there will be no difference whether to perform sort and exclude duplicates in LogStandbySnapsho
or GetRunningTransactionData.

But GetRunningTransactionData may be used in some other cases... For example I have used it in my snapfs Postgres extensions (fast file-level snapshots)
to check if there are no more active transactions in the system. In my case presence of duplicates is not important and ordering of XIDs is not needed.
But performance of GetRunningTransactionData is also not critical. But I can suspect that there can be cases when it is critical and doing unneeded qsort is not desired.
There may be thousands of active transactions and sorting their XIDs is not instantaneous.

Also please take in account that in most cases XIDs in RUNNINGS_XACTS record are not processed at all (only if we perform recovery from backup and do not reach consistent point yet). This is why this bug was not detected before.

So I completely understand the argument that it is better to eliminate source of the problem (presence of duplicates in RUNNING_XACTS record)
but not sure that in this particular case it is really the best solution. If presence of duplicates is considered to be acceptable for procarray, why we
can not accept it for RUNNING_XACTS record?
I am not insisting that skipping duplicates in ProcArrayApplyRecoveryInfo is the right place (but at least is seems to be the most efficient alternative). But I also do not fill that moving sort to GetRunningTransactionData and elimination of duplicates here (requires one more scan and copying of the whole array) is principally better...


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Tue, Oct 09, 2018 at 05:26:49PM +0300, Konstantin Knizhnik wrote:
> But GetRunningTransactionData may be used in some other cases... For
> example I have used it in my snapfs Postgres extensions (fast
> file-level snapshots) to check if there are no more active
> transactions in the system. In my case presence of duplicates is not
> important and ordering of XIDs is not needed.  But performance of
> GetRunningTransactionData is also not critical. But I can suspect that
> there can be cases when it is critical and doing unneeded qsort is not
> desired.  There may be thousands of active transactions and sorting
> their XIDs is notinstantaneous.

I am not sure if the performance argument is actually this much
sensible, it could be as you say, but another thing that we could argue
about is that the presence of duplicate entries in
GetRunningTransactionData() can be used as a point to understand that
2PC transactions are running, and that among the two, one of them is a
dummy entry while the other is pending for being cleared.

> So I completely understand the argument that it is better to eliminate
> source of the problem (presence of duplicates in RUNNING_XACTS record)
> but not sure that in this particular case it is really the best
> solution. If presence of duplicates is considered to be acceptable for
> procarray, why we can not accept it for RUNNING_XACTS record?

This won't solve the case where records have already been generated and
won't be processed correctly, but based on the chances this can happen
we can rather safely say that fixing only the source would be fine.

> I am not insisting that skipping duplicates in ProcArrayApplyRecoveryInfo is
> the right place (but at least is seems to be the most efficient
> alternative). But I also do not fill that moving sort to
> GetRunningTransactionData and elimination of duplicates here (requires one
> more scan and copying of the whole array) is principally better...

Making recovery a couple of instructions shorter is always a good thing
in my opinion.  Users don't care much if backups take a long time, ans
sweat less if restores take a shorter time.

Hence what about doing roughly the following:

1) Document that GetRunningTransactionData() fetches information also
about 2PC entries, like that:
  * GetRunningTransactionData -- returns information about running transactions.
  *
  * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * all PGXACTs with an assigned TransactionId, even VACUUM processes and
+ * dummy two-phase related entries created when preparing the transaction.

2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted
and ordered.  This can be used as a sanity check for recovery via
ProcArrayApplyRecoveryInfo().
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Wed, Oct 10, 2018 at 11:22:45AM +0900, Michael Paquier wrote:
> I am not sure if the performance argument is actually this much
> sensible, it could be as you say, but another thing that we could argue
> about is that the presence of duplicate entries in
> GetRunningTransactionData() can be used as a point to understand that
> 2PC transactions are running, and that among the two, one of them is a
> dummy entry while the other is pending for being cleared.

Actually there would be a performance impact for any deployments if we
were to do so, as ProcArrayLock is hold and we would need to scan 4
times procArray instead of twice.  Many people already complain (justly)
that ProcArray is a performance bottleneck, it would be a bad idea to
make things worse...

> 1) Document that GetRunningTransactionData() fetches information also
> about 2PC entries, like that:
>   * GetRunningTransactionData -- returns information about running transactions.
>   *
>   * Similar to GetSnapshotData but returns more information. We include
> - * all PGXACTs with an assigned TransactionId, even VACUUM processes.
> + * all PGXACTs with an assigned TransactionId, even VACUUM processes and
> + * dummy two-phase related entries created when preparing the transaction.
>
> 2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted
> and ordered.  This can be used as a sanity check for recovery via
> ProcArrayApplyRecoveryInfo().
This also would increase the pressure on ProcArrayLock with wal_level =
logical as the WAL record needs to be included while holding the lock.
So let's do as Konstantin is suggesting by skipping duplicated XIDs
after applying the qsort().  The current code is also doing a bad
documentation job, so we should mentioned that GetRunningTransactionData
may return duplicated XIDs because of the dummy 2PC entries which
overlap with the active ones, and also add a proper comment in
ProcArrayApplyRecoveryInfo().  Konstantin, do you want to give it a try
with a patch?  Or should I?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

konstantin knizhnik


On 11.10.2018 12:06, Michael Paquier wrote:

> On Wed, Oct 10, 2018 at 11:22:45AM +0900, Michael Paquier wrote:
>> I am not sure if the performance argument is actually this much
>> sensible, it could be as you say, but another thing that we could argue
>> about is that the presence of duplicate entries in
>> GetRunningTransactionData() can be used as a point to understand that
>> 2PC transactions are running, and that among the two, one of them is a
>> dummy entry while the other is pending for being cleared.
> Actually there would be a performance impact for any deployments if we
> were to do so, as ProcArrayLock is hold and we would need to scan 4
> times procArray instead of twice.  Many people already complain (justly)
> that ProcArray is a performance bottleneck, it would be a bad idea to
> make things worse...
>
>> 1) Document that GetRunningTransactionData() fetches information also
>> about 2PC entries, like that:
>>    * GetRunningTransactionData -- returns information about running transactions.
>>    *
>>    * Similar to GetSnapshotData but returns more information. We include
>> - * all PGXACTs with an assigned TransactionId, even VACUUM processes.
>> + * all PGXACTs with an assigned TransactionId, even VACUUM processes and
>> + * dummy two-phase related entries created when preparing the transaction.
>>
>> 2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted
>> and ordered.  This can be used as a sanity check for recovery via
>> ProcArrayApplyRecoveryInfo().
> This also would increase the pressure on ProcArrayLock with wal_level =
> logical as the WAL record needs to be included while holding the lock.
> So let's do as Konstantin is suggesting by skipping duplicated XIDs
> after applying the qsort().  The current code is also doing a bad
> documentation job, so we should mentioned that GetRunningTransactionData
> may return duplicated XIDs because of the dummy 2PC entries which
> overlap with the active ones, and also add a proper comment in
> ProcArrayApplyRecoveryInfo().  Konstantin, do you want to give it a try
> with a patch?  Or should I?
> --
> Michael
Proposed patch is attached.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


runnings_xids-2.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: out-of-order XID insertion in KnownAssignedXids

Michael Paquier-2
On Thu, Oct 11, 2018 at 08:04:11PM +0300, Konstantin Knizhnik wrote:
> Proposed patch is attached.

The problem I have with this patch doing the duplication removal and
qsort work in LogCurrentRunningXacts is that it would still lock
ProcArrayLock until the WAL record has been written with wal_level =
logical.  With many sessions and 2PC transactions running around, this
would be a performance impact for any deployments every time a
checkpoint happens or every time the bgwriter decide to log a standby
snapshot.  Hence I would go instead with the attached, which does the
legwork at recovery, which is a one-time code path as you mentioned.
Okay, this makes the recovery a bit longer but that's way better than
impacting all deployments of Postgres, even those not using 2PC when
normally running.  And as the sorting phase already happens we just need
to do something like the attached.

One thing that we could also do for HEAD is to add in
RunningTransactionsData if the transaction comes from a 2PC entry, and
allow recovery to do more sanity checks.  This would require a WAL
format change.  The proposed patch needs to be back-patched down to
9.3.
--
Michael

duplicated-xids-recovery.patch (2K) Download Attachment
signature.asc (849 bytes) Download Attachment