POC: Cleaning up orphaned files using undo logs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
330 messages Options
12345 ... 17
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar-2
On Thu, May 2, 2019 at 7:00 PM Robert Haas <[hidden email]> wrote:

>
> On Thu, May 2, 2019 at 5:32 AM Dilip Kumar <[hidden email]> wrote:
> > Yeah, at least in this patch it looks silly.  Actually, I added that
> > index to make the qsort stable when execute_undo_action sorts them in
> > block order.  But, as part of this patch we don't have that processing
> > so either we can remove this structure completely as you suggested but
> > undo processing patch has to add that structure or we can just add
> > comment that why we added this index field.
>
> Well, the qsort comparator could compute the index as ptr - array_base
> just like any other code, couldn't it?
>
I might be completely missing but (ptr - array_base) is only valid
when first time you get the array, but qsort will swap the element
around and after that you will never be able to make out which element
was at lower index and which one was at higher index.   Basically, our
goal is to preserve the order of the undo record for the same block
but their order might get changed due to swap when they are getting
compared with the undo record pointer of the another block and once
the order is swap we will never know what was their initial positions?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Robert Haas
On Fri, May 3, 2019 at 12:46 AM Dilip Kumar <[hidden email]> wrote:
> I might be completely missing but (ptr - array_base) is only valid
> when first time you get the array, but qsort will swap the element
> around and after that you will never be able to make out which element
> was at lower index and which one was at higher index.   Basically, our
> goal is to preserve the order of the undo record for the same block
> but their order might get changed due to swap when they are getting
> compared with the undo record pointer of the another block and once
> the order is swap we will never know what was their initial positions?

*facepalm*

Yeah, you're right.

Still, I think we should see if there's some way of getting rid of
that structure, or at least making it an internal detail that is used
by the code that's doing the sorting rather than something that is
exposed as an external interface.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
In reply to this post by akapila
On Wed, May 1, 2019 at 10:08 AM Amit Kapila <[hidden email]> wrote:
>
> Thomas told me offlist that this email of mine didn't hit
> pgsql-hackers, so trying it again by resending.
>

Attached is next version of the patch with minor improvements:
a. use FullTransactionId
b. improve comments
c. removed some functions

The branch can be accessed at
https://github.com/EnterpriseDB/zheap/tree/undoprocessing.  It is on
top of Thomas and Dilip's patches related to undo logs and undo
records, though still not everything is synced up from both the
branches as they are also actively working on their set of patches.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

0001-Allow-undo-actions-to-be-applied-on-rollbacks-and-di.v2.patch (274K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar-2
In reply to this post by Robert Haas
On Tue, Apr 30, 2019 at 8:44 PM Robert Haas <[hidden email]> wrote:
>
> On Tue, Apr 30, 2019 at 2:16 AM Dilip Kumar <[hidden email]> wrote:
> > Like previous version these patch set also applies on:
> > https://github.com/EnterpriseDB/zheap/tree/undo
> > (b397d96176879ed5b09cf7322b8d6f2edd8043a5)
>
> Some more review of 0003:
>
> There is a whitespace-only hunk in xact.c.
Fixed

>
> It would be nice if AtAbort_ResetUndoBuffers didn't need to exist at
> all.  Then, this patch would make no changes whatsoever to xact.c.
> We'd still need such changes in other patches, because the whole idea
> of undo is tightly bound up with the concept of transactions.  Yet,
> this particular patch wouldn't touch that file, and that would be
> nice.  In general, the reason why we need AtCommit/AtAbort/AtEOXact
> callbacks is to adjust the values of global variables (or the data
> structures to which they point) at commit or abort time.  And that is
> also the case here.  The thing is, I'm not sure why we need these
> particular global variables.  Is there some way that we can get by
> without them? The obvious thing to do seems to be to invent yet
> another context object, allocated via a new function, which can then
> be passed to PrepareUndoInsert, InsertPreparedUndo,
> UndoLogBuffersSetLSN, UnlockReleaseUndoBuffers, etc.  This would
> obsolete UndoSetPrepareSize, since you'd instead pass the size to the
> context allocator function.
I have moved all the global variables to a context.  Now, I think we
don't need AtAbort_ResetUndoBuffers as memory will be freed with the
transaction context.

>
> UndoRecordUpdateTransInfo should declare a local variable
> XactUndoRecordInfo *something = &xact_urec_info[idx] and use that
> variable wherever possible.
>
Done.
> It should also probably use while (1) { ... } rather than do { ... }
> while (true).
Ok

>
> In UndoBufferGetSlot you could replace 'break;' with 'return i;' and
> then more than half the function would need one less level of
> indentation.  This function should also declare PreparedUndoBuffer
> *something and set that variable equal to &prepared_undo_buffers[i] at
> the top of the loop and again after the loop, and that variable should
> then be used whenever possible.

Done
>
> UndoRecordRelationDetails seems to need renaming now that it's down to
> a single member.
I have directly moved that context to UndoPackContext
>
> The comment for UndoRecordBlock needs updating, as it still refers to blkprev.
Done
>
> The comment for UndoRecordBlockPrev refers to "Identifying
> information" but it's not identifying anything.  I think you could
> just delete "Identifying information for" from this sentence and not
> lose anything.  And I think several of the nearby comments that refer
> to "Identifying information" could more simply and more correctly just
> refer to "Information".
Done
>
> I don't think that SizeOfUrecNext needs to exist.
Removed
>
> xact.h should not add an include for undolog.h.  There are no other
> changes to this header, so presumably the header does not depend on
> anything in undolog.h.  If .c files that include xact.h need
> undolog.h, then the header should be included in those files, not in
> the header itself.  That way, we avoid making partial recompiles more
> painful than necessary.

Right, fixed.
>
> UndoGetPrevUndoRecptr looks like a strange interface.  Why not just
> arrange not to call the function in the first place if prevurp is
> valid?
Done
>
> Every use of palloc0 in this code should be audited to check whether
> it is really necessary to zero the memory before use.  If it will be
> initialized before it's used for anything anyway, it doesn't need to
> be pre-zeroed.

Yeah, I found at few places it was not required so fixed.

>
> FinishUnpackUndo looks nice!  But it is missing a blank line in one
> place, and 'if it presents' should be 'if it is present' in a whole
> bunch of places.
>
> BeginInsertUndo also looks to be missing a few blank lines.
Fixed
>
> Still need to do some cleanup of the UnpackedUndoRecord, e.g. unifying
> uur_xid and uur_xidepoch into uur_fxid.
>
I will work on this.

> InsertUndoData ends in an unnecessary return, as does SkipInsertingUndoData.
Done

>
> I think the argument to SkipInsertingUndoData should be the number of
> bytes to skip, rather than the starting byte within the block.
Done
>
> Function header comment formatting is not consistent.  Compare
> FinishUnpackUndo (function name recapitulated on first line of
> comment) to ReadUndoBytes (not recapitulated) to UnpackUndoData
> (entire header comment jammed into one paragraph with function name at
> start).  I prefer the style where the function name is not restated,
> but YMMV.  Anyway, it has to be consistent.
>
Fixed

> UndoGetPrevRecordLen should not declare char *page instead of Page
> page, I think.
>
> UndoRecInfo looks a bit silly, I think.  Isn't index just the index of
> this entry in the array?  You can always figure that out by ptr -
> array_base. Instead of having UndoRecPtr urp in this structure, how
> about adding that to UnpackedUndoRecord?  When inserting, caller
> leaves it unset and InsertPreparedUndo sets it; when retrieving,
> UndoFetchRecord or UndoRecordBulkFetch sets it.  With those two
> changes, I think you can get rid of UndoRecInfo entirely and just
> return an array of UnpackedUndoRecords.  This might also eliminate the
> need for an 'urp' member in PreparedUndoSpace.
As discussed upthread,  I will work on fixing this.

>
> I'd probably use UREC_INFO_BLKPREV rather than UREC_INFO_BLOCKPREV for
> consistency.
>
> Similarly UndoFetchRecord and UndoRecordBulkFetch seems a bit
> inconsistent.  Perhaps UndoBulkFetchRecord.
Done
>
> In general, I find the code for updating transaction headers to be
> really hard to understand.  I'm not sure exactly what can be done
> about that.  Like, why is UndoRecordPrepareTransInfo unpacking undo?

It's only unpacking header.  But, yeah we can do better, instead of
unpacking we can just read the main header and from uur_info we can
calculate exact offset of the uur_next and in
UndoRecordUpdateTransInfo we can directly update only uur_next by
writing at that offset, instead of overwriting the complete header?

> Why does it take two undo record pointers as arguments and how are
> they different?
One is previous transaction's start header which we wants to update
and other is current transaction's urec pointer what we want to set as
uur_next in the previous transaction's start header.

Just for tracking, open comments which still needs to be worked on.

1.  Avoid special case in UndoRecordIsValid.
> Can we instead eliminate the special case?  It seems like the if
> (log->oldest_data == InvalidUndoRecPtr) case will be taken very
> rarely, so if it's buggy, we might not notice.

2. While updating the previous transaction header instead of unpacking
complete header and writing it back, we can just unpack main header
and calculate the offset of uur_next and then update it directly.

3. unifying uur_xid and uur_xidepoch into uur_fxid.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

0005-undo-page-consistency-checker_v4_WIP.patch (14K) Download Attachment
0004-Test-module-for-undo-api_v4.patch (12K) Download Attachment
0003-Provide-interfaces-to-store-and-fetch-undo-records_v4.patch (111K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Robert Haas
On Mon, May 6, 2019 at 8:13 AM Dilip Kumar <[hidden email]> wrote:
> > In general, I find the code for updating transaction headers to be
> > really hard to understand.  I'm not sure exactly what can be done
> > about that.  Like, why is UndoRecordPrepareTransInfo unpacking undo?
>
> It's only unpacking header.  But, yeah we can do better, instead of
> unpacking we can just read the main header and from uur_info we can
> calculate exact offset of the uur_next and in
> UndoRecordUpdateTransInfo we can directly update only uur_next by
> writing at that offset, instead of overwriting the complete header?

Hmm.  I think it's reasonable to use the unpack infrastructure to
figure out where uur_next is.  I don't know whether a bespoke method
of figuring that out would be any better.  At least the comments
probably need some work.

> > Why does it take two undo record pointers as arguments and how are
> > they different?
> One is previous transaction's start header which we wants to update
> and other is current transaction's urec pointer what we want to set as
> uur_next in the previous transaction's start header.

So put some comments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar-2
In reply to this post by akapila
On Wed, May 1, 2019 at 9:29 AM Amit Kapila <[hidden email]> wrote:

>
> On Wed, May 1, 2019 at 6:02 AM Robert Haas <[hidden email]> wrote:
> >
> > Replying to myself to resend to the list, since my previous attempt
> > seems to have been eaten by a grue.
> >
> > On Tue, Apr 30, 2019 at 11:14 AM Robert Haas <[hidden email]> wrote:
> > >
> > > On Tue, Apr 30, 2019 at 2:16 AM Dilip Kumar <[hidden email]> wrote:
> > > > Like previous version these patch set also applies on:
> > > > https://github.com/EnterpriseDB/zheap/tree/undo
> > > > (b397d96176879ed5b09cf7322b8d6f2edd8043a5)
> > >
> > > Some more review of 0003:
> > >
>
> Another suggestion:
>
> +/*
> + * Insert a previously-prepared undo records.  This will write the actual undo
> + * record into the buffers already pinned and locked in PreparedUndoInsert,
> + * and mark them dirty.  This step should be performed after entering a
> + * criticalsection; it should never fail.
> + */
> +void
> +InsertPreparedUndo(void)
> +{
> ..
> ..
> +
> + /* Advance the insert pointer past this record. */
> + UndoLogAdvance(urp, size);
> + }
> ..
> }
>
> UndoLogAdvance internally takes LWLock and we don't recommend doing
> that in the critical section which will happen as this function is
> supposed to be invoked in the critical section as mentioned in
> comments.

I think we can call UndoLogAdvanceFinal in FinishUndoRecordInsert
because this function will be called outside the critical section.
And, now we already have the undo record size inside
UndoRecordInsertContext.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar-2
In reply to this post by Dilip Kumar-2
On Mon, May 6, 2019 at 5:43 PM Dilip Kumar <[hidden email]> wrote:
>
> Just for tracking, open comments which still needs to be worked on.
>
> 1.  Avoid special case in UndoRecordIsValid.
> > Can we instead eliminate the special case?  It seems like the if
> > (log->oldest_data == InvalidUndoRecPtr) case will be taken very
> > rarely, so if it's buggy, we might not notice.

I have worked on this comments and added changes in the latest patch.
>
> 2. While updating the previous transaction header instead of unpacking
> complete header and writing it back, we can just unpack main header
> and calculate the offset of uur_next and then update it directly.

For this as you suggested I am not changing, updated the comments.
>
> 3. unifying uur_xid and uur_xidepoch into uur_fxid.
Still open.

I have also added the README.

Patches can be applied on top of undo branch [1] commit:
(cb777466d008e656f03771cf16ec7ef9d6f2778b)

[1] https://github.com/EnterpriseDB/zheap/tree/undo

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

0001-Update-oldest_data-on-startup_v5.patch (1K) Download Attachment
0003-Test-module-for-undo-api_v5.patch (12K) Download Attachment
0004-undo-page-consistency-checker_v5.patch (14K) Download Attachment
0002-Provide-interfaces-to-store-and-fetch-undo-records_v5.patch (114K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Thomas Munro-5
On Thu, May 9, 2019 at 6:34 PM Dilip Kumar <[hidden email]> wrote:
> Patches can be applied on top of undo branch [1] commit:
> (cb777466d008e656f03771cf16ec7ef9d6f2778b)

Hello all,

Here is a new patch set which includes all of the patches discussed in
this thread in one go, rebased on today's master.  To summarise the
main layers, from the top down we have:

 0013:       undo-based orphaned file clean-up ($SUBJECT, a demo of
undo technology)
 0009-0010:  undo processing (execution of undo actions when rolling back)
 0008:       undo records
 0001-0007:  undo storage

The main changes to the storage layer since the last time I posted the
full patch stack:

* pg_upgrade support: you can't have any live undo logs (much like 2PC
transactions, we want to be free to change the format), but some work
was required to make sure that all "discarded" undo record pointers
from the old cluster still appear as discarded in the new cluster, as
well as any from the new cluster

* tweaks to various other src/bin tools that are aware of files under
pgdata and were confused by undo segment files

* the fsync of undo log segment files when they're created or recycled
is now handed off to the checkpointer (this was identified as a
performance problem for zheap)

* code tidy-up, removing dead code (undo log rewind, prevlen, prevlog
were no longer needed by patches higher up in the stack), removing
global variables, noisy LOG messages about undo segment files now
reduced to DEBUG1

* new extension contrib/undoinspect, for developer use, showing what
will be undone if you abort:

postgres=# begin;
BEGIN
postgres=# create table t();
CREATE TABLE
postgres=# select * from undoinspect();
     urecptr      |  rmgr   | flags | xid |
description
------------------+---------+-------+-----+---------------------------------------------
 00000000000032FA | Storage | P,T   | 487 | CREATE dbid=12934,
tsid=1663, relfile=16393
(1 row)

One silly detail: I had to change the default max_worker_processes
from 8 to 12, because otherwise a couple of tests run with fewer
parallel workers than they expect, due to undo worker processes using
up slots.  There is probably a better solution to that problem.

I put the patches in a tarball here, but they are also available from
https://github.com/EnterpriseDB/zheap/tree/undo.

--
Thomas Munro
https://enterprisedb.com

undo-2019-05-10.tgz (207K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Kuntal Ghosh
Hello Thomas,

In pg_buffercache contrib module, the file pg_buffercache--1.3--1.4.sql is missing. AFAICS, this file should be added as part of the following commit:
Add SmgrId to smgropen() and BufferTag

Otherwise, I'm not able to compile the contrib modules. I've also attached the patch to fix the same.


On Fri, May 10, 2019 at 11:48 AM Thomas Munro <[hidden email]> wrote:
On Thu, May 9, 2019 at 6:34 PM Dilip Kumar <[hidden email]> wrote:
> Patches can be applied on top of undo branch [1] commit:
> (cb777466d008e656f03771cf16ec7ef9d6f2778b)

Hello all,

Here is a new patch set which includes all of the patches discussed in
this thread in one go, rebased on today's master.  To summarise the
main layers, from the top down we have:

 0013:       undo-based orphaned file clean-up ($SUBJECT, a demo of
undo technology)
 0009-0010:  undo processing (execution of undo actions when rolling back)
 0008:       undo records
 0001-0007:  undo storage

The main changes to the storage layer since the last time I posted the
full patch stack:

* pg_upgrade support: you can't have any live undo logs (much like 2PC
transactions, we want to be free to change the format), but some work
was required to make sure that all "discarded" undo record pointers
from the old cluster still appear as discarded in the new cluster, as
well as any from the new cluster

* tweaks to various other src/bin tools that are aware of files under
pgdata and were confused by undo segment files

* the fsync of undo log segment files when they're created or recycled
is now handed off to the checkpointer (this was identified as a
performance problem for zheap)

* code tidy-up, removing dead code (undo log rewind, prevlen, prevlog
were no longer needed by patches higher up in the stack), removing
global variables, noisy LOG messages about undo segment files now
reduced to DEBUG1

* new extension contrib/undoinspect, for developer use, showing what
will be undone if you abort:

postgres=# begin;
BEGIN
postgres=# create table t();
CREATE TABLE
postgres=# select * from undoinspect();
     urecptr      |  rmgr   | flags | xid |
description
------------------+---------+-------+-----+---------------------------------------------
 00000000000032FA | Storage | P,T   | 487 | CREATE dbid=12934,
tsid=1663, relfile=16393
(1 row)

One silly detail: I had to change the default max_worker_processes
from 8 to 12, because otherwise a couple of tests run with fewer
parallel workers than they expect, due to undo worker processes using
up slots.  There is probably a better solution to that problem.

I put the patches in a tarball here, but they are also available from
https://github.com/EnterpriseDB/zheap/tree/undo.

--
Thomas Munro
https://enterprisedb.com


--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

0001-Add-missing-file-in-pg_buffercache.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Thomas Munro-5
On Fri, May 10, 2019 at 10:46 PM Kuntal Ghosh
<[hidden email]> wrote:
> In pg_buffercache contrib module, the file pg_buffercache--1.3--1.4.sql is missing. AFAICS, this file should be added as part of the following commit:
> Add SmgrId to smgropen() and BufferTag
>
> Otherwise, I'm not able to compile the contrib modules. I've also attached the patch to fix the same.

Oops, thanks Kuntal.  Fixed, along with some compiler warnings from
MSVC and GCC.  I added a quick tour of this to a README.md visible
here:

https://github.com/EnterpriseDB/zheap/tree/undo

--
Thomas Munro
https://enterprisedb.com

undo-2019-05-11.tgz (208K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar-2
In reply to this post by Dilip Kumar-2
On Thu, May 9, 2019 at 12:04 PM Dilip Kumar <[hidden email]> wrote:

>
> On Mon, May 6, 2019 at 5:43 PM Dilip Kumar <[hidden email]> wrote:
> >
> > Just for tracking, open comments which still needs to be worked on.
> >
> > 1.  Avoid special case in UndoRecordIsValid.
> > > Can we instead eliminate the special case?  It seems like the if
> > > (log->oldest_data == InvalidUndoRecPtr) case will be taken very
> > > rarely, so if it's buggy, we might not notice.
>
> I have worked on this comments and added changes in the latest patch.
> >
> > 2. While updating the previous transaction header instead of unpacking
> > complete header and writing it back, we can just unpack main header
> > and calculate the offset of uur_next and then update it directly.
>
> For this as you suggested I am not changing, updated the comments.
> >
> > 3. unifying uur_xid and uur_xidepoch into uur_fxid.
> Still open.
>
> I have also added the README.
>
> Patches can be applied on top of undo branch [1] commit:
> (cb777466d008e656f03771cf16ec7ef9d6f2778b)
>
> [1] https://github.com/EnterpriseDB/zheap/tree/undo
>
I have removed some of the globals and also improved some comments.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

0002-Provide-interfaces-to-store-and-fetch-undo-records_v6.patch (116K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Robert Haas
On Sun, May 12, 2019 at 2:15 AM Dilip Kumar <[hidden email]> wrote:
> I have removed some of the globals and also improved some comments.

I don't like the discard_lock very much.  Perhaps it's OK, but I hope
that there are better alternatives.  One problem with Thomas Munro
pointed out to me in off-list discussion is that the discard_lock has
to be held by anyone reading undo even if the undo they are reading
and the undo that the discard worker wants to discard are in
completely different parts of the undo log.  Somebody could be trying
to read an undo page written 1 second ago while the discard worker is
trying to discard an undo page written to the same undo log 1 hour
ago.  Those things need not block each other, but with this design
they will.  Another problem is that we end up holding it across an
I/O; there's precedent for that, but it's not particularly good
precedent.  Let's see if we can do better.

My first idea was that we should just make this the caller's problem
instead of handling it in this layer.  Undo is retained for committed
transactions until they are all-visible, and the reason for that is
that we presume that nobody can be interested in the data for MVCC
purposes unless there's a snapshot that can't see the results of the
transaction in question.  Once the committed transaction is
all-visible, that's nobody, so it should be fine to just discard the
undo any time we like.  That won't work with the existing zheap code,
which currently sometimes follows undo chains for transactions that
are all-visible, but I think that's a problem we should fix rather
than something we should force the undo layer to support.  We'd still
need something kinda like the discard_lock for aborted transactions,
though, because as soon as you release the buffer lock on a table
page, the undo workers could apply all the undo to that page and then
discard it, and then you could afterwards try to look up the undo
pointer which you had retrieved from that page and stored in
backend-local memory.  One thing we could probably do is make that a
heavyweight lock on the XID itself, so if you observe that an XID is
aborted, you have to go get this lock in ShareLock mode, then recheck
the page, and only then consult the undo; discarding the undo for an
aborted transaction would require AccessExclusiveLock on the XID.
This solution gets rid of the LWLock for committed undo; for aborted
undo, it avoids the false sharing and non-interruptibility that an
LWLock imposes.

But then I had what I think may be a better idea.  Let's add a new
ReadBufferMode that suppresses the actual I/O; if the buffer is not
already present in shared_buffers, it allocates a buffer but returns
it without doing any I/O, so the caller must be prepared for BM_VALID
to be unset.  I don't know what to call this, so I'll call it
RBM_ALLOCATE (leaving room for possible future variants like
RBM_ALLOCATE_AND_LOCK).  Then, the protocol for reading an undo buffer
would go like this:

1. Read the buffer with RBM_ALLOCATE, thus acquiring a pin on the
relevant buffer.
2. Check whether the buffer precedes the discard horizon for that undo
log stored in shared memory.
3. If so, use the ForgetBuffer() code we have in the zheap branch to
deallocate the buffer and stop here.  The undo is not available to be
read, whether it's still physically present or not.
4. Otherwise, if the buffer is not valid, call ReadBufferExtended
again, or some new function, to make it so.  Remember to release all
of our pins.

The protocol for discarding an undo buffer would go like this:

1. Advance the discard horizon in shared memory.
2. Take a cleanup lock on each buffer that ought to be discarded.
Remember the dirty ones and forget the others.
3. WAL-log the discard operation.
4. Revisit the dirty buffers we remembered in step 2 and forget them.

The idea is that, once we've advanced the discard horizon in shared
memory, any readers that come along later are responsible for making
sure that they never do I/O on any older undo.  They may create some
invalid buffers in shared memory, but they'll hopefully also get rid
of them if they do, and if they error out for some reason before doing
so, that buffer should age out naturally.  So, the discard worker just
needs to worry about buffers that already exist.  Once it's taken a
cleanup lock on each buffer, it knows that there are no I/O operations
and in fact no buffer usage of any kind still in progress from before
it moved the in-memory discard horizon.  Anyone new that comes along
will clean up after themselves.  We postpone forgetting dirty buffers
until after we've successfully WAL-logged the discard, in case we fail
to do so.

With this design, we don't add any new cases where a lock of any kind
must be held across an I/O, and there's also no false sharing.
Furthermore, unlike the previous proposal, this will work nicely with
something like old_snapshot_threshold.  The previous design relies on
undo not getting discarded while anyone still cares about it, but
old_snapshot_threshold, if applied to zheap, would have the express
goal of discarding undo while somebody still cares about it.  With
this design, we could support old_snapshot_threshold by having undo
readers error out in step #2 if the transaction is committed and not
visible to our snapshot but yet the undo is discarded.  Heck, we can
do that anyway as a safety check, basically for free, and just tailor
the error message depending on whether old_snapshot_threshold is such
that the condition is expected to be possible.

While I'm kvetching, I can't help noticing that undoinsert.c contains
functions both for inserting undo and also for reading it, which seems
like a loose end that needs to be tied up somehow.  I'm mildly
inclined to think that we should rename the file to something more
generic (e.g. undoaccess.h) rather than splitting it into two files
(e.g. undoinsert.c and undoread.c).  Also, it looks to me like you
need to go through what is currently undoinsert.h and look for stuff
that can be made private to the .c file.  I don't see why thing like
MAX_PREPARED_UNDO need to be exposed at all, and for things like
PreparedUndoSpace it seems like it would suffice to just do 'struct
PreparedUndoSpace; typedef struct PreparedUndoSpace
PreparedUndoSpace;' in the header and put the actual 'struct
PreparedUndoSpace { ... };' definition in the .c file.  And
UnlockReleaseUndoBuffers has a declaration but no longer has a
definition, so I think that can go away too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
On Mon, May 13, 2019 at 11:36 PM Robert Haas <[hidden email]> wrote:

>
> On Sun, May 12, 2019 at 2:15 AM Dilip Kumar <[hidden email]> wrote:
> > I have removed some of the globals and also improved some comments.
>
> I don't like the discard_lock very much.  Perhaps it's OK, but I hope
> that there are better alternatives.  One problem with Thomas Munro
> pointed out to me in off-list discussion is that the discard_lock has
> to be held by anyone reading undo even if the undo they are reading
> and the undo that the discard worker wants to discard are in
> completely different parts of the undo log.  Somebody could be trying
> to read an undo page written 1 second ago while the discard worker is
> trying to discard an undo page written to the same undo log 1 hour
> ago.  Those things need not block each other, but with this design
> they will.
>

Yeah, this doesn't appear to be a good way to deal with the problem.

>  Another problem is that we end up holding it across an
> I/O; there's precedent for that, but it's not particularly good
> precedent.  Let's see if we can do better.
>
> But then I had what I think may be a better idea.
>

+1.  I also think the below idea is better than the previous one.

>  Let's add a new
> ReadBufferMode that suppresses the actual I/O; if the buffer is not
> already present in shared_buffers, it allocates a buffer but returns
> it without doing any I/O, so the caller must be prepared for BM_VALID
> to be unset.  I don't know what to call this, so I'll call it
> RBM_ALLOCATE (leaving room for possible future variants like
> RBM_ALLOCATE_AND_LOCK).  Then, the protocol for reading an undo buffer
> would go like this:
>
> 1. Read the buffer with RBM_ALLOCATE, thus acquiring a pin on the
> relevant buffer.
> 2. Check whether the buffer precedes the discard horizon for that undo
> log stored in shared memory.
> 3. If so, use the ForgetBuffer() code we have in the zheap branch to
> deallocate the buffer and stop here.  The undo is not available to be
> read, whether it's still physically present or not.
> 4. Otherwise, if the buffer is not valid, call ReadBufferExtended
> again, or some new function, to make it so.  Remember to release all
> of our pins.
>
> The protocol for discarding an undo buffer would go like this:
>
> 1. Advance the discard horizon in shared memory.
> 2. Take a cleanup lock on each buffer that ought to be discarded.
> Remember the dirty ones and forget the others.
> 3. WAL-log the discard operation.
> 4. Revisit the dirty buffers we remembered in step 2 and forget them.
>
> The idea is that, once we've advanced the discard horizon in shared
> memory, any readers that come along later are responsible for making
> sure that they never do I/O on any older undo.  They may create some
> invalid buffers in shared memory, but they'll hopefully also get rid
> of them if they do, and if they error out for some reason before doing
> so, that buffer should age out naturally.  So, the discard worker just
> needs to worry about buffers that already exist.  Once it's taken a
> cleanup lock on each buffer, it knows that there are no I/O operations
> and in fact no buffer usage of any kind still in progress from before
> it moved the in-memory discard horizon.  Anyone new that comes along
> will clean up after themselves.  We postpone forgetting dirty buffers
> until after we've successfully WAL-logged the discard, in case we fail
> to do so.
>

I have spent some time thinking over this and couldn't see any problem
with this.  So, +1 for trying this out on the lines of what you have
described above.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar-2
In reply to this post by Robert Haas
On Mon, May 13, 2019 at 11:36 PM Robert Haas <[hidden email]> wrote:
>
> While I'm kvetching, I can't help noticing that undoinsert.c contains
> functions both for inserting undo and also for reading it, which seems
> like a loose end that needs to be tied up somehow.  I'm mildly
> inclined to think that we should rename the file to something more
> generic (e.g. undoaccess.h) rather than splitting it into two files
> (e.g. undoinsert.c and undoread.c).
Changed to undoaccess
 Also, it looks to me like you
> need to go through what is currently undoinsert.h and look for stuff
> that can be made private to the .c file.  I don't see why thing like
> MAX_PREPARED_UNDO need to be exposed at all,
Ideally, my previous patch should have got rid of MAX_PREPARED_UNDO as
we are now always allocating memory for prepared space but by mistake
I left it in this file.  Now, I have removed it.

and for things like
> PreparedUndoSpace it seems like it would suffice to just do 'struct
> PreparedUndoSpace; typedef struct PreparedUndoSpace
> PreparedUndoSpace;' in the header and put the actual 'struct
> PreparedUndoSpace { ... };' definition in the .c file.
Changed, I think
typedef struct PreparedUndoSpace PreparedUndoSpace; in header and
PreparedUndoSpace { ... }; is fine.
And
> UnlockReleaseUndoBuffers has a declaration but no longer has a
> definition, so I think that can go away too.
Removed, and also cleaned some other such declarations.

Pending items to be worked upon:
a) Get rid of UndoRecInfo
b) Get rid of xid in generic undo code and unify epoch and xid to fxid
c) Get rid of discard lock
d) Move log switch related information from transaction header to new
log switch header

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

0001-Provide-interfaces-to-store-and-fetch-undo-records_v7.patch (116K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Andres Freund
In reply to this post by akapila
Hi,

On 2019-05-05 10:28:21 +0530, Amit Kapila wrote:
> From 5d9e179bd481b5ed574b6e7117bf3eb62b5dc003 Mon Sep 17 00:00:00 2001
> From: Amit Kapila <[hidden email]>
> Date: Sat, 4 May 2019 16:52:01 +0530
> Subject: [PATCH] Allow undo actions to be applied on rollbacks and discard
>  unwanted undo.

I think this needs to be split into some constituent parts, to be
reviewable. Discussing 270kb of patch at once is just too much. My first
guess for a viable split would be:

1) undoaction related infrastructure
2) xact.c integration et al
3) binaryheap changes etc
4) undo worker infrastructure

It probably should be split even further, by moving things like:
- oldestXidHavingUndo infrastructure
- discard infrastructure

Some small remarks:

>  
> + {
> + {"disable_undo_launcher", PGC_POSTMASTER, DEVELOPER_OPTIONS,
> + gettext_noop("Decides whether to launch an undo worker."),
> + NULL,
> + GUC_NOT_IN_SAMPLE
> + },
> + &disable_undo_launcher,
> + false,
> + NULL, NULL, NULL
> + },
> +

We don't normally formulate GUCs in the negative like that. C.F.
autovacuum etc.


> +/* Extract xid from a value comprised of epoch and xid  */
> +#define GetXidFromEpochXid(epochxid) \
> + ((uint32) (epochxid) & 0XFFFFFFFF)
> +
> +/* Extract epoch from a value comprised of epoch and xid  */
> +#define GetEpochFromEpochXid(epochxid) \
> + ((uint32) ((epochxid) >> 32))
> +

Why do these exist? This should all go through FullTransactionId.


>   /* End-of-list marker */
>   {
>   {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
> @@ -2923,6 +2935,16 @@ static struct config_int ConfigureNamesInt[] =
>   5000, 1, INT_MAX,
>   NULL, NULL, NULL
>   },
> + {
> + {"rollback_overflow_size", PGC_USERSET, RESOURCES_MEM,
> + gettext_noop("Rollbacks greater than this size are done lazily"),
> + NULL,
> + GUC_UNIT_MB
> + },
> + &rollback_overflow_size,
> + 64, 0, MAX_KILOBYTES,
> + NULL, NULL, NULL
> + },

rollback_foreground_size? rollback_background_size? I don't think
overflow is particularly clear.


> @@ -1612,6 +1635,85 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
>  
>   MyLockedGxact = NULL;
>  
> + /*
> + * Perform undo actions, if there are undologs for this transaction. We
> + * need to perform undo actions while we are still in transaction. Never
> + * push rollbacks of temp tables to undo worker.
> + */
> + for (i = 0; i < UndoPersistenceLevels; i++)
> + {

This should be in a separate function. And it'd be good if more code
between this and ApplyUndoActions() would be shared.


> + /*
> + * Here, we just detect whether there are any pending undo actions so that
> + * we can skip releasing the locks during abort transaction.  We don't
> + * release the locks till we execute undo actions otherwise, there is a
> + * risk of deadlock.
> + */
> + SetUndoActionsInfo();

This function name is so generic that it gives the reader very little
information about why it's called here (and in other similar places).


Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Robert Haas
On Tue, May 21, 2019 at 1:18 PM Andres Freund <[hidden email]> wrote:
> I think this needs to be split into some constituent parts, to be
> reviewable. Discussing 270kb of patch at once is just too much.

+1.

> > +     {
> > +             {"rollback_overflow_size", PGC_USERSET, RESOURCES_MEM,
> > +                     gettext_noop("Rollbacks greater than this size are done lazily"),
> > +                     NULL,
> > +                     GUC_UNIT_MB
> > +             },
> > +             &rollback_overflow_size,
> > +             64, 0, MAX_KILOBYTES,
> > +             NULL, NULL, NULL
> > +     },
>
> rollback_foreground_size? rollback_background_size? I don't think
> overflow is particularly clear.

The problem with calling it 'rollback' is that a rollback is a general
PostgreSQL term that gives no hint the proposed undo facility is
involved.  I'm not exactly sure what to propose but I think it's got
to have the word 'undo' in there someplace (or some new term we invent
that is only used in connection with undo).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
In reply to this post by Andres Freund
On Tue, May 21, 2019 at 10:47 PM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2019-05-05 10:28:21 +0530, Amit Kapila wrote:
> > From 5d9e179bd481b5ed574b6e7117bf3eb62b5dc003 Mon Sep 17 00:00:00 2001
> > From: Amit Kapila <[hidden email]>
> > Date: Sat, 4 May 2019 16:52:01 +0530
> > Subject: [PATCH] Allow undo actions to be applied on rollbacks and discard
> >  unwanted undo.
>
> I think this needs to be split into some constituent parts, to be
> reviewable.

Okay.

> Discussing 270kb of patch at once is just too much. My first
> guess for a viable split would be:
>
> 1) undoaction related infrastructure
> 2) xact.c integration et al
> 3) binaryheap changes etc
> 4) undo worker infrastructure
>
> It probably should be split even further, by moving things like:
> - oldestXidHavingUndo infrastructure
> - discard infrastructure
>

Okay, I will think about this and split the patch.

> Some small remarks:
>
> >
> > +     {
> > +             {"disable_undo_launcher", PGC_POSTMASTER, DEVELOPER_OPTIONS,
> > +                     gettext_noop("Decides whether to launch an undo worker."),
> > +                     NULL,
> > +                     GUC_NOT_IN_SAMPLE
> > +             },
> > +             &disable_undo_launcher,
> > +             false,
> > +             NULL, NULL, NULL
> > +     },
> > +
>
> We don't normally formulate GUCs in the negative like that. C.F.
> autovacuum etc.
>

Okay, will change.  Actually, this is just for development purpose.
It can help us in testing cases where we have pushed the undo, but it
won't apply, so whenever the foreground process encounter such a
transaction, it will perform the page-wise undo.  I am not 100% sure
if we need this for the final version.  Similarly, for testing
purpose, we might need enable_discard_worker to test the cases where
discard doesn't happen for a long time.

>
> > +/* Extract xid from a value comprised of epoch and xid  */
> > +#define GetXidFromEpochXid(epochxid)                 \
> > +     ((uint32) (epochxid) & 0XFFFFFFFF)
> > +
> > +/* Extract epoch from a value comprised of epoch and xid  */
> > +#define GetEpochFromEpochXid(epochxid)                       \
> > +     ((uint32) ((epochxid) >> 32))
> > +
>
> Why do these exist?
>

We don't need the second one (GetEpochFromEpochXid), but the first one
is required.  Basically, the oldestXidHavingUndo computation does
consider oldestXmin (which is still a TransactionId) as we can't
retain undo which is 2^31 transactions old due to other limitations
like clog/snapshots still has a limit of 4-byte transaction ids.
Slightly unrelated, but we do want to improve the undo retention in a
subsequent version such that we won't allow pending undo for
transaction whose age is more than 2^31.

>
> >       /* End-of-list marker */
> >       {
> >               {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
> > @@ -2923,6 +2935,16 @@ static struct config_int ConfigureNamesInt[] =
> >               5000, 1, INT_MAX,
> >               NULL, NULL, NULL
> >       },
> > +     {
> > +             {"rollback_overflow_size", PGC_USERSET, RESOURCES_MEM,
> > +                     gettext_noop("Rollbacks greater than this size are done lazily"),
> > +                     NULL,
> > +                     GUC_UNIT_MB
> > +             },
> > +             &rollback_overflow_size,
> > +             64, 0, MAX_KILOBYTES,
> > +             NULL, NULL, NULL
> > +     },
>
> rollback_foreground_size? rollback_background_size? I don't think
> overflow is particularly clear.
>

How about rollback_undo_size or abort_undo_size or
undo_foreground_size or pending_undo_size?

>
> > @@ -1612,6 +1635,85 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
> >
> >       MyLockedGxact = NULL;
> >
> > +     /*
> > +      * Perform undo actions, if there are undologs for this transaction. We
> > +      * need to perform undo actions while we are still in transaction. Never
> > +      * push rollbacks of temp tables to undo worker.
> > +      */
> > +     for (i = 0; i < UndoPersistenceLevels; i++)
> > +     {
>
> This should be in a separate function. And it'd be good if more code
> between this and ApplyUndoActions() would be shared.
>

makes sense, will try.

>
> > +     /*
> > +      * Here, we just detect whether there are any pending undo actions so that
> > +      * we can skip releasing the locks during abort transaction.  We don't
> > +      * release the locks till we execute undo actions otherwise, there is a
> > +      * risk of deadlock.
> > +      */
> > +     SetUndoActionsInfo();
>
> This function name is so generic that it gives the reader very little
> information about why it's called here (and in other similar places).
>

NeedToPerformUndoActions()? UndoActionsRequired()?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Robert Haas
On Wed, May 22, 2019 at 7:17 AM Amit Kapila <[hidden email]> wrote:

> > > +/* Extract xid from a value comprised of epoch and xid  */
> > > +#define GetXidFromEpochXid(epochxid)                 \
> > > +     ((uint32) (epochxid) & 0XFFFFFFFF)
> > > +
> > > +/* Extract epoch from a value comprised of epoch and xid  */
> > > +#define GetEpochFromEpochXid(epochxid)                       \
> > > +     ((uint32) ((epochxid) >> 32))
> > > +
> >
> > Why do these exist?
> >
>
> We don't need the second one (GetEpochFromEpochXid), but the first one
> is required.  Basically, the oldestXidHavingUndo computation does
> consider oldestXmin (which is still a TransactionId) as we can't
> retain undo which is 2^31 transactions old due to other limitations
> like clog/snapshots still has a limit of 4-byte transaction ids.
> Slightly unrelated, but we do want to improve the undo retention in a
> subsequent version such that we won't allow pending undo for
> transaction whose age is more than 2^31.

The point is that we now have EpochFromFullTransactionId and
XidFromFullTransactionId.  You shouldn't be inventing your own version
of that infrastructure.  Use FullTransactionId, not a uint64, and then
use the functions for dealing with full transaction IDs from
transam.h.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
On Wed, May 22, 2019 at 5:47 PM Robert Haas <[hidden email]> wrote:

>
> On Wed, May 22, 2019 at 7:17 AM Amit Kapila <[hidden email]> wrote:
> > > > +/* Extract xid from a value comprised of epoch and xid  */
> > > > +#define GetXidFromEpochXid(epochxid)                 \
> > > > +     ((uint32) (epochxid) & 0XFFFFFFFF)
> > > > +
> > > > +/* Extract epoch from a value comprised of epoch and xid  */
> > > > +#define GetEpochFromEpochXid(epochxid)                       \
> > > > +     ((uint32) ((epochxid) >> 32))
> > > > +
> > >
> > > Why do these exist?
> > >
> >
> > We don't need the second one (GetEpochFromEpochXid), but the first one
> > is required.  Basically, the oldestXidHavingUndo computation does
> > consider oldestXmin (which is still a TransactionId) as we can't
> > retain undo which is 2^31 transactions old due to other limitations
> > like clog/snapshots still has a limit of 4-byte transaction ids.
> > Slightly unrelated, but we do want to improve the undo retention in a
> > subsequent version such that we won't allow pending undo for
> > transaction whose age is more than 2^31.
>
> The point is that we now have EpochFromFullTransactionId and
> XidFromFullTransactionId.  You shouldn't be inventing your own version
> of that infrastructure.  Use FullTransactionId, not a uint64, and then
> use the functions for dealing with full transaction IDs from
> transam.h.
>

Okay, I misunderstood the comment.  I'll change accordingly.  Thanks
for pointing out.



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
In reply to this post by akapila
On Wed, May 22, 2019 at 4:47 PM Amit Kapila <[hidden email]> wrote:

>
> On Tue, May 21, 2019 at 10:47 PM Andres Freund <[hidden email]> wrote:
>
> > Some small remarks:
> >
> > >
> > > +     {
> > > +             {"disable_undo_launcher", PGC_POSTMASTER, DEVELOPER_OPTIONS,
> > > +                     gettext_noop("Decides whether to launch an undo worker."),
> > > +                     NULL,
> > > +                     GUC_NOT_IN_SAMPLE
> > > +             },
> > > +             &disable_undo_launcher,
> > > +             false,
> > > +             NULL, NULL, NULL
> > > +     },
> > > +
> >
> > We don't normally formulate GUCs in the negative like that. C.F.
> > autovacuum etc.
> >
>
> Okay, will change.  Actually, this is just for development purpose.
> It can help us in testing cases where we have pushed the undo, but it
> won't apply, so whenever the foreground process encounter such a
> transaction, it will perform the page-wise undo.  I am not 100% sure
> if we need this for the final version.  Similarly, for testing
> purpose, we might need enable_discard_worker to test the cases where
> discard doesn't happen for a long time.
>
Changed.

> >
> > > +/* Extract xid from a value comprised of epoch and xid  */
> > > +#define GetXidFromEpochXid(epochxid)                 \
> > > +     ((uint32) (epochxid) & 0XFFFFFFFF)
> > > +
> > > +/* Extract epoch from a value comprised of epoch and xid  */
> > > +#define GetEpochFromEpochXid(epochxid)                       \
> > > +     ((uint32) ((epochxid) >> 32))
> > > +
> >
> > Why do these exist?
> >
>
> We don't need the second one (GetEpochFromEpochXid), but the first one
> is required.  Basically, the oldestXidHavingUndo computation does
> consider oldestXmin (which is still a TransactionId) as we can't
> retain undo which is 2^31 transactions old due to other limitations
> like clog/snapshots still has a limit of 4-byte transaction ids.
> Slightly unrelated, but we do want to improve the undo retention in a
> subsequent version such that we won't allow pending undo for
> transaction whose age is more than 2^31.
>
Removed both the above defines.

> >
> > >       /* End-of-list marker */
> > >       {
> > >               {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
> > > @@ -2923,6 +2935,16 @@ static struct config_int ConfigureNamesInt[] =
> > >               5000, 1, INT_MAX,
> > >               NULL, NULL, NULL
> > >       },
> > > +     {
> > > +             {"rollback_overflow_size", PGC_USERSET, RESOURCES_MEM,
> > > +                     gettext_noop("Rollbacks greater than this size are done lazily"),
> > > +                     NULL,
> > > +                     GUC_UNIT_MB
> > > +             },
> > > +             &rollback_overflow_size,
> > > +             64, 0, MAX_KILOBYTES,
> > > +             NULL, NULL, NULL
> > > +     },
> >
> > rollback_foreground_size? rollback_background_size? I don't think
> > overflow is particularly clear.
> >
>
> How about rollback_undo_size or abort_undo_size or
> undo_foreground_size or pending_undo_size?
>
I think we need some more discussion on this before we change as
Robert seems to feel that we should have 'undo' someplace in the name.
Please let me know your
preference.

> >
> > > @@ -1612,6 +1635,85 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
> > >
> > >       MyLockedGxact = NULL;
> > >
> > > +     /*
> > > +      * Perform undo actions, if there are undologs for this transaction. We
> > > +      * need to perform undo actions while we are still in transaction. Never
> > > +      * push rollbacks of temp tables to undo worker.
> > > +      */
> > > +     for (i = 0; i < UndoPersistenceLevels; i++)
> > > +     {
> >
> > This should be in a separate function. And it'd be good if more code
> > between this and ApplyUndoActions() would be shared.
> >
>
> makes sense, will try.
>
Done.  Now, there is a common function that is used in twophase.c and
ApplyUndoActions.

> >
> > > +     /*
> > > +      * Here, we just detect whether there are any pending undo actions so that
> > > +      * we can skip releasing the locks during abort transaction.  We don't
> > > +      * release the locks till we execute undo actions otherwise, there is a
> > > +      * risk of deadlock.
> > > +      */
> > > +     SetUndoActionsInfo();
> >
> > This function name is so generic that it gives the reader very little
> > information about why it's called here (and in other similar places).
> >
>
> NeedToPerformUndoActions()? UndoActionsRequired()?
>
Changed to UndoActionsRequired and added comments atop of the function
to make it clear why and when this function needs to use.

Apart from fixing the above comments, the patch is rebased on latest
undo patchset.  As of now, I have split the binaryheap.c changes into
a separate patch.  We are stilll enhancing the patch to compute
oldestXidHavingUnappliedUndo which touches various parts of patch, so
splitting further without completing that can make it a bit difficult
to work on that.

Pending work
-------------------
1. Enhance uur_progress so that it updates undo action apply progress
at regular intervals.
2. Enhance to support oldestXidHavingUnappliedUndo, more on that later.
3. Split the patch.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

0001-Extend-binary-heap-functionality.patch (7K) Download Attachment
0002-Allow-undo-actions-to-be-applied-on-rollbacks-and-di.patch (267K) Download Attachment
12345 ... 17