POC: Cleaning up orphaned files using undo logs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
330 messages Options
1 ... 14151617
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
On Fri, Dec 4, 2020 at 1:50 PM Antonin Houska <[hidden email]> wrote:

>
> Amit Kapila <[hidden email]> wrote:
>
>
> > The earlier version of the patch having all these ideas
> > implemented is attached
> > (Infrastructure-to-execute-pending-undo-actions and
> > Provide-interfaces-to-store-and-fetch-undo-records). The second one
> > has some APIs used by the first one but the main concepts were
> > implemented in the first one
> > (Infrastructure-to-execute-pending-undo-actions). I see that in the
> > current version these can't be used as it is but still it can give us
> > a good start point and we might be able to either re-use some code and
> > or ideas from these patches.
>
> Is there a branch with these patches applied? They reference some functions
> that I don't see in [1]. I'd like to examine if / how my approach can be
> aligned with the current zheap design.
>

Can you once check in the patch-set attached in the email [1]?

[1] - https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhRq-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com
--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Dmitry Dolgov
In reply to this post by Antonin Houska-2
> On Fri, Dec 04, 2020 at 10:22:42AM +0100, Antonin Houska wrote:
> Amit Kapila <[hidden email]> wrote:
>
> > On Fri, Nov 13, 2020 at 6:02 PM Antonin Houska <[hidden email]> wrote:
> > >
> > > Amit Kapila <[hidden email]> wrote:
> > >
> > > > On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska <[hidden email]> wrote:
> > > >
> > > > If you want to track at undo record level, then won't it lead to
> > > > performance overhead and probably additional WAL overhead considering
> > > > this action needs to be WAL-logged. I think recording at page-level
> > > > might be a better idea.
> > >
> > > I'm not worried about WAL because the undo execution needs to be WAL-logged
> > > anyway - see smgr_undo() in the 0005- part of the patch set. What needs to be
> > > evaluated regarding performance is the (exclusive) locking of the page that
> > > carries the progress information.
> > >
> >
> > That is just for one kind of smgr, think how you will do it for
> > something like zheap. Their idea is to collect all the undo records
> > (unless the undo for a transaction is very large) for one zheap-page
> > and apply them together, so maintaining the status at each undo record
> > level will surely lead to a large amount of additional WAL. See below
> > how and why we have decided to do it differently.
> >
> > > I'm still not sure whether this info should
> > > be on every page or only in the chunk header. In either case, we have a
> > > problem if there are two or more chunks created by different transactions on
> > > the same page, and if more than on of these transactions need to perform
> > > undo. I tend to believe that this should happen rarely though.
> > >
> >
> > I think we need to maintain this information at the transaction level
> > and need to update it after processing a few blocks, at least that is
> > what was decided and implemented earlier. We also need to update it
> > when the log is switched or all the actions of the transaction were
> > applied. The reasoning is that for short transactions it won't matter
> > and for larger transactions, it is good to update it after a few pages
> > to avoid WAL and locking overhead. Also, it is better if we collect
> > the undo in bulk, this is proved to be beneficial for large
> > transactions.
>
> Attached is what I originally did not include in the patch series, see the
> part 0012. I have no better idea so far. The progress information is stored in
> the chunk header.
>
> To avoid too frequent locking, maybe the UpdateLastAppliedRecord() function
> can be modified so it recognizes when it's necessary to update the progress
> info. Also the user (zheap) should think when it should call the function.
> Since I've included 0012 now as a prerequisite for discarding (0013),
> currently it's only necessary to update the progress at undo log chunk
> boundary.
>
> In this version of the patch series I wanted to publish the remaining ideas I
> haven't published yet.

Thanks for the updated patch. As I've mentioned off the list I'm slowly
looking through it with the intent to concentrate on undo progress
tracking. But before I will post anything I want to mention couple of
strange issues I see, otherwise I will forget for sure. Maybe it's
already known, but running several times 'make installcheck' against a
freshly build postgres with the patch applied from time to time I
observe various errors.

This one happens on a crash recovery, seems like
UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in
the replay process:

    TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line: 300)
    postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350]
    postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e]
    postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32]
    postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281]
    postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd]

This one is somewhat similar:

    TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287)
    postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350]
    postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3]
    postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008]
    postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989]

There are also here and there messages about not found undo files:

    ERROR:  cannot open undo segment file 'base/undo/000008.0000020000': No such file or directory
    WARNING:  failed to undo transaction

I haven't found out the trigger yet, but got an impression that it
happens after create_table tests.


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Antonin Houska-2
Dmitry Dolgov <[hidden email]> wrote:

> Thanks for the updated patch. As I've mentioned off the list I'm slowly
> looking through it with the intent to concentrate on undo progress
> tracking. But before I will post anything I want to mention couple of
> strange issues I see, otherwise I will forget for sure. Maybe it's
> already known, but running several times 'make installcheck' against a
> freshly build postgres with the patch applied from time to time I
> observe various errors.
>
> This one happens on a crash recovery, seems like
> UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in
> the replay process:
>
>     TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line: 300)
>     postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350]
>     postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e]
>     postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32]
>     postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281]
>     postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd]
>
> This one is somewhat similar:
>
>     TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287)
>     postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350]
>     postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3]
>     postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008]
>     postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989]

Well, on repeated run of the test I could also hit the first one. I could fix
it and will post a new version of the patch (along with some other small
changes) this week.

> There are also here and there messages about not found undo files:
>
>     ERROR:  cannot open undo segment file 'base/undo/000008.0000020000': No such file or directory
>     WARNING:  failed to undo transaction

I don't see this one in the log so far, will try again.

Thanks for the report!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Antonin Houska-2
Antonin Houska <[hidden email]> wrote:

> Dmitry Dolgov <[hidden email]> wrote:
>
> > Thanks for the updated patch. As I've mentioned off the list I'm slowly
> > looking through it with the intent to concentrate on undo progress
> > tracking. But before I will post anything I want to mention couple of
> > strange issues I see, otherwise I will forget for sure. Maybe it's
> > already known, but running several times 'make installcheck' against a
> > freshly build postgres with the patch applied from time to time I
> > observe various errors.
> >
> > This one happens on a crash recovery, seems like
> > UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in
> > the replay process:
> >
> >     TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line: 300)
> >     postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350]
> >     postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e]
> >     postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32]
> >     postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281]
> >     postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd]
> >
> > This one is somewhat similar:
> >
> >     TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287)
> >     postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350]
> >     postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3]
> >     postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008]
> >     postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989]
>
> Well, on repeated run of the test I could also hit the first one. I could fix
> it and will post a new version of the patch (along with some other small
> changes) this week.
Attached is the next version. Changes done:

  * Removed the progress tracking and implemented undo discarding in a simpler
    way. Now, instead of maintaining the pointer to the last record applied,
    only a boolean field in the chunk header is set when ROLLBACK is
    done. This helps to determine whether the undo of a non-committed
    transaction can be discarded.

  * Removed the "undo worker" that the previous version only used to apply the
    undo after crash recovery. The startup process does the work now.

  * Umplemented cleanup after crashed CREATE DATABASE and ALTER DATABASE ... SET TABLESPACE.

    BTW, I wonder if this change allows these commands to be executed in a
    transaction block. I think the reason to prohibit that is to minimize the
    window between creation of the files and transaction commit - if the
    server crashes in that window, the new database files survive but the
    catalog changes don't. But maybe there are other reasons. (I don't claim
    it's terribly useful to create database in a transaction block though
    because the client cannot connect to it w/o leaving the current
    transaction.)

  * Reordered the diffs, i.e. moved the discarding in front of the actual
    features.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


undo-20210129.tgz (250K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Bruce Momjian
On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
> Antonin Houska <[hidden email]> wrote:
> > Well, on repeated run of the test I could also hit the first one. I could fix
> > it and will post a new version of the patch (along with some other small
> > changes) this week.
>
> Attached is the next version. Changes done:

Yikes, this patch is 23k lines, and most of it looks like added lines of
code.  Is this size expected?

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Antonin Houska-2
Bruce Momjian <[hidden email]> wrote:

> On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
> > Antonin Houska <[hidden email]> wrote:
> > > Well, on repeated run of the test I could also hit the first one. I could fix
> > > it and will post a new version of the patch (along with some other small
> > > changes) this week.
> >
> > Attached is the next version. Changes done:
>
> Yikes, this patch is 23k lines, and most of it looks like added lines of
> code.  Is this size expected?

Yes, earlier versions of this patch, e.g. [1], were of comparable size. It's
not really an "ordinary patch".

[1] https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhRq-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

RE: POC: Cleaning up orphaned files using undo logs

tsunakawa.takay@fujitsu.com
From: Antonin Houska <[hidden email]>
> not really an "ordinary patch".
>
> [1]
> https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhR
> q-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com

I'm a bit interested in zheap-related topics.  I'm reading this discussion to see what I can do.  (But this thread is too long... there are still 13,000 lines out of 45,000 lines.)

What's the latest patch set to look at to achieve the undo infrastructure and its would-be first user, orphan file cleanup?  As far as I've read, multiple people posted multiple patch sets, and I don't see how they are related.


Regards
Takayuki Tsunakawa




Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

akapila
On Wed, Feb 3, 2021 at 2:45 PM [hidden email]
<[hidden email]> wrote:

>
> From: Antonin Houska <[hidden email]>
> > not really an "ordinary patch".
> >
> > [1]
> > https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhR
> > q-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com
>
> I'm a bit interested in zheap-related topics.  I'm reading this discussion to see what I can do.  (But this thread is too long... there are still 13,000 lines out of 45,000 lines.)
>
> What's the latest patch set to look at to achieve the undo infrastructure and its would-be first user, orphan file cleanup?  As far as I've read, multiple people posted multiple patch sets, and I don't see how they are related.
>

I feel it is good to start with the latest patch-set posted by Antonin [1].

[1] - https://www.postgresql.org/message-id/87363.1611941415%40antos

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

RE: POC: Cleaning up orphaned files using undo logs

tsunakawa.takay@fujitsu.com
I'm crawling like a snail to read the patch set.  Below are my first set of review comments, which are all minor.


(1)
+     <indexterm><primary>tablespace</primary><secondary>temporary</secondary></indexterm>

temporary -> undo


(2)
      <term><varname>undo_tablespaces</varname> (<type>string</type>)
+
...
+        The value is a list of names of tablespaces.  When there is more than
+        one name in the list, <productname>PostgreSQL</productname> chooses an
+        arbitrary one.  If the name doesn't correspond to an existing
+        tablespace, the next name is tried, and so on until all names have
+        been tried.  If no valid tablespace is specified, an error is raised.
+        The validation of the name doesn't happen until the first attempt to
+        write undo data.

CREATE privilege needs to be mentioned like temp_tablespaces.


(3)
+        The variable can only be changed before the first statement is
+        executed in a transaction.

Does it include any first statement that doesn't emit undo?

(4)
+      <entry>One row for each undo log, showing current pointers,
+       transactions and backends.
+       See <xref linkend="pg-stat-undo-logs-view"/> for details.

I think this can just be written like "showing usage information about the undo log" just like other statistics views.  That way, we can avoid having to modify this sentence when we want to change the content of the view later.


(5)
+     <entry><structfield>discard</structfield></entry>
+     <entry><type>text</type></entry>
+     <entry>Location of the oldest data in this undo log.</entry>

The name does not match the description intuitively.  Maybe "oldest"?

BTW, how does this information help users?  (I don't mean to say we shouldn't output information that users cannot interpret; other DBMSs output such information probably for technical support staff.)


(6)
+     <entry><structfield>insert</structfield></entry>
+     <entry><type>text</type></entry>
+     <entry>Location where the next data will be written in this undo
+      log.</entry>
...
+     <entry><structfield>end</structfield></entry>
+     <entry><type>text</type></entry>
+     <entry>Location one byte past the end of the allocated physical storage
+      backing this undo log.</entry>

Again, how can these be used?  If they are useful to calculate the amount of used space, shouldn't they be bigint?


(7)
@@ -65,7 +65,7 @@
        <structfield>smgrid</structfield> <type>integer</type>
         </para>
         <para>
-         Block storage manager ID.  0 for regular relation data.</entry>
+         Block storage manager ID.  0 for regular relation data.
         </para></entry>
      </row>

I guess this change is mistakenly included?


(8)
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
@@ -216,6 +216,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
 <!ENTITY pgwaldump          SYSTEM "pg_waldump.sgml">
+<!ENTITY pgundodump         SYSTEM "pg_undo_dump.sgml">
 <!ENTITY postgres           SYSTEM "postgres-ref.sgml">

@@ -286,6 +286,7 @@
    &pgtesttiming;
    &pgupgrade;
    &pgwaldump;
+   &pgundodump;
    &postgres;

It looks like this list needs to be ordered alphabetically.  So, the new line is better placed between pg_test_timing and pg_upgrade?


(9)
I don't want to be disliked because of being picky, but maybe pg_undo_dump should be pg_undodump.  Existing commands don't use '_' to separate words after pg_, except for pg_test_fsync and pg_test_timing.


(10)
+   This utility can only be run by the user who installed the server, because
+   it requires read-only access to the data directory.

I guess you copied this from pg_waldump or pg_resetwal, but I'm afraid this should be as follows, which is an excerpt from pg_controldata's page.  (The pages for pg_waldump and pg_resetwal should be fixed in a separate thread.)

This utility can only be run by the user who initialized the cluster because it requires read access to the data directory.


(11)
+    The <option>-m</option> option cannot be used if
+    either <option>-c</option> or <option>-l</option> is used.

-l -> -r
Or, why don't we align option characters with pg_waldump?  pg_waldump uses -r to filter by rmgr.  pg_undodump can output record contents by default like pg_waldump.  Considering pg_dump and pg_dumpall also output all data by default, that seems how PostgreSQL commands behave.


(12)
+   <arg choice="opt"><option>startseg</option><arg choice="opt"><option>endseg</option></arg></arg>

startseg and endseg are not described.


(13)
+Undo files backing undo logs in the default tablespace are stored under
...
+Undo log files contain standard page headers as described in the next section,

Fluctuations in expressions can be seen: undo file and undo log file.  I think the following "undo data file" fits best.  What do you think?

+      <entry><literal>UndoFileRead</literal></entry>
+      <entry>Waiting for a read from an undo data file.</entry>


(14)
+Undo data exists in a 64-bit address space divided into 2^34 undo
+logs, each with a theoretical capacity of 1TB.  The first time a
+backend writes undo, it attaches to an existing undo log whose
+capacity is not yet exhausted and which is not currently being used by
+any other backend; or if no suitable undo log already exists, it
+creates a new one.  To avoid wasting space, each undo log is further
+divided into 1MB segment files, so that segments which are no longer
+needed can be removed (possibly recycling the underlying file by
+renaming it) and segments which are not yet needed do not need to be
+physically created on disk.  An undo segment file has a name like
+<filename>000004.0001200000</filename>, where
+<filename>000004</filename> is the undo log number and
+<filename>0001200000</filename> is the offset of the first byte
+held in the file.

The number of undo logs is not 2^34 but 2^24 (2^64 - 2^40 (1 TB)).


(15) src/backend/access/undo/README
\ No newline at end of file

Let's add a newline.


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: POC: Cleaning up orphaned files using undo logs

Antonin Houska-2
[hidden email] <[hidden email]> wrote:

> I'm crawling like a snail to read the patch set.  Below are my first set of review comments, which are all minor.

Thanks. I will check your comments when I'll be preparing the next version of
the patch.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


1 ... 14151617