Speedup of relation deletes during recovery

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Speedup of relation deletes during recovery

Fujii Masao-2
Hi,

When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
during recovery, smgrdounlink() (not smgrdounlinkall()) is called
for each file to delete, which leads to scan shared_buffers multiple times.
Obviously this can cause to increase the WAL replay time very much
especially when shared_buffers is huge.

To alleviate this situation, I'd like to propose to change the recovery
so that it also calls smgrdounlinkall() only one time to delete multiple
relation files. Patch attached. Thought?

Regards,

--
Fujii Masao

speedup_relation_deletes_during_recovery_v1.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Kyotaro HORIGUCHI-2
Hello.

At Fri, 30 Mar 2018 08:31:29 +0900, Fujii Masao <[hidden email]> wrote in <[hidden email]>

> Hi,
>
> When multiple relations are deleted at the same transaction,
> the files of those relations are deleted by one call to smgrdounlinkall(),
> which leads to scan whole shared_buffers only one time. OTOH,
> during recovery, smgrdounlink() (not smgrdounlinkall()) is called
> for each file to delete, which leads to scan shared_buffers multiple times.
> Obviously this can cause to increase the WAL replay time very much
> especially when shared_buffers is huge.
>
> To alleviate this situation, I'd like to propose to change the recovery
> so that it also calls smgrdounlinkall() only one time to delete multiple
> relation files. Patch attached. Thought?

It is obviously a left-over of 279628a0a7. This patch applies the
same change with the patch and looks fine for me. Note that
FinishPreparedTransaction has the same loop over smgrdounlink.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Michael Paquier-2
On Fri, Mar 30, 2018 at 11:19:58AM +0900, Kyotaro HORIGUCHI wrote:

> At Fri, 30 Mar 2018 08:31:29 +0900, Fujii Masao <[hidden email]> wrote in <[hidden email]>
>> When multiple relations are deleted at the same transaction,
>> the files of those relations are deleted by one call to smgrdounlinkall(),
>> which leads to scan whole shared_buffers only one time. OTOH,
>> during recovery, smgrdounlink() (not smgrdounlinkall()) is called
>> for each file to delete, which leads to scan shared_buffers multiple times.
>> Obviously this can cause to increase the WAL replay time very much
>> especially when shared_buffers is huge.
>>
>> To alleviate this situation, I'd like to propose to change the recovery
>> so that it also calls smgrdounlinkall() only one time to delete multiple
>> relation files. Patch attached. Thought?
>
> It is obviously a left-over of 279628a0a7. This patch applies the
> same change with the patch and looks fine for me. Note that
> FinishPreparedTransaction has the same loop over smgrdounlink.
Yeah, I was just going to say the same after looking at Fujii-san's
patch.  This would also cause smgrdounlink() to become unused in the
core code.  So this could just be... Removed?

/me Preparing shelter against incoming wraith.

That's not v11 material at this point in any case.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Speedup of relation deletes during recovery

Tsunakawa, Takayuki
In reply to this post by Fujii Masao-2
From: Fujii Masao [mailto:[hidden email]]

> When multiple relations are deleted at the same transaction, the files of
> those relations are deleted by one call to smgrdounlinkall(), which leads
> to scan whole shared_buffers only one time. OTOH, during recovery,
> smgrdounlink() (not smgrdounlinkall()) is called for each file to delete,
> which leads to scan shared_buffers multiple times.
> Obviously this can cause to increase the WAL replay time very much especially
> when shared_buffers is huge.
>
> To alleviate this situation, I'd like to propose to change the recovery
> so that it also calls smgrdounlinkall() only one time to delete multiple
> relation files. Patch attached. Thought?

Nice catch.  As Horiguchi-san and Michael already commented, the patch looks good.

As a related improvement, the following proposal is effective for shortening WAL replay time of DROP TABLE (and possibly TRUNCATE as well.)  How should we proceed with this?

https://www.postgresql.org/message-id/A1CF58A8CBA14341B3F3AC6A468F18454545E4F3@g01jpexmbyt23


Furthermore, TRUNCATE has a similar and worse issue.  While DROP TABLE scans the shared buffers once for each table, TRUNCATE does that for each fork, resulting in three scans per table.  How about changing the following functions

smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
                                           BlockNumber firstDelBlock)

to

smgrtruncate(SMgrRelation reln, ForkNumber *forknum, BlockNumber *nblocks, int nforks)
DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
                                           BlockNumber *firstDelBlock, int nforks)

to perform the scan only once per table?  If there doesn't seem to be a problem, I think I'll submit the patch next month.  I'd welcome if anyone could do that.


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Jerry Sievers-3
In reply to this post by Fujii Masao-2
Fujii Masao <[hidden email]> writes:

> Hi,
>
> When multiple relations are deleted at the same transaction,
> the files of those relations are deleted by one call to smgrdounlinkall(),
> which leads to scan whole shared_buffers only one time. OTOH,
> during recovery, smgrdounlink() (not smgrdounlinkall()) is called
> for each file to delete, which leads to scan shared_buffers multiple times.
> Obviously this can cause to increase the WAL replay time very much
> especially when shared_buffers is huge.

Wonder if this is the case for streaming standbys replaying truncates
also?

Just couple days ago I was running a pg_upgrade test scenario but did
not reach the point of upgrade yet.

We made snapshots of our large reporting system putting them into a
master and 1-slave streaming replication configuration.

There are hundreds of unlogged tables that we wish to trunc before the
upgrade to save time during the rsyncing of standbys, since a standard
rsync replicates all data which is timeconsumeing and useless sending to
the standbys.

Indeed the tables are already trunc'd on the test master since it too
was a recovered system upon initial start but the truncates took place
anyway since it's part of our framework.  It ran in just a few seconds
on master.

The standby however was $slow replaying these truncates which we noticed
because the upgrade wil not proceed until master and 1 or more standbys
are confirmed all at same checkpoint.

I straced the standby's startup process to find it unlinking lots of
tables, getting -1 on the unlink syscall since the non-init fork files
were already missing.

I can't describe just how slow if was but took minutes and the lines
being output by strace were *not* blowing up my display as happens
generally when any busy process is straced.

And, the test systems were config'd with $large shared buffers of 64G.

Please advise


>
> To alleviate this situation, I'd like to propose to change the recovery
> so that it also calls smgrdounlinkall() only one time to delete multiple
> relation files. Patch attached. Thought?
>
> Regards,

--
Jerry Sievers
Postgres DBA/Development Consulting
e: [hidden email]
p: 312.241.7800

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Jerry Sievers-3
In reply to this post by Fujii Masao-2
Fujii Masao <[hidden email]> writes:

> Hi,
>
> When multiple relations are deleted at the same transaction,
> the files of those relations are deleted by one call to smgrdounlinkall(),
> which leads to scan whole shared_buffers only one time. OTOH,
> during recovery, smgrdounlink() (not smgrdounlinkall()) is called
> for each file to delete, which leads to scan shared_buffers multiple times.
> Obviously this can cause to increase the WAL replay time very much
> especially when shared_buffers is huge.

Forgot to mention version in my TLDR prev reply :-)

                                                                    version                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
(1 row)


>
> To alleviate this situation, I'd like to propose to change the recovery
> so that it also calls smgrdounlinkall() only one time to delete multiple
> relation files. Patch attached. Thought?
>
> Regards,

--
Jerry Sievers
Postgres DBA/Development Consulting
e: [hidden email]
p: 312.241.7800

Reply | Threaded
Open this post in threaded view
|

RE: Speedup of relation deletes during recovery

Tsunakawa, Takayuki
In reply to this post by Jerry Sievers-3
From: Jerry Sievers [mailto:[hidden email]]
> Wonder if this is the case for streaming standbys replaying truncates
> also?

Yes, As I wrote in my previous mail, TRUNCATE is worse than DROP TABLE.

Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Fujii Masao-2
In reply to this post by Michael Paquier-2
On Fri, Mar 30, 2018 at 11:46 AM, Michael Paquier <[hidden email]> wrote:

> On Fri, Mar 30, 2018 at 11:19:58AM +0900, Kyotaro HORIGUCHI wrote:
>> At Fri, 30 Mar 2018 08:31:29 +0900, Fujii Masao <[hidden email]> wrote in <[hidden email]>
>>> When multiple relations are deleted at the same transaction,
>>> the files of those relations are deleted by one call to smgrdounlinkall(),
>>> which leads to scan whole shared_buffers only one time. OTOH,
>>> during recovery, smgrdounlink() (not smgrdounlinkall()) is called
>>> for each file to delete, which leads to scan shared_buffers multiple times.
>>> Obviously this can cause to increase the WAL replay time very much
>>> especially when shared_buffers is huge.
>>>
>>> To alleviate this situation, I'd like to propose to change the recovery
>>> so that it also calls smgrdounlinkall() only one time to delete multiple
>>> relation files. Patch attached. Thought?
>>
>> It is obviously a left-over of 279628a0a7. This patch applies the
>> same change with the patch and looks fine for me. Note that
>> FinishPreparedTransaction has the same loop over smgrdounlink.
Thanks for the review! I also changed FinishPreparedTransaction() so that
it uses smgrdounlinkall(). Patch attached.

> Yeah, I was just going to say the same after looking at Fujii-san's
> patch.  This would also cause smgrdounlink() to become unused in the
> core code.  So this could just be... Removed?

Yes, I think. And, I found that smgrdounlinkfork() is also dead code.
Per the discussion [1], this unused function was left intentionally.
But it's still dead code since 2012, so I'd like to remove it. Patch attached.

[1]
https://www.postgresql.org/message-id/1471.1339106082@...

Regards,

--
Fujii Masao

speedup_relation_deletes_during_recovery_v2.patch (4K) Download Attachment
remove_smgrdounlink_v1.patch (4K) Download Attachment
remove_smgrdounlinkfork_v1.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Fujii Masao-2
In reply to this post by Tsunakawa, Takayuki
On Fri, Mar 30, 2018 at 12:18 PM, Tsunakawa, Takayuki
<[hidden email]> wrote:

> From: Fujii Masao [mailto:[hidden email]]
>> When multiple relations are deleted at the same transaction, the files of
>> those relations are deleted by one call to smgrdounlinkall(), which leads
>> to scan whole shared_buffers only one time. OTOH, during recovery,
>> smgrdounlink() (not smgrdounlinkall()) is called for each file to delete,
>> which leads to scan shared_buffers multiple times.
>> Obviously this can cause to increase the WAL replay time very much especially
>> when shared_buffers is huge.
>>
>> To alleviate this situation, I'd like to propose to change the recovery
>> so that it also calls smgrdounlinkall() only one time to delete multiple
>> relation files. Patch attached. Thought?
>
> Nice catch.  As Horiguchi-san and Michael already commented, the patch looks good.
>
> As a related improvement, the following proposal is effective for shortening WAL replay time of DROP TABLE (and possibly TRUNCATE as well.)  How should we proceed with this?
>
> https://www.postgresql.org/message-id/A1CF58A8CBA14341B3F3AC6A468F18454545E4F3@g01jpexmbyt23
>
>
> Furthermore, TRUNCATE has a similar and worse issue.  While DROP TABLE scans the shared buffers once for each table, TRUNCATE does that for each fork, resulting in three scans per table.  How about changing the following functions
>
> smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
> DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
>                                            BlockNumber firstDelBlock)
>
> to
>
> smgrtruncate(SMgrRelation reln, ForkNumber *forknum, BlockNumber *nblocks, int nforks)
> DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
>                                            BlockNumber *firstDelBlock, int nforks)
>
> to perform the scan only once per table?  If there doesn't seem to be a problem, I think I'll submit the patch next month.  I'd welcome if anyone could do that.

Yeah, it's worth working on this problem. To decrease the number of scans of
shared_buffers, you would need to change the order of truncations of files and
WAL logging. In RelationTruncate(), currently WAL is logged after FSM and VM
are truncated. IOW, with the patch, FSM and VM would need to be truncated after
WAL logging. You would need to check whether this reordering is valid.

Regards,

--
Fujii Masao

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Michael Paquier-2
In reply to this post by Fujii Masao-2
On Wed, Apr 18, 2018 at 12:46:58AM +0900, Fujii Masao wrote:
> Yes, I think. And, I found that smgrdounlinkfork() is also dead code.
> Per the discussion [1], this unused function was left intentionally.
> But it's still dead code since 2012, so I'd like to remove it. Patch attached.

Indeed, it's close to six years and the status is the same.  So let's
drop it.  I have been surrounding the area to see if any modules
actually use those, particularly on github, but I could not find
callers.

The patch looks logically fine to me.  In your first message, you
mentioned that the replay time increased a lot.  Do you have numbers to
share with some large settings of shared_buffers?

It would be better to wait for v12 branch to open before pushing
anything, as the focus is on stabililizing things on v11.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Fujii Masao-2
On Wed, Apr 18, 2018 at 10:44 AM, Michael Paquier <[hidden email]> wrote:

> On Wed, Apr 18, 2018 at 12:46:58AM +0900, Fujii Masao wrote:
>> Yes, I think. And, I found that smgrdounlinkfork() is also dead code.
>> Per the discussion [1], this unused function was left intentionally.
>> But it's still dead code since 2012, so I'd like to remove it. Patch attached.
>
> Indeed, it's close to six years and the status is the same.  So let's
> drop it.  I have been surrounding the area to see if any modules
> actually use those, particularly on github, but I could not find
> callers.
>
> The patch looks logically fine to me.  In your first message, you
> mentioned that the replay time increased a lot.  Do you have numbers to
> share with some large settings of shared_buffers?

No. But after my colleague truncated more than one hundred tables on
the server with shared_buffers = 300GB, the recovery could not finish
even after 10 minutes since the startup of the recovery. So I had to
shutdown the server immediately, set shared_buffers to very small
temporarily and start the server to cause the recovery to finish soon.

> It would be better to wait for v12 branch to open before pushing
> anything, as the focus is on stabililizing things on v11.

Yes, so I added this to next CommitFest.

Regards,

--
Fujii Masao

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Michael Paquier-2
On Thu, Apr 19, 2018 at 01:52:26AM +0900, Fujii Masao wrote:
> No. But after my colleague truncated more than one hundred tables on
> the server with shared_buffers = 300GB, the recovery could not finish
> even after 10 minutes since the startup of the recovery. So I had to
> shutdown the server immediately, set shared_buffers to very small
> temporarily and start the server to cause the recovery to finish soon.

Hm, OK.  Here are simple functions to create and drop many relations in
a single transaction:
create or replace function create_tables(numtabs int)
returns void as $$
declare query_string text;
begin
  for i in 1..numtabs loop
    query_string := 'create table tab_' || i::text || ' (a int);';
    execute query_string;
  end loop;
end;
$$ language plpgsql;

create or replace function drop_tables(numtabs int)
returns void as $$
declare query_string text;
begin
  for i in 1..numtabs loop
    query_string := 'drop table tab_' || i::text;
    execute query_string;
  end loop;
end;
$$ language plpgsql;

With a server running 8GB of shared buffers (you likely need to increase
max_locks_per_transaction) and 10k relations created and dropped, it
took 50 seconds to finish redo of roughly 4 segments:
2018-04-19 11:17:15 JST [7472]: [14-1] db=,user=,app=,client= LOG:  redo
starts at 0/15DF2E8
2018-04-19 11:18:05 JST [7472]: [15-1] db=,user=,app=,client= LOG:
invalid record length at 0/4E46160: wanted 24, got 0
2018-04-19 11:18:05 JST [7472]: [16-1] db=,user=,app=,client= LOG:  redo
done at 0/4A7C6E

Then with your patch it took... Barely 1 second.
2018-04-19 11:20:33 JST [11690]: [14-1] db=,user=,app=,client= LOG:
redo starts at 0/15DF2E8
2018-04-19 11:20:34 JST [11690]: [15-1] db=,user=,app=,client= LOG:
invalid record length at 0/4E299B0: wanted 24, got 0
2018-04-19 11:20:34 JST [11690]: [16-1] db=,user=,app=,client= LOG:
redo done at 0/4E29978

Looking at profiles with perf I can also see that 95% of the time is
spent in DropRelFileNodesAllBuffers which is where the startup process
spends most of its CPU.  So HEAD is booh.  And your patch is excellent
in this context.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Speedup of relation deletes during recovery

Tsunakawa, Takayuki
In reply to this post by Fujii Masao-2
From: Fujii Masao [mailto:[hidden email]]
> Yeah, it's worth working on this problem. To decrease the number of scans
> of
> shared_buffers, you would need to change the order of truncations of files
> and
> WAL logging. In RelationTruncate(), currently WAL is logged after FSM and
> VM
> are truncated. IOW, with the patch, FSM and VM would need to be truncated
> after
> WAL logging. You would need to check whether this reordering is valid.

Sure, thank you for advice.

Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Andres Freund
In reply to this post by Fujii Masao-2
Hi,

We just had a customer hit this issue. I kind of wonder whether this
shouldn't be backpatched: Currently the execution on the primary is
O(NBuffers * log(ndrels)) whereas it's O(NBuffers * ndrels) on the
standby - with a lot higher constants to boot.  That means it's very
easy to get into situations where the standy starts to lag behind very significantly.

> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -1445,6 +1445,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
>   int ndelrels;
>   SharedInvalidationMessage *invalmsgs;
>   int i;
> + SMgrRelation *srels = NULL;
>  
>   /*
>   * Validate the GID, and lock the GXACT to ensure that two backends do not
> @@ -1534,13 +1535,16 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
>   delrels = abortrels;
>   ndelrels = hdr->nabortrels;
>   }
> +
> + srels = palloc(sizeof(SMgrRelation) * ndelrels);
>   for (i = 0; i < ndelrels; i++)
> - {
> - SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
> + srels[i] = smgropen(delrels[i], InvalidBackendId);
>  
> - smgrdounlink(srel, false);
> - smgrclose(srel);
> - }
> + smgrdounlinkall(srels, ndelrels, false);
> +
> + for (i = 0; i < ndelrels; i++)
> + smgrclose(srels[i]);
> + pfree(srels);

This code is now duplicated three times - shouldn't we just add a
function that encapsulates dropping relations in a commit/abort record?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Teodor Sigaev
> We just had a customer hit this issue. I kind of wonder whether this
> shouldn't be backpatched: Currently the execution on the primary is
> O(NBuffers * log(ndrels)) whereas it's O(NBuffers * ndrels) on the
> standby - with a lot higher constants to boot.  That means it's very
> easy to get into situations where the standy starts to lag behind very significantly.
+1, we faced with that too
--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Andres Freund
In reply to this post by Andres Freund
Hi,

On 2018-06-15 10:45:04 -0700, Andres Freund wrote:

> > +
> > + srels = palloc(sizeof(SMgrRelation) * ndelrels);
> >   for (i = 0; i < ndelrels; i++)
> > - {
> > - SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
> > + srels[i] = smgropen(delrels[i], InvalidBackendId);
> >  
> > - smgrdounlink(srel, false);
> > - smgrclose(srel);
> > - }
> > + smgrdounlinkall(srels, ndelrels, false);
> > +
> > + for (i = 0; i < ndelrels; i++)
> > + smgrclose(srels[i]);
> > + pfree(srels);

Hm. This will probably cause another complexity issue. If we just
smgropen() the relation will be unowned. Which means smgrclose() will
call remove_from_unowned_list(), which is O(open-relations). Which means
this change afaict creates a new O(ndrels^2) behaviour?

It seems like the easiest fix for that would be to have a local
SMgrRelationData in that loop, that we assign the relations to?  That's
a bit hacky, but afaict should work?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Fujii Masao-2
On Sat, Jun 16, 2018 at 3:28 AM, Andres Freund <[hidden email]> wrote:

> Hi,
>
> On 2018-06-15 10:45:04 -0700, Andres Freund wrote:
>> > +
>> > +   srels = palloc(sizeof(SMgrRelation) * ndelrels);
>> >     for (i = 0; i < ndelrels; i++)
>> > -   {
>> > -           SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
>> > +           srels[i] = smgropen(delrels[i], InvalidBackendId);
>> >
>> > -           smgrdounlink(srel, false);
>> > -           smgrclose(srel);
>> > -   }
>> > +   smgrdounlinkall(srels, ndelrels, false);
>> > +
>> > +   for (i = 0; i < ndelrels; i++)
>> > +           smgrclose(srels[i]);
>> > +   pfree(srels);
>
> Hm. This will probably cause another complexity issue. If we just
> smgropen() the relation will be unowned. Which means smgrclose() will
> call remove_from_unowned_list(), which is O(open-relations). Which means
> this change afaict creates a new O(ndrels^2) behaviour?
>
> It seems like the easiest fix for that would be to have a local
> SMgrRelationData in that loop, that we assign the relations to?  That's
> a bit hacky, but afaict should work?

The easier (but tricky) fix for that is to call smgrclose() for
each SMgrRelation in the reverse order. That is, we should do

   for (i = ndelrels - 1; i >= 0; i--)
           smgrclose(srels[i]);

instead of

   for (i = 0; i < ndelrels; i++)
           smgrclose(srels[i]);

Since add_to_unowned_list() always adds the entry into the head of
the list, each SMgrRelation entry is added into the "unowned" list in
descending order of its identifier, i.e., SMgrRelation entry with larger
identifier should be placed ahead of one with smaller identifier.
So if we calls remove_from_unowed_list() in descending order of
SMgrRelation entry's identifier, the performance of
remove_from_unowned_list()'s search should O(1). IOW, we can
immediately find the SMgrRelation entry to remove, at the head of
the list.

Regards,

--
Fujii Masao

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Fujii Masao-2
In reply to this post by Teodor Sigaev
On Sat, Jun 16, 2018 at 2:54 AM, Teodor Sigaev <[hidden email]> wrote:
>> We just had a customer hit this issue. I kind of wonder whether this
>> shouldn't be backpatched: Currently the execution on the primary is
>> O(NBuffers * log(ndrels)) whereas it's O(NBuffers * ndrels) on the
>> standby - with a lot higher constants to boot.  That means it's very
>> easy to get into situations where the standy starts to lag behind very
>> significantly.
>
> +1, we faced with that too

+1 to back-patch. As Horiguchi-san pointed out, this is basically
the fix for oversight of commit 279628a0a7, not new feature.

Regards,

--
Fujii Masao

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Andres Freund
In reply to this post by Fujii Masao-2
On 2018-06-19 03:06:54 +0900, Fujii Masao wrote:

> On Sat, Jun 16, 2018 at 3:28 AM, Andres Freund <[hidden email]> wrote:
> > Hi,
> >
> > On 2018-06-15 10:45:04 -0700, Andres Freund wrote:
> >> > +
> >> > +   srels = palloc(sizeof(SMgrRelation) * ndelrels);
> >> >     for (i = 0; i < ndelrels; i++)
> >> > -   {
> >> > -           SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
> >> > +           srels[i] = smgropen(delrels[i], InvalidBackendId);
> >> >
> >> > -           smgrdounlink(srel, false);
> >> > -           smgrclose(srel);
> >> > -   }
> >> > +   smgrdounlinkall(srels, ndelrels, false);
> >> > +
> >> > +   for (i = 0; i < ndelrels; i++)
> >> > +           smgrclose(srels[i]);
> >> > +   pfree(srels);
> >
> > Hm. This will probably cause another complexity issue. If we just
> > smgropen() the relation will be unowned. Which means smgrclose() will
> > call remove_from_unowned_list(), which is O(open-relations). Which means
> > this change afaict creates a new O(ndrels^2) behaviour?
> >
> > It seems like the easiest fix for that would be to have a local
> > SMgrRelationData in that loop, that we assign the relations to?  That's
> > a bit hacky, but afaict should work?
>
> The easier (but tricky) fix for that is to call smgrclose() for
> each SMgrRelation in the reverse order. That is, we should do
>
>    for (i = ndelrels - 1; i >= 0; i--)
>            smgrclose(srels[i]);
>
> instead of
>
>    for (i = 0; i < ndelrels; i++)
>            smgrclose(srels[i]);

We could do that - but add_to_unowned_list() is actually a bottleneck in
other places during recovery too. We pretty much never (outside of
dropping relations / databases) close opened relations during recovery -
which is obviously problematic since nearly all of the entries are
unowned.  I've come to the conclusion that we should have a global
variable that just disables adding anything to the global lists.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speedup of relation deletes during recovery

Thomas Munro-3
In reply to this post by Fujii Masao-2
On Tue, Jun 19, 2018 at 6:13 AM, Fujii Masao <[hidden email]> wrote:

> On Sat, Jun 16, 2018 at 2:54 AM, Teodor Sigaev <[hidden email]> wrote:
>>> We just had a customer hit this issue. I kind of wonder whether this
>>> shouldn't be backpatched: Currently the execution on the primary is
>>> O(NBuffers * log(ndrels)) whereas it's O(NBuffers * ndrels) on the
>>> standby - with a lot higher constants to boot.  That means it's very
>>> easy to get into situations where the standy starts to lag behind very
>>> significantly.
>>
>> +1, we faced with that too
>
> +1 to back-patch. As Horiguchi-san pointed out, this is basically
> the fix for oversight of commit 279628a0a7, not new feature.

+1

--
Thomas Munro
http://www.enterprisedb.com

12