Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Neil Chen
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi, I have tested the feature and it worked well.
One thing that doesn't matter is that the modify here seems unnecessary, right?

> mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
> {
> char   *path;
> - int ret;
> + int ret = 0;
> path = relpath(rnode, forkNum
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Pavel Borisov
One thing that doesn't matter is that the modify here seems unnecessary, right?

> mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
> {
> char     *path;
> -     int                     ret;
> +     int                     ret = 0;
> path = relpath(rnode, forkNum

I suppose it is indeed necessary as otherwise the result of the comparison is not defined in case of 'else' block in the mdunlinkfork() :  
346     else
347     {
348         /* Prevent other backends' fds from holding on to the disk space */
349         do_truncate(path);
.....
356      * Delete any additional segments.
357      */
358     if (ret >= 0) 
----------^^^^^^^

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Neil Chen
Yes, It's my fault. You're right.  

Pavel Borisov <[hidden email]> 于2020年11月19日周四 下午11:55写道:
One thing that doesn't matter is that the modify here seems unnecessary, right?

> mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
> {
> char     *path;
> -     int                     ret;
> +     int                     ret = 0;
> path = relpath(rnode, forkNum

I suppose it is indeed necessary as otherwise the result of the comparison is not defined in case of 'else' block in the mdunlinkfork() :  
346     else
347     {
348         /* Prevent other backends' fds from holding on to the disk space */
349         do_truncate(path);
.....
356      * Delete any additional segments.
357      */
358     if (ret >= 0) 
----------^^^^^^^

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com
 
So in the present logic, *ret* is always 0 if it is not in recovery mode (and other *if* conditions are not satisfied). But when the *if* condition is satisfied, it is possible to skip the deletion of additional segments. 
Considering that our goal is to always try to unlink additional segments, *ret* seems unnecessary here. The code flow looks like:

> if (isRedo || .....)
> {
>     int ret;  /* move to here */
>     ....
> }
> else
> { }
>
> /* Delete any additional segments. */
> if (true)
> ...

Or is there any reason to allow us to skip the attempt to delete additional segments in recovery mode? 
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

David Zhang
In reply to this post by Neil Chen
I verified the patch "v2-0001-Free-disk-space-for-dropped-relations-on-commit.patch" on master branch "0cc99327888840f2bf572303b68438e4caf62de9". It works for me. Below is my test procedure and results.

=== Before the patch ===
#1 from psql console 1, create table and index then insert enough data
postgres=# CREATE TABLE test_tbl ( a int, b text);
postgres=# CREATE INDEX idx_test_tbl on test_tbl (a);
postgres=# INSERT INTO test_tbl SELECT generate_series(1,80000000),'Hello world!';
postgres=# INSERT INTO test_tbl SELECT generate_series(1,80000000),'Hello world!';

#2 check files size
david:12867$ du -h
12G .

#3 from psql console 2, drop the index
postgres=# drop index idx_test_tbl;

#4 check files size in different ways,
david:12867$ du -h
7.8G .
david:12867$ ls -l
...
-rw------- 1 david david          0 Nov 23 20:07 16402
...

$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  25736                  david   45u      REG              259,2          0   12592758 /home/david/sandbox/postgres/pgdata/base/12867/16402 (deleted)
postgres  25736                  david   49u      REG              259,2 1073741824   12592798 /home/david/sandbox/postgres/pgdata/base/12867/16402.1 (deleted)
postgres  25736                  david   53u      REG              259,2 1073741824   12592739 /home/david/sandbox/postgres/pgdata/base/12867/16402.2 (deleted)
postgres  25736                  david   59u      REG              259,2  372604928   12592800 /home/david/sandbox/postgres/pgdata/base/12867/16402.3 (deleted)
...

The index relnode id "16402" displays size "0" from postgres database folder, but when using lsof to check, all 16402.x are still in used by a psql connection except 16402 is set to 0. Check it again after an hour, lsof shows the same results.

=== After the patch ===
Repeat step 1 ~ 4, lsof shows all the index relnode files (in this case, the index relnode id 16389) are removed within about 1 minute.
$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  32707                  david   66u      REG              259,2          0   12592763 /home/david/sandbox/postgres/pgdata/base/12867/16389.1 (deleted)
postgres  32707                  david   70u      REG              259,2          0   12592823 /home/david/sandbox/postgres/pgdata/base/12867/16389.2 (deleted)
postgres  32707                  david   74u      REG              259,2          0   12592805 /home/david/sandbox/postgres/pgdata/base/12867/16389.3 (deleted)
...

One thing interesting for me is that, if the index is created after data records has been inserted, then lsof doesn't show this issue.
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Pavel Borisov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Given we got two other reviews from Neil and David, I think I can finalize my own review and mark the patch as ready for committer if nobody has objections.
Thank you!

Pavel Borisov

The new status of this patch is: Ready for Committer
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Thomas Munro-5
On Wed, Nov 25, 2020 at 8:00 AM Pavel Borisov <[hidden email]> wrote:
> The new status of this patch is: Ready for Committer

Thanks!  One small thing bothered me about the last version of the
patch.  It tried to unlink when ENOENT had already been enountered by
open(2), so COMMIT of a DROP looks like:

openat(AT_FDCWD, "base/14208/16384", O_RDWR) = 54
ftruncate(54, 0)                        = 0
close(54)                               = 0
openat(AT_FDCWD, "base/14208/16384.1", O_RDWR) = -1 ENOENT
unlink("base/14208/16384.1")            = -1 ENOENT
openat(AT_FDCWD, "base/14208/16384_fsm", O_RDWR) = -1 ENOENT
unlink("base/14208/16384_fsm")          = -1 ENOENT
openat(AT_FDCWD, "base/14208/16384_vm", O_RDWR) = -1 ENOENT
unlink("base/14208/16384_vm")           = -1 ENOENT
openat(AT_FDCWD, "base/14208/16384_init", O_RDWR) = -1 ENOENT
unlink("base/14208/16384_init")         = -1 ENOENT

So I fixed that, by adding a return value to do_truncate() and
checking it.  That's the version I plan to commit tomorrow, unless
there are further comments or objections.  I've also attached a
version suitable for REL_11_STABLE and earlier branches (with a name
that cfbot should ignore), where things are slightly different.  In
those branches, the register_forget_request() logic is elsewhere.

While looking at trace output, I figured we should just use
truncate(2) on non-Windows, on the master branch only.  It's not like
it really makes much difference, but I don't see why we shouldn't
allow ourselves to use ancient standardised Unix syscalls when we can.
That'd get us down to just the following when committing a DROP:

truncate("base/14208/16396", 0)         = 0
truncate("base/14208/16396.1", 0)       = -1 ENOENT
truncate("base/14208/16396_fsm", 0)     = -1 ENOENT
truncate("base/14208/16396_vm", 0)      = -1 ENOENT
truncate("base/14208/16396_init", 0)    = -1 ENOENT

v3-0001-Free-disk-space-for-dropped-relations-on-commit.patch (6K) Download Attachment
v3-0001-Free-disk-space-for-dropped-relations-on-commit.patch-REL_11_STABLE (5K) Download Attachment
v3-0002-Use-truncate-2-instead-of-open-ftruncate-close.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Thomas Munro-5
On Mon, Nov 30, 2020 at 6:59 PM Thomas Munro <[hidden email]> wrote:
> On Wed, Nov 25, 2020 at 8:00 AM Pavel Borisov <[hidden email]> wrote:
> > The new status of this patch is: Ready for Committer

> ...  That's the version I plan to commit tomorrow, unless
> there are further comments or objections.  ...

Done, and back-patched.

I thought a bit more about the fact that we fail to unlink
higher-numbered segments in certain error cases, potentially leaving
stray files behind.  As far as I can see, nothing we do in this
code-path is going to be a bullet-proof solution to that problem.  One
simple idea would be for the checkpointer to refuse to unlink segment
0 (thereby allowing the relfilenode to be recycled) until it has
scanned the parent directory for any related files that shouldn't be
there.

> While looking at trace output, I figured we should just use
> truncate(2) on non-Windows, on the master branch only.  It's not like
> it really makes much difference, but I don't see why we shouldn't
> allow ourselves to use ancient standardised Unix syscalls when we can.

Also pushed, but only to master.


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Michael Paquier-2
In reply to this post by Thomas Munro-5
On Mon, Nov 30, 2020 at 06:59:40PM +1300, Thomas Munro wrote:
> So I fixed that, by adding a return value to do_truncate() and
> checking it.  That's the version I plan to commit tomorrow, unless
> there are further comments or objections.  I've also attached a
> version suitable for REL_11_STABLE and earlier branches (with a name
> that cfbot should ignore), where things are slightly different.  In
> those branches, the register_forget_request() logic is elsewhere.

Hmm.  Sorry for arriving late at the party.  But is that really
something suitable for a backpatch?  Sure, it is not optimal to not
truncate all the segments when a transaction dropping a relation
commits, but this was not completely broken either.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Thomas Munro-5
On Tue, Dec 1, 2020 at 3:55 PM Michael Paquier <[hidden email]> wrote:

> On Mon, Nov 30, 2020 at 06:59:40PM +1300, Thomas Munro wrote:
> > So I fixed that, by adding a return value to do_truncate() and
> > checking it.  That's the version I plan to commit tomorrow, unless
> > there are further comments or objections.  I've also attached a
> > version suitable for REL_11_STABLE and earlier branches (with a name
> > that cfbot should ignore), where things are slightly different.  In
> > those branches, the register_forget_request() logic is elsewhere.
>
> Hmm.  Sorry for arriving late at the party.  But is that really
> something suitable for a backpatch?  Sure, it is not optimal to not
> truncate all the segments when a transaction dropping a relation
> commits, but this was not completely broken either.

I felt on balance it was a "bug", since it causes operational
difficulties for people and was clearly not our intended behaviour,
and I announced this intention 6 weeks ago.  Of course I'll be happy
to revert it from the back-branches if that's the consensus.  Any
other opinions?


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

Michael Paquier-2
On Tue, Dec 01, 2020 at 04:06:48PM +1300, Thomas Munro wrote:
> I felt on balance it was a "bug", since it causes operational
> difficulties for people and was clearly not our intended behaviour,
> and I announced this intention 6 weeks ago.

Oops, sorry for missing this discussion for such a long time :/

> Of course I'll be happy to revert it from the back-branches if
> that's the consensus.  Any > other opinions?

If there are no other opinions, I am also fine to rely on your
judgment.
--
Michael

signature.asc (849 bytes) Download Attachment