Unused files in the database directory after crashed VACUUM FULL

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Unused files in the database directory after crashed VACUUM FULL

Hannes Erven
Hi,


I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
Clearly my fault, I was expecting the new table to be small enough.

After freeing up space, restarting the cluster and issuing another
VACCUM FULL, I noticed that the cluster was way bigger that it should be.
In the base/<db>/ folder, there was a large number of files with one
certain number that pg_filenode_relation() could not turn into a
relation. As that number was just a bit smaller that the
pg_relation_filepath() of the table I was working on, I guess these were
the remains of the failed VACUUM FULL operation?
I removed those files and a VACCUM ANALYZE of the whole database went fine.

So... is this the expected behaviour? Is there a safe procedure how to
check/clean up "unnecessary" files in the cluster directory?


Version:
('PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit')

Table definition:
(
     id serial PRIMARY KEY,
     data bytea,
     parent_id integer NOT NULL
)
The "data" column values are large enough to be TOASTed.


Thanks & best regards,

        -hannes


Reply | Threaded
Open this post in threaded view
|

Re: Unused files in the database directory after crashed VACUUM FULL

Tom Lane-2
Hannes Erven <[hidden email]> writes:
> I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
> Clearly my fault, I was expecting the new table to be small enough.

What do you mean by "crash" exactly?  A normal transactional failure
should've cleaned up orphaned files.  I suppose if the kernel decided to
kill -9 the vacuum process, that wouldn't happen --- but that's not
the usual response to out-of-disk-space.

> After freeing up space, restarting the cluster and issuing another
> VACCUM FULL, I noticed that the cluster was way bigger that it should be.
> In the base/<db>/ folder, there was a large number of files with one
> certain number that pg_filenode_relation() could not turn into a
> relation. As that number was just a bit smaller that the
> pg_relation_filepath() of the table I was working on, I guess these were
> the remains of the failed VACUUM FULL operation?
> I removed those files and a VACCUM ANALYZE of the whole database went fine.

> So... is this the expected behaviour? Is there a safe procedure how to
> check/clean up "unnecessary" files in the cluster directory?

You just described it --- verify with pg_filenode_relation that the
file doesn't correspond to any pg_class entry, then manually delete.

(There is a crash-recovery phase that does something similar, but
I think it's only designed to clean up temporary files.)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Unused files in the database directory after crashed VACUUM FULL

Andrew Gierth
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 > Hannes Erven <[hidden email]> writes:
 >> I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
 >> Clearly my fault, I was expecting the new table to be small enough.

 Tom> What do you mean by "crash" exactly? A normal transactional
 Tom> failure should've cleaned up orphaned files. I suppose if the
 Tom> kernel decided to kill -9 the vacuum process, that wouldn't happen
 Tom> --- but that's not the usual response to out-of-disk-space.

The failure mode of this kind we see reported most often is getting a
PANIC failure from inability to extend the WAL.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Unused files in the database directory after crashed VACUUM FULL

Hannes Erven
In reply to this post by Tom Lane-2
Hi again,


Am 10.02.19 um 16:41 schrieb Tom Lane:
> Hannes Erven <[hidden email]> writes:
>> I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
>> Clearly my fault, I was expecting the new table to be small enough.
>
> What do you mean by "crash" exactly?  A normal transactional failure
> should've cleaned up orphaned files.  I suppose if the kernel decided to
> kill -9 the vacuum process, that wouldn't happen --- but that's not
> the usual response to out-of-disk-space.


Here's the exact log (the pgadmin process was running the VACCUM FULL):

2019-02-09 23:44:53.516 CET [20341] @/ <> PANIC:  could not write to
file "pg_wal/xlogtemp.20341": No space left on device
2019-02-09 23:44:53.516 CET [19181] @/ <> LOG:  WAL writer process (PID
20341) was terminated by signal 6: Aborted
2019-02-09 23:44:53.516 CET [19181] @/ <> LOG:  terminating any other
active server processes
2019-02-09 23:44:53.516 CET [20816] postgres@IP/db <pgAdmin 4 - DB:db>
WARNING:  terminating connection because of crash of another serve
r process
2019-02-09 23:44:53.516 CET [20816] postgres@IP/db <pgAdmin 4 - DB:db>
DETAIL:  The postmaster has commanded this server process to roll
back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
2019-02-09 23:44:53.516 CET [20816] postgres@IP/db <pgAdmin 4 - DB:db>
HINT:  In a moment you should be able to reconnect to the database
  and repeat your command.
2019-02-09 23:44:53.597 CET [19181] @/ <> LOG:  all server processes
terminated; reinitializing
2019-02-09 23:44:53.679 CET [11686] @/ <> LOG:  database system was
interrupted; last known up at 2019-02-09 23:44:40 CET
2019-02-09 23:44:58.761 CET [11686] @/ <> LOG:  database system was not
properly shut down; automatic recovery in progress
2019-02-09 23:44:58.807 CET [11686] @/ <> LOG:  redo starts at 1DA/B3400798
2019-02-09 23:44:58.811 CET [11686] @/ <> FATAL:  could not extend file
"base/16400/612716_fsm": No space left on device
2019-02-09 23:44:58.811 CET [11686] @/ <> HINT:  Check free disk space.
2019-02-09 23:44:58.811 CET [11686] @/ <> CONTEXT:  WAL redo at
1DA/B3406310 for Heap/INSERT: off 4


After freeing space and restarting:

2019-02-10 06:28:25.622 CET [14344] @/ <> LOG:  database system was
interrupted while in recovery at 2019-02-09 23:44:58 CET
2019-02-10 06:28:25.622 CET [14344] @/ <> HINT:  This probably means
that some data is corrupted and you will have to use the last backup for
recovery.
2019-02-10 06:28:26.762 CET [14344] @/ <> LOG:  database system was not
properly shut down; automatic recovery in progress
2019-02-10 06:28:26.831 CET [14344] @/ <> LOG:  redo starts at 1DA/B3400798
2019-02-10 06:28:27.624 CET [14344] @/ <> LOG:  redo done at 1DA/D5FFF9D8


>> Is there a safe procedure how to
>> check/clean up "unnecessary" files in the cluster directory?
>
> You just described it --- verify with pg_filenode_relation that the
> file doesn't correspond to any pg_class entry, then manually delete.


OK, I see. I came up with this:

with d0 as (
     select oid from pg_database where datname=current_database()
), d1 as (
select pg_ls_dir as fn,
   regexp_match(pg_ls_dir, '^([0-9]+)(.*)$') as match
from d0, pg_ls_dir('base/'||d0.oid)
order by 1
),
d2 as (
select fn, match[1] as base, match[2] as ext
from d1
where (fn NOT ILIKE 'pg_%')
),
d3 as (
select d.*,  pg_filenode_relation(0, base::oid) as relation
from d2 d
)

select fn, pg_size_pretty((pg_stat_file('base/'||d0.oid||'/'||fn)).size)
from d0, d3
where relation is null;



      fn     | pg_size_pretty
------------+----------------
  612713     | 25 MB
  612718     | 1024 MB
  612718.1   | 1024 MB
  612718.2   | 180 MB
  t11_621351 | 96 kB
  t11_621354 | 0 bytes
  t11_621356 | 8192 bytes
  t11_621357 | 160 kB
  t11_621360 | 0 bytes
  t11_621362 | 8192 bytes


So there's still 612713 and 612718* left to clean up; judging from the
numbers and timestamps, they probably belong to other VACUUM FULLs I
manually canceled in the pgadmin UI during my experiments.

Is this interesting enough to further investigate? If yes, I'll be happy
to work on a test case.
In any case, I've now learned (how) to check for orphaned files if a
database cluster's on-disk size is way off from the sums of
pg_total_relation_size() + WAL...


Thanks again & best regards,

        -hannes



Reply | Threaded
Open this post in threaded view
|

Re: Unused files in the database directory after crashed VACUUM FULL

Tom Lane-2
Hannes Erven <[hidden email]> writes:
> Am 10.02.19 um 16:41 schrieb Tom Lane:
>> What do you mean by "crash" exactly?

> Here's the exact log (the pgadmin process was running the VACCUM FULL):
> 2019-02-09 23:44:53.516 CET [20341] @/ <> PANIC:  could not write to
> file "pg_wal/xlogtemp.20341": No space left on device

Ah, so Andrew was correct: we panicked due to lack of WAL space, and
that explains why the vacuuming process didn't have an opportunity
to delete the files belonging to the uncommitted new relation.

If you're concerned about this sort of thing, you might consider putting
the WAL directory on a separate filesystem from the main database files.

>>> Is there a safe procedure how to
>>> check/clean up "unnecessary" files in the cluster directory?

>> You just described it --- verify with pg_filenode_relation that the
>> file doesn't correspond to any pg_class entry, then manually delete.

> OK, I see. I came up with this:

I think this isn't accounting for files in non-default
tablespaces, but maybe that's not a problem for your usage.

> Is this interesting enough to further investigate?

It's a pretty well-understood dynamic, I believe.  Perhaps we should
try harder to recover cleanly, but I don't know of anyone putting
effort into the case.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Unused files in the database directory after crashed VACUUM FULL

Thomas Munro-3
On Mon, Feb 11, 2019 at 10:21 AM Tom Lane <[hidden email]> wrote:
> Ah, so Andrew was correct: we panicked due to lack of WAL space, and
> that explains why the vacuuming process didn't have an opportunity
> to delete the files belonging to the uncommitted new relation.

> It's a pretty well-understood dynamic, I believe.  Perhaps we should
> try harder to recover cleanly, but I don't know of anyone putting
> effort into the case.

FTR I am working on a PG13 patch that records relfilenodes of
uncommitted transactions in undo logs, so it can unlink them reliably,
even if you crash (at the cost of introducing a WAL flush before
creating files).  I haven't specifically studied the VACUUM FULL case
yet, but in principle this is exactly what my project aims to fix.
It's mostly intended as example code to demonstrate the undo log
machinery (defining undo record types, registering undo log action
functions that are invoked during rollback, rollback of aborted but
not yet rolled back transaction at startup, ...) without having to
understand the whole zheap sandwich at once, but it's also a solution
to an age old problem.  More on that soon.

--
Thomas Munro
http://www.enterprisedb.com