Issue with Postgres process startup after instance restart

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with Postgres process startup after instance restart

Shishir Joshi
Hello,
I recently faced an issue with PG 11 where the VM that the PG process was running on got restarted because of a hardware issue. After the VM restart, the Postgres process failed to start on the 1st attempt with the error "LOG:  could not open directory "pg_tblspc/16388/PG_11_201809051": No such file or directory" even though that directory was present. But on the 2nd attempt it started up without issues. There didn't seem to be any disk corruption issues and there were no other errors in the syslog either. Has anyone else faced such an issue or has any ideas on why this could have occurred? 

Reply | Threaded
Open this post in threaded view
|

Re: Issue with Postgres process startup after instance restart

Tom Lane-2
Shishir Joshi <[hidden email]> writes:
> I recently faced an issue with PG 11 where the VM that the PG process was
> running on got restarted because of a hardware issue. After the VM restart,
> the Postgres process failed to start on the 1st attempt with the error "*LOG:
>  could not open directory "pg_tblspc/16388/PG_11_201809051": No such file
> or directory*" even though that directory was present. But on the 2nd
> attempt it started up without issues. There didn't seem to be any disk
> corruption issues and there were no other errors in the syslog either. Has
> anyone else faced such an issue or has any ideas on why this could have
> occurred?

Maybe whatever the tablespace is pointing at wasn't mounted yet?
Slow remote mounts are the bane of PG DBAs --- I can recall at least
one famous incident in which someone's database became totally
corrupt because the NFS mount it was on came up after server start,
leading to the server having a mishmash of files on the NFS server
and files on the local disk, now hidden underneath the mount point.

If this is what your issue was, you got very lucky to escape without
damage.  Suggest adapting your PG server start script to make sure the
mounted file system is present before you allow the server to start.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Issue with Postgres process startup after instance restart

Shishir Joshi
Hi Tom,
I forgot to mention, but in this case it looks the mount was completed before the PG process was started up. But we don't have an explicit check for making sure the file system is present in the start script. Thanks for the tip.

On Fri, 27 Mar 2020 at 19:30, Tom Lane <[hidden email]> wrote:
Shishir Joshi <[hidden email]> writes:
> I recently faced an issue with PG 11 where the VM that the PG process was
> running on got restarted because of a hardware issue. After the VM restart,
> the Postgres process failed to start on the 1st attempt with the error "*LOG:
>  could not open directory "pg_tblspc/16388/PG_11_201809051": No such file
> or directory*" even though that directory was present. But on the 2nd
> attempt it started up without issues. There didn't seem to be any disk
> corruption issues and there were no other errors in the syslog either. Has
> anyone else faced such an issue or has any ideas on why this could have
> occurred?

Maybe whatever the tablespace is pointing at wasn't mounted yet?
Slow remote mounts are the bane of PG DBAs --- I can recall at least
one famous incident in which someone's database became totally
corrupt because the NFS mount it was on came up after server start,
leading to the server having a mishmash of files on the NFS server
and files on the local disk, now hidden underneath the mount point.

If this is what your issue was, you got very lucky to escape without
damage.  Suggest adapting your PG server start script to make sure the
mounted file system is present before you allow the server to start.

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: Issue with Postgres process startup after instance restart

Laurenz Albe
On Mon, 2020-03-30 at 11:02 +0530, Shishir Joshi wrote:

> On Fri, 27 Mar 2020 at 19:30, Tom Lane <[hidden email]> wrote:
> > Shishir Joshi <[hidden email]> writes:
> > > I recently faced an issue with PG 11 where the VM that the PG process was
> > > running on got restarted because of a hardware issue. After the VM restart,
> > > the Postgres process failed to start on the 1st attempt with the error "*LOG:
> > >  could not open directory "pg_tblspc/16388/PG_11_201809051": No such file
> > > or directory*" even though that directory was present. But on the 2nd
> > > attempt it started up without issues. There didn't seem to be any disk
> > > corruption issues and there were no other errors in the syslog either. Has
> > > anyone else faced such an issue or has any ideas on why this could have
> > > occurred?
> >
> > Maybe whatever the tablespace is pointing at wasn't mounted yet?
> > Slow remote mounts are the bane of PG DBAs --- I can recall at least
> > one famous incident in which someone's database became totally
> > corrupt because the NFS mount it was on came up after server start,
> > leading to the server having a mishmash of files on the NFS server
> > and files on the local disk, now hidden underneath the mount point.
> >
> > If this is what your issue was, you got very lucky to escape without
> > damage.  Suggest adapting your PG server start script to make sure the
> > mounted file system is present before you allow the server to start.
>
> I forgot to mention, but in this case it looks the mount was completed before
> the PG process was started up. But we don't have an explicit check for making
> sure the file system is present in the start script. Thanks for the tip.

If that is an NFS mount, make sure it is "fg", not "bg".

Also, check that your startup script simply fails if the file system is not
mounted yet, rather than automatically running "initdb".

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com