PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

reiner peterke-2
Hi All,

We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8

No error on x86_64 (RH, Centos and  SUSE)

from the log file
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv6 address "::", port 5432
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-04-09 12:30:10 UTC   pid:204 xid:0 ip: LOG:  database system was shut down at 2019-04-09 12:27:09 UTC
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  database system is ready to accept connections
2019-04-09 12:31:46 UTC   pid:203 xid:0 ip: LOG:  received SIGHUP, reloading configuration files
2019-04-09 12:35:10 UTC   pid:205 xid:0 ip: PANIC:  could not flush dirty data: Operation not permitted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  checkpointer process (PID 205) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  terminating any other active server processes
2019-04-09 12:35:10 UTC   pid:208 xid:0 ip: WARNING:  terminating connection because of crash of another server process
2019-04-09 12:35:10 UTC   pid:208 xid:0 ip: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-04-09 12:35:10 UTC   pid:208 xid:0 ip: HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  all server processes terminated; reinitializing
2019-04-09 12:35:10 UTC   pid:224 xid:0 ip: LOG:  database system was interrupted; last known up at 2019-04-09 12:30:10 UTC
2019-04-09 12:35:10 UTC   pid:224 xid:0 ip: PANIC:  could not flush dirty data: Operation not permitted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  startup process (PID 224) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  aborting startup due to startup process failure
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  database system is shut down

from pg_config

pg_config output

BINDIR = /usr/local/postgres/11/bin
DOCDIR = /usr/local/postgres/11/share/doc
HTMLDIR = /usr/local/postgres/11/share/doc
INCLUDEDIR = /usr/local/postgres/11/include
PKGINCLUDEDIR = /usr/local/postgres/11/include
INCLUDEDIR-SERVER = /usr/local/postgres/11/include/server
LIBDIR = /usr/local/postgres/11/lib
PKGLIBDIR = /usr/local/postgres/11/lib
LOCALEDIR = /usr/local/postgres/11/share/locale
MANDIR = /usr/local/postgres/11/share/man
SHAREDIR = /usr/local/postgres/11/share
SYSCONFDIR = /usr/local/postgres/etc
PGXS = /usr/local/postgres/11/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-tclconfig=/usr/lib64' '--with-perl' '--with-python' '--with-tcl' '--with-openssl' '--with-pam' '--with-gssapi' '--enable-nls' '--with-libxml' '--with-libxslt' '--with-ldap' '--prefix=/usr/local/postgres/11' 'CFLAGS=-O3 -g -pipe -Wall -D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m64 -mcpu=power8 -mtune=power8 -DLINUX_OOM_SCORE_ADJ=0' '--with-libs=/usr/lib' '--with-includes=/usr/include' '--with-uuid=e2fs' '--sysconfdir=/usr/local/postgres/etc' '--with-llvm' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -g -pipe -Wall -D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m64 -mcpu=power8 -mtune=power8 -DLINUX_OOM_SCORE_ADJ=0
CFLAGS_SL = -fPIC
LDFLAGS = -L/usr/local/lib -L/usr/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/postgres/11/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 11.2

I get the feeling this is related to the fsync() issue.
why is it happening on Power RH and CentOS, but not on the other platforms?

Let me know if i need to provide any more information.

Reiner


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

Andres Freund
Hi,

On 2019-04-12 20:04:00 +0200, reiner peterke wrote:
> We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
> power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8

> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv4 address "0.0.0.0", port 5432
> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv6 address "::", port 5432
> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2019-04-09 12:30:10 UTC   pid:204 xid:0 ip: LOG:  database system was shut down at 2019-04-09 12:27:09 UTC
> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  database system is ready to accept connections
> 2019-04-09 12:31:46 UTC   pid:203 xid:0 ip: LOG:  received SIGHUP, reloading configuration files
> 2019-04-09 12:35:10 UTC   pid:205 xid:0 ip: PANIC:  could not flush dirty data: Operation not permitted
> 2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  checkpointer process (PID 205) was terminated by signal 6: Aborted

Any chance you can strace this? Because I don't understand how you'd get
a permission error here.


> I get the feeling this is related to the fsync() issue.
> why is it happening on Power RH and CentOS, but not on the other platforms?

Yea, the PANIC is due to various OSs, including linux, basically feeling
free to discard any diryt data after any integrity related calls fail
(we could narrow it down, but it's hard, given the variability between
versions). That is, if they signal such issues at all :(

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-04-12 20:04:00 +0200, reiner peterke wrote:
>> We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
>> power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8

> Any chance you can strace this? Because I don't understand how you'd get
> a permission error here.

What kind of filesystem are the database files on?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

Thomas Munro-5
In reply to this post by Andres Freund
On Sat, Apr 13, 2019 at 7:23 AM Andres Freund <[hidden email]> wrote:
> On 2019-04-12 20:04:00 +0200, reiner peterke wrote:
> > We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
> > power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8

Huh,  I wonder what is different.  I don't see this on EDB's CentOS
7.1 POWER8 system with an XFS filesystem.  I ran it under strace -f
and saw this:

[pid 51614] sync_file_range2(0x19, 0x2, 0x8000, 0x2000, 0x2, 0x8) = 0

> > 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv4 address "0.0.0.0", port 5432
> > 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv6 address "::", port 5432
> > 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
> > 2019-04-09 12:30:10 UTC   pid:204 xid:0 ip: LOG:  database system was shut down at 2019-04-09 12:27:09 UTC
> > 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  database system is ready to accept connections
> > 2019-04-09 12:31:46 UTC   pid:203 xid:0 ip: LOG:  received SIGHUP, reloading configuration files
> > 2019-04-09 12:35:10 UTC   pid:205 xid:0 ip: PANIC:  could not flush dirty data: Operation not permitted
> > 2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  checkpointer process (PID 205) was terminated by signal 6: Aborted
>
> Any chance you can strace this? Because I don't understand how you'd get
> a permission error here.

Me neither.  I hacked my tree so that it would use the msync() version
instead of the sync_file_range() version but that worked too.

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

reiner peterke-2


sent by smoke signals at great danger to my self.

> On 12 Apr 2019, at 23:16, Thomas Munro <[hidden email]> wrote:
>
>> On Sat, Apr 13, 2019 at 7:23 AM Andres Freund <[hidden email]> wrote:
>>> On 2019-04-12 20:04:00 +0200, reiner peterke wrote:
>>> We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
>>> power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8
>
> Huh,  I wonder what is different.  I don't see this on EDB's CentOS
> 7.1 POWER8 system with an XFS filesystem.  I ran it under strace -f
> and saw this:
>
> [pid 51614] sync_file_range2(0x19, 0x2, 0x8000, 0x2000, 0x2, 0x8) = 0
>
>>> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv4 address "0.0.0.0", port 5432
>>> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv6 address "::", port 5432
>>> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
>>> 2019-04-09 12:30:10 UTC   pid:204 xid:0 ip: LOG:  database system was shut down at 2019-04-09 12:27:09 UTC
>>> 2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  database system is ready to accept connections
>>> 2019-04-09 12:31:46 UTC   pid:203 xid:0 ip: LOG:  received SIGHUP, reloading configuration files
>>> 2019-04-09 12:35:10 UTC   pid:205 xid:0 ip: PANIC:  could not flush dirty data: Operation not permitted
>>> 2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  checkpointer process (PID 205) was terminated by signal 6: Aborted
>>
>> Any chance you can strace this? Because I don't understand how you'd get
>> a permission error here.
>
> Me neither.  I hacked my tree so that it would use the msync() version
> instead of the sync_file_range() version but that worked too.
>
> --
> Thomas Munro
> https://enterprisedb.com

I forgot to mention that this is happening in a docker container.
I want to test it on a VM to see if it is container related. I am sick at the moment so i’m unable to do the test at the moment.

Reiner

Reply | Threaded
Open this post in threaded view
|

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

Justin Pryzby
In reply to this post by reiner peterke-2
On Fri, Apr 12, 2019 at 08:04:00PM +0200, reiner peterke wrote:
> We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
> power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8
>
> No error on x86_64 (RH, Centos and  SUSE)

So there's an error on power8 with RH but not SUSE.

What kernel versions are used for each of the successful and not successful ?

Justin


Reply | Threaded
Open this post in threaded view
|

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

Thomas Munro-5
In reply to this post by reiner peterke-2
On Mon, Apr 15, 2019 at 7:57 PM <[hidden email]> wrote:
> I forgot to mention that this is happening in a docker container.

Huh, so there may be some configuration of Linux container that can
fail here with EPERM, even though that error that does not appear in
the man page, and doesn't make much intuitive sense.  Would be good to
figure out how that happens.

If we could somehow confirm* that sync_file_range() with the
non-waiting flags we are using is non-destructive of error state, as
Andres speculated (that is, it cannot eat the only error report we're
ever going to get to tell us that buffered dirty data may have been
dropped), then I suppose we could just remove the data_sync_elevel()
promotion here.  As with the WSL case (before the PANIC commit and the
subsequent don't-repeat-the-warning-forever patch), a user of this
posited EPERM-generating container configuration would then get
repeated warnings in the log forever (as they presumably did before).
Repeated WARNING messages are probably OK here, I think... I mean, if,
say, someone complains that FlubOS's Linux emulation fails here with
EIEIO, I'd say they should put up with the warnings and complain over
on the flub-hackers list, or whatever, and I'd say the same for
containers that generate EPERM: either the man page or the containter
technology needs work.

But... I still think we should try to avoid making decisions based on
knowledge of kernel implementation details, if it can be avoided.  I'd
probably rather treat EPERM explicitly differently (and eventually
EIEIO too, if a report comes in) than drop the current paranoid coding
completely.

*I'm not looking at it myself.  A sync_file_range() implementation is
on my list of potential FreeBSD projects for a rainy day, so I don't
want to study anything but the man page, even if it's wrong.

--
Thomas Munro
https://enterprisedb.com