BUG #16832: Interrupted system call when working with large data tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16832: Interrupted system call when working with large data tables

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16832
Logged by:          Kimon Krenz
Email address:      [hidden email]
PostgreSQL version: 13.1
Operating system:   macOS Big Sur v. 11.0.1
Description:        

I've recently updated macOS to Big Sur, and simultaneously to PostgreSQL
13.1.
Since these updates PostgreSQL keeps throwing 'Interrupted system call'
errors in three instances, which produce similar 'could not open file
pg_wal/" errors (see below 1.,2. and 3.).
The problem might be linked to BUG #16827: macOS interrupted syscall leads
to a crash.

Unfortunately, the 'Interrupted system call' error occurs frequently, but
inconsistently when using the same table. The only common demoninator is
that the error more frequent when large tables i.e. +4GB are employed. I
have unsuccesfully spend sometime on a reproducable case, this is mainly
because the same table does not always lead to an error.

1. When VACUUM FULL on entire database or single large tables:

terminal command:

udl=# VACUUM (FULL,VERBOSE);
INFO:  vacuuming "itn_2007.roadlink_ms"
INFO:  "roadlink_ms": found 0 removable, 4005223 nonremovable row versions
in 279046 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 6.03 s, system: 11.61 s, elapsed: 36.56 s.
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

postgres.log:
2021-01-21 21:14:45.470 GMT [58691] PANIC:  XX000: could not open file
"pg_wal/00000001000000AC00000045": Interrupted system call
2021-01-21 21:14:45.470 GMT [58691] LOCATION:  XLogFileInit, xlog.c:3277
2021-01-21 21:14:45.471 GMT [31083] LOG:  00000: WAL writer process (PID
58691) was terminated by signal 6: Abort trap: 6
2021-01-21 21:14:45.471 GMT [31083] LOCATION:  LogChildExit,
postmaster.c:3753
2021-01-21 21:14:45.471 GMT [31083] LOG:  00000: terminating any other
active server processes
2021-01-21 21:14:45.471 GMT [31083] LOCATION:  HandleChildCrash,
postmaster.c:3474
2021-01-21 21:14:45.471 GMT [58993] WARNING:  57P02: terminating connection
because of crash of another server process
2021-01-21 21:14:45.471 GMT [58993] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2021-01-21 21:14:45.471 GMT [58993] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2021-01-21 21:14:45.471 GMT [58993] LOCATION:  quickdie, postgres.c:2802
2021-01-21 21:14:45.471 GMT [59697] WARNING:  57P02: terminating connection
because of crash of another server process
2021-01-21 21:14:45.471 GMT [59697] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

OR

2021-01-21 21:51:51.093 GMT [60106] PANIC:  XX000: could not open file
"pg_wal/00000001000000B700000079": Interrupted system call
2021-01-21 21:51:51.093 GMT [60106] LOCATION:  XLogFileInit, xlog.c:3277
2021-01-21 21:51:51.094 GMT [31083] LOG:  00000: WAL writer process (PID
60106) was terminated by signal 6: Abort trap: 6
2021-01-21 21:51:51.094 GMT [31083] LOCATION:  LogChildExit,
postmaster.c:3753
2021-01-21 21:51:51.094 GMT [31083] LOG:  00000: terminating any other
active server processes
2021-01-21 21:51:51.094 GMT [31083] LOCATION:  HandleChildCrash,
postmaster.c:3474
2021-01-21 21:51:51.095 GMT [60228] WARNING:  57P02: terminating connection
because of crash of another server process
2021-01-21 21:51:51.095 GMT [60228] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2021-01-21 21:51:51.095 GMT [60228] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2021-01-21 21:51:51.095 GMT [60228] LOCATION:  quickdie, postgres.c:2802

2. When performing a simple inner join (table1 4.5GB, table2 900MB):

udl=# drop table if exists itn_2019.roadlink_msn_2;
        create table itn_2019.roadlink_msn_2 as
        select *
        from
        itn_2019.roadlink_ms left join itn_2019.road_names
        ON fid = fid2;


2021-01-21 21:01:51.545 GMT [58711] ERROR:  XX000: could not open temporary
file "base/pgsql_tmp/pgsql_tmp58711.0.sharedfileset/i2of128.p0.0":
Interrupted system call
2021-01-21 21:01:51.545 GMT [58711] LOCATION:  PathNameOpenTemporaryFile,
fd.c:1764
2021-01-21 21:01:51.545 GMT [58711] STATEMENT:  drop table if exists
itn_2019.roadlink_msn_2;
        create table itn_2019.roadlink_msn_2 as
        select *
        from
        itn_2019.roadlink_ms left join itn_2019.road_names
        ON fid = fid2;
2021-01-21 21:01:51.546 GMT [59276] ERROR:  XX000: could not open temporary
file "base/pgsql_tmp/pgsql_tmp58711.0.sharedfileset/i1of128.p2.0":
Interrupted system call
2021-01-21 21:01:51.546 GMT [59276] LOCATION:  PathNameOpenTemporaryFile,
fd.c:1764
2021-01-21 21:01:51.546 GMT [59276] STATEMENT:  drop table if exists
itn_2019.roadlink_msn_2;
        create table itn_2019.roadlink_msn_2 as
        select *
        from
        itn_2019.roadlink_ms left join itn_2019.road_names
        ON fid = fid2;
2021-01-21 21:01:51.548 GMT [31083] LOG:  00000: background worker "parallel
worker" (PID 59276) exited with exit code 1
2021-01-21 21:01:51.548 GMT [31083] LOCATION:  LogChildExit,
postmaster.c:3731

3. When restoring entire database (150GB) using pg_restore. I ended up
restoring every table manually from the dump tar file, as the same table
sometimes restored without a problem and sometimes threw first an
'Interrupted system call' error, but the second or third time was restored
without problems.

Best,
Kimon

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16832: Interrupted system call when working with large data tables

Andres Freund
Hi,

On 2021-01-21 22:16:55 +0000, PG Bug reporting form wrote:
> I've recently updated macOS to Big Sur, and simultaneously to PostgreSQL
> 13.1.
> Since these updates PostgreSQL keeps throwing 'Interrupted system call'
> errors in three instances, which produce similar 'could not open file
> pg_wal/" errors (see below 1.,2. and 3.).
> The problem might be linked to BUG #16827: macOS interrupted syscall leads
> to a crash.

There is additional information in another bug report at
https://postgr.es/m/16827-7606aeb21d38c228%40postgresql.org

I don't really know what to do here short term - adding EINTR handling
to syscalls that traditionally never had returned EINTR (which used to
only happen for "blocking" system calls) will be a fair amount of work.

I'll also respond in the other thread, CCing you, as there's more
information there.

Greetings,

Andres Freund