logical replication: could not create file "state.tmp": File exists

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

logical replication: could not create file "state.tmp": File exists

Grigory Smolkin
Hello!

One of my colleagues encountered an out of space condition, which broke
his logical replication setup.
It`s manifested with the following errors:

ERROR:  could not receive data from WAL stream: ERROR:  could not create
file "pg_replslot/some_sub/state.tmp": File exists

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT
| O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


replication_bug.sh (1K) Download Attachment
0001-logical_replication_fix.patch (556 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Michael Paquier-2
On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
> I`ve digged a bit into this problem, and it`s turned out that in
> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
> O_EXCL' flags, which makes this routine as not very reentrant.

What did you see as I/O problem before facing the actual error
reported here?  Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd?  The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.

> Since an exclusive lock is taken before temp file creation, I think it
> should be safe to replace O_EXCL with O_TRUNC.
> Script to reproduce and patch are attached.

Agreed.  I prefer the O_TRUNC option because that's less code churn.  
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.

Have others thoughts or objections to share?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Grigory Smolkin

On 12/2/19 7:35 AM, Michael Paquier wrote:

> On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
> What did you see as I/O problem before facing the actual error
> reported here?  Was it just ENOSPC, a fsync failure, or just a failure
> in closing the fd?  The first pattern is mostly what I guess happened,
> still a fsync failure would not trigger a PANIC here (actually we
> really should do that!), but I am raising a different thread about
> that issue.

Hello!

I didn`t see the very first error that left behind the temp file.
I`ve requested it just now, but it will take some time to get it (there
are several terabytes of text log).
But I assume that it was out of space error, which, by the look of the
code, should produce ERROR and leave temp file behind, just as it
happened in aforementioned  case.


>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
>> Script to reproduce and patch are attached.
> Agreed.  I prefer the O_TRUNC option because that's less code churn.
> Also, as it can still be useful to have a look at the temporary state
> file after a crash or a failure, doing unlink() in the error code
> paths is no good option IMO.

I`m sorry, but it was an production system, so, as I understand it,
stale temp file was hastily deleted without long considerations.

Thank you for your interest in this topic.

>
> Have others thoughts or objections to share?
> --
> Michael

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Andres Freund
In reply to this post by Grigory Smolkin
Hi,

On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
> One of my colleagues encountered an out of space condition, which broke his
> logical replication setup.
> It`s manifested with the following errors:
>
> ERROR:  could not receive data from WAL stream: ERROR:  could not create
> file "pg_replslot/some_sub/state.tmp": File exists

Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?


> I`ve digged a bit into this problem, and it`s turned out that in
> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
> O_EXCL' flags, which makes this routine as not very reentrant.
>
> Since an exclusive lock is taken before temp file creation, I think it
> should be safe to replace O_EXCL with O_TRUNC.

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.


> Script to reproduce and patch are attached.

Well:

> # Imitate out_of_space/write_operation_error
> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp

Doesn't really replicate how we got into this state...

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Grigory Smolkin

On 12/2/19 7:12 PM, Andres Freund wrote:

> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
Is it possible with exclusive lock taken before that?
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.


>
> Greetings,
>
> Andres Freund
>
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Dmitry Vasiliev
Here's what happened from the publisher and subscriber point of view:

publisher:  (some query) ERROR: could not write to tuplestore temporary file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply worker for subscription "<name> _sub" has started
publisher:  LOG: received replication command: IDENTIFY_SYSTEM
publisher:  LOG: received replication command: START_REPLICATION SLOT "<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names '"<name>_pub"')
publisher:  ERROR: could not create file "pg_replslot /<name>_sub/state.tmp": File exists

I think some publisher logs may not be available due to out of space problem.

On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <[hidden email]> wrote:

On 12/2/19 7:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
Is it possible with exclusive lock taken before that?
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.


>
> Greetings,
>
> Andres Freund
>
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Grigory Smolkin
In reply to this post by Andres Freund

On 12/2/19 7:12 PM, Andres Freund wrote:

> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...

I`ve managed to reproduce the issue using the attached script:

2019-12-02 20:32:12.547 MSK [4180] ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.548 MSK [4178] ERROR:  could not receive data from
WAL stream: ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.557 MSK [4645] LOG:  logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:12.562 MSK [4091] LOG:  background worker "logical
replication worker" (PID 4178) exited with exit code 1
2019-12-02 20:32:12.569 MSK [4648] LOG:  starting logical decoding for
slot "mysub"
2019-12-02 20:32:12.569 MSK [4648] DETAIL:  Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:12.569 MSK [4648] LOG:  logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:12.569 MSK [4648] DETAIL:  Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:12.672 MSK [4648] ERROR:  could not write to file
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4645] ERROR:  could not receive data from
WAL stream: ERROR:  could not write to file
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4091] LOG:  background worker "logical
replication worker" (PID 4645) exited with exit code 1
2019-12-02 20:32:17.684 MSK [4653] LOG:  logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:17.693 MSK [4654] LOG:  starting logical decoding for
slot "mysub"
2019-12-02 20:32:17.693 MSK [4654] DETAIL:  Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:17.693 MSK [4654] LOG:  logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:17.693 MSK [4654] DETAIL:  Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:17.847 MSK [4654] ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.847 MSK [4653] ERROR:  could not receive data from
WAL stream: ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.848 MSK [4091] LOG:  background worker "logical
replication worker" (PID 4653) exited with exit code 1
2019-12-02 20:32:22.859 MSK [4658] LOG:  logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:22.868 MSK [4659] LOG:  starting logical decoding for
slot "mysub"
2019-12-02 20:32:22.868 MSK [4659] DETAIL:  Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:22.868 MSK [4659] LOG:  logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:22.868 MSK [4659] DETAIL:  Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:22.868 MSK [4659] ERROR:  could not create file
"pg_replslot/mysub/state.tmp": File exists
2019-12-02 20:32:22.869 MSK [4658] ERROR:  could not receive data from
WAL stream: ERROR:  could not create file "pg_replslot/mysub/state.tmp":
File exists



>
> Greetings,
>
> Andres Freund

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


replication_bug.sh (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Michael Paquier-2
In reply to this post by Andres Freund
On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.

During the checkpoint of replication slots, SaveSlotToPath() would
just *LOG* any failure while leaving around the state.tmp of a slot,
and then any follow-up attempt to create state.tmp would just fail
because of that, preventing the slot state file from being flushed
continuously.  I think that's wrong.  Concurrency is not a concern
either here as the slot's LWLock to track an I/O in progress is taken
in exclusive lock.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical replication: could not create file "state.tmp": File exists

Andres Freund
Hi,

On 2019-12-11 14:27:45 +0900, Michael Paquier wrote:

> On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:
> > I'm very doubtful about this. I think it's a good safety measure to
> > ensure that there's no previous state file that we're somehow
> > overwriting.
>
> During the checkpoint of replication slots, SaveSlotToPath() would
> just *LOG* any failure while leaving around the state.tmp of a slot,
> and then any follow-up attempt to create state.tmp would just fail
> because of that, preventing the slot state file from being flushed
> continuously.  I think that's wrong.  Concurrency is not a concern
> either here as the slot's LWLock to track an I/O in progress is taken
> in exclusive lock.

I'm not clear on what you're saying? I don't see how I'm arguing for the
type of behaviour you seem to be describing?

Greetings,

Andres Freund