pendingOps table is not cleared with fsync=off

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

pendingOps table is not cleared with fsync=off

Heikki Linnakangas
Hi!

I noticed that commit 3eb77eba5a changed the logic in
ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
the entries are not removed from the pendingOps hash table. I don't
think that was intended.

I propose the attached patch to move the test for enableFsync so that
the entries are removed from pendingOps again. It looks larger than it
really is because it re-indents the block of code that is now inside the
"if (enableFsync)" condition.

- Heikki

clear-pendingOps-with-fsync-off-1.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Thomas Munro-5
On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <[hidden email]> wrote:
> I noticed that commit 3eb77eba5a changed the logic in
> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
> the entries are not removed from the pendingOps hash table. I don't
> think that was intended.

Perhaps we got confused about what the comment "... so that changing
fsync on the fly behaves sensibly" means.  Fsyncing everything you
missed when you turn it back on after a period running with it off
does sound a bit like behaviour that someone might want or expect,
though it probably isn't really enough to guarantee durability,
because requests queued here aren't the only fsyncs you missed while
you had it off, among other problems.  Unfortunately, if you try that
on an assertion build, you get a failure anyway, so that probably
wasn't deliberate:

TRAP: FailedAssertion("(CycleCtr) (entry->cycle_ctr + 1) ==
sync_cycle_ctr", File: "sync.c", Line: 335)

> I propose the attached patch to move the test for enableFsync so that
> the entries are removed from pendingOps again. It looks larger than it
> really is because it re-indents the block of code that is now inside the
> "if (enableFsync)" condition.

Yeah, I found that git diff/show -w made it easier to understand that
change.  LGTM, though I'd be tempted to use "goto skip" instead of
incurring that much indentation but up to you ...


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Heikki Linnakangas
On 09/05/2020 02:53, Thomas Munro wrote:

> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <[hidden email]> wrote:
>> I noticed that commit 3eb77eba5a changed the logic in
>> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
>> the entries are not removed from the pendingOps hash table. I don't
>> think that was intended.
>
> Perhaps we got confused about what the comment "... so that changing
> fsync on the fly behaves sensibly" means.  Fsyncing everything you
> missed when you turn it back on after a period running with it off
> does sound a bit like behaviour that someone might want or expect,
> though it probably isn't really enough to guarantee durability,
> because requests queued here aren't the only fsyncs you missed while
> you had it off, among other problems.

Yeah, you need to run "sync" after turning fsync=on to be safe, there's
no way around it.

> Unfortunately, if you try that on an assertion build, you get a
> failure anyway, so that probably wasn't deliberate:
>
> TRAP: FailedAssertion("(CycleCtr) (entry->cycle_ctr + 1) ==
> sync_cycle_ctr", File: "sync.c", Line: 335)

Ah, I didn't notice that.

>> I propose the attached patch to move the test for enableFsync so that
>> the entries are removed from pendingOps again. It looks larger than it
>> really is because it re-indents the block of code that is now inside the
>> "if (enableFsync)" condition.
>
> Yeah, I found that git diff/show -w made it easier to understand that
> change.  LGTM, though I'd be tempted to use "goto skip" instead of
> incurring that much indentation but up to you ...

I considered a goto too, but I found it confusing. If we need any more
nesting here in the future, I think extracting the inner parts into a
function would be good.

Committed, thanks!

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Tom Lane-2
Heikki Linnakangas <[hidden email]> writes:
> On 09/05/2020 02:53, Thomas Munro wrote:
>> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <[hidden email]> wrote:
>>> I noticed that commit 3eb77eba5a changed the logic in
>>> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
>>> the entries are not removed from the pendingOps hash table. I don't
>>> think that was intended.

I'm looking at this commit in connection with writing release notes
for next week's releases.  Am I right in thinking that this bug leads
to indefinite bloat of the pendingOps hash table when fsync is off?
If so, that seems much more worth documenting than the assertion
failure.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Heikki Linnakangas
On 06/08/2020 18:42, Tom Lane wrote:

> Heikki Linnakangas <[hidden email]> writes:
>> On 09/05/2020 02:53, Thomas Munro wrote:
>>> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <[hidden email]> wrote:
>>>> I noticed that commit 3eb77eba5a changed the logic in
>>>> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
>>>> the entries are not removed from the pendingOps hash table. I don't
>>>> think that was intended.
>
> I'm looking at this commit in connection with writing release notes
> for next week's releases.  Am I right in thinking that this bug leads
> to indefinite bloat of the pendingOps hash table when fsync is off?
> If so, that seems much more worth documenting than the assertion
> failure.

That's correct.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Tom Lane-2
Heikki Linnakangas <[hidden email]> writes:
> On 06/08/2020 18:42, Tom Lane wrote:
>> I'm looking at this commit in connection with writing release notes
>> for next week's releases.  Am I right in thinking that this bug leads
>> to indefinite bloat of the pendingOps hash table when fsync is off?
>> If so, that seems much more worth documenting than the assertion
>> failure.

> That's correct.

OK, thanks for confirming.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Shawn Debnath
In reply to this post by Thomas Munro-5
On Sat, May 09, 2020 at 11:53:13AM +1200, Thomas Munro wrote:

> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <[hidden email]> wrote:
> > I noticed that commit 3eb77eba5a changed the logic in
> > ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
> > the entries are not removed from the pendingOps hash table. I don't
> > think that was intended.
>
> Perhaps we got confused about what the comment "... so that changing
> fsync on the fly behaves sensibly" means.  Fsyncing everything you
> missed when you turn it back on after a period running with it off
> does sound a bit like behaviour that someone might want or expect,
> though it probably isn't really enough to guarantee durability,
> because requests queued here aren't the only fsyncs you missed while
> you had it off, among other problems.

Good catch. Question is, are the users aware of the requirement to do a
manual fsync if they flip the fsync GUC off and then on? Should we do
this on their behalf to make a good faith attempt to ensure things are
flushed properly via an assign hook?


--
Shawn Debnath
Amazon Web Services (AWS)


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Tom Lane-2
Shawn Debnath <[hidden email]> writes:
> Good catch. Question is, are the users aware of the requirement to do a
> manual fsync if they flip the fsync GUC off and then on? Should we do
> this on their behalf to make a good faith attempt to ensure things are
> flushed properly via an assign hook?

No.  Or at least, expecting that you can do that from an assign hook
is impossibly wrong-headed.  GUC assign hooks can't have failure modes.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pendingOps table is not cleared with fsync=off

Shawn Debnath
On Mon, Aug 10, 2020 at 02:50:26PM -0400, Tom Lane wrote:
>
> Shawn Debnath <[hidden email]> writes:
> > Good catch. Question is, are the users aware of the requirement to do a
> > manual fsync if they flip the fsync GUC off and then on? Should we do
> > this on their behalf to make a good faith attempt to ensure things are
> > flushed properly via an assign hook?
>
> No.  Or at least, expecting that you can do that from an assign hook
> is impossibly wrong-headed.  GUC assign hooks can't have failure modes.

Okay agree, will remind myself to drink more coffee next time.

If we think a fsync should be performed in this case, assign hook
could set a value to indicate parameter was reset via SIGHUP. Next call
to ProcessSyncRequests() could check for this, do a fsync prior to
absorbing the newly submitted sync requests, and reset the flag.
fsync_pgdata() comes to mind to be inclusive.

If folks are not inclined to do the fsync, the change is good as is.

--
Shawn Debnath
Amazon Web Services (AWS)