reloption to prevent VACUUM from truncating empty pages at the end of relation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
132 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|

reloption to prevent VACUUM from truncating empty pages at the end of relation

Fujii Masao-2
Hi,

I'd like to propose to add $SUBJECT for performance improvement.

When VACUUM tries to truncate the trailing empty pages, it scans shared_buffers
to invalidate the pages-to-truncate during holding an AccessExclusive lock on
the relation. So if shared_buffers is huge, other transactions need to wait for
a very long time before accessing to the relation. Which would cause the
response-time spikes, for example, I observed such spikes several times on
the server with shared_buffers = 300GB while running the benchmark.
Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such spikes
for that relation.

Also, first of all, if other transactions need to extend the relation
(i.e., need new pages) as soon as VACUUM truncates the empty pages at the end,
that truncation would not be so helpful for performance. In this case,
the truncation and extension of the relation are unnecessarily repeated,
which would decrease the performance. So, to alleviate this situation,
$SUBJECT is useful, I think.

Thought?

Regards,

--
Fujii Masao

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Tom Lane-2
Fujii Masao <[hidden email]> writes:
> When VACUUM tries to truncate the trailing empty pages, it scans shared_buffers
> to invalidate the pages-to-truncate during holding an AccessExclusive lock on
> the relation. So if shared_buffers is huge, other transactions need to wait for
> a very long time before accessing to the relation. Which would cause the
> response-time spikes, for example, I observed such spikes several times on
> the server with shared_buffers = 300GB while running the benchmark.
> Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such spikes
> for that relation.

I think that the real problem here is having to do a scan of all of shared
buffers.  VACUUM's not the only thing that has to do that, there's also
e.g. DROP and TRUNCATE.  So rather than a klugy solution that only fixes
VACUUM (and not very well, requiring user intervention and an unpleasant
tradeoff), we ought to look at ways to avoid needing a whole-pool scan to
find the pages belonging to one relation.  In the past we've been able to
skate by without a decent solution for that because shared buffers were
customarily not all that big.  But if we're going to start considering
huge buffer pools to be a case we want to have good performance for,
that's got to change.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Peter Geoghegan-4
On Tue, Apr 17, 2018 at 11:09 AM, Tom Lane <[hidden email]> wrote:
> So rather than a klugy solution that only fixes
> VACUUM (and not very well, requiring user intervention and an unpleasant
> tradeoff), we ought to look at ways to avoid needing a whole-pool scan to
> find the pages belonging to one relation.  In the past we've been able to
> skate by without a decent solution for that because shared buffers were
> customarily not all that big.  But if we're going to start considering
> huge buffer pools to be a case we want to have good performance for,
> that's got to change.

Andres mentioned that he has prototyped an approach to buffer
management that uses a Radix tree, which is generally assumed to be
the right long-term fix.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Álvaro Herrera
In reply to this post by Tom Lane-2
Tom Lane wrote:

> Fujii Masao <[hidden email]> writes:
> > When VACUUM tries to truncate the trailing empty pages, it scans shared_buffers
> > to invalidate the pages-to-truncate during holding an AccessExclusive lock on
> > the relation. So if shared_buffers is huge, other transactions need to wait for
> > a very long time before accessing to the relation. Which would cause the
> > response-time spikes, for example, I observed such spikes several times on
> > the server with shared_buffers = 300GB while running the benchmark.
> > Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such spikes
> > for that relation.
>
> I think that the real problem here is having to do a scan of all of shared
> buffers.  VACUUM's not the only thing that has to do that, there's also
> e.g. DROP and TRUNCATE.  So rather than a klugy solution that only fixes
> VACUUM (and not very well, requiring user intervention and an unpleasant
> tradeoff), we ought to look at ways to avoid needing a whole-pool scan to
> find the pages belonging to one relation.  In the past we've been able to
> skate by without a decent solution for that because shared buffers were
> customarily not all that big.  But if we're going to start considering
> huge buffer pools to be a case we want to have good performance for,
> that's got to change.

Andres was working on a radix tree structure to fix this problem, but
that seems to be abandoned now, and it seems a major undertaking.  While
I agree that the proposed solution is a wart, it seems much better than
no solution at all.  Can we consider Fujii's proposal as a temporary
measure until we fix shared buffers?  I'm +1 on it myself.

We've seen this problem also affecting a production workload pretty
severely, though shared_buffers is not as big.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> Andres was working on a radix tree structure to fix this problem, but
> that seems to be abandoned now, and it seems a major undertaking.  While
> I agree that the proposed solution is a wart, it seems much better than
> no solution at all.  Can we consider Fujii's proposal as a temporary
> measure until we fix shared buffers?  I'm +1 on it myself.

Once we've introduced a user-visible reloption it's going to be
practically impossible to get rid of it, so I'm -1.  I'd much rather
see somebody put some effort into the radix-tree idea than introduce
a kluge that we'll be stuck with, and that doesn't even provide a
good user experience.  Disabling vacuum truncation is *not* something
that I think we should recommend.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Andres Freund
On 2018-04-17 15:09:18 -0400, Tom Lane wrote:
> Alvaro Herrera <[hidden email]> writes:
> > Andres was working on a radix tree structure to fix this problem, but
> > that seems to be abandoned now, and it seems a major undertaking.

I hope to re-ignite work on that later in the v12 cycle. But
realistically that means it's not going to be mergable for v12.


> > While I agree that the proposed solution is a wart, it seems much
> > better than no solution at all.  Can we consider Fujii's proposal as
> > a temporary measure until we fix shared buffers?  I'm +1 on it
> > myself.
>
> Once we've introduced a user-visible reloption it's going to be
> practically impossible to get rid of it, so I'm -1.

It's not much work to maintain though? And even the brief AEL lock can
cause troubles, leaving the scan aside. So I'm like +0.1 or such.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Michael Paquier-2
On Tue, Apr 17, 2018 at 12:12:26PM -0700, Andres Freund wrote:
> On 2018-04-17 15:09:18 -0400, Tom Lane wrote:
>> Alvaro Herrera <[hidden email]> writes:
>>> Andres was working on a radix tree structure to fix this problem, but
>>> that seems to be abandoned now, and it seems a major undertaking.
>
> I hope to re-ignite work on that later in the v12 cycle. But
> realistically that means it's not going to be mergable for v12.

Need a push of man-hours for that?

>>> While I agree that the proposed solution is a wart, it seems much
>>> better than no solution at all.  Can we consider Fujii's proposal as
>>> a temporary measure until we fix shared buffers?  I'm +1 on it
>>> myself.
>>
>> Once we've introduced a user-visible reloption it's going to be
>> practically impossible to get rid of it, so I'm -1.
>
> It's not much work to maintain though? And even the brief AEL lock can
> cause troubles, leaving the scan aside. So I'm like +0.1 or such.
I would say that if the radix tree patch can make it for the first
commit fest and has reviews, then there would be likely no need for this
reloption.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Andres Freund


On April 17, 2018 6:00:59 PM PDT, Michael Paquier <[hidden email]> wrote:

>On Tue, Apr 17, 2018 at 12:12:26PM -0700, Andres Freund wrote:
>> On 2018-04-17 15:09:18 -0400, Tom Lane wrote:
>>> Alvaro Herrera <[hidden email]> writes:
>>>> Andres was working on a radix tree structure to fix this problem,
>but
>>>> that seems to be abandoned now, and it seems a major undertaking.
>>
>> I hope to re-ignite work on that later in the v12 cycle. But
>> realistically that means it's not going to be mergable for v12.
>
>Need a push of man-hours for that?

Not sure what you mean?


>>>> While I agree that the proposed solution is a wart, it seems much
>>>> better than no solution at all.  Can we consider Fujii's proposal
>as
>>>> a temporary measure until we fix shared buffers?  I'm +1 on it
>>>> myself.
>>>
>>> Once we've introduced a user-visible reloption it's going to be
>>> practically impossible to get rid of it, so I'm -1.
>>
>> It's not much work to maintain though? And even the brief AEL lock
>can
>> cause troubles, leaving the scan aside. So I'm like +0.1 or such.
>
>I would say that if the radix tree patch can make it for the first
>commit fest and has reviews, then there would be likely no need for
>this
>reloption.

There's no way it can.


Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Michael Paquier-2
On Tue, Apr 17, 2018 at 06:13:31PM -0700, Andres Freund wrote:
> Not sure what you mean?

Do you need help on it?  I suggest that I could undertake the proposed
patch and submit it earlier in the development cycle of v12.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Andres Freund
On 2018-04-18 10:46:51 +0900, Michael Paquier wrote:
> On Tue, Apr 17, 2018 at 06:13:31PM -0700, Andres Freund wrote:
> > Not sure what you mean?
>
> Do you need help on it?  I suggest that I could undertake the proposed
> patch and submit it earlier in the development cycle of v12.

I think it's at the very least two months of serious development work to
get it into a state ready for submission. And a good chunk of that not
even sketched out.  Replacing the hashtable is the easy part, the memory
management (Complicated due to lock-freeness. I'm thinking of using a
variant of epoch based reclamation) isn't really there, the management
of shared "open relations" state are the hard parts...

So yes, I could use help on it, but it'll be a lot of actual design and
investigatory work.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

akapila
On Wed, Apr 18, 2018 at 7:46 AM, Andres Freund <[hidden email]> wrote:

> On 2018-04-18 10:46:51 +0900, Michael Paquier wrote:
>> On Tue, Apr 17, 2018 at 06:13:31PM -0700, Andres Freund wrote:
>> > Not sure what you mean?
>>
>> Do you need help on it?  I suggest that I could undertake the proposed
>> patch and submit it earlier in the development cycle of v12.
>
> I think it's at the very least two months of serious development work to
> get it into a state ready for submission. And a good chunk of that not
> even sketched out.  Replacing the hashtable is the easy part, the memory
> management (Complicated due to lock-freeness. I'm thinking of using a
> variant of epoch based reclamation) isn't really there, the management
> of shared "open relations" state are the hard parts...
>
> So yes, I could use help on it, but it'll be a lot of actual design and
> investigatory work.
>

I think it makes sense to pursue that approach, but it might be worth
considering some alternative till we have it.  I remember last time
(in 2015) we have discussed some another solution [1] to this problem
(or similar) and we have left it unattended in the hope that we will
get a better solution, but we are still in the same situation.  I
think in general it is better to go with the approach which can fix
the root cause of the problem, but if that is going to take a long
time, it is not terrible to provide some workable solution which can
help users.


[1] - https://www.postgresql.org/message-id/CAA4eK1JPLGjpMeJ5YLNE7bpNBhP2EQe_rDR%2BAw3atNfj9WkAGg%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Pavan Deolasee
In reply to this post by Fujii Masao-2


On Tue, Apr 17, 2018 at 11:05 PM, Fujii Masao <[hidden email]> wrote:
Hi,

I'd like to propose to add $SUBJECT for performance improvement.

When VACUUM tries to truncate the trailing empty pages, it scans shared_buffers
to invalidate the pages-to-truncate during holding an AccessExclusive lock on
the relation. So if shared_buffers is huge, other transactions need to wait for
a very long time before accessing to the relation. Which would cause the
response-time spikes, for example, I observed such spikes several times on
the server with shared_buffers = 300GB while running the benchmark.
Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such spikes
for that relation.

Alvaro reminded me that we already have a mechanism in place which forces VACUUM to give up the exclusive lock if another backend is waiting on the lock for more than certain pre-defined duration. AFAICS we give up the lock, but again retry truncation from the previously left off position. What if we make that lock-wait duration configurable on a per-table basis? And may be a special value to never truncate (though it seems quite excessive to me and a possible footgun)

I was actually thinking in the other direction. So between the time VACUUM figures out it can possibly truncate last K pages, some backend may insert a tuple in some page and make the truncation impossible. What if we truncate the FSM before starting the backward scan so that new inserts go into the pages prior to the truncation point, if possible. That will increase the chances of VACUUM being able to truncate all the empty pages. Though I think in some cases it might lead to unnecessary further extension of the relation. May be we use some heuristic based on available free space in the table prior to the truncation point?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Simon Riggs
In reply to this post by Tom Lane-2
On 17 April 2018 at 20:09, Tom Lane <[hidden email]> wrote:

> Alvaro Herrera <[hidden email]> writes:
>> Andres was working on a radix tree structure to fix this problem, but
>> that seems to be abandoned now, and it seems a major undertaking.  While
>> I agree that the proposed solution is a wart, it seems much better than
>> no solution at all.  Can we consider Fujii's proposal as a temporary
>> measure until we fix shared buffers?  I'm +1 on it myself.
>
> Once we've introduced a user-visible reloption it's going to be
> practically impossible to get rid of it, so I'm -1.  I'd much rather
> see somebody put some effort into the radix-tree idea than introduce
> a kluge that we'll be stuck with, and that doesn't even provide a
> good user experience.  Disabling vacuum truncation is *not* something
> that I think we should recommend.

The truncation at the end of VACUUM takes an AccessExclusiveLock,
which is already user visible. Using a radix tree won't alter that.

ISTM the user might be interested in having the *lock* NOT happen, so
I am +1 for the suggestion regardless of whether radix tree ever
happens.

The lock itself can be cancelled, so the user would also be interested
in explicitly requesting a retry with a separate command/function.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Fujii Masao-2
In reply to this post by Pavan Deolasee
On Wed, Apr 18, 2018 at 11:29 PM, Pavan Deolasee
<[hidden email]> wrote:

>
>
> On Tue, Apr 17, 2018 at 11:05 PM, Fujii Masao <[hidden email]> wrote:
>>
>> Hi,
>>
>> I'd like to propose to add $SUBJECT for performance improvement.
>>
>> When VACUUM tries to truncate the trailing empty pages, it scans
>> shared_buffers
>> to invalidate the pages-to-truncate during holding an AccessExclusive lock
>> on
>> the relation. So if shared_buffers is huge, other transactions need to
>> wait for
>> a very long time before accessing to the relation. Which would cause the
>> response-time spikes, for example, I observed such spikes several times on
>> the server with shared_buffers = 300GB while running the benchmark.
>> Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such
>> spikes
>> for that relation.
>
>
> Alvaro reminded me that we already have a mechanism in place which forces
> VACUUM to give up the exclusive lock if another backend is waiting on the
> lock for more than certain pre-defined duration. AFAICS we give up the lock,
> but again retry truncation from the previously left off position. What if we
> make that lock-wait duration configurable on a per-table basis? And may be a
> special value to never truncate (though it seems quite excessive to me and a
> possible footgun)

I'm not sure if it's safe to cancel forcibly VACUUM's truncation during
scaning shared_buffers. That scan happens after WAL-logging and before
the actual truncation.

> I was actually thinking in the other direction. So between the time VACUUM
> figures out it can possibly truncate last K pages, some backend may insert a
> tuple in some page and make the truncation impossible. What if we truncate
> the FSM before starting the backward scan so that new inserts go into the
> pages prior to the truncation point, if possible. That will increase the
> chances of VACUUM being able to truncate all the empty pages. Though I think
> in some cases it might lead to unnecessary further extension of the
> relation. May be we use some heuristic based on available free space in the
> table prior to the truncation point?

Isn't this too complicated? I wonder what heuristic we can use here.

Regards,

--
Fujii Masao

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Pavan Deolasee


On Wed, Apr 18, 2018 at 10:50 PM, Fujii Masao <[hidden email]> wrote:


I'm not sure if it's safe to cancel forcibly VACUUM's truncation during
scaning shared_buffers. That scan happens after WAL-logging and before
the actual truncation.


Ah ok. I misread your proposal. This is about the shared_buffers scan in DropRelFileNodeBuffers() and we can't cancel that operation.

What if we remember the buffers as seen by count_nondeletable_pages() and then just discard those specific buffers instead of scanning the entire shared_buffers again? Surely we revisit all to-be-truncated blocks before actual truncation. So we already know which buffers to discard. And we're holding exclusive lock at that point, so nothing can change underneath. Of course, we can't really remember a large number of buffers, so we can do this in small chunks. Scan last K blocks, remember those K buffers, discard those K buffers, truncate the relation and then try for next K blocks. If another backend requests lock on the table, we give up or retry after a while.   

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Tom Lane-2
Pavan Deolasee <[hidden email]> writes:
> What if we remember the buffers as seen by count_nondeletable_pages() and
> then just discard those specific buffers instead of scanning the entire
> shared_buffers again?

That's an idea.

> Surely we revisit all to-be-truncated blocks before
> actual truncation. So we already know which buffers to discard. And we're
> holding exclusive lock at that point, so nothing can change underneath. Of
> course, we can't really remember a large number of buffers, so we can do
> this in small chunks.

Hm?  We're deleting the last N consecutive blocks, so it seems like we
just need to think in terms of clearing that range.  I think this can
just be a local logic change inside DropRelFileNodeBuffers().

You could optimize it fairly easily with some heuristic that compares
N to sizeof shared buffers; if it's too large a fraction, the existing
implementation will be cheaper than a bunch of hashtable probes.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Tom Lane-2
I wrote:
> Pavan Deolasee <[hidden email]> writes:
>> What if we remember the buffers as seen by count_nondeletable_pages() and
>> then just discard those specific buffers instead of scanning the entire
>> shared_buffers again?

> That's an idea.

BTW, before pushing too hard on any of this, we need to think about the
data-corruption hazard that MauMau just reminded us about.  I'm afraid
what we're likely to end up with after the dust settles is worse
performance than today, not better :-(.

https://postgr.es/m/5BBC590AE8DF4ED1A170E4D48F1B53AC@tunaPC

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Michael Paquier-2
In reply to this post by akapila
On Wed, Apr 18, 2018 at 07:41:44PM +0530, Amit Kapila wrote:
> I think it makes sense to pursue that approach, but it might be worth
> considering some alternative till we have it.  I remember last time
> (in 2015) we have discussed some another solution [1] to this problem
> (or similar) and we have left it unattended in the hope that we will
> get a better solution, but we are still in the same situation.  I
> think in general it is better to go with the approach which can fix
> the root cause of the problem, but if that is going to take a long
> time, it is not terrible to provide some workable solution which can
> help users.

Yeah, I can understand that feeling.  When we talked about the
compression of FPWs back in 9.5, we discussed that if we had
double-writes then this would not be necessary, and we are still with
wal_compression but without double writes (actually, it happens that
compression of pages can also be used with double writes, but that's
enough highjacking for this thread..).

Then, let's consider the beginning of the first commit fest of v12 as
judgement.  Implementing radix tree for shared buffers is a long-term
project, which has no guarantee to get merged, while a visibly-simple
reloptions which helps in some cases...
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: reloption to prevent VACUUM from truncating empty pages at the end of relation

Tsunakawa, Takayuki
In reply to this post by Fujii Masao-2
From: Fujii Masao [mailto:[hidden email]]
> a very long time before accessing to the relation. Which would cause the
> response-time spikes, for example, I observed such spikes several times
> on
> the server with shared_buffers = 300GB while running the benchmark.

FYI, a long transaction took about 900 ms, while the average transaction response time was 150 ms or so.  (I'm working with Fujii-san in this performance benchmark.)


> Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such
> spikes
> for that relation.

How about an integer variable to replace the following?

#define REL_TRUNCATE_FRACTION 16


> Also, first of all, if other transactions need to extend the relation
> (i.e., need new pages) as soon as VACUUM truncates the empty pages at the
> end,
> that truncation would not be so helpful for performance. In this case,
> the truncation and extension of the relation are unnecessarily repeated,
> which would decrease the performance. So, to alleviate this situation,
> $SUBJECT is useful, I think.

I wonder if fillfactor=50 would alleviate this situation.

Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Álvaro Herrera
In reply to this post by Michael Paquier-2
Michael Paquier wrote:

> Then, let's consider the beginning of the first commit fest of v12 as
> judgement.  Implementing radix tree for shared buffers is a long-term
> project, which has no guarantee to get merged, while a visibly-simple
> reloptions which helps in some cases...

In the scenario we studied, the truncations were causing periodic
hiccups which were quite severe.  The truncations were completely
useless anyway because the table grew back to the original size daily (a
few dozen GBs I think).  That was a lot of unnecessary work, and under
exclusive lock no less.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

1234 ... 7