Vacuum: allow usage of more than 1GB of work mem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
200 messages Options
1234 ... 10
Reply | Threaded
Open this post in threaded view
|

Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
The attached patch allows setting maintainance_work_mem or
autovacuum_work_mem higher than 1GB (and be effective), by turning the
allocation of the dead_tuples into a huge allocation.

This results in fewer index scans for heavily bloated tables, and
could be a lifesaver in many situations (in particular, the situation
I'm living right now in production, where we don't have enough room
for a vacuum full, and have just deleted 75% of a table to make room
but have to rely on regular lazy vacuum to free the space).

The patch also makes vacuum free the dead_tuples before starting
truncation. It didn't seem necessary to hold onto it beyond that
point, and it might help give the OS more cache, especially if work
mem is configured very high to avoid multiple index scans.

Tested with pgbench scale 4000 after deleting the whole
pgbench_accounts table, seemed to work fine.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Vacuum-allow-using-more-than-1GB-work-mem.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Simon Riggs
On 3 September 2016 at 04:25, Claudio Freire <[hidden email]> wrote:
> The attached patch allows setting maintainance_work_mem or
> autovacuum_work_mem higher than 1GB (and be effective), by turning the
> allocation of the dead_tuples into a huge allocation.
>
> This results in fewer index scans for heavily bloated tables, and
> could be a lifesaver in many situations (in particular, the situation
> I'm living right now in production, where we don't have enough room
> for a vacuum full, and have just deleted 75% of a table to make room
> but have to rely on regular lazy vacuum to free the space).

This part looks fine. I'm inclined to commit the attached patch soon.

> The patch also makes vacuum free the dead_tuples before starting
> truncation. It didn't seem necessary to hold onto it beyond that
> point, and it might help give the OS more cache, especially if work
> mem is configured very high to avoid multiple index scans.

How long does that part ever take? Is there any substantial gain from this?

Lets discuss that as a potential second patch.

> Tested with pgbench scale 4000 after deleting the whole
> pgbench_accounts table, seemed to work fine.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

allow_vacuum_work_mem_gt_1GB.v2.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Jim Nasby-5
On 9/4/16 1:46 AM, Simon Riggs wrote:
>> > The patch also makes vacuum free the dead_tuples before starting
>> > truncation. It didn't seem necessary to hold onto it beyond that
>> > point, and it might help give the OS more cache, especially if work
>> > mem is configured very high to avoid multiple index scans.
> How long does that part ever take? Is there any substantial gain from this?

If you're asking about how long the dealloc takes, we're going to have
to pay that cost anyway when the context gets destroyed/reset, no? Doing
that sooner rather than later certainly seems like a good idea since
we've seen that truncation can take quite some time. Might as well give
the memory back to the OS ASAP.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
In reply to this post by Simon Riggs
On Sun, Sep 4, 2016 at 3:46 AM, Simon Riggs <[hidden email]> wrote:
> On 3 September 2016 at 04:25, Claudio Freire <[hidden email]> wrote:
>> The patch also makes vacuum free the dead_tuples before starting
>> truncation. It didn't seem necessary to hold onto it beyond that
>> point, and it might help give the OS more cache, especially if work
>> mem is configured very high to avoid multiple index scans.
>
> How long does that part ever take? Is there any substantial gain from this?
>
> Lets discuss that as a potential second patch.

In the test case I mentioned, it takes longer than the vacuum part itself.

Other than freeing RAM there's no gain. I didn't measure any speed
difference while testing, but that's probably because the backward
scan doesn't benefit from the cache, but other activity on the system
might. So, depending on the workload on the server, extra available
RAM may be a significant gain on its own or not. It just didn't seem
there was a reason to keep that RAM reserved, especially after making
it a huge allocation.

I'm fine either way. I can remove that from the patch or leave it
as-is. It just seemed like a good idea at the time.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
On Mon, Sep 5, 2016 at 11:50 AM, Claudio Freire <[hidden email]> wrote:

> On Sun, Sep 4, 2016 at 3:46 AM, Simon Riggs <[hidden email]> wrote:
>> On 3 September 2016 at 04:25, Claudio Freire <[hidden email]> wrote:
>>> The patch also makes vacuum free the dead_tuples before starting
>>> truncation. It didn't seem necessary to hold onto it beyond that
>>> point, and it might help give the OS more cache, especially if work
>>> mem is configured very high to avoid multiple index scans.
>>
>> How long does that part ever take? Is there any substantial gain from this?
>>
>> Lets discuss that as a potential second patch.
>
> In the test case I mentioned, it takes longer than the vacuum part itself.
>
> Other than freeing RAM there's no gain. I didn't measure any speed
> difference while testing, but that's probably because the backward
> scan doesn't benefit from the cache, but other activity on the system
> might. So, depending on the workload on the server, extra available
> RAM may be a significant gain on its own or not. It just didn't seem
> there was a reason to keep that RAM reserved, especially after making
> it a huge allocation.
>
> I'm fine either way. I can remove that from the patch or leave it
> as-is. It just seemed like a good idea at the time.

Rebased and split versions attached


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Vacuum-allow-using-more-than-1GB-work-mem.patch (2K) Download Attachment
0002-Vacuum-free-dead_tuples-sooner.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Simon Riggs
In reply to this post by Claudio Freire
On 5 September 2016 at 15:50, Claudio Freire <[hidden email]> wrote:

> On Sun, Sep 4, 2016 at 3:46 AM, Simon Riggs <[hidden email]> wrote:
>> On 3 September 2016 at 04:25, Claudio Freire <[hidden email]> wrote:
>>> The patch also makes vacuum free the dead_tuples before starting
>>> truncation. It didn't seem necessary to hold onto it beyond that
>>> point, and it might help give the OS more cache, especially if work
>>> mem is configured very high to avoid multiple index scans.
>>
>> How long does that part ever take? Is there any substantial gain from this?
>>
>> Lets discuss that as a potential second patch.
>
> In the test case I mentioned, it takes longer than the vacuum part itself.

Please provide a test case and timings so we can see what's happening.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
On Mon, Sep 5, 2016 at 5:36 PM, Simon Riggs <[hidden email]> wrote:

> On 5 September 2016 at 15:50, Claudio Freire <[hidden email]> wrote:
>> On Sun, Sep 4, 2016 at 3:46 AM, Simon Riggs <[hidden email]> wrote:
>>> On 3 September 2016 at 04:25, Claudio Freire <[hidden email]> wrote:
>>>> The patch also makes vacuum free the dead_tuples before starting
>>>> truncation. It didn't seem necessary to hold onto it beyond that
>>>> point, and it might help give the OS more cache, especially if work
>>>> mem is configured very high to avoid multiple index scans.
>>>
>>> How long does that part ever take? Is there any substantial gain from this?
>>>
>>> Lets discuss that as a potential second patch.
>>
>> In the test case I mentioned, it takes longer than the vacuum part itself.
>
> Please provide a test case and timings so we can see what's happening.

The referenced test case is the one I mentioned on the OP:

- createdb pgbench
- pgbench -i -s 4000 pgbench
- psql pgbench -c 'delete from pgbench_accounts;'
- vacuumdb -v -t pgbench_accounts pgbench

fsync=off, autovacuum=off, maintainance_work_mem=4GB

From what I remember, it used ~2.7GB of RAM up until the truncate
phase, where it freed it. It performed a single index scan over the
PK.

I don't remember timings, and I didn't take them, so I'll have to
repeat the test to get them. It takes all day and makes my laptop
unusably slow, so I'll post them later, but they're not very
interesting. The only interesting bit is that it does a single index
scan instead of several, which on TB-or-more tables it's kinda nice.

Btw, without a further patch to prefetch pages on the backward scan
for truncate, however, my patience ran out before it finished
truncating. I haven't submitted that patch because there was an
identical patch in an older thread that was discussed and more or less
rejected since it slightly penalized SSDs. While I'm confident my
version of the patch is a little easier on SSDs, I haven't got an SSD
at hand to confirm, so that patch is still waiting on the queue until
I get access to an SSD. The tests I'll make include that patch, so the
timing regarding truncate won't be representative of HEAD (I really
can't afford to run the tests on a scale=4000 pgbench without that
patch, it crawls, and smaller scales don't fill the dead_tuples
array).


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Simon Riggs
On 5 September 2016 at 21:58, Claudio Freire <[hidden email]> wrote:

>>>> How long does that part ever take? Is there any substantial gain from this?

> Btw, without a further patch to prefetch pages on the backward scan
> for truncate, however, my patience ran out before it finished
> truncating. I haven't submitted that patch because there was an
> identical patch in an older thread that was discussed and more or less
> rejected since it slightly penalized SSDs.

OK, thats enough context. Sorry for being forgetful on that point.

Please post that new patch also.


This whole idea of backwards scanning to confirm truncation seems
wrong. What we want is an O(1) solution. Thinking.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Robert Haas
In reply to this post by Claudio Freire
On Sat, Sep 3, 2016 at 8:55 AM, Claudio Freire <[hidden email]> wrote:

> The attached patch allows setting maintainance_work_mem or
> autovacuum_work_mem higher than 1GB (and be effective), by turning the
> allocation of the dead_tuples into a huge allocation.
>
> This results in fewer index scans for heavily bloated tables, and
> could be a lifesaver in many situations (in particular, the situation
> I'm living right now in production, where we don't have enough room
> for a vacuum full, and have just deleted 75% of a table to make room
> but have to rely on regular lazy vacuum to free the space).
>
> The patch also makes vacuum free the dead_tuples before starting
> truncation. It didn't seem necessary to hold onto it beyond that
> point, and it might help give the OS more cache, especially if work
> mem is configured very high to avoid multiple index scans.
>
> Tested with pgbench scale 4000 after deleting the whole
> pgbench_accounts table, seemed to work fine.

The problem with this is that we allocate the entire amount of
maintenance_work_mem even when the number of actual dead tuples turns
out to be very small.  That's not so bad if the amount of memory we're
potentially wasting is limited to ~1 GB, but it seems pretty dangerous
to remove the 1 GB limit, because somebody might have
maintenance_work_mem set to tens or hundreds of gigabytes to speed
index creation, and allocating that much space for a VACUUM that
encounters 1 dead tuple does not seem like a good plan.

What I think we need to do is make some provision to initially
allocate only a small amount of memory and then grow the allocation
later if needed.  For example, instead of having
vacrelstats->dead_tuples be declared as ItemPointer, declare it as
ItemPointer * and allocate the array progressively in segments.  I'd
actually argue that the segment size should be substantially smaller
than 1 GB, like say 64MB; there are still some people running systems
which are small enough that allocating 1 GB when we may need only 6
bytes can drive the system into OOM.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
In reply to this post by Jim Nasby-5
On Sun, Sep 4, 2016 at 8:10 PM, Jim Nasby <[hidden email]> wrote:

> On 9/4/16 1:46 AM, Simon Riggs wrote:
>>>
>>> > The patch also makes vacuum free the dead_tuples before starting
>>> > truncation. It didn't seem necessary to hold onto it beyond that
>>> > point, and it might help give the OS more cache, especially if work
>>> > mem is configured very high to avoid multiple index scans.
>>
>> How long does that part ever take? Is there any substantial gain from
>> this?
>
>
> If you're asking about how long the dealloc takes, we're going to have to
> pay that cost anyway when the context gets destroyed/reset, no? Doing that
> sooner rather than later certainly seems like a good idea since we've seen
> that truncation can take quite some time. Might as well give the memory back
> to the OS ASAP.

AFAIK, except on debug builds where it has to memset the whole thing,
the cost is constant (unrelated to the allocated block size), so it
should be rather small in this context.


On Tue, Sep 6, 2016 at 1:42 PM, Robert Haas <[hidden email]> wrote:

> On Sat, Sep 3, 2016 at 8:55 AM, Claudio Freire <[hidden email]> wrote:
>> The attached patch allows setting maintainance_work_mem or
>> autovacuum_work_mem higher than 1GB (and be effective), by turning the
>> allocation of the dead_tuples into a huge allocation.
>>
>> This results in fewer index scans for heavily bloated tables, and
>> could be a lifesaver in many situations (in particular, the situation
>> I'm living right now in production, where we don't have enough room
>> for a vacuum full, and have just deleted 75% of a table to make room
>> but have to rely on regular lazy vacuum to free the space).
>>
>> The patch also makes vacuum free the dead_tuples before starting
>> truncation. It didn't seem necessary to hold onto it beyond that
>> point, and it might help give the OS more cache, especially if work
>> mem is configured very high to avoid multiple index scans.
>>
>> Tested with pgbench scale 4000 after deleting the whole
>> pgbench_accounts table, seemed to work fine.
>
> The problem with this is that we allocate the entire amount of
> maintenance_work_mem even when the number of actual dead tuples turns
> out to be very small.  That's not so bad if the amount of memory we're
> potentially wasting is limited to ~1 GB, but it seems pretty dangerous
> to remove the 1 GB limit, because somebody might have
> maintenance_work_mem set to tens or hundreds of gigabytes to speed
> index creation, and allocating that much space for a VACUUM that
> encounters 1 dead tuple does not seem like a good plan.
>
> What I think we need to do is make some provision to initially
> allocate only a small amount of memory and then grow the allocation
> later if needed.  For example, instead of having
> vacrelstats->dead_tuples be declared as ItemPointer, declare it as
> ItemPointer * and allocate the array progressively in segments.  I'd
> actually argue that the segment size should be substantially smaller
> than 1 GB, like say 64MB; there are still some people running systems
> which are small enough that allocating 1 GB when we may need only 6
> bytes can drive the system into OOM.

This would however incur the cost of having to copy the whole GB-sized
chunk every time it's expanded. It woudln't be cheap.

I've monitored the vacuum as it runs and the OS doesn't map the whole
block unless it's touched, which it isn't until dead tuples are found.
Surely, if overcommit is disabled (as it should), it could exhaust the
virtual address space if set very high, but it wouldn't really use the
memory unless it's needed, it would merely reserve it.

To fix that, rather than repalloc the whole thing, dead_tuples would
have to be an ItemPointer** of sorted chunks. That'd be a
significantly more complex patch, but at least it wouldn't incur the
memcpy. I could attempt that, but I don't see the difference between
vacuum and create index in this case. Both could allocate a huge chunk
of the virtual address space if maintainance work mem says so, both
proportional to the size of the table. I can't see how that could take
any DBA by surprise.

A sensible compromise could be dividing the maintainance_work_mem by
autovacuum_max_workers when used in autovacuum, as is done for cost
limits, to protect those that set both rather high.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Robert Haas
On Tue, Sep 6, 2016 at 10:28 PM, Claudio Freire <[hidden email]> wrote:

>> The problem with this is that we allocate the entire amount of
>> maintenance_work_mem even when the number of actual dead tuples turns
>> out to be very small.  That's not so bad if the amount of memory we're
>> potentially wasting is limited to ~1 GB, but it seems pretty dangerous
>> to remove the 1 GB limit, because somebody might have
>> maintenance_work_mem set to tens or hundreds of gigabytes to speed
>> index creation, and allocating that much space for a VACUUM that
>> encounters 1 dead tuple does not seem like a good plan.
>>
>> What I think we need to do is make some provision to initially
>> allocate only a small amount of memory and then grow the allocation
>> later if needed.  For example, instead of having
>> vacrelstats->dead_tuples be declared as ItemPointer, declare it as
>> ItemPointer * and allocate the array progressively in segments.  I'd
>> actually argue that the segment size should be substantially smaller
>> than 1 GB, like say 64MB; there are still some people running systems
>> which are small enough that allocating 1 GB when we may need only 6
>> bytes can drive the system into OOM.
>
> This would however incur the cost of having to copy the whole GB-sized
> chunk every time it's expanded. It woudln't be cheap.

No, I don't want to end up copying the whole array; that's what I
meant by allocating it progressively in segments.  Something like what
you go on to propose.

> I've monitored the vacuum as it runs and the OS doesn't map the whole
> block unless it's touched, which it isn't until dead tuples are found.
> Surely, if overcommit is disabled (as it should), it could exhaust the
> virtual address space if set very high, but it wouldn't really use the
> memory unless it's needed, it would merely reserve it.

Yeah, but I've seen actual breakage from exactly this issue on
customer systems even with the 1GB limit, and when we start allowing
100GB it's going to get a whole lot worse.

> To fix that, rather than repalloc the whole thing, dead_tuples would
> have to be an ItemPointer** of sorted chunks. That'd be a
> significantly more complex patch, but at least it wouldn't incur the
> memcpy.

Right, this is what I had in mind.  I don't think this is actually
very complicated, because the way we use this array is really simple.
We basically just keep appending to the array until we run out of
space, and that's not very hard to implement with an array-of-arrays.
The chunks are, in some sense, sorted, as you say, but you don't need
to do qsort() or anything like that.  You're just replacing a single
flat array with a data structure that can be grown incrementally in
fixed-size chunks.

> I could attempt that, but I don't see the difference between
> vacuum and create index in this case. Both could allocate a huge chunk
> of the virtual address space if maintainance work mem says so, both
> proportional to the size of the table. I can't see how that could take
> any DBA by surprise.

Really?  CREATE INDEX isn't going to allocate more storage space than
the size of the data actually being sorted, because tuplesort.c is
smart about that kind of thing.  But VACUUM will very happily allocate
vastly more memory than the number of dead tuples.  It is thankfully
smart enough not to allocate more storage than the number of line
pointers that could theoretically exist in a relation of the given
size, but that only helps for very small relations.  In a large
relation that divergence between the amount of storage space that
could theoretically be needed and the amount that is actually needed
is likely to be extremely high.  1 TB relation = 2^27 blocks, each of
which can contain MaxHeapTuplesPerPage dead line pointers.  On my
system, MaxHeapTuplesPerPage is 291, so that's 291 * 2^27 possible
dead line pointers, which at 6 bytes each is 291 * 6 * 2^27 = ~218GB,
but the expected number of dead line pointers is much less than that.
Even if this is a vacuum triggered by autovacuum_vacuum_scale_factor
and you're using the default of 0.2 (probably too high for such a
large table), assuming there are about 60 tuples for page (which is
what I get with pgbench -i) the table would have about 2^27 * 60 = 7.7
billion tuples of which 1.5 billion would be dead, meaning we need
about 9-10GB of space to store all of those dead tuples.  Allocating
as much as 218GB when we need 9-10GB is going to sting, and I don't
see how you will get a comparable distortion with CREATE INDEX.  I
might be missing something, though.

There's no real issue when there's only one process running on the
system at a time.  If the user set maintenance_work_mem to an amount
of memory that he can't afford to pay even once, then that's simple
misconfiguration and it's not really our problem.  The issue is that
when there are 3 or potentially more VACUUM processes running plus a
CREATE INDEX or two at the same time.  If you set maintenance_work_mem
to a value that is large enough to make the CREATE INDEX run fast, now
with your patch that is also going to cause each VACUUM process to
gobble up lots of extra memory that it probably doesn't need, and now
you may well start to get failures.  I've seen this happen even with
the current 1GB limit, though you need a pretty small system - e.g.
8GB RAM - for it to be a problem.  I think it is really really likely
to cause big problems for us if we dramatically increase that limit
without making the allocation algorithm smarter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
On Tue, Sep 6, 2016 at 2:39 PM, Robert Haas <[hidden email]> wrote:

>> I could attempt that, but I don't see the difference between
>> vacuum and create index in this case. Both could allocate a huge chunk
>> of the virtual address space if maintainance work mem says so, both
>> proportional to the size of the table. I can't see how that could take
>> any DBA by surprise.
>
> Really?  CREATE INDEX isn't going to allocate more storage space than
> the size of the data actually being sorted, because tuplesort.c is
> smart about that kind of thing.  But VACUUM will very happily allocate
> vastly more memory than the number of dead tuples.  It is thankfully
> smart enough not to allocate more storage than the number of line
> pointers that could theoretically exist in a relation of the given
> size, but that only helps for very small relations.  In a large
> relation that divergence between the amount of storage space that
> could theoretically be needed and the amount that is actually needed
> is likely to be extremely high.  1 TB relation = 2^27 blocks, each of
> which can contain MaxHeapTuplesPerPage dead line pointers.  On my
> system, MaxHeapTuplesPerPage is 291, so that's 291 * 2^27 possible
> dead line pointers, which at 6 bytes each is 291 * 6 * 2^27 = ~218GB,
> but the expected number of dead line pointers is much less than that.
> Even if this is a vacuum triggered by autovacuum_vacuum_scale_factor
> and you're using the default of 0.2 (probably too high for such a
> large table), assuming there are about 60 tuples for page (which is
> what I get with pgbench -i) the table would have about 2^27 * 60 = 7.7
> billion tuples of which 1.5 billion would be dead, meaning we need
> about 9-10GB of space to store all of those dead tuples.  Allocating
> as much as 218GB when we need 9-10GB is going to sting, and I don't
> see how you will get a comparable distortion with CREATE INDEX.  I
> might be missing something, though.

CREATE INDEX could also allocate 218GB, you just need to index enough
columns and you'll get that.

Aside from the fact that CREATE INDEX will only allocate what is going
to be used and VACUUM will overallocate, the potential to fully
allocate the amount given is still there for both cases.

> There's no real issue when there's only one process running on the
> system at a time.  If the user set maintenance_work_mem to an amount
> of memory that he can't afford to pay even once, then that's simple
> misconfiguration and it's not really our problem.  The issue is that
> when there are 3 or potentially more VACUUM processes running plus a
> CREATE INDEX or two at the same time.  If you set maintenance_work_mem
> to a value that is large enough to make the CREATE INDEX run fast, now
> with your patch that is also going to cause each VACUUM process to
> gobble up lots of extra memory that it probably doesn't need, and now
> you may well start to get failures.  I've seen this happen even with
> the current 1GB limit, though you need a pretty small system - e.g.
> 8GB RAM - for it to be a problem.  I think it is really really likely
> to cause big problems for us if we dramatically increase that limit
> without making the allocation algorithm smarter.

Ok, a pity it will invalidate all the testing already done though (I
was almost done with the testing).

I guess I'll send the results anyway.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Robert Haas
On Tue, Sep 6, 2016 at 11:22 PM, Claudio Freire <[hidden email]> wrote:
> CREATE INDEX could also allocate 218GB, you just need to index enough
> columns and you'll get that.
>
> Aside from the fact that CREATE INDEX will only allocate what is going
> to be used and VACUUM will overallocate, the potential to fully
> allocate the amount given is still there for both cases.

I agree with that, but I think there's a big difference between
allocating the memory only when it's needed and allocating it whether
it is needed or not.  YMMV, of course, but that's what I think....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Tom Lane-2
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> Yeah, but I've seen actual breakage from exactly this issue on
> customer systems even with the 1GB limit, and when we start allowing
> 100GB it's going to get a whole lot worse.

While it's not necessarily a bad idea to consider these things,
I think people are greatly overestimating the consequences of the
patch-as-proposed.  AFAICS, it does *not* let you tell VACUUM to
eat 100GB of workspace.  Note the line right in front of the one
being changed:

         maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
         maxtuples = Min(maxtuples, INT_MAX);
-        maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+        maxtuples = Min(maxtuples, MaxAllocHugeSize / sizeof(ItemPointerData));

Regardless of what vac_work_mem is, we aren't gonna let you have more
than INT_MAX ItemPointers, hence 12GB at the most.  So the worst-case
increase from the patch as given is 12X.  Maybe that's enough to cause
bad consequences on some systems, but it's not the sort of disaster
Robert posits above.

It's also worth re-reading the lines just after this, which constrain
the allocation a whole lot more for small tables.  Robert comments:

> ...  But VACUUM will very happily allocate
> vastly more memory than the number of dead tuples.  It is thankfully
> smart enough not to allocate more storage than the number of line
> pointers that could theoretically exist in a relation of the given
> size, but that only helps for very small relations.  In a large
> relation that divergence between the amount of storage space that
> could theoretically be needed and the amount that is actually needed
> is likely to be extremely high.  1 TB relation = 2^27 blocks, each of
> which can contain MaxHeapTuplesPerPage dead line pointers.  On my
> system, MaxHeapTuplesPerPage is 291, so that's 291 * 2^27 possible
> dead line pointers, which at 6 bytes each is 291 * 6 * 2^27 = ~218GB,
> but the expected number of dead line pointers is much less than that.

If we think the expected number of dead pointers is so much less than
that, why don't we just decrease LAZY_ALLOC_TUPLES, and take a hit in
extra index vacuum cycles when we're wrong?

(Actually, what I'd be inclined to do is let it have MaxHeapTuplesPerPage
slots per page up till a few meg, and then start tailing off the
space-per-page, figuring that the law of large numbers will probably kick
in.)

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Simon Riggs
On 6 September 2016 at 19:00, Tom Lane <[hidden email]> wrote:

> Robert Haas <[hidden email]> writes:
>> Yeah, but I've seen actual breakage from exactly this issue on
>> customer systems even with the 1GB limit, and when we start allowing
>> 100GB it's going to get a whole lot worse.
>
> While it's not necessarily a bad idea to consider these things,
> I think people are greatly overestimating the consequences of the
> patch-as-proposed.  AFAICS, it does *not* let you tell VACUUM to
> eat 100GB of workspace.  Note the line right in front of the one
> being changed:
>
>          maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
>          maxtuples = Min(maxtuples, INT_MAX);
> -        maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> +        maxtuples = Min(maxtuples, MaxAllocHugeSize / sizeof(ItemPointerData));
>
> Regardless of what vac_work_mem is, we aren't gonna let you have more
> than INT_MAX ItemPointers, hence 12GB at the most.  So the worst-case
> increase from the patch as given is 12X.  Maybe that's enough to cause
> bad consequences on some systems, but it's not the sort of disaster
> Robert posits above.

Is there a reason we can't use repalloc here?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Robert Haas
In reply to this post by Tom Lane-2
On Tue, Sep 6, 2016 at 2:00 PM, Tom Lane <[hidden email]> wrote:

> Robert Haas <[hidden email]> writes:
>> Yeah, but I've seen actual breakage from exactly this issue on
>> customer systems even with the 1GB limit, and when we start allowing
>> 100GB it's going to get a whole lot worse.
>
> While it's not necessarily a bad idea to consider these things,
> I think people are greatly overestimating the consequences of the
> patch-as-proposed.  AFAICS, it does *not* let you tell VACUUM to
> eat 100GB of workspace.  Note the line right in front of the one
> being changed:
>
>          maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
>          maxtuples = Min(maxtuples, INT_MAX);
> -        maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> +        maxtuples = Min(maxtuples, MaxAllocHugeSize / sizeof(ItemPointerData));
>
> Regardless of what vac_work_mem is, we aren't gonna let you have more
> than INT_MAX ItemPointers, hence 12GB at the most.  So the worst-case
> increase from the patch as given is 12X.  Maybe that's enough to cause
> bad consequences on some systems, but it's not the sort of disaster
> Robert posits above.

Hmm, OK.  Yes, that is a lot less bad.  (I think it's still bad.)

> If we think the expected number of dead pointers is so much less than
> that, why don't we just decrease LAZY_ALLOC_TUPLES, and take a hit in
> extra index vacuum cycles when we're wrong?

Because that's really inefficient.  Growing the array, even with a
stupid approach that copies all of the TIDs every time, is a heck of a
lot faster than incurring an extra index vac cycle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Robert Haas
In reply to this post by Simon Riggs
On Tue, Sep 6, 2016 at 2:06 PM, Simon Riggs <[hidden email]> wrote:

> On 6 September 2016 at 19:00, Tom Lane <[hidden email]> wrote:
>> Robert Haas <[hidden email]> writes:
>>> Yeah, but I've seen actual breakage from exactly this issue on
>>> customer systems even with the 1GB limit, and when we start allowing
>>> 100GB it's going to get a whole lot worse.
>>
>> While it's not necessarily a bad idea to consider these things,
>> I think people are greatly overestimating the consequences of the
>> patch-as-proposed.  AFAICS, it does *not* let you tell VACUUM to
>> eat 100GB of workspace.  Note the line right in front of the one
>> being changed:
>>
>>          maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
>>          maxtuples = Min(maxtuples, INT_MAX);
>> -        maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
>> +        maxtuples = Min(maxtuples, MaxAllocHugeSize / sizeof(ItemPointerData));
>>
>> Regardless of what vac_work_mem is, we aren't gonna let you have more
>> than INT_MAX ItemPointers, hence 12GB at the most.  So the worst-case
>> increase from the patch as given is 12X.  Maybe that's enough to cause
>> bad consequences on some systems, but it's not the sort of disaster
>> Robert posits above.
>
> Is there a reason we can't use repalloc here?

There are two possible problems, either of which is necessarily fatal:

1. I expect repalloc probably works by allocating the new space,
copying from old to new, and freeing the old.  That could work out
badly if we are nearly the edge of the system's allocation limit.

2. It's slower than the approach proposed upthread of allocating the
array in segments.  With that approach, we never need to memcpy()
anything.

On the plus side, it's probably less code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Robert Haas
On Tue, Sep 6, 2016 at 2:09 PM, Robert Haas <[hidden email]> wrote:
> There are two possible problems, either of which is necessarily fatal:

I meant to write "neither of which" not "either of which".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Tom Lane-2
In reply to this post by Simon Riggs
Simon Riggs <[hidden email]> writes:
> Is there a reason we can't use repalloc here?

(1) repalloc will probably copy the data.

(2) that answer doesn't excuse you from choosing a limit.

We could get around (1) by something like Robert's idea of segmented
allocation, but TBH I've seen nothing on this thread to make me think
it's necessary or would even result in any performance improvement
at all.  The bigger we make that array, the worse index-cleaning
is going to perform, and complicating the data structure will add
another hit on top of that.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Vacuum: allow usage of more than 1GB of work mem

Claudio Freire
On Tue, Sep 6, 2016 at 3:11 PM, Tom Lane <[hidden email]> wrote:
> We could get around (1) by something like Robert's idea of segmented
> allocation, but TBH I've seen nothing on this thread to make me think
> it's necessary or would even result in any performance improvement
> at all.  The bigger we make that array, the worse index-cleaning
> is going to perform, and complicating the data structure will add
> another hit on top of that.

I wouldn't be so sure, I've seen cases where two binary searches were
faster than a single binary search, especially when working with
humongus arrays like this tid array, because touching less (memory)
pages for a search does pay off considerably.

I'd try before giving up on the idea.

The test results (which I'll post in a second) do give credit to your
expectation that making the array bigger/more complex does impact
index scan performance. It's still faster than scanning several times
though.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 10