Re: pgsql: Compute XID horizon for page level index vacuum on primary.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Simon Riggs
On Fri, 29 Mar 2019 at 16:12, Andres Freund <[hidden email]> wrote:
 
On 2019-03-29 15:58:14 +0000, Simon Riggs wrote:
> On Fri, 29 Mar 2019 at 15:29, Andres Freund <[hidden email]> wrote:
> > That's far from a trivial feature imo. It seems quite possible that we'd
> > end up with increased overhead, because the current logic can get away
> > with only doing hint bit style writes - but would that be true if we
> > started actually replacing the item pointers? Because I don't see any
> > guarantee they couldn't cross a page boundary etc? So I think we'd need
> > to do WAL logging during index searches, which seems prohibitively
> > expensive.
> >
>
> Don't see that.
>
> I was talking about reusing the first 4 bytes of an index tuple's
> ItemPointerData,
> which is the first field of an index tuple. Index tuples are MAXALIGNed, so
> I can't see how that would ever cross a page boundary.

They're 8 bytes, and MAXALIGN often is 4 bytes:

xids are 4 bytes, so we're good. 

If MAXALIGN could ever be 2 bytes, we'd have a problem.

So as a whole they definitely can cross sector boundaries. You might be
able to argue your way out of that by saying that the blkid is going to
be aligned, but that's not that trivial, as t_info isn't guaranteed
that.

But even so, you can't have unlogged changes that you then rely on. Even
if there's no torn page issue. Currently BTP_HAS_GARBAGE and
ItemIdMarkDead() are treated as hints - if we want to guarantee all
these are accurate, I don't quite see how we'd get around WAL logging
those.

You can have unlogged changes that you rely on - that is exactly how hints work.

If the hint is lost, we do the I/O. Worst case it would be the same as what you have now.

I'm talking about saving many I/Os - this doesn't need to provably avoid all I/Os to work, its incremental benefit all the way.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Peter Geoghegan-4
 On Fri, Mar 29, 2019 at 9:12 AM Andres Freund <[hidden email]> wrote:
> But even so, you can't have unlogged changes that you then rely on. Even
> if there's no torn page issue. Currently BTP_HAS_GARBAGE and
> ItemIdMarkDead() are treated as hints - if we want to guarantee all
> these are accurate, I don't quite see how we'd get around WAL logging
> those.

It might be possible to WAL-log the _bt_check_unique() item killing.
That seems to be much more effective than the similar and better known
kill_prior_tuple optimization in practice. I don't think that that
should be in scope for v12, though. I for one am satisfied with your
explanation.

> > The code can do literally hundreds of random I/Os in an 8192 blocksize.
> > What happens with 16 or 32kB?
>
> It's really hard to construct such cases after the sorting changes, but
> obviously not impossible. But to make it actually painful you need a
> workload where the implied randomness of accesses isn't already a major
> bottleneck - and that's hard.

There is also the fact that in many cases you'll just have accessed
the same buffers from within _bt_check_unique() anyway.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Robert Haas
In reply to this post by Simon Riggs
On Sat, Mar 30, 2019 at 6:33 AM Thomas Munro <[hidden email]> wrote:

> I didn't understand that last sentence.
>
> Here's an attempt to write a suitable comment for the quick fix.  And
> I suppose effective_io_concurrency is a reasonable default.
>
> It's pretty hard to think of a good way to get your hands on the real
> value safely from here.  I wondered if there was a way to narrow this
> to just GLOBALTABLESPACE_OID since that's where pg_tablespace lives,
> but that doesn't work, we access other catalog too in that path.
>
> Hmm, it seems a bit odd that 0 is supposed to mean "disable issuance
> of asynchronous I/O requests" according to config.sgml, but here 0
> will prefetch 10 buffers.

Mmmph.  I'm starting to think we're not going to get a satisfactory
result here unless we make this controlled by something other than
effective_io_concurrency.  There's just no reason to suppose that the
same setting that we use to control prefetching for bitmap index scans
is also going to be right for what's basically a bulk operation.

Interestingly, Dilip Kumar ran into similar issues recently while
working on bulk processing for undo records for zheap.  In that case,
you definitely want to prefetch the undo aggressively, because you're
reading it front to back and backwards scans suck without prefetching.
And you possibly also want to prefetch the data pages to which the
undo that you are prefetching applies, but maybe not as aggressively
because you're going to be doing a WAL write for each data page and
flooding the system with too many reads could be counterproductive, at
least if pg_wal and the rest of $PGDATA are not on separate spindles.
And even if they are, it's possible that as you suck in undo pages and
the zheap pages that they need to update, you might evict dirty pages,
generating write activity against the data directory.

Overall I'm inclined to think that we're making the same mistake here
that we did with work_mem, namely, assuming that you can control a
bunch of different prefetching behaviors with a single GUC and things
will be OK.  Let's just create a new GUC for this and default it to 10
or something and go home.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Peter Geoghegan-4
On Sat, Mar 30, 2019 at 8:44 AM Robert Haas <[hidden email]> wrote:
> Overall I'm inclined to think that we're making the same mistake here
> that we did with work_mem, namely, assuming that you can control a
> bunch of different prefetching behaviors with a single GUC and things
> will be OK.  Let's just create a new GUC for this and default it to 10
> or something and go home.

I agree. If you invent a new GUC, then everybody notices, and it
usually has to be justified quite rigorously. There is a strong
incentive to use an existing GUC, if only because the problem that
this creates is harder to measure than the supposed problem that it
avoids. This can perversely work against the goal of making the system
easy to use. Stretching the original definition of a GUC is bad.

I take issue with the general assumption that not adding a GUC at
least makes things easier for users. In reality, it depends entirely
on the situation at hand.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Thomas Munro-5
On Sun, Mar 31, 2019 at 8:20 AM Peter Geoghegan <[hidden email]> wrote:

> On Sat, Mar 30, 2019 at 8:44 AM Robert Haas <[hidden email]> wrote:
> > Overall I'm inclined to think that we're making the same mistake here
> > that we did with work_mem, namely, assuming that you can control a
> > bunch of different prefetching behaviors with a single GUC and things
> > will be OK.  Let's just create a new GUC for this and default it to 10
> > or something and go home.
>
> I agree. If you invent a new GUC, then everybody notices, and it
> usually has to be justified quite rigorously. There is a strong
> incentive to use an existing GUC, if only because the problem that
> this creates is harder to measure than the supposed problem that it
> avoids. This can perversely work against the goal of making the system
> easy to use. Stretching the original definition of a GUC is bad.
>
> I take issue with the general assumption that not adding a GUC at
> least makes things easier for users. In reality, it depends entirely
> on the situation at hand.

I'm not sure I understand why this is any different from the bitmap
heapscan case though, or in fact why we are adding 10 in this case.
In both cases we will soon be reading the referenced buffers, and it
makes sense to queue up prefetch requests for the blocks if they
aren't already in shared buffers.  In both cases, the number of
prefetch requests we want to send to the OS is somehow linked to the
amount of IO requests we think the OS can handle concurrently at once
(since that's one factor determining how fast it drains them), but
it's not necessarily the same as that number, AFAICS.  It's useful to
queue some number of prefetch requests even if you have no IO
concurrency at all (a single old school spindle), just because the OS
will chew on that queue in the background while we're also doing
stuff, which is probably what that "+ 10" is expressing.  But that
seems to apply to bitmap heapscan too, doesn't it?

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Andres Freund
Hi,

On March 30, 2019 5:33:12 PM EDT, Thomas Munro <[hidden email]> wrote:

>On Sun, Mar 31, 2019 at 8:20 AM Peter Geoghegan <[hidden email]> wrote:
>> On Sat, Mar 30, 2019 at 8:44 AM Robert Haas <[hidden email]>
>wrote:
>> > Overall I'm inclined to think that we're making the same mistake
>here
>> > that we did with work_mem, namely, assuming that you can control a
>> > bunch of different prefetching behaviors with a single GUC and
>things
>> > will be OK.  Let's just create a new GUC for this and default it to
>10
>> > or something and go home.
>>
>> I agree. If you invent a new GUC, then everybody notices, and it
>> usually has to be justified quite rigorously. There is a strong
>> incentive to use an existing GUC, if only because the problem that
>> this creates is harder to measure than the supposed problem that it
>> avoids. This can perversely work against the goal of making the
>system
>> easy to use. Stretching the original definition of a GUC is bad.
>>
>> I take issue with the general assumption that not adding a GUC at
>> least makes things easier for users. In reality, it depends entirely
>> on the situation at hand.
>
>I'm not sure I understand why this is any different from the bitmap
>heapscan case though, or in fact why we are adding 10 in this case.
>In both cases we will soon be reading the referenced buffers, and it
>makes sense to queue up prefetch requests for the blocks if they
>aren't already in shared buffers.  In both cases, the number of
>prefetch requests we want to send to the OS is somehow linked to the
>amount of IO requests we think the OS can handle concurrently at once
>(since that's one factor determining how fast it drains them), but
>it's not necessarily the same as that number, AFAICS.  It's useful to
>queue some number of prefetch requests even if you have no IO
>concurrency at all (a single old school spindle), just because the OS
>will chew on that queue in the background while we're also doing
>stuff, which is probably what that "+ 10" is expressing.  But that
>seems to apply to bitmap heapscan too, doesn't it?

The index page deletion code does work on behalf of multiple backends, bitmap scans don't. If your system is busy it makes sense to like resource usage of per backend work, but not really work on shared resources like page reuse. A bit like work mem vs mwm.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2019-03-30 11:44:36 -0400, Robert Haas wrote:

> On Sat, Mar 30, 2019 at 6:33 AM Thomas Munro <[hidden email]> wrote:
> > I didn't understand that last sentence.
> >
> > Here's an attempt to write a suitable comment for the quick fix.  And
> > I suppose effective_io_concurrency is a reasonable default.
> >
> > It's pretty hard to think of a good way to get your hands on the real
> > value safely from here.  I wondered if there was a way to narrow this
> > to just GLOBALTABLESPACE_OID since that's where pg_tablespace lives,
> > but that doesn't work, we access other catalog too in that path.
> >
> > Hmm, it seems a bit odd that 0 is supposed to mean "disable issuance
> > of asynchronous I/O requests" according to config.sgml, but here 0
> > will prefetch 10 buffers.
>
> Mmmph.  I'm starting to think we're not going to get a satisfactory
> result here unless we make this controlled by something other than
> effective_io_concurrency.  There's just no reason to suppose that the
> same setting that we use to control prefetching for bitmap index scans
> is also going to be right for what's basically a bulk operation.
>
> Interestingly, Dilip Kumar ran into similar issues recently while
> working on bulk processing for undo records for zheap.  In that case,
> you definitely want to prefetch the undo aggressively, because you're
> reading it front to back and backwards scans suck without prefetching.
> And you possibly also want to prefetch the data pages to which the
> undo that you are prefetching applies, but maybe not as aggressively
> because you're going to be doing a WAL write for each data page and
> flooding the system with too many reads could be counterproductive, at
> least if pg_wal and the rest of $PGDATA are not on separate spindles.
> And even if they are, it's possible that as you suck in undo pages and
> the zheap pages that they need to update, you might evict dirty pages,
> generating write activity against the data directory.

I'm not yet convinced it's necessary to create a new GUC, but also not
strongly opposed.  I've created an open items issue for it, so we don't
forget.


> Overall I'm inclined to think that we're making the same mistake here
> that we did with work_mem, namely, assuming that you can control a
> bunch of different prefetching behaviors with a single GUC and things
> will be OK.  Let's just create a new GUC for this and default it to 10
> or something and go home.

I agree that we needed to split work_mem, but a) that was far less clear
for many years b) there was no logic ot use more work_mem in
maintenance-y cases...

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Andres Freund
Hi,

On 2019-04-01 18:26:59 -0700, Andres Freund wrote:
> I'm not yet convinced it's necessary to create a new GUC, but also not
> strongly opposed.  I've created an open items issue for it, so we don't
> forget.

My current inclination is to not do anything for v12. Robert, do you
disagree?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Robert Haas
On Wed, May 1, 2019 at 12:15 PM Andres Freund <[hidden email]> wrote:
> On 2019-04-01 18:26:59 -0700, Andres Freund wrote:
> > I'm not yet convinced it's necessary to create a new GUC, but also not
> > strongly opposed.  I've created an open items issue for it, so we don't
> > forget.
>
> My current inclination is to not do anything for v12. Robert, do you
> disagree?

Not strongly enough to argue about it very hard.  The current behavior
is a little weird, but it's a long way from being the weirdest thing
we ship, and it appears that we have no tangible evidence that it
causes a problem in practice.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, May 1, 2019 at 12:15 PM Andres Freund <[hidden email]> wrote:
>> My current inclination is to not do anything for v12. Robert, do you
>> disagree?

> Not strongly enough to argue about it very hard.  The current behavior
> is a little weird, but it's a long way from being the weirdest thing
> we ship, and it appears that we have no tangible evidence that it
> causes a problem in practice.

I think there's nothing that fails to suck about a hardwired "+ 10".

We should either remove that and use effective_io_concurrency as-is,
or decide that it's worth having a separate GUC for bulk operations.
At this stage of the cycle I'd incline to the former, but if somebody
is excited enough to prepare a patch for a new GUC, I wouldn't push
back on that solution.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Robert Haas
On Wed, May 1, 2019 at 12:50 PM Tom Lane <[hidden email]> wrote:
> > Not strongly enough to argue about it very hard.  The current behavior
> > is a little weird, but it's a long way from being the weirdest thing
> > we ship, and it appears that we have no tangible evidence that it
> > causes a problem in practice.
>
> I think there's nothing that fails to suck about a hardwired "+ 10".

It avoids a performance regression without adding another GUC.

That may not be enough reason to keep it like that, but it is one
thing that does fail to suck.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Thomas Munro-5
On Thu, May 2, 2019 at 5:10 AM Robert Haas <[hidden email]> wrote:

> On Wed, May 1, 2019 at 12:50 PM Tom Lane <[hidden email]> wrote:
> > > Not strongly enough to argue about it very hard.  The current behavior
> > > is a little weird, but it's a long way from being the weirdest thing
> > > we ship, and it appears that we have no tangible evidence that it
> > > causes a problem in practice.
> >
> > I think there's nothing that fails to suck about a hardwired "+ 10".
>
> It avoids a performance regression without adding another GUC.
>
> That may not be enough reason to keep it like that, but it is one
> thing that does fail to suck.

This is listed as an open item to resolve for 12.  IIUC the options on
the table are:

1.  Do nothing, and ship with effective_io_concurrency + 10.
2.  Just use effective_io_concurrency without the hardwired boost.
3.  Switch to a new GUC maintenance_io_concurrency (or some better name).

The rationale for using a different number is that this backend is
working on behalf of multiple sessions, so you might want to give it
some more juice, much like maintenance_work_mem.

I vote for option 3.  I have no clue how to set it, but at least users
have a fighting chance of experimenting and figuring it out that way.
I volunteer to write the patch if we get a consensus.

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Andres Freund
Hi,

On 2019-05-15 12:01:07 +1200, Thomas Munro wrote:

> On Thu, May 2, 2019 at 5:10 AM Robert Haas <[hidden email]> wrote:
> > On Wed, May 1, 2019 at 12:50 PM Tom Lane <[hidden email]> wrote:
> > > > Not strongly enough to argue about it very hard.  The current behavior
> > > > is a little weird, but it's a long way from being the weirdest thing
> > > > we ship, and it appears that we have no tangible evidence that it
> > > > causes a problem in practice.
> > >
> > > I think there's nothing that fails to suck about a hardwired "+ 10".
> >
> > It avoids a performance regression without adding another GUC.
> >
> > That may not be enough reason to keep it like that, but it is one
> > thing that does fail to suck.
>
> This is listed as an open item to resolve for 12.  IIUC the options on
> the table are:
>
> 1.  Do nothing, and ship with effective_io_concurrency + 10.
> 2.  Just use effective_io_concurrency without the hardwired boost.
> 3.  Switch to a new GUC maintenance_io_concurrency (or some better name).
>
> The rationale for using a different number is that this backend is
> working on behalf of multiple sessions, so you might want to give it
> some more juice, much like maintenance_work_mem.
>
> I vote for option 3.  I have no clue how to set it, but at least users
> have a fighting chance of experimenting and figuring it out that way.
> I volunteer to write the patch if we get a consensus.

I'd personally, unsurprisingly perhaps, go with 1 for v12. I think 3 is
also a good option - it's easy to imagine to later use it for for
VACUUM, ANALYZE and the like.  I think 2 is a bad idea.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Tom Lane-2
Andres Freund <[hidden email]> writes:

> On 2019-05-15 12:01:07 +1200, Thomas Munro wrote:
>> This is listed as an open item to resolve for 12.  IIUC the options on
>> the table are:
>>
>> 1.  Do nothing, and ship with effective_io_concurrency + 10.
>> 2.  Just use effective_io_concurrency without the hardwired boost.
>> 3.  Switch to a new GUC maintenance_io_concurrency (or some better name).
>>
>> I vote for option 3.  I have no clue how to set it, but at least users
>> have a fighting chance of experimenting and figuring it out that way.
>> I volunteer to write the patch if we get a consensus.

> I'd personally, unsurprisingly perhaps, go with 1 for v12. I think 3 is
> also a good option - it's easy to imagine to later use it for for
> VACUUM, ANALYZE and the like.  I think 2 is a bad idea.

FWIW, I also agree with settling for #1 at this point.  A new GUC would
make more sense if we have multiple use-cases for it, which we probably
will at some point, but not today.  I'm concerned that if we invent a
GUC now, we might find out that it's not really usable for other cases
in future (e.g., default value is no good for other cases).  It's the
old story that inventing an API with only one use-case in mind leads
to a bad API.

So yeah, let's leave this be for now, ugly as it is.  Improving it
can be future work.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Thomas Munro-5
On Thu, May 16, 2019 at 3:53 AM Tom Lane <[hidden email]> wrote:

> Andres Freund <[hidden email]> writes:
> > On 2019-05-15 12:01:07 +1200, Thomas Munro wrote:
> >> This is listed as an open item to resolve for 12.  IIUC the options on
> >> the table are:
> >>
> >> 1.  Do nothing, and ship with effective_io_concurrency + 10.
> >> 2.  Just use effective_io_concurrency without the hardwired boost.
> >> 3.  Switch to a new GUC maintenance_io_concurrency (or some better name).
> >>
> >> I vote for option 3.  I have no clue how to set it, but at least users
> >> have a fighting chance of experimenting and figuring it out that way.
> >> I volunteer to write the patch if we get a consensus.
>
> > I'd personally, unsurprisingly perhaps, go with 1 for v12. I think 3 is
> > also a good option - it's easy to imagine to later use it for for
> > VACUUM, ANALYZE and the like.  I think 2 is a bad idea.
>
> FWIW, I also agree with settling for #1 at this point.  A new GUC would
> make more sense if we have multiple use-cases for it, which we probably
> will at some point, but not today.  I'm concerned that if we invent a
> GUC now, we might find out that it's not really usable for other cases
> in future (e.g., default value is no good for other cases).  It's the
> old story that inventing an API with only one use-case in mind leads
> to a bad API.
>
> So yeah, let's leave this be for now, ugly as it is.  Improving it
> can be future work.

Cool, I moved it to the resolved section.

--
Thomas Munro
https://enterprisedb.com