error context for vacuum to include block number

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
139 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|

error context for vacuum to include block number

Justin Pryzby
On Wed, Aug 07, 2019 at 04:51:54PM -0700, Andres Freund wrote:
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
| Ugh :(
|
| We really need to add a error context to vacuumlazy that shows which
| block is being processed.

I eeked out a minimal patch.

I renamed "StringInfoData buf", since it wasn't nice to mask it by
"Buffer buf".

postgres=# SET statement_timeout=99;vacuum t;
SET
2019-11-20 14:52:49.521 CST [6319] ERROR:  canceling statement due to statement timeout
2019-11-20 14:52:49.521 CST [6319] CONTEXT:  block 596
2019-11-20 14:52:49.521 CST [6319] STATEMENT:  vacuum t;

Justin

v1-0001-vacuum-errcontext-to-show-block-being-processed.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
Find attached updated patch:
 . Use structure to include relation name.
 . Split into a separate patch rename of "StringInfoData buf".

2019-11-27 20:04:53.640 CST [14244] ERROR:  canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT:  block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT:  vacuum t;

I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.

v2-0001-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
v2-0002-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Michael Paquier-2
On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:

> Find attached updated patch:
>  . Use structure to include relation name.
>  . Split into a separate patch rename of "StringInfoData buf".
>
> 2019-11-27 20:04:53.640 CST [14244] ERROR:  canceling statement due to statement timeout
> 2019-11-27 20:04:53.640 CST [14244] CONTEXT:  block 2314 of relation t
> 2019-11-27 20:04:53.640 CST [14244] STATEMENT:  vacuum t;
>
> I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
> the buffer is not pinned.
No problem from me to add more context directly in lazy_scan_heap().

+       // errcallback.arg = (void *) &buf;
The first patch is full of that, please make sure to clean it up.  

Let's keep also the message simple, still I think that it should be a
bit more explicative:
1) Let's the schema name, and quote the relation name.
2) Let's mention the scanning (or vacuuming) operation.

So I would suggest the following instead:
"while scanning block %u of relation \"%s.%s\""
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:

> On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
> > Find attached updated patch:
> >  . Use structure to include relation name.
> >  . Split into a separate patch rename of "StringInfoData buf".
> >
> > 2019-11-27 20:04:53.640 CST [14244] ERROR:  canceling statement due to statement timeout
> > 2019-11-27 20:04:53.640 CST [14244] CONTEXT:  block 2314 of relation t
> > 2019-11-27 20:04:53.640 CST [14244] STATEMENT:  vacuum t;
> >
> > I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
> > the buffer is not pinned.
>
> No problem from me to add more context directly in lazy_scan_heap().
Do you mean without a callback ?  I think that's necessary, since the IO errors
would happen within ReadBufferExtended, but we don't want to polute that with
errcontext.  And cannot call errcontext on its own:
FATAL:  errstart was not called

> So I would suggest the following instead:
> "while scanning block %u of relation \"%s.%s\""

Done in the attached.

v3-0001-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Alvaro Herrera-9
On 2019-Dec-11, Justin Pryzby wrote:

> @@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>   else
>   skipping_blocks = false;
>  
> + /* Setup error traceback support for ereport() */
> + errcallback.callback = vacuum_error_callback;
> + cbarg.relname = relname;
> + cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
> + cbarg.blkno = 0; /* Not known yet */

Shouldn't you use InvalidBlockNumber for this initialization?

> @@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>  
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>  
> + cbarg.blkno = blkno;

I would put this before pgstat_progress_update_param, just out of
paranoia.

> @@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>  
>   buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
>   RBM_NORMAL, vac_strategy);
> -
>   /* We need buffer cleanup lock so that we can prune HOT chains. */
>   if (!ConditionalLockBufferForCleanup(buf))
>   {

Lose this hunk?

> @@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>  
>   return all_visible;
>  }
> +
> +/*
> + * Error context callback for errors occurring during vacuum.
> + */
> +static void
> +vacuum_error_callback(void *arg)
> +{
> + vacuum_error_callback_arg *cbarg = arg;
> +
> + errcontext("while scanning block %u of relation \"%s.%s\"",
> + cbarg->blkno, cbarg->relnamespace, cbarg->relname);
> +}

I would put this function around line 1512 (just after lazy_scan_heap)
rather than at bottom of file.  (And move its prototype accordingly, to
line 156.)  Or do you intend that this is going to be used for
lazy_vacuum_heap too?  Maybe it should.

Patch looks good to me otherwise.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Andres Freund
In reply to this post by Justin Pryzby
Hi,

Thanks for working on this!

On 2019-12-11 08:36:48 -0600, Justin Pryzby wrote:

> On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:
> > On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
> > > Find attached updated patch:
> > >  . Use structure to include relation name.
> > >  . Split into a separate patch rename of "StringInfoData buf".
> > >
> > > 2019-11-27 20:04:53.640 CST [14244] ERROR:  canceling statement due to statement timeout
> > > 2019-11-27 20:04:53.640 CST [14244] CONTEXT:  block 2314 of relation t
> > > 2019-11-27 20:04:53.640 CST [14244] STATEMENT:  vacuum t;
> > >
> > > I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
> > > the buffer is not pinned.

The tag will not add all that informative details, because the
relfilenode isn't easily mappable to the table name or such.


> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 043ebb4..9376989 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -138,6 +138,12 @@ typedef struct LVRelStats
>   bool lock_waiter_detected;
>  } LVRelStats;
>  
> +typedef struct
> +{
> + char *relname;
> + char *relnamespace;
> + BlockNumber blkno;
> +} vacuum_error_callback_arg;

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.



> @@ -524,6 +531,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>   PROGRESS_VACUUM_MAX_DEAD_TUPLES
>   };
>   int64 initprog_val[3];
> + ErrorContextCallback errcallback;
> + vacuum_error_callback_arg cbarg;

Not a fan of "cbarg", too generic.

>   pg_rusage_init(&ru0);
>  
> @@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>   else
>   skipping_blocks = false;
>  
> + /* Setup error traceback support for ereport() */
> + errcallback.callback = vacuum_error_callback;
> + cbarg.relname = relname;
> + cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
> + cbarg.blkno = 0; /* Not known yet */
> + errcallback.arg = (void *) &cbarg;
> + errcallback.previous = error_context_stack;
> + error_context_stack = &errcallback;
> +
>   for (blkno = 0; blkno < nblocks; blkno++)
>   {
>   Buffer buf;
> @@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>  
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>  
> + cbarg.blkno = blkno;
> +
>   if (blkno == next_unskippable_block)
>   {
>   /* Time to advance next_unskippable_block */
> @@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>  
>   buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
>   RBM_NORMAL, vac_strategy);
> -
>   /* We need buffer cleanup lock so that we can prune HOT chains. */
>   if (!ConditionalLockBufferForCleanup(buf))
>   {
> @@ -1388,6 +1407,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>   RecordPageWithFreeSpace(onerel, blkno, freespace);
>   }
>  
> + /* Pop the error context stack */
> + error_context_stack = errcallback.previous;
> +
>   /* report that everything is scanned and vacuumed */
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>  
> @@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>  
>   return all_visible;
>  }

I think this will misattribute errors that happen when in the:
                /*
                 * If we are close to overrunning the available space for dead-tuple
                 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
                 */
section of lazy_scan_heap(). That will

a) scan the index, during which we presumably don't want the same error
   context, as it'd be quite misleading: The block that was just scanned
   in the loop isn't actually likely to be the culprit for an index
   problem. And we'd not mention the fact that the problem is occurring
   in the index.
b) will report the wrong block, when in lazy_vacuum_heap().

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
On Wed, Dec 11, 2019 at 12:33:53PM -0300, Alvaro Herrera wrote:
> On 2019-Dec-11, Justin Pryzby wrote:
> > +   cbarg.blkno = 0; /* Not known yet */
> Shouldn't you use InvalidBlockNumber for this initialization?
..
> >             pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
> > +           cbarg.blkno = blkno;
> I would put this before pgstat_progress_update_param, just out of
> paranoia.
..
> Lose this hunk?

Addressed those.

> Or do you intend that this is going to be used for lazy_vacuum_heap too?
> Maybe it should.

Done in a separate patch.

On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:
> Hm, wonder if could be worthwhile to not use a separate struct here, but
> instead extend one of the existing structs to contain the necessary
> information. Or perhaps have one new struct with all the necessary
> information. There's already quite a few places that do
> get_namespace_name(), for example.

Didn't find a better struct to use yet.

> > +   vacuum_error_callback_arg cbarg;
> Not a fan of "cbarg", too generic.
..
> I think this will misattribute errors that happen when in the:

Probably right.  Attached should address it.

On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:

> > +typedef struct
> > +{
> > + char *relname;
> > + char *relnamespace;
> > + BlockNumber blkno;
> > +} vacuum_error_callback_arg;
>
> Hm, wonder if could be worthwhile to not use a separate struct here, but
> instead extend one of the existing structs to contain the necessary
> information. Or perhaps have one new struct with all the necessary
> information. There's already quite a few places that do
> get_namespace_name(), for example.

> Not a fan of "cbarg", too generic.

> I think this will misattribute errors that happen when in the:

I think that's addressed after deduplicating in attached.

Deduplication revealed 2nd progress call, which seems to have been included
redundantly at c16dc1aca.

-               /* Remove tuples from heap */
-               pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
-                                                                        PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

Justin

v4-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch (2K) Download Attachment
v4-0002-Remove-redundant-call-to-vacuum-progress.patch (1004 bytes) Download Attachment
v4-0003-deduplication.patch (5K) Download Attachment
v4-0004-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
v4-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Michael Paquier-2
On Thu, Dec 12, 2019 at 09:08:31PM -0600, Justin Pryzby wrote:
> On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:
>> Hm, wonder if could be worthwhile to not use a separate struct here, but
>> instead extend one of the existing structs to contain the necessary
>> information. Or perhaps have one new struct with all the necessary
>> information. There's already quite a few places that do
>> get_namespace_name(), for example.
>
> Didn't find a better struct to use yet.

Yes, I am too wondering what Andres has in mind here.  It is not like
you can use VacuumRelation as the OID of the relation may not have
been stored.

> On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:>
> I think that's addressed after deduplicating in attached.
>
> Deduplication revealed 2nd progress call, which seems to have been included
> redundantly at c16dc1aca.
>
> -               /* Remove tuples from heap */
> -               pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
> -                                                                        PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

What is the purpose of 0001 in the context of this thread?  One could
say the same about 0002 and 0003.  Anyway, you are right with 0002 as
the progress value for PROGRESS_VACUUM_PHASE gets updated twice in a
row with the same value.  So let's clean up that.  

The refactoring in 0003 is interesting, so I would be tempted to merge
it.  Now you have one small issue in it:
-    /*
-     * Forget the now-vacuumed tuples, and press on, but be careful
-     * not to reset latestRemovedXid since we want that value to be
-     * valid.
-     */
+     lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
      vacrelstats->num_dead_tuples = 0;
-     vacrelstats->num_index_scans++;
You are moving this comment within lazy_vacuum_heap_index, but it
applies to num_dead_tuples and not num_index_scans, no?  You should
keep the incrementation of num_index_scans within the routine though.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
On Fri, Dec 13, 2019 at 10:28:50PM +0900, Michael Paquier wrote:

>> v4-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch
>> v4-0002-Remove-redundant-call-to-vacuum-progress.patch
>> v4-0003-deduplication.patch
>> v4-0004-vacuum-errcontext-to-show-block-being-processed.patch
>> v4-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch

> What is the purpose of 0001 in the context of this thread?  One could
> say the same about 0002 and 0003.  Anyway, you are right with 0002 as
> the progress value for PROGRESS_VACUUM_PHASE gets updated twice in a
> row with the same value.  So let's clean up that.  

It's related code which I cleaned up before adding new stuff.  Not essential,
thus separate (0002 should be backpatched).

> The refactoring in 0003 is interesting, so I would be tempted to merge
> it.  Now you have one small issue in it:
> -    /*
> -     * Forget the now-vacuumed tuples, and press on, but be careful
> -     * not to reset latestRemovedXid since we want that value to be
> -     * valid.
> -     */
> +     lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
>       vacrelstats->num_dead_tuples = 0;
> -     vacrelstats->num_index_scans++;
> You are moving this comment within lazy_vacuum_heap_index, but it
> applies to num_dead_tuples and not num_index_scans, no?  You should
> keep the incrementation of num_index_scans within the routine though.
Thank you, fixed.

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

v5-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch (2K) Download Attachment
v5-0002-Remove-redundant-call-to-vacuum-progress.patch (1K) Download Attachment
v5-0003-deduplication.patch (5K) Download Attachment
v5-0004-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
v5-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Michael Paquier-2
On Fri, Dec 13, 2019 at 04:47:35PM -0600, Justin Pryzby wrote:
> It's related code which I cleaned up before adding new stuff.  Not essential,
> thus separate (0002 should be backpatched).

The issue just causes some extra work and that's not a bug, so applied
without a backpatch.

>> The refactoring in 0003 is interesting, so I would be tempted to merge
>> it.  Now you have one small issue in it:
>> -    /*
>> -     * Forget the now-vacuumed tuples, and press on, but be careful
>> -     * not to reset latestRemovedXid since we want that value to be
>> -     * valid.
>> -     */
>> +     lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
>>       vacrelstats->num_dead_tuples = 0;
>> -     vacrelstats->num_index_scans++;
>> You are moving this comment within lazy_vacuum_heap_index, but it
>> applies to num_dead_tuples and not num_index_scans, no?  You should
>> keep the incrementation of num_index_scans within the routine though.
>
> Thank you, fixed.
For 0003, I think that lazy_vacuum_heap_index() can be confusing as
those indexes are unrelated to heap.  Why not naming it just
lazy_vacuum_all_indexes()?  The routine should also have a header
describing it.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
On Sun, Dec 15, 2019 at 10:07:08PM +0900, Michael Paquier wrote:
> On Fri, Dec 13, 2019 at 04:47:35PM -0600, Justin Pryzby wrote:
> > It's related code which I cleaned up before adding new stuff.  Not essential,
> > thus separate (0002 should be backpatched).
>
> The issue just causes some extra work and that's not a bug, so applied
> without a backpatch.

Thanks

> For 0003, I think that lazy_vacuum_heap_index() can be confusing as
> those indexes are unrelated to heap.  Why not naming it just
> lazy_vacuum_all_indexes()?  The routine should also have a header
> describing it.

I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")

I suppose you don't think the other way around is better?
lazy_vacuum_index_heap

Justin


Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Michael Paquier-2
On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
> I named it so because it calls both lazy_vacuum_index
> ("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
> lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")
>
> I suppose you don't think the other way around is better?
> lazy_vacuum_index_heap

Dunno.  Let's see if others have other thoughts on the matter.  FWIW,
I have a long history at naming things in a way others don't like :)
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Kyotaro Horiguchi-4
At Mon, 16 Dec 2019 11:49:56 +0900, Michael Paquier <[hidden email]> wrote in

> On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
> > I named it so because it calls both lazy_vacuum_index
> > ("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
> > lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")
> >
> > I suppose you don't think the other way around is better?
> > lazy_vacuum_index_heap
>
> Dunno.  Let's see if others have other thoughts on the matter.  FWIW,
> I have a long history at naming things in a way others don't like :)

lazy_vacuum_heap_index() seems confusing to me.  I read the name as
Michael did before looking the above explanation.

lazy_vacuum_heap_and_index() is clearer to me.
lazy_vacuum_heap_with_index() could also work but I'm not sure it's
further better.

I see some function names like that, and some others that have two
verbs bonded by "_and_".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
In reply to this post by Michael Paquier-2
On Mon, Dec 16, 2019 at 11:49:56AM +0900, Michael Paquier wrote:

> On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
> > I named it so because it calls both lazy_vacuum_index
> > ("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
> > lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")
> >
> > I suppose you don't think the other way around is better?
> > lazy_vacuum_index_heap
>
> Dunno.  Let's see if others have other thoughts on the matter.  FWIW,
> I have a long history at naming things in a way others don't like :)
I renamed.

And deduplicated two more hunks into a 2nd function.

(I'm also including the changes I mentioned here ... in case anyone cares to
comment or review).
https://www.postgresql.org/message-id/20191220171132.GB30414%40telsasoft.com

v6-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch (2K) Download Attachment
v6-0002-deduplication.patch (5K) Download Attachment
v6-0003-dedup2-skip_blocks.patch (9K) Download Attachment
v6-0004-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
v6-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch (1K) Download Attachment
v6-0006-Print-debug-line-before-starting-each-vacuum-step.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Michael Paquier-2
On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:
> I renamed.

Hmm.  I have found what was partially itching me for patch 0002, and
that's actually the fact that we don't do the error reporting for heap
within lazy_vacuum_heap() because the code relies too much on updating
two progress parameters at the same time, on top of the fact that you
are mixing multiple concepts with this refactoring.  One problem is
that if this code is refactored in the future, future callers of
lazy_vacuum_heap() would miss the update of the progress reporting.
Splitting things improves also the readability of the code, so
attached is the refactoring I would do for this portion of the set.
It is also more natural to increment num_index_scans when the
reporting happens on consistency grounds.

(Please note that I have not indented yet the patch.)
--
Michael

vacuumlazy-index-refactor-v7.patch (5K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Michael Paquier-2
On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:
> (Please note that I have not indented yet the patch.)

And one indentation later, committed this one after an extra lookup as
of 1ab41a3.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
In reply to this post by Michael Paquier-2
On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:

> On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:
> > I renamed.
>
> Hmm.  I have found what was partially itching me for patch 0002, and
> that's actually the fact that we don't do the error reporting for heap
> within lazy_vacuum_heap() because the code relies too much on updating
> two progress parameters at the same time, on top of the fact that you
> are mixing multiple concepts with this refactoring.  One problem is
> that if this code is refactored in the future, future callers of
> lazy_vacuum_heap() would miss the update of the progress reporting.
> Splitting things improves also the readability of the code, so
> attached is the refactoring I would do for this portion of the set.
> It is also more natural to increment num_index_scans when the
I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.

v7-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch (2K) Download Attachment
v7-0002-michael-dedup.patch (5K) Download Attachment
v7-0003-dedup2-skip_blocks.patch (9K) Download Attachment
v7-0004-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
v7-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch (1K) Download Attachment
v7-0006-Print-debug-line-before-starting-each-vacuum-step.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Robert Haas
On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby <[hidden email]> wrote:
> I agree that's better.
> I don't see any reason why the progress params need to be updated atomically.
> So rebasified against your patch.

I am not sure whether it's important enough to make a stink about, but
it bothers me a bit that this is being dismissed as unimportant. The
problem is that, if the updates are not atomic, then somebody might
see the data after one has been updated and the other has not yet been
updated. The result is that when the phase is
PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
can't tell whether the number of index scans reported is the number
*previously* performed or the number performed including the one that
just finished. The race to see the latter state is narrow, so it
probably wouldn't come up often, but it does seem like it would be
confusing if it did happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number (atomic progress update)

Justin Pryzby
On Sat, Dec 28, 2019 at 07:21:31PM -0500, Robert Haas wrote:

> On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby <[hidden email]> wrote:
> > I agree that's better.
> > I don't see any reason why the progress params need to be updated atomically.
> > So rebasified against your patch.
>
> I am not sure whether it's important enough to make a stink about, but
> it bothers me a bit that this is being dismissed as unimportant. The
> problem is that, if the updates are not atomic, then somebody might
> see the data after one has been updated and the other has not yet been
> updated. The result is that when the phase is
> PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
> can't tell whether the number of index scans reported is the number
> *previously* performed or the number performed including the one that
> just finished. The race to see the latter state is narrow, so it
> probably wouldn't come up often, but it does seem like it would be
> confusing if it did happen.

What used to be atomic was this:

-               hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
-               hvp_val[1] = vacrelstats->num_index_scans + 1;

=> switch from PROGRESS_VACUUM_PHASE_VACUUM INDEX to HEAP and increment
index_vacuum_count, which is documented as the "Number of completed index
vacuum cycles."

Now, it 1) increments the number of completed scans; and, 2) then progresses
phase to HEAP, so there's a window where the number of completed scans is
incremented, and it still says VACUUM_INDEX.

Previously, if it said VACUUM_INDEX, one could assume that index_vacuum_count
would increase at least once more, and that's no longer true.  If someone sees
VACUUM_INDEX and some NUM_INDEX_VACUUMS, and then later sees VACUUM_HEAP or
other later stage, with same (maybe final) value of NUM_INDEX_VACUUMS, that's
different than previous behavior.

It seems to me that a someone or their tool monitoring pg_stat shouldn't be
confused by this change, since:
1) there's no promise about how high NUM_INDEX_VACUUMS will or won't go; and,
2) index_vacuum_count didn't do anything strange like decreasing, or increased
before the scans were done; and,
3) the vacuum can finish at any time, and the monitoring process presumably
knows that when the PID is gone, it's finished, even if it missed intermediate
updates;

The behavior is different from before, but I think that's ok: the number of
scans is accurate, and the PHASE is accurate, even though it'll change a moment
later.

I see there's similar case here:
|    /* report all blocks vacuumed; and that we're cleaning up */
|    pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
|    pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
|                                 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);

heap_blks_scanned is documented as "Number of heap blocks SCANNED", and it
increments exactly to heap_blks_total.  Would someone be confused if
heap_blks_scanned==heap_blks_total AND phase=='scanning heap' ?  I think they'd
just expect PHASE to be updated a moment later.  (And if it wasn't, I agree they
should then be legitimately confused or concerned).

Actually, the doc says:
|If heap_blks_scanned is less than heap_blks_total, the system will return to
|scanning the heap after this phase is completed; otherwise, it will begin
|cleaning up indexes AFTER THIS PHASE IS COMPLETED.

I read that to mean that it's okay if heap_blks_scanned==heap_blks_total when
scanning/vacuuming heap.

Justin


Reply | Threaded
Open this post in threaded view
|

Re: error context for vacuum to include block number

Justin Pryzby
In reply to this post by Justin Pryzby
On Thu, Dec 26, 2019 at 09:57:04AM -0600, Justin Pryzby wrote:
> So rebasified against your patch.

Rebased against your patch in master this time.

v8-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch (2K) Download Attachment
v8-0002-dedup2-skip_blocks.patch (9K) Download Attachment
v8-0003-vacuumlazy-typos-in-comments.patch (1K) Download Attachment
v8-0004-vacuum-errcontext-to-show-block-being-processed.patch (3K) Download Attachment
v8-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch (1K) Download Attachment
1234 ... 7