display offset along with block number in vacuum errors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

display offset along with block number in vacuum errors

Mahendra Singh Thalor
Hi hackers,
We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in vacuum error. Here, proposing a patch to add offset along with block number in vacuum errors.

In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to display additional information(block number) so that in case of vacuum error, we can identify which block we are getting error.  Addition to block number, if we can display offset, then it will be more helpful for users. So to display offset, here proposing two different methods(Thanks Robert for suggesting these 2 methods):

Method 1: We can report the TID as well as the block number in  errcontext.
- errcontext("while scanning block %u of relation \"%s.%s\"",
-   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+   errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);

Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method.

Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it report the TID as part of the error message or in the error detail.
  ereport(ERROR,
  (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("found xmin %u from before relfrozenxid %u",
+ errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
+ ItemPointerGetBlockNumber(tid),
+ ItemPointerGetOffsetNumber(tid),
  xid, relfrozenxid)));

Attaching v01_0002 patch for this method.

Please let me know your thoughts.

[1] : http://postgr.es/m/20200713223822.az6fo3m2x4t42xz2@...
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v01_0001-Added-offset-with-block-number-in-vacuum-errcontext.patch (9K) Download Attachment
v01_0002-Added-block-and-offset-to-errors-of-heap_prepare_fre.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Michael Paquier-2
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> In commit b61d161(Introduce vacuum errcontext to display additional
> information), we added vacuum errcontext to display additional
> information(block number) so that in case of vacuum error, we can identify
> which block we are getting error.  Addition to block number, if we can
> display offset, then it will be more helpful for users. So to display
> offset, here proposing two different methods(Thanks Robert for suggesting
> these 2 methods):

    new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
    vacrelstats->blkno = new_rel_pages;
+   vacrelstats->offnum = InvalidOffsetNumber;

Adding more context would be interesting for some cases, but not all
contrary to what your patch does in some code paths like
lazy_truncate_heap() as you would show up an offset of 0 for an
invalid value.  This could confuse some users.  Note that we are
careful enough to not print a context message if the block number is
invalid.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Masahiko Sawada-2
In reply to this post by Mahendra Singh Thalor
On Sat, 25 Jul 2020 at 02:49, Mahendra Singh Thalor <[hidden email]> wrote:

>
> Hi hackers,
> We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in vacuum error. Here, proposing a patch to add offset along with block number in vacuum errors.
>
> In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to display additional information(block number) so that in case of vacuum error, we can identify which block we are getting error.  Addition to block number, if we can display offset, then it will be more helpful for users. So to display offset, here proposing two different methods(Thanks Robert for suggesting these 2 methods):
>
> Method 1: We can report the TID as well as the block number in  errcontext.
> - errcontext("while scanning block %u of relation \"%s.%s\"",
> -   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
> + errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
> +   errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
>
> Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method.
>
> Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it report the TID as part of the error message or in the error detail.
>   ereport(ERROR,
>   (errcode(ERRCODE_DATA_CORRUPTED),
> - errmsg_internal("found xmin %u from before relfrozenxid %u",
> + errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
> + ItemPointerGetBlockNumber(tid),
> + ItemPointerGetOffsetNumber(tid),
>   xid, relfrozenxid)));
>
> Attaching v01_0002 patch for this method.
>
> Please let me know your thoughts.
>

+1 for adding offset in error messages.

I had a look at 0001 patch. You've set the vacuum error info but I
think an error won't happen during setting itemids unused:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber
blkno, Buffer buffer,
                BlockNumber tblk;
                OffsetNumber toff;
                ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;

                tblk =
ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
                if (tblk != blkno)
                        break;                          /* past end of
tuples for this block */
                toff =
ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats,
&loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
                itemid = PageGetItemId(page, toff);
                ItemIdSetUnused(itemid);
                unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error
traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
        }

        PageRepairFragmentation(page);

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Mahendra Singh Thalor
In reply to this post by Michael Paquier-2
Thanks Michael for looking into this.

On Sat, 25 Jul 2020 at 15:02, Michael Paquier <[hidden email]> wrote:

>
> On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> > In commit b61d161(Introduce vacuum errcontext to display additional
> > information), we added vacuum errcontext to display additional
> > information(block number) so that in case of vacuum error, we can identify
> > which block we are getting error.  Addition to block number, if we can
> > display offset, then it will be more helpful for users. So to display
> > offset, here proposing two different methods(Thanks Robert for suggesting
> > these 2 methods):
>
>     new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
>     vacrelstats->blkno = new_rel_pages;
> +   vacrelstats->offnum = InvalidOffsetNumber;
>
> Adding more context would be interesting for some cases, but not all
> contrary to what your patch does in some code paths like
> lazy_truncate_heap() as you would show up an offset of 0 for an
> invalid value.  This could confuse some users.  Note that we are
> careful enough to not print a context message if the block number is
> invalid.

Okay. I agree with you. In case of inavlid offset, we can skip offset
printing. I will do this change in the next patch.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
In reply to this post by Mahendra Singh Thalor
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> Hi hackers,
> We discussed in another email thread[1], that it will be helpful if we can
> display offset along with block number in vacuum error. Here, proposing a
> patch to add offset along with block number in vacuum errors.

Thanks.  I happened to see both threads, only by chance.

I'd already started writing the same as your 0001, which is essentially the
same as yours.

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
  BlockNumber tblk;
  OffsetNumber toff;
  ItemId itemid;
+ LVSavedErrInfo loc_saved_err_info;
 
  tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
  if (tblk != blkno)
  break; /* past end of tuples for this block */
  toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+ /* Update error traceback information */
+ update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+ blkno, toff);
  itemid = PageGetItemId(page, toff);
  ItemIdSetUnused(itemid);
  unused[uncnt++] = toff;
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
  }

I'm not sure why you use restore_vacuum_error_info() at all.  It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

                tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
                vacrelstats->blkno = tblk;

I think you should also do:

@@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
                ItemId          itemid;
                HeapTupleData tuple;
 
+               vacrelstats->offset = offnum;

I'm not sure, but maybe you'd also want to do the same in more places:

@@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
@@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Mahendra Singh Thalor
Thanks Justin, Sawada and Michael for reviewing.

On Mon, 27 Jul 2020 at 16:43, Justin Pryzby <[hidden email]> wrote:

>
> On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> > Hi hackers,
> > We discussed in another email thread[1], that it will be helpful if we can
> > display offset along with block number in vacuum error. Here, proposing a
> > patch to add offset along with block number in vacuum errors.
>
> Thanks.  I happened to see both threads, only by chance.
>
> I'd already started writing the same as your 0001, which is essentially the
> same as yours.
>
> Here:
>
> @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>                 BlockNumber tblk;
>                 OffsetNumber toff;
>                 ItemId          itemid;
> +               LVSavedErrInfo loc_saved_err_info;
>
>                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
>                 if (tblk != blkno)
>                         break;                          /* past end of tuples for this block */
>                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> +
> +               /* Update error traceback information */
> +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> +                                                                blkno, toff);
>                 itemid = PageGetItemId(page, toff);
>                 ItemIdSetUnused(itemid);
>                 unused[uncnt++] = toff;
> +
> +               /* Revert to the previous phase information for error traceback */
> +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
>         }
>
> I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> called at the end of lazy_vacuum_page() (and others) to allow functions to
> clean up after their own state changes, rather than requiring callers to do it.
> I don't think you should use it in a loop, nor introduce another
> LVSavedErrInfo.
>
> Since phase and blkno are already set, I think you should just set
> vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> Similar to whats done in lazy_vacuum_heap():
>
>                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
>                 vacrelstats->blkno = tblk;
Fixed.

>
> I think you should also do:
>
> @@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>                 ItemId          itemid;
>                 HeapTupleData tuple;
>
> +               vacrelstats->offset = offnum;

Agreed and fixed.

>
> I'm not sure, but maybe you'd also want to do the same in more places:
>
> @@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

Fixed.

> @@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)

I checked the code and I think there is no need to do similar changes
in count_nondeletable_pages. I will look again and will verify again.

Apart from these, I fixed comments given by Sawada and Michael in the
latest patch. Attaching v2 patch for review.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

v02_0001-Added-offset-with-block-number-in-vacuum-errcontext.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> Apart from these, I fixed comments given by Sawada and Michael in the
> latest patch. Attaching v2 patch for review.

Thanks.

lazy_check_needs_freeze iterates over blocks and this patch changes it to
update vacrelstats.  I think it should do what
lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
its beginning (even though only the offset is changed), and then
restore_vacuum_error_info() at its end (to "revert back" to the item number it
started with).

The same is true of heap_page_is_all_visible(), except it's only called by
lazy_vacuum_page(), which already calls restore_vacuum_error_info() a few lines
later.

As for the message:

+                               if (OffsetNumberIsValid(errinfo->offnum))
+                                       errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+                                                          errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
+                               else
+                                       errcontext("while scanning block %u of relation \"%s.%s\"",
+                                                          errinfo->blkno, errinfo->relnamespace, errinfo->relname);

I think that may be confusing.  A DBA should know what a "block" is, but
"offset" sounds like a byte offset, rather than an item number.  Here's what
I'd written.  Maybe it should say "offset number".

+                               errcontext("while vacuuming block %u of relation \"%s.%s\", item offset %u",


--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
In reply to this post by Mahendra Singh Thalor
Bcc:
Subject: Re: display offset along with block number in vacuum errors
Reply-To:
In-Reply-To: <[hidden email]>

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

> > Here:
> >
> > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> >                 BlockNumber tblk;
> >                 OffsetNumber toff;
> >                 ItemId          itemid;
> > +               LVSavedErrInfo loc_saved_err_info;
> >
> >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> >                 if (tblk != blkno)
> >                         break;                          /* past end of tuples for this block */
> >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > +
> > +               /* Update error traceback information */
> > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > +                                                                blkno, toff);
> >                 itemid = PageGetItemId(page, toff);
> >                 ItemIdSetUnused(itemid);
> >                 unused[uncnt++] = toff;
> > +
> > +               /* Revert to the previous phase information for error traceback */
> > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> >         }
> >
> > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > clean up after their own state changes, rather than requiring callers to do it.
> > I don't think you should use it in a loop, nor introduce another
> > LVSavedErrInfo.
> >
> > Since phase and blkno are already set, I think you should just set
> > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > Similar to whats done in lazy_vacuum_heap():
> >
> >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> >                 vacrelstats->blkno = tblk;
>
> Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk.  On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean().  An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

I added this at:
https://commitfest.postgresql.org/29/2662/

If anyone is considering this patch for v13, I guess it should be completed by
next week.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> Bcc:
> Subject: Re: display offset along with block number in vacuum errors
> Reply-To:
> In-Reply-To: <[hidden email]>

whoops

> On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > Here:
> > >
> > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > >                 BlockNumber tblk;
> > >                 OffsetNumber toff;
> > >                 ItemId          itemid;
> > > +               LVSavedErrInfo loc_saved_err_info;
> > >
> > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > >                 if (tblk != blkno)
> > >                         break;                          /* past end of tuples for this block */
> > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > +
> > > +               /* Update error traceback information */
> > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > +                                                                blkno, toff);
> > >                 itemid = PageGetItemId(page, toff);
> > >                 ItemIdSetUnused(itemid);
> > >                 unused[uncnt++] = toff;
> > > +
> > > +               /* Revert to the previous phase information for error traceback */
> > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > >         }
> > >
> > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > clean up after their own state changes, rather than requiring callers to do it.
> > > I don't think you should use it in a loop, nor introduce another
> > > LVSavedErrInfo.
> > >
> > > Since phase and blkno are already set, I think you should just set
> > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > Similar to whats done in lazy_vacuum_heap():
> > >
> > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > >                 vacrelstats->blkno = tblk;
> >
> > Fixed.
>
> I rearead this thread and I think the earlier suggestion from Masahiko was
> right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> the page, which has already been read from disk.  On my suggestion, your v2
> patch sets offnum directly, but now I think it's not useful to set at all,
> since the whole page is manipulated by PageRepairFragmentation() and
> log_heap_clean().  An error there would misleadingly say "..at offset number
> MM", but would always show the page's last offset, and not the offset where an
> error occured.
This makes me question whether offset numbers are ever useful during
VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
internal functions that don't update vacrelstats->offno.  Note that my initial
problem report that led to the errcontext implementation was an ERROR in heap
*scan* (not vacuum).  So an offset number at that point would've been
sufficient.
https://www.postgresql.org/message-id/20190808012436.GG11185@...

I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
so an error in heap_page_prune() (for example) doesn't get the wrong offset
associated with it.

So see the attached changes on top of your v2 patch.

postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999); VACUUM tt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt;
ERROR:  found xmin 1961 from before relfrozenxid 1073743785
CONTEXT:  while scanning block 345 of relation "public.tt", item offset 205

Hmm.. is it confusing that the block number is 0-indexed but the offset is
1-indexed ?

--
Justin

0001-Added-offset-with-block-number-in-vacuum-errcontext.patch (10K) Download Attachment
0002-fix.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Mahendra Singh Thalor
Thanks Justin.

On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <[hidden email]> wrote:

>
> On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > Bcc:
> > Subject: Re: display offset along with block number in vacuum errors
> > Reply-To:
> > In-Reply-To: <[hidden email]>
>
> whoops
>
> > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > Here:
> > > >
> > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > >                 BlockNumber tblk;
> > > >                 OffsetNumber toff;
> > > >                 ItemId          itemid;
> > > > +               LVSavedErrInfo loc_saved_err_info;
> > > >
> > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > >                 if (tblk != blkno)
> > > >                         break;                          /* past end of tuples for this block */
> > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > +
> > > > +               /* Update error traceback information */
> > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > +                                                                blkno, toff);
> > > >                 itemid = PageGetItemId(page, toff);
> > > >                 ItemIdSetUnused(itemid);
> > > >                 unused[uncnt++] = toff;
> > > > +
> > > > +               /* Revert to the previous phase information for error traceback */
> > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > >         }
> > > >
> > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > I don't think you should use it in a loop, nor introduce another
> > > > LVSavedErrInfo.
> > > >
> > > > Since phase and blkno are already set, I think you should just set
> > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > Similar to whats done in lazy_vacuum_heap():
> > > >
> > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > >                 vacrelstats->blkno = tblk;
> > >
> > > Fixed.
> >
> > I rearead this thread and I think the earlier suggestion from Masahiko was
> > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > the page, which has already been read from disk.  On my suggestion, your v2
> > patch sets offnum directly, but now I think it's not useful to set at all,
> > since the whole page is manipulated by PageRepairFragmentation() and
> > log_heap_clean().  An error there would misleadingly say "..at offset number
> > MM", but would always show the page's last offset, and not the offset where an
> > error occured.
>
> This makes me question whether offset numbers are ever useful during
> VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> internal functions that don't update vacrelstats->offno.  Note that my initial
> problem report that led to the errcontext implementation was an ERROR in heap
> *scan* (not vacuum).  So an offset number at that point would've been
> sufficient.
> https://www.postgresql.org/message-id/20190808012436.GG11185@...
>
> I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> so an error in heap_page_prune() (for example) doesn't get the wrong offset
> associated with it.
>
> So see the attached changes on top of your v2 patch.
Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

>
> postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999); VACUUM tt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt;
> ERROR:  found xmin 1961 from before relfrozenxid 1073743785
> CONTEXT:  while scanning block 345 of relation "public.tt", item offset 205
>
> Hmm.. is it confusing that the block number is 0-indexed but the offset is
> 1-indexed ?
>
> --
> Justin

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.c

v03_0001-Added-offset-with-block-number-in-vacuum-errcontext.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
On Sat, Aug 01, 2020 at 12:31:53PM +0530, Mahendra Singh Thalor wrote:
> Actually I was waiting for review comments from committer and other
> people also and was planning to send a patch after that. I already
> fixed your comments in my offline patch and was waiting for more
> comments. Anyway, thanks for delta patch.
>
> Here, attaching v3 patch for review.

I wasn't being impatient but I spent enough time thinking about this that it
made sense to put it in patch form.  Your patch has a couple extaneous changes:

                case VACUUM_ERRCB_PHASE_VACUUM_HEAP:
                        if (BlockNumberIsValid(errinfo->blkno))
+                       {
                                errcontext("while vacuuming block %u of relation \"%s.%s\"",
                                                   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+                       }
                        break;
 
                case VACUUM_ERRCB_PHASE_VACUUM_INDEX:
@@ -3589,6 +3618,7 @@ vacuum_error_callback(void *arg)
                                           errinfo->indname, errinfo->relnamespace, errinfo->relname);
                        break;
 
+
                case VACUUM_ERRCB_PHASE_INDEX_CLEANUP:
                        errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",

I would get rid of these by doing like: git reset -p HEAD~1 (say "n" to most
hunks and "y" to reset just the two, above), then git commit --amend (without
-a and without pathnames), then git diff will show local changes (including
those no-longer-committed hunks), which you can git checkout -p (or similar).
I'd be interested to hear if there's a better way.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Masahiko Sawada-2
In reply to this post by Mahendra Singh Thalor
On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <[hidden email]> wrote:

>
> Thanks Justin.
>
> On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <[hidden email]> wrote:
> >
> > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > > Bcc:
> > > Subject: Re: display offset along with block number in vacuum errors
> > > Reply-To:
> > > In-Reply-To: <[hidden email]>
> >
> > whoops
> >
> > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > > Here:
> > > > >
> > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > > >                 BlockNumber tblk;
> > > > >                 OffsetNumber toff;
> > > > >                 ItemId          itemid;
> > > > > +               LVSavedErrInfo loc_saved_err_info;
> > > > >
> > > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > >                 if (tblk != blkno)
> > > > >                         break;                          /* past end of tuples for this block */
> > > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > +
> > > > > +               /* Update error traceback information */
> > > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > > +                                                                blkno, toff);
> > > > >                 itemid = PageGetItemId(page, toff);
> > > > >                 ItemIdSetUnused(itemid);
> > > > >                 unused[uncnt++] = toff;
> > > > > +
> > > > > +               /* Revert to the previous phase information for error traceback */
> > > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > > >         }
> > > > >
> > > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > > I don't think you should use it in a loop, nor introduce another
> > > > > LVSavedErrInfo.
> > > > >
> > > > > Since phase and blkno are already set, I think you should just set
> > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > > Similar to whats done in lazy_vacuum_heap():
> > > > >
> > > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > >                 vacrelstats->blkno = tblk;
> > > >
> > > > Fixed.
> > >
> > > I rearead this thread and I think the earlier suggestion from Masahiko was
> > > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > > the page, which has already been read from disk.  On my suggestion, your v2
> > > patch sets offnum directly, but now I think it's not useful to set at all,
> > > since the whole page is manipulated by PageRepairFragmentation() and
> > > log_heap_clean().  An error there would misleadingly say "..at offset number
> > > MM", but would always show the page's last offset, and not the offset where an
> > > error occured.
> >
> > This makes me question whether offset numbers are ever useful during
> > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> > internal functions that don't update vacrelstats->offno.  Note that my initial
> > problem report that led to the errcontext implementation was an ERROR in heap
> > *scan* (not vacuum).  So an offset number at that point would've been
> > sufficient.
> > https://www.postgresql.org/message-id/20190808012436.GG11185@...
> >
> > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> > so an error in heap_page_prune() (for example) doesn't get the wrong offset
> > associated with it.
> >
> > So see the attached changes on top of your v2 patch.
>
> Actually I was waiting for review comments from committer and other
> people also and was planning to send a patch after that. I already
> fixed your comments in my offline patch and was waiting for more
> comments. Anyway, thanks for delta patch.
>
> Here, attaching v3 patch for review.

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
    if (PageIsNew(page) || PageIsEmpty(page))
        return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+           InvalidOffsetNumber);
+
    maxoff = PageGetMaxOffsetNumber(page);
    for (offnum = FirstOffsetNumber;
         offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in that phase. I'm okay with explicitly
setting it but on the other hand, we don't set the phase in
heap_page_is_all_visible(). Is there any reason for that?

Also, since we don't reset vacrelstats->offnum at the end of
heap_page_is_all_visible(), if an error occurs by the end of
lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
report the error context with the last offset number in the page,
making the users confused.

---
@@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

        if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
                                    MultiXactCutoff, buf))
-           return true;
+           break;
    }                           /* scan along page */

-   return false;
+   /* Revert to the previous phase information for error traceback */
+   restore_vacuum_error_info(vacrelstats, &saved_err_info);
+
+   return offnum <= maxoff ? true : false;
 }

I think we can write just "return (offnum <= maxoff)".

---
-   /* Revert back to the old phase information for error traceback */
+   /* Revert to the old phase information for error traceback */

If we want to modify this comment how about the following phrase for
consistency with other places?

/* Revert to the previous phase information for error traceback */

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote:

> Thank you for updating the patch!
>
> Here are my comments on v3 patch:
>
> @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>     if (PageIsNew(page) || PageIsEmpty(page))
>         return false;
>
> +   /* Update error traceback information */
> +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> +           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> +           InvalidOffsetNumber);
> +
>     maxoff = PageGetMaxOffsetNumber(page);
>     for (offnum = FirstOffsetNumber;
>          offnum <= maxoff;
>
> You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> but I think we're already in that phase. I'm okay with explicitly
> setting it but on the other hand, we don't set the phase in
> heap_page_is_all_visible(). Is there any reason for that?

That part was my suggestion, so I can answer that.  I added
update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later
call restore_vacuum_error_info().

> Also, since we don't reset vacrelstats->offnum at the end of
> heap_page_is_all_visible(), if an error occurs by the end of
> lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> report the error context with the last offset number in the page,
> making the users confused.

So it looks like heap_page_is_all_visible() should also call the update and
restore functions.

Do you agree with my suggestion that the VACUUM phase should never try to
report an offset ?

How do you think we can phrase the message to avoid confusion due to 0-based
block number and 1-based offset ?

Thanks,
--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Masahiko Sawada-2
On Sun, 2 Aug 2020 at 13:24, Justin Pryzby <[hidden email]> wrote:

>
> On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch!
> >
> > Here are my comments on v3 patch:
> >
> > @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
> >     if (PageIsNew(page) || PageIsEmpty(page))
> >         return false;
> >
> > +   /* Update error traceback information */
> > +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> > +           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> > +           InvalidOffsetNumber);
> > +
> >     maxoff = PageGetMaxOffsetNumber(page);
> >     for (offnum = FirstOffsetNumber;
> >          offnum <= maxoff;
> >
> > You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> > but I think we're already in that phase. I'm okay with explicitly
> > setting it but on the other hand, we don't set the phase in
> > heap_page_is_all_visible(). Is there any reason for that?
>
> That part was my suggestion, so I can answer that.  I added
> update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later
> call restore_vacuum_error_info().
>
> > Also, since we don't reset vacrelstats->offnum at the end of
> > heap_page_is_all_visible(), if an error occurs by the end of
> > lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> > report the error context with the last offset number in the page,
> > making the users confused.
>
> So it looks like heap_page_is_all_visible() should also call the update and
> restore functions.
>
> Do you agree with my suggestion that the VACUUM phase should never try to
> report an offset ?

Yeah, given the current heap vacuum implementation, I agree that
setting the offset number during VACUUM_HEAP phase doesn't help
anything. But setting the offset number during checking tuples'
visibility in heap_page_is_all_visible() might be useful, although it
might be unlikely to find a problem in heap_page_is_all_visible() as
the tuple visibility checking is already done in lazy_scan_heap(). I
wonder if we can set SCAN_HEAP phase and update the offset number in
heap_page_is_all_visible().

> How do you think we can phrase the message to avoid confusion due to 0-based
> block number and 1-based offset ?

I think that since the user who uses this errcontext information is
likely to know more or less the internal of PostgreSQL I think 0-based
block number and 1-based offset will not be a problem. However, I
expected these are documented but couldn't find. If not yet, I think
it’s a good time to document that.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Mahendra Singh Thalor
In reply to this post by Masahiko Sawada-2
Thanks Sawada and Justin.

On Sun, 2 Aug 2020 at 09:33, Masahiko Sawada
<[hidden email]> wrote:

>
> On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <[hidden email]> wrote:
> >
> > Thanks Justin.
> >
> > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <[hidden email]> wrote:
> > >
> > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > > > Bcc:
> > > > Subject: Re: display offset along with block number in vacuum errors
> > > > Reply-To:
> > > > In-Reply-To: <[hidden email]>
> > >
> > > whoops
> > >
> > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > > > Here:
> > > > > >
> > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > > > >                 BlockNumber tblk;
> > > > > >                 OffsetNumber toff;
> > > > > >                 ItemId          itemid;
> > > > > > +               LVSavedErrInfo loc_saved_err_info;
> > > > > >
> > > > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > >                 if (tblk != blkno)
> > > > > >                         break;                          /* past end of tuples for this block */
> > > > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > > +
> > > > > > +               /* Update error traceback information */
> > > > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > > > +                                                                blkno, toff);
> > > > > >                 itemid = PageGetItemId(page, toff);
> > > > > >                 ItemIdSetUnused(itemid);
> > > > > >                 unused[uncnt++] = toff;
> > > > > > +
> > > > > > +               /* Revert to the previous phase information for error traceback */
> > > > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > > > >         }
> > > > > >
> > > > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > > > I don't think you should use it in a loop, nor introduce another
> > > > > > LVSavedErrInfo.
> > > > > >
> > > > > > Since phase and blkno are already set, I think you should just set
> > > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > > > Similar to whats done in lazy_vacuum_heap():
> > > > > >
> > > > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > > >                 vacrelstats->blkno = tblk;
> > > > >
> > > > > Fixed.
> > > >
> > > > I rearead this thread and I think the earlier suggestion from Masahiko was
> > > > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > > > the page, which has already been read from disk.  On my suggestion, your v2
> > > > patch sets offnum directly, but now I think it's not useful to set at all,
> > > > since the whole page is manipulated by PageRepairFragmentation() and
> > > > log_heap_clean().  An error there would misleadingly say "..at offset number
> > > > MM", but would always show the page's last offset, and not the offset where an
> > > > error occured.
> > >
> > > This makes me question whether offset numbers are ever useful during
> > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> > > internal functions that don't update vacrelstats->offno.  Note that my initial
> > > problem report that led to the errcontext implementation was an ERROR in heap
> > > *scan* (not vacuum).  So an offset number at that point would've been
> > > sufficient.
> > > https://www.postgresql.org/message-id/20190808012436.GG11185@...
> > >
> > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> > > so an error in heap_page_prune() (for example) doesn't get the wrong offset
> > > associated with it.
> > >
> > > So see the attached changes on top of your v2 patch.
> >
> > Actually I was waiting for review comments from committer and other
> > people also and was planning to send a patch after that. I already
> > fixed your comments in my offline patch and was waiting for more
> > comments. Anyway, thanks for delta patch.
> >
> > Here, attaching v3 patch for review.
>
> Thank you for updating the patch!
>
> Here are my comments on v3 patch:
>
> @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>     if (PageIsNew(page) || PageIsEmpty(page))
>         return false;
>
> +   /* Update error traceback information */
> +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> +           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> +           InvalidOffsetNumber);
> +
>     maxoff = PageGetMaxOffsetNumber(page);
>     for (offnum = FirstOffsetNumber;
>          offnum <= maxoff;
>
> You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> but I think we're already in that phase. I'm okay with explicitly
> setting it but on the other hand, we don't set the phase in
> heap_page_is_all_visible(). Is there any reason for that?
>
> Also, since we don't reset vacrelstats->offnum at the end of
> heap_page_is_all_visible(), if an error occurs by the end of
> lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> report the error context with the last offset number in the page,
> making the users confused.
Your point is valid. Added update and restore functions in
heap_page_is_all_visible in the latest patch.

>
> ---
> @@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>
>         if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
>                                     MultiXactCutoff, buf))
> -           return true;
> +           break;
>     }                           /* scan along page */
>
> -   return false;
> +   /* Revert to the previous phase information for error traceback */
> +   restore_vacuum_error_info(vacrelstats, &saved_err_info);
> +
> +   return offnum <= maxoff ? true : false;
>  }
>
> I think we can write just "return (offnum <= maxoff)".
Fixed this.

>
> ---
> -   /* Revert back to the old phase information for error traceback */
> +   /* Revert to the old phase information for error traceback */
>
> If we want to modify this comment how about the following phrase for
> consistency with other places?

Fixed this.

>
> /* Revert to the previous phase information for error traceback */
>
> Regards,
>
> --
> Masahiko Sawada            http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Apart from these, I fixed Justin's comment of extra brackets(That was
due to "patch -p 1 < file", as 002_fix was not applying directly). I
haven't updated the document for this(Sawada's comment). I will try in
the next patch.
Attaching v4 patch for review.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

v04_0001-Added-offset-with-block-number-in-vacuum-errcontext.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Robert Haas
In reply to this post by Masahiko Sawada-2
On Sun, Aug 2, 2020 at 10:43 PM Masahiko Sawada
<[hidden email]> wrote:
> I think that since the user who uses this errcontext information is
> likely to know more or less the internal of PostgreSQL I think 0-based
> block number and 1-based offset will not be a problem. However, I
> expected these are documented but couldn't find. If not yet, I think
> it’s a good time to document that.

I agree. That's just how TIDs are.


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

akapila
In reply to this post by Justin Pryzby
On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <[hidden email]> wrote:

>
> On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > Apart from these, I fixed comments given by Sawada and Michael in the
> > latest patch. Attaching v2 patch for review.
>
> Thanks.
>
> lazy_check_needs_freeze iterates over blocks and this patch changes it to
> update vacrelstats.  I think it should do what
> lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> its beginning (even though only the offset is changed), and then
> restore_vacuum_error_info() at its end (to "revert back" to the item number it
> started with).
>

I see that Mahendra has changed patch as per this suggestion but I am
not convinced that it is a good idea to sprinkle
update_vacuum_error_info()/restore_vacuum_error_info() at places more
than required. I see that it might look a bit clean from the
perspective that if tomorrow we use the function
lazy_check_needs_freeze() for a different purpose then we don't need
to worry about the wrong phase information. If we are worried about
that then we should have an assert in that function to ensure that the
current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

akapila
In reply to this post by Mahendra Singh Thalor
On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor
<[hidden email]> wrote:
>
> Apart from these, I fixed Justin's comment of extra brackets(That was
> due to "patch -p 1 < file", as 002_fix was not applying directly). I
> haven't updated the document for this(Sawada's comment). I will try in
> the next patch.
> Attaching v4 patch for review.
>

Few comments on the latest patch:
1.
@@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
*vacrelstats)
  */
  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
  vacrelstats->blkno = new_rel_pages;
+ vacrelstats->offnum = InvalidOffsetNumber;

Do we really need to update the 'vacrelstats->offnum' here when we
have already set it to InvalidOffsetNumber in the caller?

2.
@@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg)
  {
  case VACUUM_ERRCB_PHASE_SCAN_HEAP:
  if (BlockNumberIsValid(errinfo->blkno))
- errcontext("while scanning block %u of relation \"%s.%s\"",
-    errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ {
+ if (OffsetNumberIsValid(errinfo->offnum))
+ errcontext("while scanning block %u of relation \"%s.%s\", item offset %u",
+

I am not completely sure if this error message is an improvement over
what you have in the initial version of patch "while scanning block %u
and offset %u of relation \"%s.%s\"",...". I see that Justin has
raised a concern above that whether users will be aware of 'offset'
but I also see that we have used it in a few other messages in the
code.  For example:

PageIndexTupleDeleteNoCompact()
{
..
nline = PageGetMaxOffsetNumber(page);
if ((int) offnum <= 0 || (int) offnum > nline)
elog(ERROR, "invalid index offnum: %u", offnum);
..
}

hash_desc
{
..
case XLOG_HASH_INSERT:
{
xl_hash_insert *xlrec = (xl_hash_insert *) rec;

appendStringInfo(buf, "off %u", xlrec->offnum);
break;
}

Similarly in other desc functions, we have used off or offnum.

I find the message in your initial patch better.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

Justin Pryzby
In reply to this post by akapila
On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:

> On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <[hidden email]> wrote:
> >
> > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > Apart from these, I fixed comments given by Sawada and Michael in the
> > > latest patch. Attaching v2 patch for review.
> >
> > Thanks.
> >
> > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > update vacrelstats.  I think it should do what
> > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > its beginning (even though only the offset is changed), and then
> > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > started with).
> >
>
> I see that Mahendra has changed patch as per this suggestion but I am
> not convinced that it is a good idea to sprinkle
> update_vacuum_error_info()/restore_vacuum_error_info() at places more
> than required. I see that it might look a bit clean from the
> perspective that if tomorrow we use the function
> lazy_check_needs_freeze() for a different purpose then we don't need
> to worry about the wrong phase information. If we are worried about
> that then we should have an assert in that function to ensure that the
> current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

The motivation was to restore the offnum, which is set to Invalid at the start
of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
you think it's important, we could just set vacrelstats->offnum = Invalid
before returning, but that's what the restore function was built for.  We do
direct assignment in 2 places to avoid a function call within a loop.

lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
                           Relation *Irel, int nindexes, bool aggressive)

...
        for (blkno = 0; blkno < nblocks; blkno++)
        {
...
                update_vacuum_error_info(vacrelstats, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
                                                                 blkno, InvalidOffsetNumber);
                if (!ConditionalLockBufferForCleanup(buf))
                {
...
                        if (!lazy_check_needs_freeze(buf, &hastup, vacrelstats))
                        {
...
                for (offnum = FirstOffsetNumber;
                         offnum <= maxoff;
                         offnum = OffsetNumberNext(offnum))


--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: display offset along with block number in vacuum errors

akapila
On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <[hidden email]> wrote:

>
> On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <[hidden email]> wrote:
> > >
> > >
> > > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > > update vacrelstats.  I think it should do what
> > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > > its beginning (even though only the offset is changed), and then
> > > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > > started with).
> > >
> >
> > I see that Mahendra has changed patch as per this suggestion but I am
> > not convinced that it is a good idea to sprinkle
> > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > than required. I see that it might look a bit clean from the
> > perspective that if tomorrow we use the function
> > lazy_check_needs_freeze() for a different purpose then we don't need
> > to worry about the wrong phase information. If we are worried about
> > that then we should have an assert in that function to ensure that the
> > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
>
> The motivation was to restore the offnum, which is set to Invalid at the start
> of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
> you think it's important, we could just set vacrelstats->offnum = Invalid
> before returning,
>

Yeah, I would prefer that and probably a comment to indicate why we
are doing that.

> but that's what the restore function was built for.
>

I think it would be better to call restore wherever we call update. I
see your point that there is some value doing it via update/restore
but I think we should try to avoid that at many places unless it is
required and we already update blockno information without
update/restore at few places.

--
With Regards,
Amit Kapila.


12