Backpatch b61d161c14

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Backpatch b61d161c14

akapila
I propose to backpatch b61d161c14 [1] (Introduce vacuum errcontext to
display additional information.).  In the recent past, we have seen an
error report similar to "ERROR: found xmin 2157740646 from before
relfrozenxid 1197" from multiple EDB customers.  A similar report is
seen on pgsql-bugs as well [2] which I think has triggered the
implementation of this feature for v13.  Such reports mostly indicate
database corruption rather than any postgres bug which is also
indicated by the error-code (from before relfrozenxid) for this
message. I think there is a good reason to back-patch this as multiple
users are facing similar issues.  This patch won't fix this issue but
it will help us in detecting the problematic part of the heap/index
and then if users wish they can delete the portion of data that
appeared to be corrupted and resume the operations on that relation.

I have tried to back-patch this for v12 and attached is the result.
The attached patch passes make check-world but I have yet to test it
manually and also prepare the patch for other branches once we agree
on this proposal.

Thoughts?

[1] -
commit b61d161c146328ae6ba9ed937862d66e5c8b035a
Author: Amit Kapila <[hidden email]>
Date:   Mon Mar 30 07:33:38 2020 +0530

    Introduce vacuum errcontext to display additional information.

    The additional information displayed will be block number for error
    occurring while processing heap and index name for error occurring
    while processing the index.

[2] - https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Introduce-vacuum-errcontext-to-display-additional-in.v12.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14

akapila
On Mon, Jun 22, 2020 at 10:35 AM Amit Kapila <[hidden email]> wrote:

>
> I propose to backpatch b61d161c14 [1] (Introduce vacuum errcontext to
> display additional information.).  In the recent past, we have seen an
> error report similar to "ERROR: found xmin 2157740646 from before
> relfrozenxid 1197" from multiple EDB customers.  A similar report is
> seen on pgsql-bugs as well [2] which I think has triggered the
> implementation of this feature for v13.  Such reports mostly indicate
> database corruption rather than any postgres bug which is also
> indicated by the error-code (from before relfrozenxid) for this
> message.
>

Sorry, the error-code I want to refer to in above sentence was
ERRCODE_DATA_CORRUPTED.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14

Alvaro Herrera-9
In reply to this post by akapila
On 2020-Jun-22, Amit Kapila wrote:

> I propose to backpatch b61d161c14 [1] (Introduce vacuum errcontext to
> display additional information.).  In the recent past, we have seen an
> error report similar to "ERROR: found xmin 2157740646 from before
> relfrozenxid 1197" from multiple EDB customers.  A similar report is
> seen on pgsql-bugs as well [2] which I think has triggered the
> implementation of this feature for v13.

+1 to backpatching this change.  I did not review your actual patch,
though.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14

Andres Freund
In reply to this post by akapila
Hi,

On 2020-06-22 10:35:47 +0530, Amit Kapila wrote:

> I propose to backpatch b61d161c14 [1] (Introduce vacuum errcontext to
> display additional information.).  In the recent past, we have seen an
> error report similar to "ERROR: found xmin 2157740646 from before
> relfrozenxid 1197" from multiple EDB customers.  A similar report is
> seen on pgsql-bugs as well [2] which I think has triggered the
> implementation of this feature for v13.  Such reports mostly indicate
> database corruption rather than any postgres bug which is also
> indicated by the error-code (from before relfrozenxid) for this
> message. I think there is a good reason to back-patch this as multiple
> users are facing similar issues.  This patch won't fix this issue but
> it will help us in detecting the problematic part of the heap/index
> and then if users wish they can delete the portion of data that
> appeared to be corrupted and resume the operations on that relation.
>
> I have tried to back-patch this for v12 and attached is the result.
> The attached patch passes make check-world but I have yet to test it
> manually and also prepare the patch for other branches once we agree
> on this proposal.

I think having the additional information in the back branches would be
good. But on the other hand I think this is a somewhat large change
to backpatch, and it hasn't yet much real world exposure.

I'm also uncomfortable with the approach of just copying all of
LVRelStats in several places:

>  /*
> @@ -1580,9 +1648,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>   int uncnt = 0;
>   TransactionId visibility_cutoff_xid;
>   bool all_frozen;
> + LVRelStats olderrinfo;
>  
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
>  
> + /* Update error traceback information */
> + olderrinfo = *vacrelstats;
> + update_vacuum_error_info(vacrelstats, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> + blkno, NULL);
> +
>   START_CRIT_SECTION();
>  
>   for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
> @@ -1659,6 +1733,11 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>    *vmbuffer, visibility_cutoff_xid, flags);
>   }
>  
> + /* Revert to the previous phase information for error traceback */
> + update_vacuum_error_info(vacrelstats,
> + olderrinfo.phase,
> + olderrinfo.blkno,
> + olderrinfo.indname);
>   return tupindex;
>  }

To me that's a very weird approach. It's fragile because we need to be
sure that there's no updates to the wrong LVRelStats for important
things, and it has a good bit of potential to be inefficient because
LVRelStats isn't exactly small. This pretty much relies on the compiler
doing good enough escape analysis to realize that most parts of
olderrinfo aren't touched.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Justin Pryzby
On Mon, Jun 22, 2020 at 01:09:39PM -0700, Andres Freund wrote:
> On 2020-06-22 10:35:47 +0530, Amit Kapila wrote:
> > I propose to backpatch b61d161c14 [1] (Introduce vacuum errcontext to
> > display additional information.).
...
> I think having the additional information in the back branches would be
> good. But on the other hand I think this is a somewhat large change
> to backpatch, and it hasn't yet much real world exposure.

I see that's nontrivial to cherry-pick due to parallel vacuum changes, and due
to re-arranging calls to pgstat_progress.

Since the next minor releases are in August, and PG13 expected to be released
~October, we could defer backpatching until November (or later).

> I'm also uncomfortable with the approach of just copying all of
> LVRelStats in several places:
>
> >  /*
> > @@ -1580,9 +1648,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> >   int uncnt = 0;
> >   TransactionId visibility_cutoff_xid;
> >   bool all_frozen;
> > + LVRelStats olderrinfo;

I guess the alternative is to write like

LVRelStats olderrinfo = {
        .phase = vacrelstats.phase,
        .blkno = vacrelstats.blkno,
        .indname = vacrelstats.indname,
};

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Andres Freund
Hi,

On 2020-06-22 15:43:11 -0500, Justin Pryzby wrote:

> On Mon, Jun 22, 2020 at 01:09:39PM -0700, Andres Freund wrote:
> > I'm also uncomfortable with the approach of just copying all of
> > LVRelStats in several places:
> >
> > >  /*
> > > @@ -1580,9 +1648,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > >   int uncnt = 0;
> > >   TransactionId visibility_cutoff_xid;
> > >   bool all_frozen;
> > > + LVRelStats olderrinfo;
>
> I guess the alternative is to write like
>
> LVRelStats olderrinfo = {
> .phase = vacrelstats.phase,
> .blkno = vacrelstats.blkno,
> .indname = vacrelstats.indname,
> };

No, I don't think that's a solution. I think it's wrong to have
something like olderrinfo in the first place. Using a struct with ~25
members to store the current state of three variables just doesn't make
sense.  Why isn't this just a LVSavedPosition struct or something like
that?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Tom Lane-2
Andres Freund <[hidden email]> writes:
> No, I don't think that's a solution. I think it's wrong to have
> something like olderrinfo in the first place. Using a struct with ~25
> members to store the current state of three variables just doesn't make
> sense.  Why isn't this just a LVSavedPosition struct or something like
> that?

That seems like rather pointless micro-optimization really; the struct's
not *that* large.  But I have a different complaint now that I look at
this code: is it safe at all?  I see that the indname field is a pointer
to who-knows-where.  If it's possible in the first place for that to
change while this code runs, then what guarantees that we won't be
restoring a dangling pointer to freed memory?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Justin Pryzby
On Mon, Jun 22, 2020 at 06:15:27PM -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > No, I don't think that's a solution. I think it's wrong to have
> > something like olderrinfo in the first place. Using a struct with ~25
> > members to store the current state of three variables just doesn't make
> > sense.  Why isn't this just a LVSavedPosition struct or something like
> > that?
>
> That seems like rather pointless micro-optimization really; the struct's
> not *that* large.  But I have a different complaint now that I look at
> this code: is it safe at all?  I see that the indname field is a pointer
> to who-knows-where.  If it's possible in the first place for that to
> change while this code runs, then what guarantees that we won't be
> restoring a dangling pointer to freed memory?

I'm not sure it addresses your concern, but we talked a bunch about safety
starting here:
https://www.postgresql.org/message-id/20200326150457.GB17431%40telsasoft.com
..and concluding with an explanation about CHECK_FOR_INTERRUPTS.

[hidden email]
|And I think you're right: we only save state when the calling function has a
|indname=NULL, so we never "put back" a non-NULL indname.  We go from having a
|indname=NULL at lazy_scan_heap to not not-NULL at lazy_vacuum_index, and never
|the other way around.  So once we've "reverted back", 1) the pointer is null;
|and, 2) the callback function doesn't access it for the previous/reverted phase
|anyway.

When this function is called by lazy_vacuum_{heap,page,index}, it's also called
a 2nd time to restore the previous phase information.  When it's called the
first time by lazy_vacuum_index(), it does errinfo->indname = pstrdup(indname),
and on the 2nd call then does pfree(errinfo->indame), followed by
errinfo->indname = NULL.

|static void
|update_vacuum_error_info(LVSavedPosition *errinfo, int phase, BlockNumber blkno,
|                                                 char *indname)
|{
|        errinfo->blkno = blkno;
|        errinfo->phase = phase;
|
|        /* Free index name from any previous phase */
|        if (errinfo->indname)
|                pfree(errinfo->indname);
|
|        /* For index phases, save the name of the current index for the callback */
|        errinfo->indname = indname ? pstrdup(indname) : NULL;
|}

If it's inadequately clear, maybe we should do:

         if (errinfo->indname)
+        {
                 pfree(errinfo->indname);
+                Assert(indname == NULL);
+        }

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> On Mon, Jun 22, 2020 at 06:15:27PM -0400, Tom Lane wrote:
>> That seems like rather pointless micro-optimization really; the struct's
>> not *that* large.  But I have a different complaint now that I look at
>> this code: is it safe at all?  I see that the indname field is a pointer
>> to who-knows-where.  If it's possible in the first place for that to
>> change while this code runs, then what guarantees that we won't be
>> restoring a dangling pointer to freed memory?

> I'm not sure it addresses your concern, but we talked a bunch about safety
> starting here:
> https://www.postgresql.org/message-id/20200326150457.GB17431%40telsasoft.com
> ..and concluding with an explanation about CHECK_FOR_INTERRUPTS.

> [hidden email]
> |And I think you're right: we only save state when the calling function has a
> |indname=NULL, so we never "put back" a non-NULL indname.  We go from having a
> |indname=NULL at lazy_scan_heap to not not-NULL at lazy_vacuum_index, and never
> |the other way around.  So once we've "reverted back", 1) the pointer is null;
> |and, 2) the callback function doesn't access it for the previous/reverted phase
> |anyway.

If we're relying on that, I'd replace the "save" action by an Assert that
indname is NULL, and the "restore" action by just assigning NULL again.
That eliminates all concern about whether the restored value is valid.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Justin Pryzby
In reply to this post by Andres Freund
On Mon, Jun 22, 2020 at 01:57:12PM -0700, Andres Freund wrote:

> On 2020-06-22 15:43:11 -0500, Justin Pryzby wrote:
> > On Mon, Jun 22, 2020 at 01:09:39PM -0700, Andres Freund wrote:
> > > I'm also uncomfortable with the approach of just copying all of
> > > LVRelStats in several places:
> > >
> > > >  /*
> > > > @@ -1580,9 +1648,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > >   int uncnt = 0;
> > > >   TransactionId visibility_cutoff_xid;
> > > >   bool all_frozen;
> > > > + LVRelStats olderrinfo;
> >
> > I guess the alternative is to write like
> >
> > LVRelStats olderrinfo = {
> > .phase = vacrelstats.phase,
> > .blkno = vacrelstats.blkno,
> > .indname = vacrelstats.indname,
> > };
>
> No, I don't think that's a solution. I think it's wrong to have
> something like olderrinfo in the first place. Using a struct with ~25
> members to store the current state of three variables just doesn't make
> sense.  Why isn't this just a LVSavedPosition struct or something like
> that?
I'd used LVRelStats on your suggestion:
https://www.postgresql.org/message-id/20191211165425.4ewww2s5k5cafi4l%40alap3.anarazel.de
https://www.postgresql.org/message-id/20200120191305.sxi44cedhtxwr3ag%40alap3.anarazel.de

I understood the goal to be avoiding the need to add a new struct, when most
functions are already passed LVRelStats *vacrelstats.

But maybe I misunderstood.  (Also, back in January, the callback was only used
for scan-heap phase, so it's increased in scope several times).

Anyway, I put together some patches for discussion purposes.

--
Justin

0001-Rename-from-errcbarg.patch (1K) Download Attachment
0002-Add-assert-and-document-why-indname-is-safe.patch (1K) Download Attachment
0003-Move-error-information-into-a-separate-struct.patch (15K) Download Attachment
0004-Update-functions-to-pass-only-errinfo-struct.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

akapila
On Tue, Jun 23, 2020 at 7:13 AM Justin Pryzby <[hidden email]> wrote:

>
> On Mon, Jun 22, 2020 at 01:57:12PM -0700, Andres Freund wrote:
> >
> > No, I don't think that's a solution. I think it's wrong to have
> > something like olderrinfo in the first place. Using a struct with ~25
> > members to store the current state of three variables just doesn't make
> > sense.  Why isn't this just a LVSavedPosition struct or something like
> > that?
>
> I'd used LVRelStats on your suggestion:
> https://www.postgresql.org/message-id/20191211165425.4ewww2s5k5cafi4l%40alap3.anarazel.de
> https://www.postgresql.org/message-id/20200120191305.sxi44cedhtxwr3ag%40alap3.anarazel.de
>
> I understood the goal to be avoiding the need to add a new struct, when most
> functions are already passed LVRelStats *vacrelstats.
>

Yeah, I think this is a good point against adding a separate struct.
I also don't think that we can buy much by doing this optimization.
To me, the current code looks good in this regard.

> But maybe I misunderstood.  (Also, back in January, the callback was only used
> for scan-heap phase, so it's increased in scope several times).
>
> Anyway, I put together some patches for discussion purposes.
>

Few comments for 0002-Add-assert-and-document-why-indname-is-safe
-----------------------------------------------------------------------------------------------------
- /* Free index name from any previous phase */
  if (errinfo->indname)
+ {
+ /*
+ * indname is only ever saved during lazy_vacuum_index(), which
+ * during which the phase information is not not further
+ * manipulated, until it's restored before returning from
+ * lazy_vacuum_index().
+ */
+ Assert(indname == NULL);
+
  pfree(errinfo->indname);
+ errinfo->indname = NULL;
+ }

It is not very clear that this is the place where we are saving the
state.  I think it would be better to do in the caller (ex. in before
statement olderrinfo = *vacrelstats; in lazy_vacuum_index()) where it
is clear that we are saving the state for later use.

I guess for the restore case we are already assigning NULL via
"errinfo->indname = indname ? pstrdup(indname) : NULL;" in
update_vacuum_error_info.  I think some more comments in the function
update_vacuum_error_info would explain it better.

0001-Rename-from-errcbarg, looks fine to me but we can see if others
have any opinion on the naming (especially changing  VACUUM_ERRCB* to
VACUUM_ERRINFO*).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Andres Freund
In reply to this post by Justin Pryzby
Hi,

On 2020-06-22 20:43:47 -0500, Justin Pryzby wrote:

> On Mon, Jun 22, 2020 at 01:57:12PM -0700, Andres Freund wrote:
> > On 2020-06-22 15:43:11 -0500, Justin Pryzby wrote:
> > > On Mon, Jun 22, 2020 at 01:09:39PM -0700, Andres Freund wrote:
> > > > I'm also uncomfortable with the approach of just copying all of
> > > > LVRelStats in several places:
> > > >
> > > > >  /*
> > > > > @@ -1580,9 +1648,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > > >   int uncnt = 0;
> > > > >   TransactionId visibility_cutoff_xid;
> > > > >   bool all_frozen;
> > > > > + LVRelStats olderrinfo;
> > >
> > > I guess the alternative is to write like
> > >
> > > LVRelStats olderrinfo = {
> > > .phase = vacrelstats.phase,
> > > .blkno = vacrelstats.blkno,
> > > .indname = vacrelstats.indname,
> > > };
> >
> > No, I don't think that's a solution. I think it's wrong to have
> > something like olderrinfo in the first place. Using a struct with ~25
> > members to store the current state of three variables just doesn't make
> > sense.  Why isn't this just a LVSavedPosition struct or something like
> > that?
>
> I'd used LVRelStats on your suggestion:
> https://www.postgresql.org/message-id/20191211165425.4ewww2s5k5cafi4l%40alap3.anarazel.de
> https://www.postgresql.org/message-id/20200120191305.sxi44cedhtxwr3ag%40alap3.anarazel.de
>
> I understood the goal to be avoiding the need to add a new struct, when most
> functions are already passed LVRelStats *vacrelstats.

> But maybe I misunderstood.  (Also, back in January, the callback was only used
> for scan-heap phase, so it's increased in scope several times).

I am only suggesting that where you save the old location, as currently
done with LVRelStats olderrinfo, you instead use a more specific
type. Not that you should pass that anywhere (except for
update_vacuum_error_info).

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Tom Lane-2
Andres Freund <[hidden email]> writes:
> I am only suggesting that where you save the old location, as currently
> done with LVRelStats olderrinfo, you instead use a more specific
> type. Not that you should pass that anywhere (except for
> update_vacuum_error_info).

As things currently stand, I don't think we need another struct
type at all.  ISTM we should hard-wire the handling of indname
as I suggested above.  Then there are only two fields to be dealt
with, and we could just as well save them in simple local variables.

If there's a clear future path to needing to save/restore more
fields, then maybe another struct type would be useful ... but
right now the struct type declaration itself would take more
lines of code than it would save.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Justin Pryzby
On Tue, Jun 23, 2020 at 12:14:40AM -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > I am only suggesting that where you save the old location, as currently
> > done with LVRelStats olderrinfo, you instead use a more specific
> > type. Not that you should pass that anywhere (except for
> > update_vacuum_error_info).
>
> As things currently stand, I don't think we need another struct
> type at all.  ISTM we should hard-wire the handling of indname
> as I suggested above.  Then there are only two fields to be dealt
> with, and we could just as well save them in simple local variables.
>
> If there's a clear future path to needing to save/restore more
> fields, then maybe another struct type would be useful ... but
> right now the struct type declaration itself would take more
> lines of code than it would save.
Updated patches for consideration.  I left the "struct" patch there to show
what it'd look like.

--
Justin

v2-0001-Rename-from-errcbarg.patch (1K) Download Attachment
v2-0002-Specially-save-the-index-name-in-lazy_-vacuum-cle.patch (6K) Download Attachment
v2-0003-Save-phase-blkno-in-local-variables-for-consisten.patch (4K) Download Attachment
v2-0004-Move-error-information-into-a-separate-struct.patch (15K) Download Attachment
v2-0005-Update-functions-to-pass-only-errinfo-struct.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-06-23 00:14:40 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > I am only suggesting that where you save the old location, as currently
> > done with LVRelStats olderrinfo, you instead use a more specific
> > type. Not that you should pass that anywhere (except for
> > update_vacuum_error_info).
>
> As things currently stand, I don't think we need another struct
> type at all.  ISTM we should hard-wire the handling of indname
> as I suggested above.  Then there are only two fields to be dealt
> with, and we could just as well save them in simple local variables.
That's fine with me too.


> If there's a clear future path to needing to save/restore more
> fields, then maybe another struct type would be useful ... but
> right now the struct type declaration itself would take more
> lines of code than it would save.

FWIW, I started to be annoyed by this code when I was addding
prefetching support to vacuum, and wanted to change what's tracked
where. The number of places that needed to be touched was
disproportional.


Here's a *draft* for how I thought this roughly could look like. I think
it's nicer to not specify the exact saved state in multiple places, and
I think it's much clearer to use a separate function to restore the
state than to set a "fresh" state.

I've only applied a hacky fix for the way the indname is tracked, I
thought that'd best be discussed separately.

Greetings,

Andres Freund

draft-vacuumlazy-errcontext-tracking.diff (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

akapila
On Tue, Jun 23, 2020 at 11:49 PM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2020-06-23 00:14:40 -0400, Tom Lane wrote:
> > Andres Freund <[hidden email]> writes:
> > > I am only suggesting that where you save the old location, as currently
> > > done with LVRelStats olderrinfo, you instead use a more specific
> > > type. Not that you should pass that anywhere (except for
> > > update_vacuum_error_info).
> >
> > As things currently stand, I don't think we need another struct
> > type at all.  ISTM we should hard-wire the handling of indname
> > as I suggested above.  Then there are only two fields to be dealt
> > with, and we could just as well save them in simple local variables.
>
> That's fine with me too.
>
I have looked at both the patches (using separate variables (by
Justin) and using a struct (by Andres)) and found the second one bit
better.

>
> > If there's a clear future path to needing to save/restore more
> > fields, then maybe another struct type would be useful ... but
> > right now the struct type declaration itself would take more
> > lines of code than it would save.
>
> FWIW, I started to be annoyed by this code when I was addding
> prefetching support to vacuum, and wanted to change what's tracked
> where. The number of places that needed to be touched was
> disproportional.
>
>
> Here's a *draft* for how I thought this roughly could look like. I think
> it's nicer to not specify the exact saved state in multiple places, and
> I think it's much clearer to use a separate function to restore the
> state than to set a "fresh" state.
>
I think this is a good idea and makes code look better.  I think it is
better to name new struct as LVSavedErrInfo instead of LVSavedPos.

> I've only applied a hacky fix for the way the indname is tracked, I
> thought that'd best be discussed separately.
>

I think it is better to use Tom's idea here to save and restore index
information in-place.  I have used Justin's patch with some
improvements like adding Asserts and initializing with NULL for
indname while restoring to make things unambiguous.

I have improved some comments in the code and for now, kept as two
patches (a) one for improving the error info for index (mostly
Justin's patch based on Tom's idea) and (b) the other to generally
improve the code in this area (mostly Andres's patch).

I have done some testing with both the patches and would like to do
more unless there are objections with these.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

improve_vac_err_cb_v1.patch (6K) Download Attachment
improve_vac_err_cb_v2.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

Justin Pryzby
On Thu, Jun 25, 2020 at 02:31:51PM +0530, Amit Kapila wrote:
> I have looked at both the patches (using separate variables (by
> Justin) and using a struct (by Andres)) and found the second one bit
> better.

Thanks for looking.

> I have improved some comments in the code and for now, kept as two
> patches (a) one for improving the error info for index (mostly
> Justin's patch based on Tom's idea) and (b) the other to generally
> improve the code in this area (mostly Andres's patch).

And thanks for separate patchen :)

> I have done some testing with both the patches and would like to do
> more unless there are objections with these.

Comments:

>         * The index name is saved only during this phase and restored immediately

=> I wouldn't say "only" since it's saved during lazy_vacuum: index AND cleanup.

>update_vacuum_error_info(LVRelStats *errinfo, LVSavedErrInfo *oldpos, int phase,

=> You called your struct "LVSavedErrInfo" but the variables are still called
"pos".  I would call it olderrinfo or just old.

Also, this doesn't (re)rename the "cbarg" stuff that Alvaro didn't like, which
was my 0001 patch.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

akapila
On Fri, Jun 26, 2020 at 7:25 AM Justin Pryzby <[hidden email]> wrote:

>
>
> > I have done some testing with both the patches and would like to do
> > more unless there are objections with these.
>
> Comments:
>
> >         * The index name is saved only during this phase and restored immediately
>
> => I wouldn't say "only" since it's saved during lazy_vacuum: index AND cleanup.
>
> >update_vacuum_error_info(LVRelStats *errinfo, LVSavedErrInfo *oldpos, int phase,
>
> => You called your struct "LVSavedErrInfo" but the variables are still called
> "pos".  I would call it olderrinfo or just old.
>
Fixed both of the above comments. I used the variable name as saved_err_info.

> Also, this doesn't (re)rename the "cbarg" stuff that Alvaro didn't like, which
> was my 0001 patch.
>

If I am not missing anything then that change was in
lazy_cleanup_index and after this patch, it won't be required because
we are using a different variable name.

I have combined both the patches now.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

0001-Improve-vacuum-error-context-handling.v1.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

akapila
On Fri, Jun 26, 2020 at 9:19 AM Amit Kapila <[hidden email]> wrote:

>
> On Fri, Jun 26, 2020 at 7:25 AM Justin Pryzby <[hidden email]> wrote:
> >
> >
> > Comments:
> >
> > >         * The index name is saved only during this phase and restored immediately
> >
> > => I wouldn't say "only" since it's saved during lazy_vacuum: index AND cleanup.
> >
> > >update_vacuum_error_info(LVRelStats *errinfo, LVSavedErrInfo *oldpos, int phase,
> >
> > => You called your struct "LVSavedErrInfo" but the variables are still called
> > "pos".  I would call it olderrinfo or just old.
> >
>
> Fixed both of the above comments. I used the variable name as saved_err_info.
>
> > Also, this doesn't (re)rename the "cbarg" stuff that Alvaro didn't like, which
> > was my 0001 patch.
> >
>
> If I am not missing anything then that change was in
> lazy_cleanup_index and after this patch, it won't be required because
> we are using a different variable name.
>
> I have combined both the patches now.
>

I am planning to push this tomorrow if there are no further
suggestions/comments.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)

akapila
On Tue, Jun 30, 2020 at 9:30 AM Amit Kapila <[hidden email]> wrote:

>
> >
> > If I am not missing anything then that change was in
> > lazy_cleanup_index and after this patch, it won't be required because
> > we are using a different variable name.
> >
> > I have combined both the patches now.
> >
>
> I am planning to push this tomorrow if there are no further
> suggestions/comments.
>

Pushed.  Now, coming back to the question of the back patch.  I see a
point in deferring this for 3-6 months or maybe more after PG13 is
released.  OTOH, this implementation is mainly triggered by issues
reported in this area and this doesn't seem to be a very invasive
patch which can cause some de-stabilization in back-branches. I am not
in a hurry to get this backpatched but still, it would be good if this
can be backpatched earlier as quite a few people (onlist and EDB
customers) have reported issues that could have been narrowed down if
this patch is present in back-branches.

It seems Alvaro and I are in favor of backpatch whereas Andres and
Justin seem to think it should be deferred until this change has seen
some real-world exposure.

Anyone else wants to weigh in?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com