progress report for ANALYZE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

progress report for ANALYZE

Alvaro Herrera-9
Hello

Here's a patch that implements progress reporting for ANALYZE.


Thanks

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v1-0001-Report-progress-for-ANALYZE.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Julien Rouhaud
On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <[hidden email]> wrote:
>
> Here's a patch that implements progress reporting for ANALYZE.

Patch applies, code and doc and compiles cleanly.  I have few comments:

@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
    if (numrows > 0)
    {
        MemoryContext col_context,
-                   old_context;
+                     old_context;
+       const int   index[] = {
+           PROGRESS_ANALYZE_PHASE,
+           PROGRESS_ANALYZE_TOTAL_BLOCKS,
+           PROGRESS_ANALYZE_BLOCKS_DONE
+       };
+       const int64 val[] = {
+           PROGRESS_ANALYZE_PHASE_ANALYSIS,
+           0, 0
+       };
+
+       pgstat_progress_update_multi_param(3, index, val);
[...]
    }
+   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned.  I'm not sure if we should stay
consistent here.

diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
    /* Translate command name into command type code. */
    if (pg_strcasecmp(cmd, "VACUUM") == 0)
        cmdtype = PROGRESS_COMMAND_VACUUM;
+   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+       cmdtype = PROGRESS_COMMAND_ANALYZE;
    else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
        cmdtype = PROGRESS_COMMAND_CLUSTER;
    else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)

it should be an "else if" here.

Everything else LGTM.


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Anthony Nowocien
Hi,
In monitoring.sgml, "a" is missing in "row for ech backend that is currently running that command[...]".
Anthony


On Tuesday, July 2, 2019, Julien Rouhaud <[hidden email]> wrote:

> On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <[hidden email]> wrote:
>>
>> Here's a patch that implements progress reporting for ANALYZE.
>
> Patch applies, code and doc and compiles cleanly.  I have few comments:
>
> @@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
>     if (numrows > 0)
>     {
>         MemoryContext col_context,
> -                   old_context;
> +                     old_context;
> +       const int   index[] = {
> +           PROGRESS_ANALYZE_PHASE,
> +           PROGRESS_ANALYZE_TOTAL_BLOCKS,
> +           PROGRESS_ANALYZE_BLOCKS_DONE
> +       };
> +       const int64 val[] = {
> +           PROGRESS_ANALYZE_PHASE_ANALYSIS,
> +           0, 0
> +       };
> +
> +       pgstat_progress_update_multi_param(3, index, val);
> [...]
>     }
> +   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> +                                PROGRESS_ANALYZE_PHASE_COMPLETE);
> +
> If there wasn't any row but multiple blocks were scanned, the
> PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
> the blocks that were scanned.  I'm not sure if we should stay
> consistent here.
>
> diff --git a/src/backend/utils/adt/pgstatfuncs.c
> b/src/backend/utils/adt/pgstatfuncs.c
> index 05240bfd14..98b01e54fa 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
>     /* Translate command name into command type code. */
>     if (pg_strcasecmp(cmd, "VACUUM") == 0)
>         cmdtype = PROGRESS_COMMAND_VACUUM;
> +   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
> +       cmdtype = PROGRESS_COMMAND_ANALYZE;
>     else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
>         cmdtype = PROGRESS_COMMAND_CLUSTER;
>     else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
>
> it should be an "else if" here.
>
> Everything else LGTM.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Tatsuro Yamada-2
In reply to this post by Alvaro Herrera-9
Hi Alvaro!

On 2019/06/22 3:52, Alvaro Herrera wrote:
> Hello
>
> Here's a patch that implements progress reporting for ANALYZE.

Sorry for the late reply.
My email address was changed to [hidden email].

I have a question about your patch.
My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

However, actually, I think it's okay because the feature is useful
for DBAs, even if your patch can't handle Foreign table.

I'll review your patch in this week. :)
 
[1] ANALYZE command progress checker
https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

Regards,
Tatsuro Yamada


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Tatsuro Yamada-2
Hi Alvaro,

> I'll review your patch in this week. :)

I tested your patch on 6b854896.
Here is a result. See below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a
\t
select * from pg_stat_progress_analyze; \watch 0.001

17520|13599|postgres|16387|f|16387|scanning table|4425|14
17520|13599|postgres|16387|f|16387|scanning table|4425|64
17520|13599|postgres|16387|f|16387|scanning table|4425|111
...
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
---------------------------------------------------------

I have a question of the last line of the result.
I'm not sure it is able or not, but I guess it would be better
to keep the phase name (analyzing sample) on the view until the
end of this command. :)

Regards,
Tatsuro Yamada





Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Robert Haas
On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
<[hidden email]> wrote:
> 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??

Why do we zero out the block numbers when we switch phases?  The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Alvaro Herrera-9
On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
> <[hidden email]> wrote:
> > 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> > 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> > 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
>
> Why do we zero out the block numbers when we switch phases?  The
> CREATE INDEX progress reporting patch does that kind of thing too, and
> it seems like poor behavior to me.

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Robert Haas
On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <[hidden email]> wrote:
> Yeah, I got the impression that that was determined to be the desirable
> behavior, so I made it do that, but I'm not really happy about it
> either.  We're not too late to change the CREATE INDEX behavior, but
> let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Julien Rouhaud
On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <[hidden email]> wrote:

>
> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <[hidden email]> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
>
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?
>
> I propose that once a field is set, we should leave it set until the end.

+1

Note that this patch is already behaving like that if the table only
contains dead rows.


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Tatsuro Yamada-2
Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <[hidden email]> wrote:
>>
>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <[hidden email]> wrote:
>>> Yeah, I got the impression that that was determined to be the desirable
>>> behavior, so I made it do that, but I'm not really happy about it
>>> either.  We're not too late to change the CREATE INDEX behavior, but
>>> let's discuss what is it that we want.
>>
>> I don't think I intended to make any such determination -- which
>> commit do you think established this as the canonical behavior?
>>
>> I propose that once a field is set, we should leave it set until the end.
>
> +1
>
> Note that this patch is already behaving like that if the table only
> contains dead rows.

I fixed the patch including:

   - Replace "if" to "else if". (Suggested by Julien)
   - Fix typo s/ech/each/. (Suggested by Anthony)
   - Add Phase "analyzing complete" in the pgstat view. (Suggested by Julien, Robert and me)
     It was overlooked to add it in system_views.sql.

I share my re-test result, see below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001

3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and fixed. :)
---------------------------------------------------------

Thanks,
Tatsuro Yamada




v2-0001-Report-progress-for-ANALYZE.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Alvaro Herrera-9
In reply to this post by Robert Haas
On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <[hidden email]> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
>
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?

No commit, just discussion in the CREATE INDEX thread.

> I propose that once a field is set, we should leave it set until the end.

Hmm, ok.  In CREATE INDEX, we use the block counters multiple times. We
can leave them set until the next time we need them, I suppose.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Robert Haas
On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <[hidden email]> wrote:
> Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.

Why do we do that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Alvaro Herrera-9
On 2019-Jul-10, Robert Haas wrote:

> On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <[hidden email]> wrote:
> > Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.
>
> Why do we do that?

Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC).  We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns.  I think we also do that for tuple counters in one case.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Robert Haas
On Wed, Jul 10, 2019 at 9:26 AM Alvaro Herrera <[hidden email]> wrote:

> On 2019-Jul-10, Robert Haas wrote:
> > On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <[hidden email]> wrote:
> > > Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.
> >
> > Why do we do that?
>
> Because we scan the table first, then the index, then the table again
> (last two for the validation phase of CIC).  We count "block numbers"
> separately for each of those, and keep those counters in the same pair
> of columns.  I think we also do that for tuple counters in one case.

Hmm.  I think I would have been inclined to use different counter
numbers for table blocks and index blocks, but perhaps we don't have
room. Anyway, leaving them set until we need them again seems like the
best we can do as things stand.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Kyotaro Horiguchi-4
In reply to this post by Tatsuro Yamada-2
Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <[hidden email]> wrote in <[hidden email]>

> Hi Alvaro, Anthony, Julien and Robert,
>
> On 2019/07/09 3:47, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <[hidden email]>
> > wrote:
> >>
> >> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
> >> <[hidden email]> wrote:
> >>> Yeah, I got the impression that that was determined to be the
> >>> desirable
> >>> behavior, so I made it do that, but I'm not really happy about it
> >>> either.  We're not too late to change the CREATE INDEX behavior, but
> >>> let's discuss what is it that we want.
> >>
> >> I don't think I intended to make any such determination -- which
> >> commit do you think established this as the canonical behavior?
> >>
> >> I propose that once a field is set, we should leave it set until the
> >> end.
> > +1
> > Note that this patch is already behaving like that if the table only
> > contains dead rows.

+1 from me.

> I fixed the patch including:
>
>   - Replace "if" to "else if". (Suggested by Julien)
>   - Fix typo s/ech/each/. (Suggested by Anthony)
>   - Add Phase "analyzing complete" in the pgstat view. (Suggested by
>   - Julien, Robert and me)
>     It was overlooked to add it in system_views.sql.
>
> I share my re-test result, see below:
>
> ---------------------------------------------------------
> [Session #1]
> create table hoge as select * from generate_series(1, 1000000) a;
> analyze verbose hoge;
>
> [Session #2]
> \a \t
> select * from pg_stat_progress_analyze; \watch 0.001
>
> 3785|13599|postgres|16384|f|16384|scanning table|4425|6
> 3785|13599|postgres|16384|f|16384|scanning table|4425|31
> 3785|13599|postgres|16384|f|16384|scanning table|4425|70
> 3785|13599|postgres|16384|f|16384|scanning table|4425|109
> ...
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|analyzing sample|0|0
> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> fixed. :)
> ---------------------------------------------------------

I have some comments.

+ const int   index[] = {
+ PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_BLOCKS_DONE
+ };
+ const int64 val[] = {
+ PROGRESS_ANALYZE_PHASE_ANALYSIS,
+ 0, 0

Do you oppose to leaving the total/done blocks alone here:p?



+  BlockNumber  nblocks;
+  double    blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);

It is converted into int64.



+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'



+                      WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'


+     <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+      <entry>OID of the database to which this backend is connected.</entry>
+      <entry>Name of the database to which this backend is connected.</entry>

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)


+      <entry>Whether the current scan includes legacy inheritance children.</entry>

This apparently excludes partition tables but actually it is
included.

  "Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..


+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).

Is <literal> needed? And I think the parentheses are not needed.

  OID of the table currently being scanned. Can differ from relid
  when analyzing tables that have child tables.


+       Total number of heap blocks to scan in the current table.

   Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".


+       The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.


+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.

Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.


regards.

-
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Tatsuro Yamada-2
Hi Horiguchi-san!


On 2019/07/11 19:56, Kyotaro Horiguchi wrote:

> Hello.
>
> At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <[hidden email]> wrote in <[hidden email]>
>> Hi Alvaro, Anthony, Julien and Robert,
>>
>> On 2019/07/09 3:47, Julien Rouhaud wrote:
>>> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <[hidden email]>
>>> wrote:
>>>>
>>>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
>>>> <[hidden email]> wrote:
>>>>> Yeah, I got the impression that that was determined to be the
>>>>> desirable
>>>>> behavior, so I made it do that, but I'm not really happy about it
>>>>> either.  We're not too late to change the CREATE INDEX behavior, but
>>>>> let's discuss what is it that we want.
>>>>
>>>> I don't think I intended to make any such determination -- which
>>>> commit do you think established this as the canonical behavior?
>>>>
>>>> I propose that once a field is set, we should leave it set until the
>>>> end.
>>> +1
>>> Note that this patch is already behaving like that if the table only
>>> contains dead rows.
>
> +1 from me.
>
>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>> fixed. :)
>> ---------------------------------------------------------
>
> I have some comments.
>
> + const int   index[] = {
> + PROGRESS_ANALYZE_PHASE,
> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> + PROGRESS_ANALYZE_BLOCKS_DONE
> + };
> + const int64 val[] = {
> + PROGRESS_ANALYZE_PHASE_ANALYSIS,
> + 0, 0
>
> Do you oppose to leaving the total/done blocks alone here:p?

Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)


Ah, I revised it to remove "0, 0".



> +  BlockNumber  nblocks;
> +  double    blksdone = 0;
>
> Why is it a double even though blksdone is of the same domain
> with nblocks? And finally:
>
> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> +                   ++blksdone);
>
> It is converted into int64.

Fixed.
But is it suitable to use BlockNumber instead int64?


 

> +                      WHEN 2 THEN 'analyzing sample'
> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
>
> I think we should avoid parenthesized phrases as far as
> not-necessary. That makes the column unnecessarily long. The
> phase is internally called as "compute stats" so *I* would prefer
> something like the followings:
>
> +                      WHEN 2 THEN 'computing stats'
> +                      WHEN 3 THEN 'computing extended stats'
>
>
>
> +                      WHEN 4 THEN 'analyzing complete'
>
> And if you are intending by this that (as mentioned in the
> documentation) "working to complete this analyze", this would be:
>
> +                      WHEN 4 THEN 'completing analyze'
> +                      WHEN 4 THEN 'finalizing analyze'

I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

   WHEN 2 THEN 'computing stats'
   WHEN 3 THEN 'computing extended stats'
   WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.



> +     <entry>Process ID of backend.</entry>
>
> of "the" backend. ? And period is not attached on all
> descriptions consisting of a single sentence.
>
> +      <entry>OID of the database to which this backend is connected.</entry>
> +      <entry>Name of the database to which this backend is connected.</entry>
>
> "database to which .. is connected" is phrased as "database this
> backend is connected to" in pg_stat_acttivity. (Just for consistency)

I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)



> +      <entry>Whether the current scan includes legacy inheritance children.</entry>
>
> This apparently excludes partition tables but actually it is
> included.
>
>    "Whether scanning through child tables" ?
>
> I'm not sure "child tables" is established as the word to mean
> both "partition tables and inheritance children"..


Hmm... I'm also not sure but I fixed as you suggested.



> +       The table being scanned (differs from <literal>relid</literal>
> +       only when processing partitions or inheritance children).
>
> Is <literal> needed? And I think the parentheses are not needed.
>
>    OID of the table currently being scanned. Can differ from relid
>    when analyzing tables that have child tables.


How about:
   OID of the table currently being scanned.
   It might be different from relid when analyzing tables that have child tables.



> +       Total number of heap blocks to scan in the current table.
>
>     Number of heap blocks on scanning_table to scan?
>
> It might be better that this description describes that this
> and the next column is meaningful only while the phase
> "scanning table".


How about:
   Total number of heap blocks in the scanning_table.




> +       The command is currently scanning the table.
>
> "sample(s)" comes somewhat abruptly in the next item. Something
> like "The command is currently scanning the table
> <structfield>scanning_table</structfield> to obtain samples"
> might be better.


Fixed.

 
> +       <command>ANALYZE</command> is currently extracting statistical data
> +       from the sample obtained.
>
> Something like "The command is computing stats from the samples
> obtained in the previous phase" might be better.


Fixed.


Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================


Thanks,
Tatsuro Yamada


v3-0001-Report-progress-for-ANALYZE.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Kyotaro Horiguchi-4
Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <[hidden email]> wrote in <[hidden email]>

> >> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> >> fixed. :)
> >> ---------------------------------------------------------
> >
> > I have some comments.
> > + const int   index[] = {
> > + PROGRESS_ANALYZE_PHASE,
> > + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> > + PROGRESS_ANALYZE_BLOCKS_DONE
> > + };
> > + const int64 val[] = {
> > + PROGRESS_ANALYZE_PHASE_ANALYSIS,
> > + 0, 0
> > Do you oppose to leaving the total/done blocks alone here:p?
>
>
> Thanks for your comments!
> I created a new patch based on your comments because Alvaro allowed me
> to work on revising the patch. :)
>
>
> Ah, I revised it to remove "0, 0".

Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

> > +  BlockNumber  nblocks;
> > +  double    blksdone = 0;
> > Why is it a double even though blksdone is of the same domain
> > with nblocks? And finally:
> > +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> > +                   ++blksdone);
> > It is converted into int64.
>
>
> Fixed.
> But is it suitable to use BlockNumber instead int64?

Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

> > +                      WHEN 2 THEN 'analyzing sample'
> > +                      WHEN 3 THEN 'analyzing sample (extended stats)'
> > I think we should avoid parenthesized phrases as far as
> > not-necessary. That makes the column unnecessarily long. The
> > phase is internally called as "compute stats" so *I* would prefer
> > something like the followings:
> > +                      WHEN 2 THEN 'computing stats'
> > +                      WHEN 3 THEN 'computing extended stats'
> > +                      WHEN 4 THEN 'analyzing complete'
> > And if you are intending by this that (as mentioned in the
> > documentation) "working to complete this analyze", this would be:
> > +                      WHEN 4 THEN 'completing analyze'
> > +                      WHEN 4 THEN 'finalizing analyze'
>
>
> I have no strong opinion, so I changed the phase-names based on
> your suggestions like following:
>
>   WHEN 2 THEN 'computing stats'
>   WHEN 3 THEN 'computing extended stats'
>   WHEN 4 THEN 'finalizing analyze'
>
> However, I'd like to get any comments from hackers to get a consensus
> about the names.

Agreed. Especially such word choosing is not suitable for me..

> > +     <entry>Process ID of backend.</entry>
> > of "the" backend. ? And period is not attached on all
> > descriptions consisting of a single sentence.
> >
> > + <entry>OID of the database to which this backend is
> > connected.</entry>
> > + <entry>Name of the database to which this backend is
> > connected.</entry>
> > "database to which .. is connected" is phrased as "database this
> > backend is connected to" in pg_stat_acttivity. (Just for consistency)
>
>
> I checked the sentences on other views of progress monitor (VACUUM,
> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
> I'd like to keep it as is. :)

Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

> > + <entry>Whether the current scan includes legacy inheritance
> > children.</entry>
> > This apparently excludes partition tables but actually it is
> > included.
> >
> >    "Whether scanning through child tables" ?
> > I'm not sure "child tables" is established as the word to mean
> > both "partition tables and inheritance children"..
>
>
> Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..


> > +       The table being scanned (differs from <literal>relid</literal>
> > +       only when processing partitions or inheritance children).
> > Is <literal> needed? And I think the parentheses are not needed.
> >    OID of the table currently being scanned. Can differ from relid
> >    when analyzing tables that have child tables.
>
>
> How about:
>   OID of the table currently being scanned.
>   It might be different from relid when analyzing tables that have child
>   tables.
>
>
>
> > +       Total number of heap blocks to scan in the current table.
> >     Number of heap blocks on scanning_table to scan?
> > It might be better that this description describes that this
> > and the next column is meaningful only while the phase
> > "scanning table".
>
>
> How about:
>   Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a

> > +       The command is currently scanning the table.
> > "sample(s)" comes somewhat abruptly in the next item. Something
> > like "The command is currently scanning the table
> > <structfield>scanning_table</structfield> to obtain samples"
> > might be better.
>
>
> Fixed.
>
>  
> > + <command>ANALYZE</command> is currently extracting statistical data
> > +       from the sample obtained.
> > Something like "The command is computing stats from the samples
> > obtained in the previous phase" might be better.
>
>
> Fixed.
>
>
> Please find attached patch. :)
>
> Here is a test result of the patch.
> ==========================================================
> # select * from pg_stat_progress_analyze ; \watch 0.0001;
>
> 9067|13599|postgres|16387|f|16387|scanning table|443|14
> 9067|13599|postgres|16387|f|16387|scanning table|443|44
> 9067|13599|postgres|16387|f|16387|scanning table|443|76
> 9067|13599|postgres|16387|f|16387|scanning table|443|100
> ...
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
> ==========================================================

Looks fine!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Tatsuro Yamada-2
Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,


On 2019/07/22 17:30, Kyotaro Horiguchi wrote:

> Hello.
>
> # It's very good timing, as you came in while I have a time after
> # finishing a quite nerve-wrackig task..
>
> At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <[hidden email]> wrote in <[hidden email]>
>>>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>>>> fixed. :)
>>>> ---------------------------------------------------------
>>>
>>> I have some comments.
>>> + const int   index[] = {
>>> + PROGRESS_ANALYZE_PHASE,
>>> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
>>> + PROGRESS_ANALYZE_BLOCKS_DONE
>>> + };
>>> + const int64 val[] = {
>>> + PROGRESS_ANALYZE_PHASE_ANALYSIS,
>>> + 0, 0
>>> Do you oppose to leaving the total/done blocks alone here:p?
>>
>>
>> Thanks for your comments!
>> I created a new patch based on your comments because Alvaro allowed me
>> to work on revising the patch. :)
>>
>>
>> Ah, I revised it to remove "0, 0".
>
> Thanks. For a second I thought that
> PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
> renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:

./src/include/commands/progress.h

+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE              1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING                       2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE                        4

Is it Okay?

 

>>> +  BlockNumber  nblocks;
>>> +  double    blksdone = 0;
>>> Why is it a double even though blksdone is of the same domain
>>> with nblocks? And finally:
>>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
>>> +                   ++blksdone);
>>> It is converted into int64.
>>
>>
>> Fixed.
>> But is it suitable to use BlockNumber instead int64?
>
> Yeah, I didn't meant that we should use int64 there. Sorry for
> the confusing comment. I meant that blksdone should be of
> BlockNumber.

Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.
 

>>> +                      WHEN 2 THEN 'analyzing sample'
>>> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
>>> I think we should avoid parenthesized phrases as far as
>>> not-necessary. That makes the column unnecessarily long. The
>>> phase is internally called as "compute stats" so *I* would prefer
>>> something like the followings:
>>> +                      WHEN 2 THEN 'computing stats'
>>> +                      WHEN 3 THEN 'computing extended stats'
>>> +                      WHEN 4 THEN 'analyzing complete'
>>> And if you are intending by this that (as mentioned in the
>>> documentation) "working to complete this analyze", this would be:
>>> +                      WHEN 4 THEN 'completing analyze'
>>> +                      WHEN 4 THEN 'finalizing analyze'
>>
>>
>> I have no strong opinion, so I changed the phase-names based on
>> your suggestions like following:
>>
>>    WHEN 2 THEN 'computing stats'
>>    WHEN 3 THEN 'computing extended stats'
>>    WHEN 4 THEN 'finalizing analyze'
>>
>> However, I'd like to get any comments from hackers to get a consensus
>> about the names.
>
> Agreed. Especially such word choosing is not suitable for me..

To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)


 

>>> +     <entry>Process ID of backend.</entry>
>>> of "the" backend. ? And period is not attached on all
>>> descriptions consisting of a single sentence.
>>>
>>> + <entry>OID of the database to which this backend is
>>> connected.</entry>
>>> + <entry>Name of the database to which this backend is
>>> connected.</entry>
>>> "database to which .. is connected" is phrased as "database this
>>> backend is connected to" in pg_stat_acttivity. (Just for consistency)
>>
>>
>> I checked the sentences on other views of progress monitor (VACUUM,
>> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
>> I'd like to keep it as is. :)
>
> Oh, I see from where the wordings came. But no periods seen after
> sentense when it is the only one in a description in other system
> views tables. I think the progress views tables should be
> corrected following convention.

Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.

 

>>> + <entry>Whether the current scan includes legacy inheritance
>>> children.</entry>
>>> This apparently excludes partition tables but actually it is
>>> included.
>>>
>>>     "Whether scanning through child tables" ?
>>> I'm not sure "child tables" is established as the word to mean
>>> both "partition tables and inheritance children"..
>>
>>
>> Hmm... I'm also not sure but I fixed as you suggested.
>
> Yeah, I also am not sure the suggestion is good enough as is..
>
>>> +       Total number of heap blocks to scan in the current table.
>>>      Number of heap blocks on scanning_table to scan?
>>> It might be better that this description describes that this
>>> and the next column is meaningful only while the phase
>>> "scanning table".
>>
>>
>> How about:
>>    Total number of heap blocks in the scanning_table.
>
> (For me, ) that seems like it shows blocks including all
> descendents for inheritance parent. But I'm not sure..a

In the case of scanning_table is parent table, it doesn't
show the number. However, child tables shows the number.
I tested it using Declarative partitioning table, See the bottom
of this email. :)


>> Please find attached patch. :)
>>
>> Here is a test result of the patch.
>> ==========================================================
>> # select * from pg_stat_progress_analyze ; \watch 0.0001;
>>
>> 9067|13599|postgres|16387|f|16387|scanning table|443|14
>> ...
>> 9067|13599|postgres|16387|f|16387|scanning table|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
>> ==========================================================
>
> Looks fine!

I shared a test result using Declarative partitioning table.

==========================================================
## Create partitioning table
create table hoge as select a from generate_series(0, 40000) a;

create table hoge2(
   a integer
) partition by range(a);

create table hoge2_10000 partition of hoge2
for values from (0) to (10000);

create table hoge2_20000 partition of hoge2
for values from (10000) to (20000);

create table hoge2_30000 partition of hoge2
for values from (20000) to (30000);

create table hoge2_default partition of hoge2 default;


## Test
select oid,relname,relpages,reltuples from pg_class where relname like 'hoge%';

   oid  |    relname    | relpages | reltuples
-------+---------------+----------+-----------
  16538 | hoge          |      177 |     40001
  16541 | hoge2         |        0 |         0
  16544 | hoge2_10000   |       45 |     10000
  16547 | hoge2_20000   |       45 |     10000
  16550 | hoge2_30000   |       45 |     10000
  16553 | hoge2_default |       45 |     10001
(6 rows)

select * from pg_stat_progress_analyze ; \watch 0.00001;

27579|13599|postgres|16541|t|16544|scanning table|45|17
27579|13599|postgres|16541|t|16544|scanning table|45|38
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45

27579|13599|postgres|16541|t|16547|scanning table|45|17
27579|13599|postgres|16541|t|16547|scanning table|45|37
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45

27579|13599|postgres|16541|t|16550|scanning table|45|9
27579|13599|postgres|16541|t|16550|scanning table|45|30
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45

27579|13599|postgres|16541|t|16553|scanning table|45|5
27579|13599|postgres|16541|t|16553|scanning table|45|26
27579|13599|postgres|16541|t|16553|scanning table|45|42
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|finalizing analyze|45|45

27579|13599|postgres|16544|f|16544|scanning table|45|1
27579|13599|postgres|16544|f|16544|scanning table|45|30
27579|13599|postgres|16544|f|16544|computing stats|45|45

27579|13599|postgres|16547|f|16547|scanning table|45|25
27579|13599|postgres|16547|f|16547|computing stats|45|45

27579|13599|postgres|16550|f|16550|scanning table|45|11
27579|13599|postgres|16550|f|16550|scanning table|45|38
27579|13599|postgres|16550|f|16550|finalizing analyze|45|45

27579|13599|postgres|16553|f|16553|scanning table|45|25
27579|13599|postgres|16553|f|16553|computing stats|45|45
==========================================================

I'll share test result using partitioning table via
Inheritance tables on next email. :)

Thanks,
Tatsuro Yamada


v4-0001-Report-progress-for-ANALYZE.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Thomas Munro-5
On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<[hidden email]> wrote:
> Attached v4 patch file only includes this fix.

Hello all,

I've moved this to the September CF, where it is in "Needs review" state.

Thanks,

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: progress report for ANALYZE

Robert Haas
On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <[hidden email]> wrote:
> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
> <[hidden email]> wrote:
> > Attached v4 patch file only includes this fix.
>
> I've moved this to the September CF, where it is in "Needs review" state.

/me reviews.

+      <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

+       The command is computing extended stats from the samples
obtained in the previous phase.

I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."

+       Total number of heap blocks in the scanning_table.

Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table.  I think that's the right thing to advertise, but
then the column should be named and documented that way.

+ {
+ const int   index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }

This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel).  Maybe you can also come up with better
names than 'index' and 'val'.  Same comment applies to another place
where you have something similar.

Patch seems to need minor rebasing.

Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?

I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count.  That creates a race during which users could see the first bit
of data set and the second not set.  I don't see any reason not to set
all four fields together.

Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.

+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE       1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING            2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE         4

vs.

+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text

Not terrible, but it could be closer.

Similarly with the column names (include_children vs. INH).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


12