CLUSTER command progress monitor

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
76 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

CLUSTER command progress monitor

Tatsuro Yamada
Hi,

Following is a proposal for reporting the progress of CLUSTER command:

It seems that the following could be the phases of CLUSTER processing:

  1. scanning heap
  2. sort tuples
  3. writing new heap
  4. scan heap and write new heap
  5. swapping relation files
  6. rebuild index
  7. performing final cleanup

These phases are based on Rahila's presentation at PGCon 2017
 (https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf)
and I added some phases on it.

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

  * Seq Scan
    1. scanning heap
    2. sort tuples
    3. writing new heap
    5. swapping relation files
    6. rebuild index
    7. performing final cleanup

  * Index Scan
    4. scan heap and write new heap
    5. swapping relation files
    6. rebuild index
    7. performing final cleanup

The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
           View "pg_catalog.pg_stat_progress_cluster"
       Column        |  Type   | Collation | Nullable | Default
---------------------+---------+-----------+----------+---------
 pid                 | integer |           |          |
 datid               | oid     |           |          |
 datname             | name    |           |          |
 relid               | oid     |           |          |
 phase               | text    |           |          |
 scan_method         | text    |           |          |
 scan_index_relid    | bigint  |           |          |
 heap_tuples_total   | bigint  |           |          |
 heap_tuples_scanned | bigint  |           |          |


Then I have questions.

  * Should we have separate views for them?  Or should both be covered by the
    same view with some indication of which command (CLUSTER or VACUUM FULL)
    is actually running?
    I mean this progress monitor could be covering not only CLUSTER command but also
    VACUUM FULL command.

  * I chose tuples as scan heap's counter (heap_tuples_scanned) since it's not
    easy to get current blocks from Index Scan. Is it Ok?


I'll add this patch to CF2017-09.
Any comments or suggestion are welcome.

Regards,
Tatsuro Yamada
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

progress_monitor_cluster_v1.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Thomas Munro-3
On Thu, Aug 31, 2017 at 2:12 PM, Tatsuro Yamada
<[hidden email]> wrote:
> Any comments or suggestion are welcome.

Although this patch updates src/test/regress/expected/rules.out I
think perhaps you included the wrong version?  That regression test
fails for me

--
Thomas Munro
http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
Hi Thomas,

>> Any comments or suggestion are welcome.
>
> Although this patch updates src/test/regress/expected/rules.out I
> think perhaps you included the wrong version?  That regression test
> fails for me

Thanks for the comment.

I use the patch on 7b69b6ce and it's fine.
Did you use "initdb" command after "make install"?
The pg_stat_progress_cluster view is created in initdb, probably.

Regards,
Tatsuro Yamada



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Masahiko Sawada
On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada
<[hidden email]> wrote:

> Hi Thomas,
>
>>> Any comments or suggestion are welcome.
>>
>>
>> Although this patch updates src/test/regress/expected/rules.out I
>> think perhaps you included the wrong version?  That regression test
>> fails for me
>
>
> Thanks for the comment.
>
> I use the patch on 7b69b6ce and it's fine.
> Did you use "initdb" command after "make install"?
> The pg_stat_progress_cluster view is created in initdb, probably.
>

I also got a regression test error (applied to abe85ef). Here is
regression.diff file.

*** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out
       2017-09-01 17:27:33.680055612 -0700
--- /home/masahiko/source/postgresql/src/test/regress/results/rules.out
2017-09-01 17:28:10.410055596 -0700
***************
*** 1819,1824 ****
--- 1819,1849 ----
      pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
      pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
     FROM pg_database d;
+ pg_stat_progress_cluster| SELECT s.pid,
+     s.datid,
+     d.datname,
+     s.relid,
+         CASE s.param1
+             WHEN 0 THEN 'initializing'::text
+             WHEN 1 THEN 'scanning heap'::text
+             WHEN 2 THEN 'sorting tuples'::text
+             WHEN 3 THEN 'writing new heap'::text
+             WHEN 4 THEN 'scan heap and write new heap'::text
+             WHEN 5 THEN 'swapping relation files'::text
+             WHEN 6 THEN 'rebuilding index'::text
+             WHEN 7 THEN 'performing final cleanup'::text
+             ELSE NULL::text
+         END AS phase,
+         CASE s.param2
+             WHEN 0 THEN 'index scan'::text
+             WHEN 1 THEN 'seq scan'::text
+             ELSE NULL::text
+         END AS scan_method,
+     s.param3 AS scan_index_relid,
+     s.param4 AS heap_tuples_total,
+     s.param5 AS heap_tuples_scanned
+    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5, param6, param7, param8,
param9, param10)
+      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
  pg_stat_progress_vacuum| SELECT s.pid,
      s.datid,
      d.datname,
***************
*** 1841,1871 ****
      s.param7 AS num_dead_tuples
     FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5, param6, param7, param8,
param9, param10)
       LEFT JOIN pg_database d ON ((s.datid = d.oid)));
- pg_stat_progress_cluster| SELECT
-     s.pid,
-     s.datid,
-     d.datname,
-     s.relid,
-         CASE s.param1
-             WHEN 0 THEN 'initializing'::text
-             WHEN 1 THEN 'scanning heap'::text
-             WHEN 2 THEN 'sorting tuples'::text
-             WHEN 3 THEN 'writing new heap'::text
-             WHEN 4 THEN 'scan heap and write new heap'::text
-             WHEN 5 THEN 'swapping relation files'::text
-             WHEN 6 THEN 'rebuilding index'::text
-             WHEN 7 THEN 'performing final cleanup'::text
-             ELSE NULL::text
-         END AS phase,
-         CASE S.param2
-             WHEN 0 THEN 'index scan'
-             WHEN 1 THEN 'seq scan'
-             END AS scan_method,
-     s.param3 AS index_relid,
-     s.param4 AS heap_blks_total,
-     s.param5 AS heap_blks_scanned
-    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5)
-      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
  pg_stat_replication| SELECT s.pid,
      s.usesysid,
      u.rolname AS usename,
--- 1866,1871 ----

======================================================================

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
Hi Sawada-san, Thomas,

Thanks for sharing the reggression.diff.
I realized Thomas's comment is right.

Attached patch is fixed version.
Could you try it?

Regards,

Tatsuro Yamada
NTT Open Source Software Center

On 2017/09/01 17:59, Masahiko Sawada wrote:

> On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada
> <[hidden email]> wrote:
>> Hi Thomas,
>>
>>>> Any comments or suggestion are welcome.
>>>
>>>
>>> Although this patch updates src/test/regress/expected/rules.out I
>>> think perhaps you included the wrong version?  That regression test
>>> fails for me
>>
>>
>> Thanks for the comment.
>>
>> I use the patch on 7b69b6ce and it's fine.
>> Did you use "initdb" command after "make install"?
>> The pg_stat_progress_cluster view is created in initdb, probably.
>>
>
> I also got a regression test error (applied to abe85ef). Here is
> regression.diff file.
>
> *** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out
>         2017-09-01 17:27:33.680055612 -0700
> --- /home/masahiko/source/postgresql/src/test/regress/results/rules.out
> 2017-09-01 17:28:10.410055596 -0700
> ***************
> *** 1819,1824 ****
> --- 1819,1849 ----
>        pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
>        pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
>       FROM pg_database d;
> + pg_stat_progress_cluster| SELECT s.pid,
> +     s.datid,
> +     d.datname,
> +     s.relid,
> +         CASE s.param1
> +             WHEN 0 THEN 'initializing'::text
> +             WHEN 1 THEN 'scanning heap'::text
> +             WHEN 2 THEN 'sorting tuples'::text
> +             WHEN 3 THEN 'writing new heap'::text
> +             WHEN 4 THEN 'scan heap and write new heap'::text
> +             WHEN 5 THEN 'swapping relation files'::text
> +             WHEN 6 THEN 'rebuilding index'::text
> +             WHEN 7 THEN 'performing final cleanup'::text
> +             ELSE NULL::text
> +         END AS phase,
> +         CASE s.param2
> +             WHEN 0 THEN 'index scan'::text
> +             WHEN 1 THEN 'seq scan'::text
> +             ELSE NULL::text
> +         END AS scan_method,
> +     s.param3 AS scan_index_relid,
> +     s.param4 AS heap_tuples_total,
> +     s.param5 AS heap_tuples_scanned
> +    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
> relid, param1, param2, param3, param4, param5, param6, param7, param8,
> param9, param10)
> +      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
>    pg_stat_progress_vacuum| SELECT s.pid,
>        s.datid,
>        d.datname,
> ***************
> *** 1841,1871 ****
>        s.param7 AS num_dead_tuples
>       FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid,
> relid, param1, param2, param3, param4, param5, param6, param7, param8,
> param9, param10)
>         LEFT JOIN pg_database d ON ((s.datid = d.oid)));
> - pg_stat_progress_cluster| SELECT
> -     s.pid,
> -     s.datid,
> -     d.datname,
> -     s.relid,
> -         CASE s.param1
> -             WHEN 0 THEN 'initializing'::text
> -             WHEN 1 THEN 'scanning heap'::text
> -             WHEN 2 THEN 'sorting tuples'::text
> -             WHEN 3 THEN 'writing new heap'::text
> -             WHEN 4 THEN 'scan heap and write new heap'::text
> -             WHEN 5 THEN 'swapping relation files'::text
> -             WHEN 6 THEN 'rebuilding index'::text
> -             WHEN 7 THEN 'performing final cleanup'::text
> -             ELSE NULL::text
> -         END AS phase,
> -         CASE S.param2
> -             WHEN 0 THEN 'index scan'
> -             WHEN 1 THEN 'seq scan'
> -             END AS scan_method,
> -     s.param3 AS index_relid,
> -     s.param4 AS heap_blks_total,
> -     s.param5 AS heap_blks_scanned
> -    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
> relid, param1, param2, param3, param4, param5)
> -      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
>    pg_stat_replication| SELECT s.pid,
>        s.usesysid,
>        u.rolname AS usename,
> --- 1866,1871 ----
>
> ======================================================================
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

progress_monitor_cluster_v2.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Masahiko Sawada
On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada
<[hidden email]> wrote:
> Hi Sawada-san, Thomas,
>
> Thanks for sharing the reggression.diff.
> I realized Thomas's comment is right.
>
> Attached patch is fixed version.
> Could you try it?
>

Yeah, in my environment the regression test passed. Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
Hi Sawada-san,

Thanks for taking your time.
I'll be more careful.

Regards,
Tatsuro Yamada

On 2017/09/04 11:51, Masahiko Sawada wrote:

> On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada
> <[hidden email]> wrote:
>> Hi Sawada-san, Thomas,
>>
>> Thanks for sharing the reggression.diff.
>> I realized Thomas's comment is right.
>>
>> Attached patch is fixed version.
>> Could you try it?
>>
>
> Yeah, in my environment the regression test passed. Thanks.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>
>




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Michael Paquier
In reply to this post by Tatsuro Yamada
On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
<[hidden email]> wrote:
> Then I have questions.
>
>   * Should we have separate views for them?  Or should both be covered by the
>     same view with some indication of which command (CLUSTER or VACUUM FULL)
>     is actually running?

Using the same view for both, and tell that this is rather VACUUM or
CLUSTER in the view, would be better IMO. Coming up with a name more
generic than pg_stat_progress_cluster may be better though if this
speaks with VACUUM FULL as well, user-facing documentation does not
say that VACUUM FULL is actually CLUSTER.

> I'll add this patch to CF2017-09.
> Any comments or suggestion are welcome.

Nice to see that you are taking the time to implement patches for
upstream, Yamada-san!
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
On 2017/09/04 15:38, Michael Paquier wrote:

> On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
> <[hidden email]> wrote:
>> Then I have questions.
>>
>>    * Should we have separate views for them?  Or should both be covered by the
>>      same view with some indication of which command (CLUSTER or VACUUM FULL)
>>      is actually running?
>
> Using the same view for both, and tell that this is rather VACUUM or
> CLUSTER in the view, would be better IMO. Coming up with a name more
> generic than pg_stat_progress_cluster may be better though if this
> speaks with VACUUM FULL as well, user-facing documentation does not
> say that VACUUM FULL is actually CLUSTER.

Thanks for sharing your thoughts.
Agreed.
I'll add new column like a "command" to tell whether running CLUSTER or VACUUM.
And how about this new view name?
  - pg_stat_progress_reorg
Is it more general name than previous name if it covers both commands?


>> I'll add this patch to CF2017-09.
>> Any comments or suggestion are welcome.
>
> Nice to see that you are taking the time to implement patches for
> upstream, Yamada-san!

Same here. :)
I'd like to contribute creating feature that is for DBA and users.
Progress monitoring feature is important from my DBA experiences.
I'm happy if you lend your hand.

Thanks,
Tatsuro Yamada




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
Hi Hackers,

I revised the patch like this:

   - Add "command" column in the view
     It tells that the running command is CLUSTER or VACUUM FULL.

   - Enable VACUUM FULL progress monitor
     Add heap_tuples_vacuumed and heap_tuples_recently_dead as a counter in the view.
     Sequence of phases are below:
       1. scanning heap
       5. swapping relation files
       6. rebuild index
       7. performing final cleanup

I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure
whether the new name (pg_stat_progress_reorg) is suitable or not.

Any comments or suggestion are welcome.

Thanks,
Tatsuro Yamada


On 2017/09/04 20:17, Tatsuro Yamada wrote:

> On 2017/09/04 15:38, Michael Paquier wrote:
>> On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
>> <[hidden email]> wrote:
>>> Then I have questions.
>>>
>>>    * Should we have separate views for them?  Or should both be covered by the
>>>      same view with some indication of which command (CLUSTER or VACUUM FULL)
>>>      is actually running?
>>
>> Using the same view for both, and tell that this is rather VACUUM or
>> CLUSTER in the view, would be better IMO. Coming up with a name more
>> generic than pg_stat_progress_cluster may be better though if this
>> speaks with VACUUM FULL as well, user-facing documentation does not
>> say that VACUUM FULL is actually CLUSTER.
>
> Thanks for sharing your thoughts.
> Agreed.
> I'll add new column like a "command" to tell whether running CLUSTER or VACUUM.
> And how about this new view name?
>   - pg_stat_progress_reorg
> Is it more general name than previous name if it covers both commands?
>
>
>>> I'll add this patch to CF2017-09.
>>> Any comments or suggestion are welcome.
>>
>> Nice to see that you are taking the time to implement patches for
>> upstream, Yamada-san!
>
> Same here. :)
> I'd like to contribute creating feature that is for DBA and users.
> Progress monitoring feature is important from my DBA experiences.
> I'm happy if you lend your hand.
>
> Thanks,
> Tatsuro Yamada


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

progress_monitor_cluster_v3.patch (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Michael Paquier
On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada
<[hidden email]> wrote:
> I revised the patch like this:

You should avoid top-posting.

> I didn't change the name of view (pg_stat_progress_cluster) because I'm not
> sure
> whether the new name (pg_stat_progress_reorg) is suitable or not.

Here are some ideas: rewrite (incorrect for ALTER TABLE), organize
(somewhat fine), order, operate (too general?), bundle, reform,
assemble.
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
On 2017/09/06 16:11, Michael Paquier wrote:
> On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada
> <[hidden email]> wrote:
>> I revised the patch like this:
>
> You should avoid top-posting.

I see.

>> I didn't change the name of view (pg_stat_progress_cluster) because I'm not
>> sure
>> whether the new name (pg_stat_progress_reorg) is suitable or not.
>
> Here are some ideas: rewrite (incorrect for ALTER TABLE), organize
> (somewhat fine), order, operate (too general?), bundle, reform,
> assemble.

Thanks for sharing your ideas.
I searched the words like a "reform table", "reassemble table" and "reorganize table"
on google. I realized "reorganaize table" is good choice than others
because many DBMS uses this keyword. Therefore, I'll change the name to it like this:

before
   pg_stat_progress_cluster

after
   pg_stat_progress_reorg (I abbreviate reorganize to reorg.)

Does anyone have any suggestions?
I'll revise the patch.

Regards,
Tatsuro Yamada




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Robert Haas
In reply to this post by Tatsuro Yamada
On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
<[hidden email]> wrote:
>   1. scanning heap
>   2. sort tuples

These two phases overlap, though. I believe progress reporting for
sorts is really hard.  In the simple case where the data fits in
work_mem, none of the work of the sort gets done until all the data is
read.  Once you switch to an external sort, you're writing batch
files, so a lot of the work is now being done during data loading.
But as the number of batch files grows, the final merge at the end
becomes an increasingly noticeable part of the cost, and eventually
you end up needing multiple merge passes.  I think we need some smart
way to report on sorts so that we can tell how much of the work has
really been done, but I don't know how to do it.

>  heap_tuples_total   | bigint  |           |          |

The patch is getting the value reported as heap_tuples_total from
OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
see that value anyway if they wish.  The point of the progress
counters is to expose things the user couldn't otherwise see.  It's
also not necessarily accurate: it's only an estimate in the best case,
and may be way off if the relation has recently be extended by a large
amount.  I think it's pretty important that we try hard to only report
values that are known to be accurate, because users hate (and mock)
inaccurate progress reports.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
On 2017/09/08 18:55, Robert Haas wrote:

> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
> <[hidden email]> wrote:
>>    1. scanning heap
>>    2. sort tuples
>
> These two phases overlap, though. I believe progress reporting for
> sorts is really hard.  In the simple case where the data fits in
> work_mem, none of the work of the sort gets done until all the data is
> read.  Once you switch to an external sort, you're writing batch
> files, so a lot of the work is now being done during data loading.
> But as the number of batch files grows, the final merge at the end
> becomes an increasingly noticeable part of the cost, and eventually
> you end up needing multiple merge passes.  I think we need some smart
> way to report on sorts so that we can tell how much of the work has
> really been done, but I don't know how to do it.

Thanks for the comment.

As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
cost estimation. In the case of SEQ SCAN, these two phases not overlap.
However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
heap and write new heap" when INDEX SCAN was selected.

I agree that progress reporting for sort is difficult. So it only reports
the phase ("sorting tuples") in the current design of progress monitor of cluster.
It doesn't report counter of sort.


>>   heap_tuples_total   | bigint  |           |          |
>
> The patch is getting the value reported as heap_tuples_total from
> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
> see that value anyway if they wish.  The point of the progress
> counters is to expose things the user couldn't otherwise see.  It's
> also not necessarily accurate: it's only an estimate in the best case,
> and may be way off if the relation has recently be extended by a large
> amount.  I think it's pretty important that we try hard to only report
> values that are known to be accurate, because users hate (and mock)
> inaccurate progress reports.

Do you mean to use the number of rows by using below calculation instead
OldHeap->rd_rel->reltuples?

  estimate rows = physical table size / average row length

I understand that OldHeap->rd_rel->reltuples is sometimes useless because
it is correct by auto analyze and it can't perform when under a threshold.

I'll add it in next patch and also share more detailed the current design of
progress monitor for cluster.

Regards,
Tatsuro Yamada




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Robert Haas
On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
<[hidden email]> wrote:

> Thanks for the comment.
>
> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
> heap and write new heap" when INDEX SCAN was selected.
>
> I agree that progress reporting for sort is difficult. So it only reports
> the phase ("sorting tuples") in the current design of progress monitor of
> cluster.
> It doesn't report counter of sort.

Doesn't that make it almost useless?  I would guess that scanning the
heap and writing the new heap would ordinarily account for most of the
runtime, or at least enough that you're going to want something more
than just knowing that's the phase you're in.

>> The patch is getting the value reported as heap_tuples_total from
>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>> see that value anyway if they wish.  The point of the progress
>> counters is to expose things the user couldn't otherwise see.  It's
>> also not necessarily accurate: it's only an estimate in the best case,
>> and may be way off if the relation has recently be extended by a large
>> amount.  I think it's pretty important that we try hard to only report
>> values that are known to be accurate, because users hate (and mock)
>> inaccurate progress reports.
>
> Do you mean to use the number of rows by using below calculation instead
> OldHeap->rd_rel->reltuples?
>
>  estimate rows = physical table size / average row length

No, I mean don't report it at all.  The caller can do that calculation
if they wish, without any help from the progress reporting machinery.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Peter Geoghegan-4
On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas <[hidden email]> wrote:

> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <[hidden email]> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

It's definitely my experience that CLUSTER is incredibly I/O bound.
You're shoveling the tuples through tuplesort.c, but the actual
sorting component isn't where the real costs are. Profiling shows that
writing out the new heap (including moderately complicated
bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER
didn't look attractive, even though it would be a fairly
straightforward matter to add that on top of the parallel CREATE INDEX
structure from the patch that I wrote to do that.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
In reply to this post by Robert Haas
On 2017/09/11 23:38, Robert Haas wrote:

> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <[hidden email]> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
I know that external merge sort takes a time than quick sort.
I'll try investigating how to get a counter from external merge sort processing.
Is this the right way?


>>> The patch is getting the value reported as heap_tuples_total from
>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>> see that value anyway if they wish.  The point of the progress
>>> counters is to expose things the user couldn't otherwise see.  It's
>>> also not necessarily accurate: it's only an estimate in the best case,
>>> and may be way off if the relation has recently be extended by a large
>>> amount.  I think it's pretty important that we try hard to only report
>>> values that are known to be accurate, because users hate (and mock)
>>> inaccurate progress reports.
>>
>> Do you mean to use the number of rows by using below calculation instead
>> OldHeap->rd_rel->reltuples?
>>
>>   estimate rows = physical table size / average row length
>
> No, I mean don't report it at all.  The caller can do that calculation
> if they wish, without any help from the progress reporting machinery.

I see. I'll remove that column on next patch.


Regards,
Tatsuro Yamada



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Tatsuro Yamada
On 2017/09/12 21:20, Tatsuro Yamada wrote:

> On 2017/09/11 23:38, Robert Haas wrote:
>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>> <[hidden email]> wrote:
>>> Thanks for the comment.
>>>
>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>>> heap and write new heap" when INDEX SCAN was selected.
>>>
>>> I agree that progress reporting for sort is difficult. So it only reports
>>> the phase ("sorting tuples") in the current design of progress monitor of
>>> cluster.
>>> It doesn't report counter of sort.
>>
>> Doesn't that make it almost useless?  I would guess that scanning the
>> heap and writing the new heap would ordinarily account for most of the
>> runtime, or at least enough that you're going to want something more
>> than just knowing that's the phase you're in.
>
> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
> I know that external merge sort takes a time than quick sort.
> I'll try investigating how to get a counter from external merge sort processing.
> Is this the right way?
>
>
>>>> The patch is getting the value reported as heap_tuples_total from
>>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>>> see that value anyway if they wish.  The point of the progress
>>>> counters is to expose things the user couldn't otherwise see.  It's
>>>> also not necessarily accurate: it's only an estimate in the best case,
>>>> and may be way off if the relation has recently be extended by a large
>>>> amount.  I think it's pretty important that we try hard to only report
>>>> values that are known to be accurate, because users hate (and mock)
>>>> inaccurate progress reports.
>>>
>>> Do you mean to use the number of rows by using below calculation instead
>>> OldHeap->rd_rel->reltuples?
>>>
>>>   estimate rows = physical table size / average row length
>>
>> No, I mean don't report it at all.  The caller can do that calculation
>> if they wish, without any help from the progress reporting machinery.
>
> I see. I'll remove that column on next patch.


I will summarize the current design and future corrections before sending
the next patch.


=== Current design ===

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

   * Scan method: Seq Scan
     1. scanning heap                (*1)
     2. sorting tuples               (*2)
     3. writing new heap             (*1)
     5. swapping relation files      (*2)
     6. rebuilding index             (*2)
     7. performing final cleanup     (*2)

   * Scan method: Index Scan
     4. scan heap and write new heap (*1)
     5. swapping relation files      (*2)
     6. rebuilding index             (*2)
     7. performing final cleanup     (*2)

VACUUM FULL command will proceed in the following sequence of phases:

     1. scanning heap                (*1)
     5. swapping relation files      (*2)
     6. rebuilding index             (*2)
     7. performing final cleanup     (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column

The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
               View "pg_catalog.pg_stat_progress_cluster"
           Column           |  Type   | Collation | Nullable | Default
---------------------------+---------+-----------+----------+---------
  pid                       | integer |           |          |
  datid                     | oid     |           |          |
  datname                   | name    |           |          |
  relid                     | oid     |           |          |
  command                   | text    |           |          |
  phase                     | text    |           |          |
  scan_method               | text    |           |          |
  scan_index_relid          | bigint  |           |          |
  heap_tuples_total         | bigint  |           |          |
  heap_tuples_scanned       | bigint  |           |          |
  heap_tuples_vacuumed      | bigint  |           |          |
  heap_tuples_recently_dead | bigint  |           |          |


=== It will be changed on next patch ===

  - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster
  - Remove heap_tuples_total column from the view
  - Add a progress counter in the phase of "sorting tuples" (difficult?!)


=== My test case as a bonus ===

I share my test case of progress monitor.
If someone wants to watch the current progress monitor, you can use
this test case as a example.

[Terminal1]
Run this query on psql:

    select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;


Thanks,
Tatsuro Yamada



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Jeff Janes
In reply to this post by Tatsuro Yamada
On Wed, Aug 30, 2017 at 7:12 PM, Tatsuro Yamada <[hidden email]> wrote:

The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
           View "pg_catalog.pg_stat_progress_cluster"
       Column        |  Type   | Collation | Nullable | Default
---------------------+---------+-----------+----------+---------
 pid                 | integer |           |          |
 datid               | oid     |           |          |
 datname             | name    |           |          |
 relid               | oid     |           |          |
 phase               | text    |           |          |
 scan_method         | text    |           |          |
 scan_index_relid    | bigint  |           |          |
 heap_tuples_total   | bigint  |           |          |
 heap_tuples_scanned | bigint  |           |          |

I think it should be cluster_index_relid, not scan_index_relid.  If the scan_method is seq, then the index isn't being scanned.

Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: CLUSTER command progress monitor

Daniel Gustafsson
In reply to this post by Tatsuro Yamada

> On 12 Sep 2017, at 14:57, Tatsuro Yamada <[hidden email]> wrote:
>
> On 2017/09/12 21:20, Tatsuro Yamada wrote:
>> On 2017/09/11 23:38, Robert Haas wrote:
>>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>>> <[hidden email]> wrote:
>>>> Thanks for the comment.
>>>>
>>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>>>> heap and write new heap" when INDEX SCAN was selected.
>>>>
>>>> I agree that progress reporting for sort is difficult. So it only reports
>>>> the phase ("sorting tuples") in the current design of progress monitor of
>>>> cluster.
>>>> It doesn't report counter of sort.
>>>
>>> Doesn't that make it almost useless?  I would guess that scanning the
>>> heap and writing the new heap would ordinarily account for most of the
>>> runtime, or at least enough that you're going to want something more
>>> than just knowing that's the phase you're in.
>>
>> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
>> I know that external merge sort takes a time than quick sort.
>> I'll try investigating how to get a counter from external merge sort processing.
>> Is this the right way?
>>
>>
>>>>> The patch is getting the value reported as heap_tuples_total from
>>>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>>>> see that value anyway if they wish.  The point of the progress
>>>>> counters is to expose things the user couldn't otherwise see.  It's
>>>>> also not necessarily accurate: it's only an estimate in the best case,
>>>>> and may be way off if the relation has recently be extended by a large
>>>>> amount.  I think it's pretty important that we try hard to only report
>>>>> values that are known to be accurate, because users hate (and mock)
>>>>> inaccurate progress reports.
>>>>
>>>> Do you mean to use the number of rows by using below calculation instead
>>>> OldHeap->rd_rel->reltuples?
>>>>
>>>>  estimate rows = physical table size / average row length
>>>
>>> No, I mean don't report it at all.  The caller can do that calculation
>>> if they wish, without any help from the progress reporting machinery.
>>
>> I see. I'll remove that column on next patch.
>
>
> I will summarize the current design and future corrections before sending
> the next patch.
>
>
> === Current design ===
>
> CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
> Depending on which one is chosen, the command will proceed in the
> following sequence of phases:
>
>  * Scan method: Seq Scan
>    1. scanning heap                (*1)
>    2. sorting tuples               (*2)
>    3. writing new heap             (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
>
>  * Scan method: Index Scan
>    4. scan heap and write new heap (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
>
> VACUUM FULL command will proceed in the following sequence of phases:
>
>    1. scanning heap                (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
>
> (*1): increasing the value in heap_tuples_scanned column
> (*2): only shows the phase in the phase column
>
> The view provides the information of CLUSTER command progress details as follows
> # \d pg_stat_progress_cluster
>              View "pg_catalog.pg_stat_progress_cluster"
>          Column           |  Type   | Collation | Nullable | Default
> ---------------------------+---------+-----------+----------+---------
> pid                       | integer |           |          |
> datid                     | oid     |           |          |
> datname                   | name    |           |          |
> relid                     | oid     |           |          |
> command                   | text    |           |          |
> phase                     | text    |           |          |
> scan_method               | text    |           |          |
> scan_index_relid          | bigint  |           |          |
> heap_tuples_total         | bigint  |           |          |
> heap_tuples_scanned       | bigint  |           |          |
> heap_tuples_vacuumed      | bigint  |           |          |
> heap_tuples_recently_dead | bigint  |           |          |
>
>
> === It will be changed on next patch ===
>
> - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster
> - Remove heap_tuples_total column from the view
> - Add a progress counter in the phase of "sorting tuples" (difficult?!)
>
>
> === My test case as a bonus ===
>
> I share my test case of progress monitor.
> If someone wants to watch the current progress monitor, you can use
> this test case as a example.
>
> [Terminal1]
> Run this query on psql:
>
>   select * from pg_stat_progress_cluster; \watch 0.05
>
> [Terminal2]
> Run these queries on psql:
>
> drop table t1;
>
> create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a;
> create index idx_t1 on t1(a);
> create index idx_t1_b on t1(b);
> analyze t1;
>
> -- index scan
> set enable_seqscan to off;
> cluster verbose t1 using idx_t1;
>
> -- seq scan
> set enable_seqscan to on;
> set enable_indexscan to off;
> cluster verbose t1 using idx_t1;
>
> -- only given table name to cluster command
> cluster verbose t1;
>
> -- only cluster command
> cluster verbose;
>
> -- vacuum full
> vacuum full t1;
>
> -- vacuum full
> vacuum full;

Based on this thread, this patch has been marked Returned with Feedback.
Please re-submit a new version to a future commitfest.

cheers ./daniel

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234