monitoring CREATE INDEX [CONCURRENTLY]

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

monitoring CREATE INDEX [CONCURRENTLY]

Alvaro Herrera-9
Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
so here's a proposal.

There are three distinct interesting cases.  One is straight CREATE
INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
finally, CREATE INDEX on a partitioned table.  Note that there's no
CONCURRENTLY for partitioned tables.

A non-concurrent build is a very straightforward: we call create_index,
which does index_build, done.  See below for how to report for
index_build, which is the interesting part.  I propose not to report
anything else than that for non-concurrent build.  There's some
preparatory work that's identical than for CIC (see below).  Like
VACUUM, it seems a bit pointless to report an initial phase that's
almost immediate, so I propose we just don't report anything until the
actual index building starts.

CREATE INDEX CONCURRENTLY does these things first, which we would not
report (this is just like VACUUM, which only starts reporting once it
starts scanning blocks):

a. lock rel.  No metrics to report.
b. other prep; includes lots of catalog access.  Unlikely to lock, but
   not impossible.  No metrics to report.
c. create_index.  CIC skips index_build here, so there's no reason to
   report it either.

We would start reporting at this point, with these phases:

1. WaitForLockers 1.  Report how many xacts do we need to wait for, how
   many are done.
2. index_build.  See below.
3. WaitForLockers 2.  Report how many xacts do we need to wait for, how
   many are done.
4. validate_index.  Scans the whole rel again.  Report number of blocks
   scanned.
5. wait for virtual XIDs.  Like WaitForLockers: report how many xacts we
   need to wait for, how many are done.

We're done.

(Alternatively, we could have an initial "prep" phase for a/b/c for the
concurrent case and a/b for non-concurrent.  I'm just not sure it's
useful.)

index_build
-----------

The actual index building is an AM-specific undertaking, and we report
its progress separately from the AM-agnostic code.  That is, each AM has
freedom to define its own list of phases and counters, separate from the
generic code.  This avoids the need to provide a new AM method or invoke
callbacks.  So when you see that CREATE_INDEX_PHASE is either "index
build" you'll have a separate BTREE_CREATE_PHASE value set to either
"scanning heap" or "sorting" or "building upper layers"; equivalently
for other AMs.

Partitioned indexes
-------------------

For partitioned indexes, we only have the index build phase, but we
repeat it for each partition.  In addition to the index_build metrics
described above, we should report how many partitions we need to handle
in total and how many partitions are already done.  (I'm avoiding
getting in the trouble of reporting *which* partition we're currently
handling and have already handled.)

Thoughts?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Alvaro Herrera-9
For discussion, here's an preliminary patch.  This is just a first
skeleton; needs to grow a lot of flesh yet, per my previous proposal.
As far as the generic CREATE INDEX stuff goes, I think this is complete;
it's missing the AM-specific bits.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

create-index-progress.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Pavan Deolasee

On Tue, Jan 1, 2019 at 6:09 AM Alvaro Herrera <[hidden email]> wrote:
For discussion, here's an preliminary patch.  This is just a first
skeleton; needs to grow a lot of flesh yet, per my previous proposal.
As far as the generic CREATE INDEX stuff goes, I think this is complete;
it's missing the AM-specific bits.

Looks like it's missing the validate_index blocks-scanned report, which is not AM-specific and your original proposal does mention that. 

I thought a bit about index_build part. If most AMs follow a somewhat standard phases while building an index, it might be simpler to define those phases and have AM agnostic code report those phases. Or may be just report the most significant information, instead of reporting each sub-phase of index_build. I think the most important progress to know would be how far the heap is scanned for to-be-indexed tuples. AFAICS all AMs use IndexBuildHeapScan() to scan the heap. Can we simply do some reporting from that routine? Like number of blocks scanned against the total number of blocks requested?

Some minor comments on the patch, though I suspect you might be already updating the patch since you marked it as WIP.

+CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'

Can we have more descriptive text for waiters? Such as "waiting for existing
writers", "waiting for intermediate writers" and "waiting for old readers". Not
sure if I got those correct, something of that sort will definitely give more
insight into what the transaction is waiting for.

Can we actually also report the list of transactions the command is waiting on?
That could be useful to the user if CIC appears to be stuck too long.

 
+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+ nparts);
+

IMHO we should just use full term INDEX instead of IDX, such as
PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple of
extra characters won't make a difference. I did not see much precedence to
shortern to IDX for INDEX elsewhere in the code (though we tend to do that for
variable names etc).


@@ -1282,6 +1318,9 @@ DefineIndex(Oid relationId,
  old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
    PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
    &n_old_snapshots);
+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+ PROGRESS_CREATEIDX_PHASE_WAIT_3);
+ pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);

I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
when the wait phase is over, to avoid any confusion. For example, I noticed that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Shouldn't PROGRESS_WAITFOR_DONE be updated when we skip a snapshot in the code below?
        if (!VirtualTransactionIdIsValid(old_snapshots[i]))
            continue;           /* found uninteresting in previous cycle */


@@ -2817,7 +2818,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
  * uses of the result.
  */
 VirtualTransactionId *
-GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
+GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)


Could that out variable be named something differently? "countp" or something
like that. I did not check if there is some practice that we follow, but I
remember suffixing with "p" rather than prefixing with "o" (for out I assume)


+
+/* Progress parameters for CREATE INDEX */
+#define PROGRESS_CREATEIDX_PHASE 0
+/* 1 and 2 reserved for "waitfor" metrics */
+#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
+#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
+

Is there a reason to leave those reserve placeholders, only to fill them a few
lines down?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Rahila Syed-2
In reply to this post by Alvaro Herrera-9
Hi Alvaro,

The WIP patch needs a rebase. Please see few in-line comments.

On Fri, Dec 21, 2018 at 3:30 AM Alvaro Herrera <[hidden email]> wrote:
Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
so here's a proposal.

There are three distinct interesting cases.  One is straight CREATE
INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
finally, CREATE INDEX on a partitioned table.  Note that there's no
CONCURRENTLY for partitioned tables.

A non-concurrent build is a very straightforward: we call create_index,
which does index_build, done.  See below for how to report for
index_build, which is the interesting part.  I propose not to report
anything else than that for non-concurrent build.  There's some
preparatory work that's identical than for CIC (see below).  Like
VACUUM, it seems a bit pointless to report an initial phase that's
almost immediate, so I propose we just don't report anything until the
actual index building starts.
 
 Aren't we reporting this initial preparatory work in the form of 'initializing' phase that you
have in current patch? IIUC, there are no metrics but the name of the phase.

CREATE INDEX CONCURRENTLY does these things first, which we would not
report (this is just like VACUUM, which only starts reporting once it
starts scanning blocks):

a. lock rel.  No metrics to report.
b. other prep; includes lots of catalog access.  Unlikely to lock, but
   not impossible.  No metrics to report.
c. create_index.  CIC skips index_build here, so there's no reason to
   report it either.

We would start reporting at this point, with these phases:

1. WaitForLockers 1.  Report how many xacts do we need to wait for, how
   many are done.
2. index_build.  See below.
3. WaitForLockers 2.  Report how many xacts do we need to wait for, how
   many are done.
4. validate_index.  Scans the whole rel again.  Report number of blocks
   scanned.
5. wait for virtual XIDs.  Like WaitForLockers: report how many xacts we
   need to wait for, how many are done.

We're done.

(Alternatively, we could have an initial "prep" phase for a/b/c for the
concurrent case and a/b for non-concurrent.  I'm just not sure it's
useful.)
 
index_build
-----------

The actual index building is an AM-specific undertaking, and we report
its progress separately from the AM-agnostic code.  That is, each AM has
freedom to define its own list of phases and counters, separate from the
generic code.  This avoids the need to provide a new AM method or invoke
callbacks.  So when you see that CREATE_INDEX_PHASE is either "index
build" you'll have a separate BTREE_CREATE_PHASE value set to either
"scanning heap" or "sorting" or "building upper layers"; equivalently
for other AMs.

OK. 
I think the main phases in which index_build for most AMs can be divided is as follows:
1. Scanning heap tuples for building index which is common for all AMs
2. Forming index tuples which is AM-specific
3. Writing tuples into the index which is AM-specific. 
Out of these, metrics for phase 1 can be heap_tuples_scanned / total_heap_tuples and 
for phase 3, it can be index_tuples_computed/ total_index_tuples.
I am not sure about metrics for phase 2 especially for Btree which involves
reporting progress of sorting. 

Partitioned indexes
-------------------

For partitioned indexes, we only have the index build phase, but we
repeat it for each partition.  In addition to the index_build metrics
described above, we should report how many partitions we need to handle
in total and how many partitions are already done.  (I'm avoiding
getting in the trouble of reporting *which* partition we're currently
handling and have already handled.)

Thoughts?

+CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'
+   END as phase,
+ S.param2 AS procs_to_wait_for,
+ S.param3 AS procs_waited_for,
+ S.param4 AS partitions_to_build,
+ S.param5 AS partitions_built
+ FROM pg_stat_get_progress_info('CREATE INDEX') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+

1. In the above code, I think it will be useful if we report phases as
'Initializing phase 1 of n'
'Waiting for lockers phase 2 of n' etc. i.e reporting total number of phases as well.

+               holders = lappend(holders,
+                                                 GetLockConflicts(locktag, lockmode, &count));
+               total += count;
2. IIUC, the above code in WaitForLockersMultiple can be written under
if(progress) condition like rest of the progress checking code in that function
and pass NULL for count otherwise.

3. Will it be useful to report pid's of the backend's 
for the transactions which CREATE INDEX concurrently is waiting for?
I think it can be used to debug long running transactions.

4. Should we have additional statistics update phase before index_update_stats
like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

5. I think it may be useful to report number of parallel workers requested and number of workers 
actually running index build in case of btree.

6. Also, this can be reported as an additional validation phase for exclusion constraint   
in index_build function as it involves scanning all live tuples of heap relation. 

 /*
         * If it's for an exclusion constraint, make a second pass over the heap
         * to verify that the constraint is satisfied.  We must not do this until
         * the index is fully valid.  (Broken HOT chains shouldn't matter, though;
         * see comments for IndexCheckExclusion.)
         */
        if (indexInfo->ii_ExclusionOps != NULL)
                IndexCheckExclusion(heapRelation, indexRelation, indexInfo);
*/

Thank you,
Rahila Syed
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Alvaro Herrera-9
In reply to this post by Pavan Deolasee
On 2019-Jan-09, Pavan Deolasee wrote:

> Looks like it's missing the validate_index blocks-scanned report, which is
> not AM-specific and your original proposal does mention that.

True. That phase is actually 3 phases, which would be reported
separately:

  5. index_bulk_delete() scan
  6. performsort
  7. validate_index_heapscan


> I thought a bit about index_build part. If most AMs follow a somewhat
> standard phases while building an index, it might be simpler to define
> those phases and have AM agnostic code report those phases.

Well, think about btrees.  We first scan the whole table putting all
tuples in a spool (two spools actually), then we tell the spools to get
sorted, then we extract tuples from the spools, and finally we read the
spool to produce the leaf pages.  If we just report the table scan, the
reports gets to 100% complete in that phase and then waits for a very
long time during which nothing seems to happen.  That's not cool.

I'm adding a new AM support routine which turns the subphase number into
a textual description, so that we don't have to hardcode phase names in
the agnostic code.

> Can we have more descriptive text for waiters? Such as "waiting for
> existing writers", "waiting for intermediate writers" and "waiting for
> old readers".  Not sure if I got those correct, something of that sort
> will definitely give more insight into what the transaction is waiting
> for.

Can do.

> Can we actually also report the list of transactions the command is waiting
> on?
> That could be useful to the user if CIC appears to be stuck too long.

I'm afraid this is not possible, because the progress report interface
doesn't allow for something like that.

> IMHO we should just use full term INDEX instead of IDX, such as
> PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
> of extra characters won't make a difference.

Really?  I though it was clear enough and it's *three* characters saved
even.

> I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
> when the wait phase is over, to avoid any confusion. For example, I noticed
> that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Yes, absolutely.

> +GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)
>
> Could that out variable be named something differently? "countp" or
> something like that. I did not check if there is some practice that we
> follow, but I remember suffixing with "p" rather than prefixing with
> "o" (for out I assume)

Sure.

> +/* Progress parameters for CREATE INDEX */
> +#define PROGRESS_CREATEIDX_PHASE 0
> +/* 1 and 2 reserved for "waitfor" metrics */
> +#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
> +#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
> +
>
> Is there a reason to leave those reserve placeholders, only to fill them a
> few lines down?

Yes -- those are going to be used by other reports that also use similar
metrics, such as the CLUSTER report.


I'm running out of columns to put the numbers into :-(  Right now I have

1. phase
2. subphase (for index_build)
3. lockers total (to wait for)
4. lockers done
5. blocks total
6. blocks done
7. tapes total
8. tapes done
9. partitions total
10. partitions done

The "tapes total/done" bit is about reporting the performsort steps; I'm
not really sure yet it'll be tapes, but I hope it can be done with two
numbers.  Anyway, in btree build I have these subphases

1. spool heapscan using IndexBuildHeapScan
2. _bt_parallel_heapscan stuff (only one of these in a build)
3. bt_leafbuild, performsort spool 1
4. bt_leafbuild, performsort spool 2
5. bt_load

and I don't have columns to put the metrics for the btload thing,
assuming I figure out what those are.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Michael Paquier-2
Hi all,

Based on the latest emails exchanged, the patch got some feedback from
Pavan which has not been addressed.  So I am marking the patch as
returned with feedback.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Alvaro Herrera-9
In reply to this post by Rahila Syed-2
Hi Rahila, Pavan,

Thanks for the review.  I incorporated some fixes per your comments.
More fixes are needed still.  That said, the patch in attachment gives
good insight into how I think this should turn out.

> > index_build
> > -----------

> OK.
> I think the main phases in which index_build for most AMs can be divided is
> as follows:
> [...]

I ended up defining phases for the index_build phase in the AM itself;
the code reports a phase number using the regular API, and then the
pgstat_progress view obtains the name of each phase using a support
method.

For btree, I ended up not reporting anything about the sort per se; we
just scan the heap (reporting block numbers, which is easy because we
know the heap size in advance) and count heap tuples while scanning;
once that's done, we know how many tuples we need to write out to the
index, so that total becomes the next stage's target total.  That's
simpler. (It is wrong for partial indexes currently, though.)

So for btree we have this:

/*
 * btphasename() -- Return name of phase, for index build progress report
 */
char *
btphasename(int64 phasenum)
{
        switch (phasenum)
        {
                case PROGRESS_CREATEIDX_SUBPHASE_INITIALIZE:
                        return "initializing (1/5)";
                case PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN:
                        return "table scan (2/5)";
                case PROGRESS_BTREE_PHASE_PERFORMSORT_1:
                        return "sorting tuples, spool 1 (3/5)";
                case PROGRESS_BTREE_PHASE_PERFORMSORT_2:
                        return "sorting tuples, spool 2 (4/5)";
                case PROGRESS_BTREE_PHASE_LEAF_LOAD:
                        return "final btree sort & load (5/5)";
                default:
                        return NULL;
        }
}

Now this is a bit strange, because the report looks like this:

  phase                 | building index (3 of 8): initializing (1/5)
  [...]
  blocks total          | 442478
  blocks done           | 3267

So for phase 3, we always have phase and subphase counters in the phase
name.  However, I don't think there's any clean way to know from the
very beginning how many subphases are there going to be, and it would be
even more confusing to have the total "of N" number vary each time
depending on the access method.  So this leaves the phase counter going
from 1 to 8, and then for phase 3 you have a second part that goes from
1 to N.

Anyway, at some point it completes the heap scan, and the phase changes
to this:

  phase                 | building index (3 of 8): heap scan(2/5)

I think I should take Rahila's suggestion to report the number of
workers running (but I'm not sure I see the point in reporting number of
workers planned).

The heap scan quickly gives way to the next one,

  phase                 | building index (3 of 8): final btree sort & load (5/5)
  [...]
  tuples total          | 100000000
  tuples done           | 58103713

Finally,
  phase                 | validating index scan (phase 5 of 8)
and
  phase                 | validate index heapscan (phase 7 of 8)

and then it's over.

Now, it's clear that I haven't yet nailed all the progress updates
correctly, because there are some stalls where nothing seems to be
happening.  The final index_validate phases have been ignored so far.

> 1. In the above code, I think it will be useful if we report phases as
> 'Initializing phase 1 of n'
> 'Waiting for lockers phase 2 of n' etc. i.e reporting total number of
> phases as well.

Great idea, done.

> 2. IIUC, the above code in WaitForLockersMultiple can be written under
> if(progress) condition like rest of the progress checking code in that
> function
> and pass NULL for count otherwise.

Yep, good point.

> 3. Will it be useful to report pid's of the backend's
> for the transactions which CREATE INDEX concurrently is waiting for?
> I think it can be used to debug long running transactions.

Done.

> 4. Should we have additional statistics update phase before
> index_update_stats
> like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

Not sure about this one ... it's supposed to be a fairly quick
operation.

> 5. I think it may be useful to report number of parallel workers requested
> and number of workers
> actually running index build in case of btree.

True, I haven't done this one.  I'll add it next.

> 6. Also, this can be reported as an additional validation phase for
> exclusion constraint
> in index_build function as it involves scanning all live tuples of heap
> relation.
>
>  /*
>          * If it's for an exclusion constraint, make a second pass over the
> heap
>          * to verify that the constraint is satisfied.  We must not do this
> until
>          * the index is fully valid.  (Broken HOT chains shouldn't matter,
> though;
>          * see comments for IndexCheckExclusion.)
>          */
>         if (indexInfo->ii_ExclusionOps != NULL)
>                 IndexCheckExclusion(heapRelation, indexRelation, indexInfo);
> */
Hmm, I haven't looked into exclusion constraints thus far ... I suppose
this is a critical point to keep in mind.

Thanks for the review!

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-create-index-progress.patch (40K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Alvaro Herrera-9
I added metrics for the validate_index phases; patch attached.  This is
still a bit raw, but it looks much better now.  Here's a sample
concurrent index build on a 100M tuple table.  There are no concurrent
transactions, so phases that wait for lockers don't appear.  I'm not
making any effort to report metrics during phase 6, note.  In phase 7
I'm currently reporting only tuple counts; I think it'd be better to
report block numbers.  

I don't show it here, but when an index is built on a partitioned table,
the "partitions done" number goes up all the way to "partitions total"
and the phases come and go for each partition.

One thing is clear: given the phase mechanics of varying durations, it
seems hard to use these numbers to predict total index build time.


     now      |                         phase                          | blocks total | blocks done | tuples total | tuples done | partitions total | partitions done
--------------+--------------------------------------------------------+--------------+-------------+--------------+-------------+------------------+-----------------
 15:56:33.792 | building index (3 of 8): initializing (1/5)            |       442478 |          32 |            0 |           0 |                0 |               0
 15:56:33.805 | building index (3 of 8): initializing (1/5)            |       442478 |         188 |            0 |           0 |                0 |               0
     [snip about 500 lines]
 15:56:41.345 | building index (3 of 8): initializing (1/5)            |       442478 |      439855 |            0 |           0 |                0 |               0
 15:56:41.356 | building index (3 of 8): initializing (1/5)            |       442478 |      440288 |            0 |           0 |                0 |               0
 15:56:41.367 | building index (3 of 8): initializing (1/5)            |       442478 |      440778 |            0 |           0 |                0 |               0
 15:56:41.378 | building index (3 of 8): initializing (1/5)            |       442478 |      441706 |            0 |           0 |                0 |               0
 15:56:41.389 | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0
 15:56:41.4   | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0

     [snip 300 lines] ... I'm not sure what happened in this 3 seconds period.  No metrics were being updated.

 15:56:44.65  | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0
 15:56:44.661 | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0
 15:56:44.672 | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0
 15:56:44.682 | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0
 15:56:44.694 | building index (3 of 8): initializing (1/5)            |       442478 |      442399 |            0 |           0 |                0 |               0
 15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |       442478 |      442399 |    100000000 |           0 |                0 |               0
 15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |       442478 |      442399 |    100000000 |           0 |                0 |               0
 15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |       79057 |                0 |               0
 15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |      217018 |                0 |               0
 15:56:44.75  | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |      353804 |                0 |               0
 15:56:44.761 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |      487892 |                0 |               0
 15:56:44.773 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |      634136 |                0 |               0
    [snip 660 lines]
 15:56:52.47  | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |    99111337 |                0 |               0
 15:56:52.482 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |    99285701 |                0 |               0
 15:56:52.493 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |    99444763 |                0 |               0
 15:56:52.505 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |    99612313 |                0 |               0
 15:56:52.516 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |    99782666 |                0 |               0
 15:56:52.528 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |    99937524 |                0 |               0
 15:56:52.54  | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.551 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.561 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.572 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.583 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.594 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.604 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.615 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.626 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.637 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.648 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.658 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.669 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.68  | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.691 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.701 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.712 | building index (3 of 8): final btree sort & load (5/5) |       442478 |      442399 |    100000000 |   100000000 |                0 |               0
 15:56:52.723 | validating index scan (phase 5 of 8)                   |       274194 |        1049 |            0 |           0 |                0 |               0
 15:56:52.734 | validating index scan (phase 5 of 8)                   |       274194 |        2676 |            0 |           0 |                0 |               0
 15:56:52.744 | validating index scan (phase 5 of 8)                   |       274194 |        2876 |            0 |           0 |                0 |               0
 15:56:52.755 | validating index scan (phase 5 of 8)                   |       274194 |        4194 |            0 |           0 |                0 |               0
    [snip 400 lines]
 15:56:57.031 | validating index scan (phase 5 of 8)                   |       274194 |      268343 |            0 |           0 |                0 |               0
 15:56:57.042 | validating index scan (phase 5 of 8)                   |       274194 |      268479 |            0 |           0 |                0 |               0
 15:56:57.053 | validating index scan (phase 5 of 8)                   |       274194 |      270027 |            0 |           0 |                0 |               0
 15:56:57.064 | validating index scan (phase 5 of 8)                   |       274194 |      271580 |            0 |           0 |                0 |               0
 15:56:57.075 | validating index scan (phase 5 of 8)                   |       274194 |      273121 |            0 |           0 |                0 |               0
 15:56:57.085 | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.096 | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.107 | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.118 | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.128 | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.139 | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.15  | sorting index scan results (phase 6 of 8)              |       274194 |      274193 |            0 |           0 |                0 |               0
 15:56:57.161 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |       50152 |                0 |               0
 15:56:57.172 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |      175602 |                0 |               0
 15:56:57.182 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |      305326 |                0 |               0
 15:56:57.193 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |      430142 |                0 |               0
      [snip 730 lines]
 15:57:05.003 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |    99125556 |                0 |               0
 15:57:05.013 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |    99276471 |                0 |               0
 15:57:05.024 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |    99425041 |                0 |               0
 15:57:05.035 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |    99580174 |                0 |               0
 15:57:05.045 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |    99720505 |                0 |               0
 15:57:05.056 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |    99862311 |                0 |               0
 15:57:05.067 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.077 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.088 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.099 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.109 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.12  | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.131 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.142 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.152 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.163 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
 15:57:05.174 | validate index heapscan (phase 7 of 8)                 |            0 |           0 |    100000000 |   100000000 |                0 |               0
(2862 filas)

and then it's done.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v3-create-index-progress.patch (44K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Simon Riggs
On Wed, 13 Feb 2019 at 00:46, Alvaro Herrera <[hidden email]> wrote:
Here's a sample
concurrent index build on a 100M tuple table.

Cool
 
There are no concurrent
transactions, so phases that wait for lockers don't appear.

Would prefer to see them, so we know they are zero. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Amit Langote-2
On 2019/02/13 11:59, Simon Riggs wrote:
> On Wed, 13 Feb 2019 at 00:46, Alvaro Herrera <[hidden email]>
> wrote:
>
>> Here's a sample
>> concurrent index build on a 100M tuple table.
>
>
> Cool

+1

Looking at the "([phase] x of x)" in the sample output, I thought
pg_stat_progress_vacuum's output should've had it too (not a concern of
Alvaro's patch though.)

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Amit Langote-2
In reply to this post by Alvaro Herrera-9
Hi Alvaro,

On 2019/02/12 12:18, Alvaro Herrera wrote:
> I ended up defining phases for the index_build phase in the AM itself;
> the code reports a phase number using the regular API, and then the
> pgstat_progress view obtains the name of each phase using a support
> method.

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 05102724ead..189179a5667 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -127,6 +127,7 @@ typedef struct IndexAmRoutine
     amcostestimate_function amcostestimate;
     amoptions_function amoptions;
     amproperty_function amproperty;     /* can be NULL */
+    amphasename_function amphasename;   /* can be NULL */

Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild?  Maybe ambuildphase?

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: monitoring CREATE INDEX [CONCURRENTLY]

Tatsuro Yamada
In reply to this post by Alvaro Herrera-9
On 2019/02/13 4:16, Alvaro Herrera wrote:
> I added metrics for the validate_index phases; patch attached.  This is
> still a bit raw, but it looks much better now.  Here's a sample
> concurrent index build on a 100M tuple table.  There are no concurrent

Great!

+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param2 WHEN 0 THEN 'initializing (phase 1 of 8)'
+  WHEN 1 THEN 'waiting for lockers 1 (phase 2 of 8)'
+  WHEN 2 THEN 'building index (3 of 8): ' ||
+ pg_indexam_progress_phasename(s.param1::oid, s.param3)

It would be better to replace "s" with "S".

s/s.pid/S.pid/
s/s.param2/S.param2/
s/s.param1::oid, s.param3/S.param1::oid, S.param3/

These are not important comments but for readability. :)

Thanks,
Tatsuro Yamada