Re: Block level parallel vacuum

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
280 messages Options
123456 ... 14
Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
On Thu, Mar 14, 2019 at 6:41 AM Robert Haas <[hidden email]> wrote:

>
> On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <[hidden email]> wrote:
> > I don't have a strong opinion but the using a Node would be more
> > suitable in the future when we add more options to vacuum. And it
> > seems to me that it's unlikely to change a Node to a plain struct. So
> > there is an idea of doing it now anyway if we might need to do it
> > someday.
>
> I just tried to apply 0001 again and noticed a conflict in the
> autovac_table structure in postmaster.c.
>
> That conflict got me thinking: aren't parameters and options an awful
> lot alike?  Why do we need to pass around a VacuumOptions structure
> *and* a VacuumParams structure to all of these functions?  Couldn't we
> just have one?  That led to the attached patch, which just gets rid of
> the separate options flag and folds it into VacuumParams.
Indeed. I like this approach. The comment of vacuum() says,

* options is a bitmask of VacuumOption flags, indicating what to do.
* (snip)
* params contains a set of parameters that can be used to customize the
* behavior.

It seems to me that the purpose of both variables are different. But
it would be acceptable even if we merge them.

BTW your patch seems to not apply to the current HEAD cleanly and to
need to update the comment of vacuum().

> If we took
> this approach, the degree of parallelism would just be another thing
> that would get added to VacuumParams, and VacuumOptions wouldn't end
> up existing at all.
>

Agreed.

> This patch does not address the question of what the *parse tree*
> representation of the PARALLEL option should look like; the idea would
> be that ExecVacuum() would need to extra the value for that option and
> put it into VacuumParams just as it already does for various other
> things in VacuumParams.  Maybe the most natural approach would be to
> convert the grammar productions for the VACUUM options list so that
> they just build a list of DefElems, and then have ExecVacuum() iterate
> over that list and make sense of it, as for example ExplainQuery()
> already does.
>
Agreed. That change would help for the discussion changing VACUUM
option syntax to field-and-value style.

Attached the updated patch you proposed and the patch that converts
the grammer productions for the VACUUM option on top of the former
patch. The latter patch moves VacuumOption to vacuum.h since the
parser no longer needs such information.

If we take this direction I will change the parallel vacuum patch so
that it adds new PARALLEL option and adds 'nworkers' to VacuumParams.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

vacuum-grammer.patch (16K) Download Attachment
vacuum-options-into-params_v2.patch (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Masahiko Sawada
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <[hidden email]> wrote:

>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <[hidden email]> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <[hidden email]> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>
Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |    head    | speed_up
---------+-----------------+------------+------------+----------
      0 |               0 |   238.2085 |   244.7625 |   1.0275
      0 |               1 |   237.7050 |   244.7625 |   1.0297
      0 |               2 |   238.0390 |   244.7625 |   1.0282
      0 |               4 |   238.1045 |   244.7625 |   1.0280
      0 |               8 |   237.8995 |   244.7625 |   1.0288
      0 |              16 |   237.7775 |   244.7625 |   1.0294
      1 |               0 |  1328.8590 |  1334.9125 |   1.0046
      1 |               1 |  1325.9140 |  1334.9125 |   1.0068
      1 |               2 |  1333.3665 |  1334.9125 |   1.0012
      1 |               4 |  1329.5205 |  1334.9125 |   1.0041
      1 |               8 |  1334.2255 |  1334.9125 |   1.0005
      1 |              16 |  1335.1510 |  1334.9125 |   0.9998
      2 |               0 |  2426.2905 |  2427.5165 |   1.0005
      2 |               1 |  1416.0595 |  2427.5165 |   1.7143
      2 |               2 |  1411.6270 |  2427.5165 |   1.7197
      2 |               4 |  1411.6490 |  2427.5165 |   1.7196
      2 |               8 |  1410.1750 |  2427.5165 |   1.7214
      2 |              16 |  1413.4985 |  2427.5165 |   1.7174
      4 |               0 |  4622.5060 |  4619.0340 |   0.9992
      4 |               1 |  2536.8435 |  4619.0340 |   1.8208
      4 |               2 |  2548.3615 |  4619.0340 |   1.8126
      4 |               4 |  1467.9655 |  4619.0340 |   3.1466
      4 |               8 |  1486.3155 |  4619.0340 |   3.1077
      4 |              16 |  1481.7150 |  4619.0340 |   3.1174
      8 |               0 |  9039.3810 |  8990.4735 |   0.9946
      8 |               1 |  4807.5880 |  8990.4735 |   1.8701
      8 |               2 |  3786.7620 |  8990.4735 |   2.3742
      8 |               4 |  2924.2205 |  8990.4735 |   3.0745
      8 |               8 |  2684.2545 |  8990.4735 |   3.3493
      8 |              16 |  2672.9800 |  8990.4735 |   3.3635
     16 |               0 | 17821.4715 | 17740.1300 |   0.9954
     16 |               1 |  9318.3810 | 17740.1300 |   1.9038
     16 |               2 |  7260.6315 | 17740.1300 |   2.4433
     16 |               4 |  5538.5225 | 17740.1300 |   3.2030
     16 |               8 |  5368.5255 | 17740.1300 |   3.3045
     16 |              16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

Attached the updated version patches. The patches apply to the current
HEAD cleanly but the 0001 patch still changes the vacuum option to a
Node since it's under the discussion. After the direction has been
decided, I'll update the patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

v17-0001-Make-vacuum-options-a-Node.patch (39K) Download Attachment
v17-0002-Add-parallel-option-to-VACUUM-command.patch (74K) Download Attachment
v17-0003-Add-paralell-P-option-to-vacuumdb-command.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Kyotaro HORIGUCHI-2
Hello.

At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>

> Here is the performance test results. I've setup a 500MB table with
> several indexes and made 10% of table dirty before each vacuum.
> Compared execution time of the patched postgrse with the current HEAD
> (at 'speed_up' column). In my environment,
>
>  indexes | parallel_degree |  patched   |    head    | speed_up
> ---------+-----------------+------------+------------+----------
>       0 |               0 |   238.2085 |   244.7625 |   1.0275
>       0 |               1 |   237.7050 |   244.7625 |   1.0297
>       0 |               2 |   238.0390 |   244.7625 |   1.0282
>       0 |               4 |   238.1045 |   244.7625 |   1.0280
>       0 |               8 |   237.8995 |   244.7625 |   1.0288
>       0 |              16 |   237.7775 |   244.7625 |   1.0294
>       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
>       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
>       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
>       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
>       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
>       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
>       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
>       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
>       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
>       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
>       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
>       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
>       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
>       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
>       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
>       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
>       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
>       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
>       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
>       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
>       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
>       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
>       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
>       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
>      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
>      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
>      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
>      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
>      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
>      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
> (36 rows)

For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
almost the same. I suspect that the indexes are too-small and all
the index pages were on memory and CPU is saturated. Maybe you
had four cores and parallel workers more than the number had no
effect.  Other normal backends should have been able do almost
nothing meanwhile. Usually the number of parallel workers is
determined so that IO capacity is filled up but this feature
intermittently saturates CPU capacity very under such a
situation.

I'm not sure, but what if we do index vacuum in one-tuple-by-one
manner? That is, heap vacuum passes dead tuple one-by-one (or
buffering few tuples) to workers and workers process it not by
bulkdelete, but just tuple_delete (we don't have one). That could
avoid the sleep time of heap-scan while index bulkdelete.


> Attached the updated version patches. The patches apply to the current
> HEAD cleanly but the 0001 patch still changes the vacuum option to a
> Node since it's under the discussion. After the direction has been
> decided, I'll update the patches.

As for the to-be-or-not-to-be a node problem, I don't think it is
needed but from the point of consistency, it seems reasonable and
it is seen in other nodes that *Stmt Node holds option Node. But
makeVacOpt and it's usage, and subsequent operations on the node
look somewhat strange.. Why don't you just do
"makeNode(VacuumOptions)"?


>+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);

If I understand this correctly, nindexes is always > 1 there. At
lesat asserted that > 0 there.

>+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),

I don't think the name is good. (dt menant detach by the first look for me..)

>+        if (lps->nworkers_requested > 0)
>+            appendStringInfo(&buf,
>+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",

"planned"?


>+        /* Get the next index to vacuum */
>+        if (do_parallel)
>+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
>+        else
>+            idx = nprocessed++;

It seems that both of the two cases can be handled using
LVParallelState and most of the branches by lps or do_parallel
can be removed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Robert Haas
In reply to this post by Masahiko Sawada
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <[hidden email]> wrote:
> BTW your patch seems to not apply to the current HEAD cleanly and to
> need to update the comment of vacuum().

Yeah, I omitted some hunks by being stupid with 'git'.

Since you seem to like the approach, I put back the hunks I intended
to have there, pulled in one change from your v2 that looked good,
made one other tweak, and committed this.  I think I like what I did
with vacuum_open_relation a bit better than what you did; actually, I
think it cannot be right to just pass 'params' when the current code
is passing params->options & ~(VACOPT_VACUUM).  My approach avoids
that particular pitfall.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Robert Haas
In reply to this post by Masahiko Sawada
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <[hidden email]> wrote:
> Attached the updated patch you proposed and the patch that converts
> the grammer productions for the VACUUM option on top of the former
> patch. The latter patch moves VacuumOption to vacuum.h since the
> parser no longer needs such information.

Committed.

> If we take this direction I will change the parallel vacuum patch so
> that it adds new PARALLEL option and adds 'nworkers' to VacuumParams.

Sounds good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Robert Haas
On Tue, Mar 19, 2019 at 3:05 AM Robert Haas <[hidden email]> wrote:

>
> On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <[hidden email]> wrote:
> > BTW your patch seems to not apply to the current HEAD cleanly and to
> > need to update the comment of vacuum().
>
> Yeah, I omitted some hunks by being stupid with 'git'.
>
> Since you seem to like the approach, I put back the hunks I intended
> to have there, pulled in one change from your v2 that looked good,
> made one other tweak, and committed this.

Thank you!

>   I think I like what I did
> with vacuum_open_relation a bit better than what you did; actually, I
> think it cannot be right to just pass 'params' when the current code
> is passing params->options & ~(VACOPT_VACUUM).  My approach avoids
> that particular pitfall.

Agreed. Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Haribabu Kommi-2
In reply to this post by Masahiko Sawada

On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <[hidden email]> wrote:
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <[hidden email]> wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <[hidden email]> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <[hidden email]> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>

Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |    head    | speed_up
---------+-----------------+------------+------------+----------
      0 |               0 |   238.2085 |   244.7625 |   1.0275
      0 |               1 |   237.7050 |   244.7625 |   1.0297
      0 |               2 |   238.0390 |   244.7625 |   1.0282
      0 |               4 |   238.1045 |   244.7625 |   1.0280
      0 |               8 |   237.8995 |   244.7625 |   1.0288
      0 |              16 |   237.7775 |   244.7625 |   1.0294
      1 |               0 |  1328.8590 |  1334.9125 |   1.0046
      1 |               1 |  1325.9140 |  1334.9125 |   1.0068
      1 |               2 |  1333.3665 |  1334.9125 |   1.0012
      1 |               4 |  1329.5205 |  1334.9125 |   1.0041
      1 |               8 |  1334.2255 |  1334.9125 |   1.0005
      1 |              16 |  1335.1510 |  1334.9125 |   0.9998
      2 |               0 |  2426.2905 |  2427.5165 |   1.0005
      2 |               1 |  1416.0595 |  2427.5165 |   1.7143
      2 |               2 |  1411.6270 |  2427.5165 |   1.7197
      2 |               4 |  1411.6490 |  2427.5165 |   1.7196
      2 |               8 |  1410.1750 |  2427.5165 |   1.7214
      2 |              16 |  1413.4985 |  2427.5165 |   1.7174
      4 |               0 |  4622.5060 |  4619.0340 |   0.9992
      4 |               1 |  2536.8435 |  4619.0340 |   1.8208
      4 |               2 |  2548.3615 |  4619.0340 |   1.8126
      4 |               4 |  1467.9655 |  4619.0340 |   3.1466
      4 |               8 |  1486.3155 |  4619.0340 |   3.1077
      4 |              16 |  1481.7150 |  4619.0340 |   3.1174
      8 |               0 |  9039.3810 |  8990.4735 |   0.9946
      8 |               1 |  4807.5880 |  8990.4735 |   1.8701
      8 |               2 |  3786.7620 |  8990.4735 |   2.3742
      8 |               4 |  2924.2205 |  8990.4735 |   3.0745
      8 |               8 |  2684.2545 |  8990.4735 |   3.3493
      8 |              16 |  2672.9800 |  8990.4735 |   3.3635
     16 |               0 | 17821.4715 | 17740.1300 |   0.9954
     16 |               1 |  9318.3810 | 17740.1300 |   1.9038
     16 |               2 |  7260.6315 | 17740.1300 |   2.4433
     16 |               4 |  5538.5225 | 17740.1300 |   3.2030
     16 |               8 |  5368.5255 | 17740.1300 |   3.3045
     16 |              16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

The performance results are good. Do we want to add the recommended
size in the document for the parallel option? the parallel option for smaller
tables can lead to performance overhead.

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Kyotaro HORIGUCHI-2
On Mon, Mar 18, 2019 at 7:06 PM Kyotaro HORIGUCHI
<[hidden email]> wrote:

>
> Hello.
>
> At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > Here is the performance test results. I've setup a 500MB table with
> > several indexes and made 10% of table dirty before each vacuum.
> > Compared execution time of the patched postgrse with the current HEAD
> > (at 'speed_up' column). In my environment,
> >
> >  indexes | parallel_degree |  patched   |    head    | speed_up
> > ---------+-----------------+------------+------------+----------
> >       0 |               0 |   238.2085 |   244.7625 |   1.0275
> >       0 |               1 |   237.7050 |   244.7625 |   1.0297
> >       0 |               2 |   238.0390 |   244.7625 |   1.0282
> >       0 |               4 |   238.1045 |   244.7625 |   1.0280
> >       0 |               8 |   237.8995 |   244.7625 |   1.0288
> >       0 |              16 |   237.7775 |   244.7625 |   1.0294
> >       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
> >       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
> >       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
> >       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
> >       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
> >       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
> >       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
> >       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
> >       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
> >       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
> >       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
> >       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
> >       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
> >       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
> >       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
> >       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
> >       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
> >       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
> >       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
> >       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
> >       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
> >       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
> >       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
> >       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
> >      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
> >      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
> >      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
> >      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
> >      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
> >      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
> > (36 rows)
>
> For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> almost the same. I suspect that the indexes are too-small and all
> the index pages were on memory and CPU is saturated. Maybe you
> had four cores and parallel workers more than the number had no
> effect.  Other normal backends should have been able do almost
> nothing meanwhile. Usually the number of parallel workers is
> determined so that IO capacity is filled up but this feature
> intermittently saturates CPU capacity very under such a
> situation.
>

I'm sorry I didn't make it clear enough. If the parallel degree is
higher than 'the number of indexes - 1' redundant workers are not
launched. So for indexes=4, 8, 16 the number of actually launched
parallel workers is up to 3, 7, 15 respectively. That's why the result
shows almost the same execution time in the cases where nindexes <=
parallel_degree.

I'll share the performance test result of more larger tables and indexes.

> I'm not sure, but what if we do index vacuum in one-tuple-by-one
> manner? That is, heap vacuum passes dead tuple one-by-one (or
> buffering few tuples) to workers and workers process it not by
> bulkdelete, but just tuple_delete (we don't have one). That could
> avoid the sleep time of heap-scan while index bulkdelete.
>

Just to be clear, in parallel lazy vacuum all parallel vacuum
processes including the leader process do index vacuuming, no one
doesn't sleep during index vacuuming. The leader process does heap
scan and launches parallel workers before index vacuuming. Each
processes exclusively processes indexes one by one.

Such index deletion method could be an optimization but I'm not sure
that the calling tuple_delete many times would be faster than one
bulkdelete. If there are many dead tuples vacuum has to call
tuple_delete as much as dead tuples. In general one seqscan is faster
than tons of indexscan. There is the proposal for such one by one
index deletions[1] but it's not a replacement of bulkdelete.

>
> > Attached the updated version patches. The patches apply to the current
> > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > Node since it's under the discussion. After the direction has been
> > decided, I'll update the patches.
>
> As for the to-be-or-not-to-be a node problem, I don't think it is
> needed but from the point of consistency, it seems reasonable and
> it is seen in other nodes that *Stmt Node holds option Node. But
> makeVacOpt and it's usage, and subsequent operations on the node
> look somewhat strange.. Why don't you just do
> "makeNode(VacuumOptions)"?

Thank you for the comment but this part has gone away as the recent
commit changed the grammar production of vacuum command.

>
>
> >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
>
> If I understand this correctly, nindexes is always > 1 there. At
> lesat asserted that > 0 there.
>
> >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
>
> I don't think the name is good. (dt menant detach by the first look for me..)

Fixed.

>
> >+        if (lps->nworkers_requested > 0)
> >+            appendStringInfo(&buf,
> >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",
>
> "planned"?

The 'planned' shows how many parallel workers we planned to launch.
The degree of parallelism is determined based on either user request
or the number of indexes that the table has.

>
>
> >+        /* Get the next index to vacuum */
> >+        if (do_parallel)
> >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> >+        else
> >+            idx = nprocessed++;
>
> It seems that both of the two cases can be handled using
> LVParallelState and most of the branches by lps or do_parallel
> can be removed.
>

Sorry I couldn't get your comment. You meant to move nprocessed to
LVParallelState?

[1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Kyotaro HORIGUCHI-2
At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>

> > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > almost the same. I suspect that the indexes are too-small and all
> > the index pages were on memory and CPU is saturated. Maybe you
> > had four cores and parallel workers more than the number had no
> > effect.  Other normal backends should have been able do almost
> > nothing meanwhile. Usually the number of parallel workers is
> > determined so that IO capacity is filled up but this feature
> > intermittently saturates CPU capacity very under such a
> > situation.
> >
>
> I'm sorry I didn't make it clear enough. If the parallel degree is
> higher than 'the number of indexes - 1' redundant workers are not
> launched. So for indexes=4, 8, 16 the number of actually launched
> parallel workers is up to 3, 7, 15 respectively. That's why the result
> shows almost the same execution time in the cases where nindexes <=
> parallel_degree.

In the 16 indexes case, the performance saturated at 4 workers
which contradicts to your explanation.

> I'll share the performance test result of more larger tables and indexes.
>
> > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > buffering few tuples) to workers and workers process it not by
> > bulkdelete, but just tuple_delete (we don't have one). That could
> > avoid the sleep time of heap-scan while index bulkdelete.
> >
>
> Just to be clear, in parallel lazy vacuum all parallel vacuum
> processes including the leader process do index vacuuming, no one
> doesn't sleep during index vacuuming. The leader process does heap
> scan and launches parallel workers before index vacuuming. Each
> processes exclusively processes indexes one by one.

The leader doesn't continue heap-scan while index vacuuming is
running. And the index-page-scan seems eat up CPU easily. If
index vacuum can run simultaneously with the next heap scan
phase, we can make index scan finishes almost the same time with
the next round of heap scan. It would reduce the (possible) CPU
contention. But this requires as the twice size of shared
memoryas the current implement.

> Such index deletion method could be an optimization but I'm not sure
> that the calling tuple_delete many times would be faster than one
> bulkdelete. If there are many dead tuples vacuum has to call
> tuple_delete as much as dead tuples. In general one seqscan is faster
> than tons of indexscan. There is the proposal for such one by one
> index deletions[1] but it's not a replacement of bulkdelete.

I'm not sure what you mean by 'replacement' but it depends on how
large part of a table is removed at once. As mentioned in the
thread. But unfortunately it doesn't seem easy to do..

> > > Attached the updated version patches. The patches apply to the current
> > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > Node since it's under the discussion. After the direction has been
> > > decided, I'll update the patches.
> >
> > As for the to-be-or-not-to-be a node problem, I don't think it is
> > needed but from the point of consistency, it seems reasonable and
> > it is seen in other nodes that *Stmt Node holds option Node. But
> > makeVacOpt and it's usage, and subsequent operations on the node
> > look somewhat strange.. Why don't you just do
> > "makeNode(VacuumOptions)"?
>
> Thank you for the comment but this part has gone away as the recent
> commit changed the grammar production of vacuum command.

Oops!


> > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> >
> > If I understand this correctly, nindexes is always > 1 there. At
> > lesat asserted that > 0 there.
> >
> > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> >
> > I don't think the name is good. (dt menant detach by the first look for me..)
>
> Fixed.
>
> >
> > >+        if (lps->nworkers_requested > 0)
> > >+            appendStringInfo(&buf,
> > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",
> >
> > "planned"?
>
> The 'planned' shows how many parallel workers we planned to launch.
> The degree of parallelism is determined based on either user request
> or the number of indexes that the table has.
>
> >
> >
> > >+        /* Get the next index to vacuum */
> > >+        if (do_parallel)
> > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > >+        else
> > >+            idx = nprocessed++;
> >
> > It seems that both of the two cases can be handled using
> > LVParallelState and most of the branches by lps or do_parallel
> > can be removed.
> >
>
> Sorry I couldn't get your comment. You meant to move nprocessed to
> LVParallelState?

Exactly. I meant letting lvshared points to private memory, but
it might introduce confusion.


> [1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Haribabu Kommi-2
On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
<[hidden email]> wrote:

>
>
> On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <[hidden email]> wrote:
>>
>> On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <[hidden email]> wrote:
>> >
>> > On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <[hidden email]> wrote:
>> > >
>> > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <[hidden email]> wrote:
>> > >>
>> > >> Thank you. Attached the rebased patch.
>> > >
>> > >
>> > > I ran some performance tests to compare the parallelism benefits,
>> >
>> > Thank you for testing!
>> >
>> > > but I got some strange results of performance overhead, may be it is
>> > > because, I tested it on my laptop.
>> >
>> > Hmm, I think the parallel vacuum would help for heavy workloads like a
>> > big table with multiple indexes. In your test result, all executions
>> > are completed within 1 sec, which seems to be one use case that the
>> > parallel vacuum wouldn't help. I suspect that the table is small,
>> > right? Anyway I'll also do performance tests.
>> >
>>
>> Here is the performance test results. I've setup a 500MB table with
>> several indexes and made 10% of table dirty before each vacuum.
>> Compared execution time of the patched postgrse with the current HEAD
>> (at 'speed_up' column). In my environment,
>>
>>  indexes | parallel_degree |  patched   |    head    | speed_up
>> ---------+-----------------+------------+------------+----------
>>       0 |               0 |   238.2085 |   244.7625 |   1.0275
>>       0 |               1 |   237.7050 |   244.7625 |   1.0297
>>       0 |               2 |   238.0390 |   244.7625 |   1.0282
>>       0 |               4 |   238.1045 |   244.7625 |   1.0280
>>       0 |               8 |   237.8995 |   244.7625 |   1.0288
>>       0 |              16 |   237.7775 |   244.7625 |   1.0294
>>       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
>>       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
>>       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
>>       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
>>       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
>>       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
>>       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
>>       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
>>       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
>>       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
>>       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
>>       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
>>       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
>>       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
>>       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
>>       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
>>       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
>>       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
>>       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
>>       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
>>       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
>>       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
>>       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
>>       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
>>      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
>>      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
>>      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
>>      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
>>      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
>>      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
>> (36 rows)
>
>
> The performance results are good. Do we want to add the recommended
> size in the document for the parallel option? the parallel option for smaller
> tables can lead to performance overhead.
>

Hmm, I don't think we can add the specific recommended size because
the performance gain by parallel lazy vacuum depends on various things
such as CPU cores, the number of indexes, shared buffer size, index
types, HDD or SSD. I suppose that users who want to use this option
have some sort of performance problem such as that vacuum takes a very
long time. They would use it for relatively larger tables.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Kyotaro HORIGUCHI-2
On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
<[hidden email]> wrote:

>
> At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > almost the same. I suspect that the indexes are too-small and all
> > > the index pages were on memory and CPU is saturated. Maybe you
> > > had four cores and parallel workers more than the number had no
> > > effect.  Other normal backends should have been able do almost
> > > nothing meanwhile. Usually the number of parallel workers is
> > > determined so that IO capacity is filled up but this feature
> > > intermittently saturates CPU capacity very under such a
> > > situation.
> > >
> >
> > I'm sorry I didn't make it clear enough. If the parallel degree is
> > higher than 'the number of indexes - 1' redundant workers are not
> > launched. So for indexes=4, 8, 16 the number of actually launched
> > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > shows almost the same execution time in the cases where nindexes <=
> > parallel_degree.
>
> In the 16 indexes case, the performance saturated at 4 workers
> which contradicts to your explanation.

Because the machine I used has 4 cores the performance doesn't get
improved even if more than 4 parallel workers are launched.

>
> > I'll share the performance test result of more larger tables and indexes.
> >
> > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > buffering few tuples) to workers and workers process it not by
> > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > avoid the sleep time of heap-scan while index bulkdelete.
> > >
> >
> > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > processes including the leader process do index vacuuming, no one
> > doesn't sleep during index vacuuming. The leader process does heap
> > scan and launches parallel workers before index vacuuming. Each
> > processes exclusively processes indexes one by one.
>
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.

Yeah, I've considered that something like pipe-lining approach that
one process continue to queue the dead tuples and other process
fetches and processes them during index vacuuming but the current
version patch employed the most simple approach as the first step.
Once we had the retail index deletion approach we might be able to use
it for parallel vacuum.

>
> > Such index deletion method could be an optimization but I'm not sure
> > that the calling tuple_delete many times would be faster than one
> > bulkdelete. If there are many dead tuples vacuum has to call
> > tuple_delete as much as dead tuples. In general one seqscan is faster
> > than tons of indexscan. There is the proposal for such one by one
> > index deletions[1] but it's not a replacement of bulkdelete.
>
> I'm not sure what you mean by 'replacement' but it depends on how
> large part of a table is removed at once. As mentioned in the
> thread. But unfortunately it doesn't seem easy to do..
>
> > > > Attached the updated version patches. The patches apply to the current
> > > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > > Node since it's under the discussion. After the direction has been
> > > > decided, I'll update the patches.
> > >
> > > As for the to-be-or-not-to-be a node problem, I don't think it is
> > > needed but from the point of consistency, it seems reasonable and
> > > it is seen in other nodes that *Stmt Node holds option Node. But
> > > makeVacOpt and it's usage, and subsequent operations on the node
> > > look somewhat strange.. Why don't you just do
> > > "makeNode(VacuumOptions)"?
> >
> > Thank you for the comment but this part has gone away as the recent
> > commit changed the grammar production of vacuum command.
>
> Oops!
>
>
> > > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> > >
> > > If I understand this correctly, nindexes is always > 1 there. At
> > > lesat asserted that > 0 there.
> > >
> > > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> > >
> > > I don't think the name is good. (dt menant detach by the first look for me..)
> >
> > Fixed.
> >
> > >
> > > >+        if (lps->nworkers_requested > 0)
> > > >+            appendStringInfo(&buf,
> > > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",
> > >
> > > "planned"?
> >
> > The 'planned' shows how many parallel workers we planned to launch.
> > The degree of parallelism is determined based on either user request
> > or the number of indexes that the table has.
> >
> > >
> > >
> > > >+        /* Get the next index to vacuum */
> > > >+        if (do_parallel)
> > > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > > >+        else
> > > >+            idx = nprocessed++;
> > >
> > > It seems that both of the two cases can be handled using
> > > LVParallelState and most of the branches by lps or do_parallel
> > > can be removed.
> > >
> >
> > Sorry I couldn't get your comment. You meant to move nprocessed to
> > LVParallelState?
>
> Exactly. I meant letting lvshared points to private memory, but
> it might introduce confusion.

Hmm, I'm not sure it would be a good idea. It would introduce
confusion as you mentioned. And since 'nprocessed' have to be
pg_atomic_uint32 in parallel mode we will end up with having an
another branch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Kyotaro HORIGUCHI-2
At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=[hidden email]>

> On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
> <[hidden email]> wrote:
> >
> > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > > almost the same. I suspect that the indexes are too-small and all
> > > > the index pages were on memory and CPU is saturated. Maybe you
> > > > had four cores and parallel workers more than the number had no
> > > > effect.  Other normal backends should have been able do almost
> > > > nothing meanwhile. Usually the number of parallel workers is
> > > > determined so that IO capacity is filled up but this feature
> > > > intermittently saturates CPU capacity very under such a
> > > > situation.
> > > >
> > >
> > > I'm sorry I didn't make it clear enough. If the parallel degree is
> > > higher than 'the number of indexes - 1' redundant workers are not
> > > launched. So for indexes=4, 8, 16 the number of actually launched
> > > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > > shows almost the same execution time in the cases where nindexes <=
> > > parallel_degree.
> >
> > In the 16 indexes case, the performance saturated at 4 workers
> > which contradicts to your explanation.
>
> Because the machine I used has 4 cores the performance doesn't get
> improved even if more than 4 parallel workers are launched.

That is what I mentioned in the cited phrases. Sorry for perhaps
hard-to-read phrases..

> >
> > > I'll share the performance test result of more larger tables and indexes.
> > >
> > > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > > buffering few tuples) to workers and workers process it not by
> > > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > > avoid the sleep time of heap-scan while index bulkdelete.
> > > >
> > >
> > > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > > processes including the leader process do index vacuuming, no one
> > > doesn't sleep during index vacuuming. The leader process does heap
> > > scan and launches parallel workers before index vacuuming. Each
> > > processes exclusively processes indexes one by one.
> >
> > The leader doesn't continue heap-scan while index vacuuming is
> > running. And the index-page-scan seems eat up CPU easily. If
> > index vacuum can run simultaneously with the next heap scan
> > phase, we can make index scan finishes almost the same time with
> > the next round of heap scan. It would reduce the (possible) CPU
> > contention. But this requires as the twice size of shared
> > memoryas the current implement.
>
> Yeah, I've considered that something like pipe-lining approach that
> one process continue to queue the dead tuples and other process
> fetches and processes them during index vacuuming but the current
> version patch employed the most simple approach as the first step.
> Once we had the retail index deletion approach we might be able to use
> it for parallel vacuum.

Ok, I understood the direction.

...

> > > Sorry I couldn't get your comment. You meant to move nprocessed to
> > > LVParallelState?
> >
> > Exactly. I meant letting lvshared points to private memory, but
> > it might introduce confusion.
>
> Hmm, I'm not sure it would be a good idea. It would introduce
> confusion as you mentioned. And since 'nprocessed' have to be
> pg_atomic_uint32 in parallel mode we will end up with having an
> another branch.

Ok. Agreed. Thank you for the pacience.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Kyotaro HORIGUCHI-2
In reply to this post by Masahiko Sawada
At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>

> On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
> <[hidden email]> wrote:
> > The performance results are good. Do we want to add the recommended
> > size in the document for the parallel option? the parallel option for smaller
> > tables can lead to performance overhead.
> >
>
> Hmm, I don't think we can add the specific recommended size because
> the performance gain by parallel lazy vacuum depends on various things
> such as CPU cores, the number of indexes, shared buffer size, index
> types, HDD or SSD. I suppose that users who want to use this option
> have some sort of performance problem such as that vacuum takes a very
> long time. They would use it for relatively larger tables.

Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side effect of this feature is required for those who are to use the feature.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Kyotaro HORIGUCHI-2
On Tue, Mar 19, 2019 at 7:15 PM Kyotaro HORIGUCHI
<[hidden email]> wrote:

>
> At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=[hidden email]>
> > On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
> > <[hidden email]> wrote:
> > >
> > > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > > > almost the same. I suspect that the indexes are too-small and all
> > > > > the index pages were on memory and CPU is saturated. Maybe you
> > > > > had four cores and parallel workers more than the number had no
> > > > > effect.  Other normal backends should have been able do almost
> > > > > nothing meanwhile. Usually the number of parallel workers is
> > > > > determined so that IO capacity is filled up but this feature
> > > > > intermittently saturates CPU capacity very under such a
> > > > > situation.
> > > > >
> > > >
> > > > I'm sorry I didn't make it clear enough. If the parallel degree is
> > > > higher than 'the number of indexes - 1' redundant workers are not
> > > > launched. So for indexes=4, 8, 16 the number of actually launched
> > > > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > > > shows almost the same execution time in the cases where nindexes <=
> > > > parallel_degree.
> > >
> > > In the 16 indexes case, the performance saturated at 4 workers
> > > which contradicts to your explanation.
> >
> > Because the machine I used has 4 cores the performance doesn't get
> > improved even if more than 4 parallel workers are launched.
>
> That is what I mentioned in the cited phrases. Sorry for perhaps
> hard-to-read phrases..
I understood now. Thank you!


Attached the updated version patches incorporated all review comments.

Commit 6776142 changed the grammar production of vacuum command. This
patch adds PARALLEL option on top of the commit.

I realized that the commit 6776142 breaks indents in ExecVacuum() and
the including nodes/parsenodes.h is no longer needed. Sorry that's my
wrong. Attached the patch (vacuum_fix.patch) fixes them, although the
indent issue will be resolved by pgindent before releasing.

In parsing vacuum command, since only PARALLEL option can have an
argument I've added the check in ExecVacuum to erroring out when other
options have an argument. But it might be good to make other vacuum
options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
argument just like EXPLAIN command.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

vacuum_fix.patch (1K) Download Attachment
v18-0001-Add-parallel-option-to-VACUUM-command.patch (74K) Download Attachment
v18-0002-Add-paralell-P-option-to-vacuumdb-command.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
In reply to this post by Kyotaro HORIGUCHI-2
On Tue, Mar 19, 2019 at 7:29 PM Kyotaro HORIGUCHI
<[hidden email]> wrote:

>
> At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <[hidden email]> wrote in <[hidden email]>
> > On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
> > <[hidden email]> wrote:
> > > The performance results are good. Do we want to add the recommended
> > > size in the document for the parallel option? the parallel option for smaller
> > > tables can lead to performance overhead.
> > >
> >
> > Hmm, I don't think we can add the specific recommended size because
> > the performance gain by parallel lazy vacuum depends on various things
> > such as CPU cores, the number of indexes, shared buffer size, index
> > types, HDD or SSD. I suppose that users who want to use this option
> > have some sort of performance problem such as that vacuum takes a very
> > long time. They would use it for relatively larger tables.
>
> Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side effect of this feature is required for those who are to use the feature.
>

I think that the side effect of parallel lazy vacuum would be to
consume more CPUs and I/O bandwidth, but which is also true for the
other utility command (i.e. parallel create index). The description of
max_parallel_maintenance_worker documents such things[1]. Anything
else to document?

[1] https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Sergei Kornilov
In reply to this post by Masahiko Sawada
Hello

> * in_parallel is true if we're performing parallel lazy vacuum. Since any
> * updates are not allowed during parallel mode we don't update statistics
> * but set the index bulk-deletion result to *stats. Otherwise we update it
> * and set NULL.

lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after lazy_cleanup_index call and do something else with stats for parallel execution.
Would be better always return stats and update statistics in caller? It's possible to update all index stats in lazy_vacuum_all_indexes for example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup and statistics update for each index */ on for_cleanup=true call.

I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2 option we want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single process is running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel.

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Robert Haas
In reply to this post by Kyotaro HORIGUCHI-2
On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI
<[hidden email]> wrote:
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.

I think you're approaching this from the wrong point of view.  If we
have a certain amount of memory available, is it better to (a) fill
the entire thing with dead tuples once, or (b) better to fill half of
it with dead tuples, start index vacuuming, and then fill the other
half of it with dead tuples for the next index-vacuum cycle while the
current one is running?  I think the answer is that (a) is clearly
better, because it results in half as many index vacuum cycles.

We can't really ask the user how much memory it's OK to use and then
use twice as much.  But if we could, what you're proposing here is
probably still not the right way to use it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Robert Haas
In reply to this post by Masahiko Sawada
On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <[hidden email]> wrote:
> In parsing vacuum command, since only PARALLEL option can have an
> argument I've added the check in ExecVacuum to erroring out when other
> options have an argument. But it might be good to make other vacuum
> options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
> argument just like EXPLAIN command.

I think all of the existing options, including DISABLE_PAGE_SKIPPING,
should permit an argument that is passed to defGetBoolean().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Masahiko Sawada
On Fri, Mar 22, 2019 at 4:53 AM Robert Haas <[hidden email]> wrote:

>
> On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <[hidden email]> wrote:
> > In parsing vacuum command, since only PARALLEL option can have an
> > argument I've added the check in ExecVacuum to erroring out when other
> > options have an argument. But it might be good to make other vacuum
> > options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
> > argument just like EXPLAIN command.
>
> I think all of the existing options, including DISABLE_PAGE_SKIPPING,
> should permit an argument that is passed to defGetBoolean().
>
Agreed. The attached 0001 patch changes so.

On Thu, Mar 21, 2019 at 8:05 PM Sergei Kornilov <[hidden email]> wrote:
>
> Hello
>

Thank you for reviewing the patch!

> > * in_parallel is true if we're performing parallel lazy vacuum. Since any
> > * updates are not allowed during parallel mode we don't update statistics
> > * but set the index bulk-deletion result to *stats. Otherwise we update it
> > * and set NULL.
>
> lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after lazy_cleanup_index call and do something else with stats for parallel execution.
> Would be better always return stats and update statistics in caller? It's possible to update all index stats in lazy_vacuum_all_indexes for example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup and statistics update for each index */ on for_cleanup=true call.

Agreed. I've changed the patch so that we update index statistics in
lazy_vacuum_all_indexes().

>
> I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2 option we want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single process is running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel.
>

Hmm, the documentation says "Perform vacuum index and cleanup index
phases of VACUUM in parallel using N background workers". Doesn't it
already explain that?

Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

v19-0001-All-VACUUM-command-options-allow-an-argument.patch (7K) Download Attachment
v19-0003-Add-paralell-P-option-to-vacuumdb-command.patch (8K) Download Attachment
v19-0002-Add-parallel-option-to-VACUUM-command.patch (74K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Block level parallel vacuum

Haribabu Kommi-2

On Fri, Mar 22, 2019 at 4:06 PM Masahiko Sawada <[hidden email]> wrote:

Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.

Thanks for sharing the updated patches.

0001 patch:

+    PARALLEL [ <replaceable class="parameter">N</replaceable> ]

But this patch contains syntax of PARALLEL but no explanation, I saw that
it is explained in 0002. It is not a problem, but just mentioning.

+      Specifies parallel degree for <literal>PARALLEL</literal> option. The
+      value must be at least 1. If the parallel degree
+      <replaceable class="parameter">integer</replaceable> is omitted, then
+      <command>VACUUM</command> decides the number of workers based on number of
+      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.

Can we add some more details about backend participation also, parallel workers will
come into picture only when there are 2 indexes in the table.

+ /*
+ * Do post-vacuum cleanup and statistics update for each index if
+ * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
+ * only post-vacum cleanup and then update statistics after exited
+ * from parallel mode.
+ */
+ lazy_vacuum_all_indexes(vacrelstats, Irel, nindexes, indstats,
+ lps, true);

How about renaming the above function, as it does the cleanup also?
lazy_vacuum_or_cleanup_all_indexes?


+ if (!IsInParallelVacuum(lps))
+ {
+ /*
+ * Update index statistics. If in parallel lazy vacuum, we will
+ * update them after exited from parallel mode.
+ */
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }

The above check in lazy_vacuum_all_indexes can be combined it with the outer
if check where the memcpy is happening. I still feel that the logic around the stats
makes it little bit complex.

+ if (IsParallelWorker())
+ msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
+ else
+ msg = "scanned index \"%s\" to remove %d row versions";

I feel, this way of error message may not be picked for the translations.
Is there any problem if we duplicate the entire ereport message with changed message?

+ for (i = 0; i < nindexes; i++)
+ {
+ LVIndStats *s = &(copied_indstats[i]);
+
+ if (s->updated)
+ lazy_update_index_statistics(Irel[i], &(s->stats));
+ }
+
+ pfree(copied_indstats);

why can't we use the shared memory directly to update the stats once all the workers
are finished, instead of copying them to a local memory?

+ tab->at_params.nworkers = 0; /* parallel lazy autovacuum is not supported */

User is not required to provide workers number compulsory even that parallel vacuum can
work, so just setting the above parameters doesn't stop the parallel workers, user must
pass the PARALLEL option also. So mentioning that also will be helpful later when we
start supporting it or some one who is reading the code can understand.

Regards,
Haribabu Kommi
Fujitsu Australia
123456 ... 14